Swallowing the Elephant (part 6): Fool me once…
I’ve been long overdue to update the pbrt version of Disney’s amazing Moana island scene to account for the changes in pbrt-v4’s scene description format; I finally got around to it over the past few days. Surprise, surprise, getting it rendering again wasn’t all smooth sailing, but Things Were Learned and here we are with another few blog posts about the experience. (For context, it might be worthwhile to read the earlier posts on rendering the Moana island in pbrt if you have not already.)
Converting to pbrt-v4
The latest version of pbrt provides an --upgrade
flag that does a
reasonably good job of automatically updating scene description files from
the previous version of pbrt to work with pbrt-v4. For most scenes,
--upgrade
does all that is needed. For others, a few manual fixes may be
necessary, though pbrt tries to give guidance—this was ambiguous, so over
to you, and the like. For this monster of a scene, a few additional hours
of manual work with sed and emacs macros were necessary to finish the
job.
The first renderings of the converted scene weren’t exactly awesome…
Disney's Moana island scene rendered with pbrt-v4, a disastrous conversion of the materials, and some issues with incorrect transformations (note that those neon yellow leaves not aligned with the tree trunk and branches to their left).
All of that manual work was due to self-inflicted wounds:
- In pbrt-v3, one could specify a material and then subsequently override its parameters with the parameters that are provided with a shape that uses that material. We removed this functionality to simplify processing the scene description thinking that it was rarely used. It turns out that this capability was used extensively in Disney’s conversion of the scene to pbrt’s format.
- One could redefine named textures in pbrt-v3, while pbrt-v4 prohibits this; again, we didn’t think this was widely used and again, guess what, it was used extensively in the pbrt-v3 version of the Moana island scene.
- pbrt-v4 no longer supports the Disney BSDF, which was used for all of the objects in this scene, it was necessary to manually map all of the uses of it to the most similar BSDFs that are provided in pbrt-v4.1
A reorganization of the parsing code
Before looking at pbrt’s performance and memory use, it’s worth discussing an important change to pbrt’s implementation since my earlier posts: the parts of the system responsible for parsing scene description files and converting them into objects suitable for rendering have changed substantially in pbrt-v4. In earlier versions of the system, parsing and scene object creation were intermingled. For example, if the parser saw an image texture definition, it would stop to read the texture from disk and build MIP maps before it continued. If an object instance was defined, then all of the constituent primitives would be created and a BVH built for it before parsing resumed. And so forth…
In pbrt-v4, the parser’s job is more that of deserializing the scene
description to a
generic intermediate representation. For example, it will record the fact
that some texture of type “imagemap” has been defined and that it has a
string-valued parameter “filename” with some value associated with it, but
that’s it—on to snarfing up more tokens from the scene description. The
parser is responsible for initializing an instance of the
ParsedScene
class; only when parsing is complete is the ParsedScene
converted to the
optimized scene representation that is used for rendering. The form of
ParsedScene
is more or less
class ParsedScene {
public:
std::vector<ShapeEntity> shapes;
std::vector<LightEntity> lights;
std::map<std::string, SceneEntity> namedMaterials;
// ...
};
where, for example,
ShapeEntity
records things like the name of the shape (“trianglemesh”, “plymesh”,
“sphere”, or whatever), its transformation, the material
associated with it, as well as the parameters that were provided with it
(e.g., “there’s a float
named ‘radius’ with value 10.”)
The initial motivation for this restructuring was the addition of GPU rendering in pbrt-v4; while things like lights, materials, and most textures are represented by the same objects for both CPU and GPU, the respective geometric representations of the scene differ substantially. Thus, it worked well to structure the system so that the parsing code generates an intermediate representation that can then be transformed into a specific representation used for rendering.
As we will see shortly, this rewrite caused some trouble, though by the end, it redeems itself.
Fail fast
With the scene converted, all was not well. Where we left off, pbrt-next, the in-progress version of pbrt-v4 from 2.5 years ago, used 41 GB of RAM when rendering the scene, with an additional spike of about 10 GB while the top-level BVH was built. One might hope that on my current system with 64 GB of RAM it would render nicely out of the box.
One might hope…
Rather, pbrt-v4 filled up available RAM and the puny 2 GB of swap before it was killed when memory ran out. I bumped up the size of the swap file to 64 GB just to see if that would do it, but still had no luck. Time to turn to my old friend massif, which tracks memory allocations over the course of a program’s execution. I tried rendering a pared down version of the scene with massif to see where all of the memory was going.
There wasn’t much nuance in what massif had to report; by far the
greatest memory consumer was instances of the InstanceSceneEntity
structure. The parser creates one for each object instance in the
scene; it basically wraps up a transformation matrix and the name of the
object being instantiated. The transformation may be fixed or it may be
specified by a pair of transformations that are interpolated. Therefore,
it stores both a Transform *
and an AnimatedTransform
.
Here are the important parts of its definition:
struct InstanceSceneEntity : public SceneEntity {
// ...
AnimatedTransform renderFromInstanceAnim;
const Transform *renderFromInstance;
};
The reader with a good memory may now remember that AnimatedTransform
was
a troublemaker the first time I dug into pbrt-v3’s use of memory with the
Moana island scene. (If one has forgotten, see
here.)
AnimatedTransform
is not a small structure; in pbrt-v4, each one is 696
bytes. In this case, nothing is animated and the AnimatedTransform
is
unused.
Clearly I had forgotten this pitfall, since there I go again making the
very same mistake, here now with InstanceSceneEntity
. For the full Moana
island scene, a total of 39,270,497 of them are allocated. At 696 bytes
for each AnimatedTransform
, that works out to 25.4 GB of unused identity
matrices and associated baggage.
That was an easy fix and with it, the scene successfully rendered on my system here. Here’s an image for sustenance:
Moana island rendered more successfully with pbrt-v4. This image rendered in 46m37s at 1920x804 resolution with 2048 samples per pixel on a 32-core AMD 3970X CPU.
However, pbrt still used about 66 GB of memory during rendering, with a peak of 82 GB. Plenty more stinkiness remained.
Department of redundant scene descriptions department
Another run of massif with the full scene was just as unambiguous about
where the problem was as the first one was; 27 GB of vector<double>
s
had been allocated as part of the ParsedParameter
class.
ParsedParameter
is another part of the new parsing system; it is
responsible for recording all of the parameter values provided for things
in the scene description file. For example, if you specify "integer
indices" [ 0 1 2 ]
with a triangle mesh, a ParsedParameter
instance
records that there was this thing with “integer” type, it has the name
“indices”, and those three values were specified. This is again part of
the parser just recording what it sees, but not judging or interpreting.
Here are the relevant parts of its definition:
class ParsedParameter {
public:
std::string type, name;
std::vector<double> numbers;
std::vector<std::string> strings;
std::vector<uint8_t> bools;
// ...
};
Momentarily leaving aside the use of double precision for numbers
, it
only took a few minutes thinking to realize that while pbrt-v4 was creating
the scene representation to use for rendering, it wasn’t freeing up parts
of the ParsedScene
when it was done with them. Indeed, all of it was
still using up memory the whole time rendering proceeded, so there were
those 27 GB and then more.
With a few changes to free ParsedScene
memory when possible
(1)
(2)
(3),
peak memory use drops by 32 GB to 50 GB, with 32 GB in use at the start of
rendering.
Too much precision, because you never know
Returning to the topic of the use of double precision in
ParsedParameter::numbers
: I used double
s out of of laziness. Although
pbrt generally uses 32-bit floating point, double
has the nice property
that it can exactly represent all 32-bit integers. Thus, the parser could
just be simpleminded and store arrays of numbers, without worrying about
whether or not they were floats or integers.
I told myself that those vector
s would never get very big. I figured
that big triangle meshes would usually come in via PLY files, in which case
the only use of ParsedParameter
is to store a single filename. Thus, I
assumed that those arrays would never use an objectionable amount of
memory. That assumption was mostly true, but not true enough: some of the
trees in the Moana scene are represented by many small independent
triangle meshes of a few tens or a hundred or so triangles each.
Individually, these don’t make sense to store as PLY files; there would be
tens of thousands of them for a single tree. Thus, they are left as text
in the scene description. From them, those parameter vectors become large.
With another simple once you get around to doing it change, we’re down another 4.5 GB to 45.5 GB peak memory use and now 31 GB in use at the start of rendering—10 GB less than before. Victory!
Wrap-up
It took a few days of digging into regressions, but pbrt-v4 is now even better than where it had been 2.5 years ago, memory-wise. I can’t precisely account for that last 10 GB improvement, but would assume that most of it is due to switching to tagged pointers to eliminate the virtual function pointers in the shape and primitive classes (as considered earlier). The size of those classes has seen some further attention in pbrt-v4, and it seems to have added up in this case.
Here is an accounting of how memory is used now when rendering begins:
Type | Memory |
---|---|
BVH | 13.5 GB |
Transformations | 5.5 GB |
Transformation hash table | 1 GB |
Primitives | 2.5 GB |
Triangles | 1.2 GB |
Triangle vertex buffers (P, N, uv, indices) | 5.25 GB |
Curves | 0.6 GB |
Next time we’ll dig into runtime performance while parsing the scene, where things start in a better place and go fun places from there.
note
-
About dropping the Disney BSDF: while folks at Disney were working on converting the scene to pbrt’s format a few years ago, I added the Disney BSDF (and support for Ptex textures) to pbrt-v3 in order to make pbrt-v3 a more hospitable target. Normally new functionality isn’t added after the book comes out, since the whole idea of the book is to describe the implementation of the renderer, but it was well worth it for this prize of a scene.
For the fourth edition of the book, we have redesigned the set of materials and BSDFs from scratch and have tried to be more physically principled than before. (Among other things, pbrt’s old kitchen sink
UberMaterial
is gone.) In this context, an artist-friendly BSDF like the Disney one doesn’t fit with the book’s current focus, so we have cut it in the interests of simplifying the system. (Ptex support remains, at least!) ↩