Swallowing the Elephant (part 6): Fool me once…

I’ve been long overdue to update the pbrt version of Disney’s amazing Moana island scene to account for the changes in pbrt-v4’s scene description format; I finally got around to it over the past few days. Surprise, surprise, getting it rendering again wasn’t all smooth sailing, but Things Were Learned and here we are with another few blog posts about the experience. (For context, it might be worthwhile to read the earlier posts on rendering the Moana island in pbrt if you have not already.)

Converting to pbrt-v4

The latest version of pbrt provides an --upgrade flag that does a reasonably good job of automatically updating scene description files from the previous version of pbrt to work with pbrt-v4. For most scenes, --upgrade does all that is needed. For others, a few manual fixes may be necessary, though pbrt tries to give guidance—this was ambiguous, so over to you, and the like. For this monster of a scene, a few additional hours of manual work with sed and emacs macros were necessary to finish the job.

The first renderings of the converted scene weren’t exactly awesome…

Disney's Moana island scene rendered with pbrt-v4, a disastrous conversion of the materials, and some issues with incorrect transformations (note that those neon yellow leaves not aligned with the tree trunk and branches to their left).

All of that manual work was due to self-inflicted wounds:

In pbrt-v3, one could specify a material and then subsequently override its parameters with the parameters that are provided with a shape that uses that material. We removed this functionality to simplify processing the scene description thinking that it was rarely used. It turns out that this capability was used extensively in Disney’s conversion of the scene to pbrt’s format.
One could redefine named textures in pbrt-v3, while pbrt-v4 prohibits this; again, we didn’t think this was widely used and again, guess what, it was used extensively in the pbrt-v3 version of the Moana island scene.
pbrt-v4 no longer supports the Disney BSDF, which was used for all of the objects in this scene, it was necessary to manually map all of the uses of it to the most similar BSDFs that are provided in pbrt-v4.¹

A reorganization of the parsing code

Before looking at pbrt’s performance and memory use, it’s worth discussing an important change to pbrt’s implementation since my earlier posts: the parts of the system responsible for parsing scene description files and converting them into objects suitable for rendering have changed substantially in pbrt-v4. In earlier versions of the system, parsing and scene object creation were intermingled. For example, if the parser saw an image texture definition, it would stop to read the texture from disk and build MIP maps before it continued. If an object instance was defined, then all of the constituent primitives would be created and a BVH built for it before parsing resumed. And so forth…

In pbrt-v4, the parser’s job is more that of deserializing the scene description to a generic intermediate representation. For example, it will record the fact that some texture of type “imagemap” has been defined and that it has a string-valued parameter “filename” with some value associated with it, but that’s it—on to snarfing up more tokens from the scene description. The parser is responsible for initializing an instance of the ParsedScene class; only when parsing is complete is the ParsedScene converted to the optimized scene representation that is used for rendering. The form of ParsedScene is more or less

class ParsedScene {
  public:
    std::vector<ShapeEntity> shapes;
    std::vector<LightEntity> lights;
    std::map<std::string, SceneEntity> namedMaterials;
    // ...
};

where, for example, ShapeEntity records things like the name of the shape (“trianglemesh”, “plymesh”, “sphere”, or whatever), its transformation, the material associated with it, as well as the parameters that were provided with it (e.g., “there’s a float named ‘radius’ with value 10.”)

The initial motivation for this restructuring was the addition of GPU rendering in pbrt-v4; while things like lights, materials, and most textures are represented by the same objects for both CPU and GPU, the respective geometric representations of the scene differ substantially. Thus, it worked well to structure the system so that the parsing code generates an intermediate representation that can then be transformed into a specific representation used for rendering.

As we will see shortly, this rewrite caused some trouble, though by the end, it redeems itself.

Fail fast

With the scene converted, all was not well. Where we left off, pbrt-next, the in-progress version of pbrt-v4 from 2.5 years ago, used 41 GB of RAM when rendering the scene, with an additional spike of about 10 GB while the top-level BVH was built. One might hope that on my current system with 64 GB of RAM it would render nicely out of the box.

One might hope…

Rather, pbrt-v4 filled up available RAM and the puny 2 GB of swap before it was killed when memory ran out. I bumped up the size of the swap file to 64 GB just to see if that would do it, but still had no luck. Time to turn to my old friend massif, which tracks memory allocations over the course of a program’s execution. I tried rendering a pared down version of the scene with massif to see where all of the memory was going.

There wasn’t much nuance in what massif had to report; by far the greatest memory consumer was instances of the InstanceSceneEntity structure. The parser creates one for each object instance in the scene; it basically wraps up a transformation matrix and the name of the object being instantiated. The transformation may be fixed or it may be specified by a pair of transformations that are interpolated. Therefore, it stores both a Transform * and an AnimatedTransform.

Here are the important parts of its definition:

struct InstanceSceneEntity : public SceneEntity {
    // ...
    AnimatedTransform renderFromInstanceAnim;
    const Transform *renderFromInstance;
};

The reader with a good memory may now remember that AnimatedTransform was a troublemaker the first time I dug into pbrt-v3’s use of memory with the Moana island scene. (If one has forgotten, see here.) AnimatedTransform is not a small structure; in pbrt-v4, each one is 696 bytes. In this case, nothing is animated and the AnimatedTransform is unused.

Clearly I had forgotten this pitfall, since there I go again making the very same mistake, here now with InstanceSceneEntity. For the full Moana island scene, a total of 39,270,497 of them are allocated. At 696 bytes for each AnimatedTransform, that works out to 25.4 GB of unused identity matrices and associated baggage.

That was an easy fix and with it, the scene successfully rendered on my system here. Here’s an image for sustenance:

Moana island rendered more successfully with pbrt-v4. This image rendered in 46m37s at 1920x804 resolution with 2048 samples per pixel on a 32-core AMD 3970X CPU.

However, pbrt still used about 66 GB of memory during rendering, with a peak of 82 GB. Plenty more stinkiness remained.

Department of redundant scene descriptions department

Another run of massif with the full scene was just as unambiguous about where the problem was as the first one was; 27 GB of vector<double>s had been allocated as part of the ParsedParameter class. ParsedParameter is another part of the new parsing system; it is responsible for recording all of the parameter values provided for things in the scene description file. For example, if you specify "integer indices" [ 0 1 2 ] with a triangle mesh, a ParsedParameter instance records that there was this thing with “integer” type, it has the name “indices”, and those three values were specified. This is again part of the parser just recording what it sees, but not judging or interpreting.

Here are the relevant parts of its definition:

class ParsedParameter {
  public:
    std::string type, name;
    std::vector<double> numbers;
    std::vector<std::string> strings;
    std::vector<uint8_t> bools;
    // ...
};

Momentarily leaving aside the use of double precision for numbers, it only took a few minutes thinking to realize that while pbrt-v4 was creating the scene representation to use for rendering, it wasn’t freeing up parts of the ParsedScene when it was done with them. Indeed, all of it was still using up memory the whole time rendering proceeded, so there were those 27 GB and then more.

With a few changes to free ParsedScene memory when possible (1) (2) (3), peak memory use drops by 32 GB to 50 GB, with 32 GB in use at the start of rendering.

Too much precision, because you never know

Returning to the topic of the use of double precision in ParsedParameter::numbers: I used doubles out of of laziness. Although pbrt generally uses 32-bit floating point, double has the nice property that it can exactly represent all 32-bit integers. Thus, the parser could just be simpleminded and store arrays of numbers, without worrying about whether or not they were floats or integers.

I told myself that those vectors would never get very big. I figured that big triangle meshes would usually come in via PLY files, in which case the only use of ParsedParameter is to store a single filename. Thus, I assumed that those arrays would never use an objectionable amount of memory. That assumption was mostly true, but not true enough: some of the trees in the Moana scene are represented by many small independent triangle meshes of a few tens or a hundred or so triangles each. Individually, these don’t make sense to store as PLY files; there would be tens of thousands of them for a single tree. Thus, they are left as text in the scene description. From them, those parameter vectors become large.

With another simple once you get around to doing it change, we’re down another 4.5 GB to 45.5 GB peak memory use and now 31 GB in use at the start of rendering—10 GB less than before. Victory!

Wrap-up

It took a few days of digging into regressions, but pbrt-v4 is now even better than where it had been 2.5 years ago, memory-wise. I can’t precisely account for that last 10 GB improvement, but would assume that most of it is due to switching to tagged pointers to eliminate the virtual function pointers in the shape and primitive classes (as considered earlier). The size of those classes has seen some further attention in pbrt-v4, and it seems to have added up in this case.

Here is an accounting of how memory is used now when rendering begins:

Type	Memory
BVH	13.5 GB
Transformations	5.5 GB
Transformation hash table	1 GB
Primitives	2.5 GB
Triangles	1.2 GB
Triangle vertex buffers (P, N, uv, indices)	5.25 GB
Curves	0.6 GB

Next time we’ll dig into runtime performance while parsing the scene, where things start in a better place and go fun places from there.

note

About dropping the Disney BSDF: while folks at Disney were working on converting the scene to pbrt’s format a few years ago, I added the Disney BSDF (and support for Ptex textures) to pbrt-v3 in order to make pbrt-v3 a more hospitable target. Normally new functionality isn’t added after the book comes out, since the whole idea of the book is to describe the implementation of the renderer, but it was well worth it for this prize of a scene.

For the fourth edition of the book, we have redesigned the set of materials and BSDFs from scratch and have tried to be more physically principled than before. (Among other things, pbrt’s old kitchen sink UberMaterial is gone.) In this context, an artist-friendly BSDF like the Disney one doesn’t fit with the book’s current focus, so we have cut it in the interests of simplifying the system. (Ptex support remains, at least!) ↩