Swallowing the elephant (part 4)

I have a branch of pbrt that I use for trying out new ideas, implementing neat things from papers, and generally exploring things that end up in a future edition of Physically Based Rendering. Unlike pbrt-v3, which we try to keep as close as possible to the system described in the book, it’s possible to change anything in that branch. Today we’ll look at how a few more radical changes to the system than we’ve considered previously in this series significantly reduce memory usage with Disney’s Moana island scene.

A note on methodology: in the previous three posts, all of the statistics were measured with the WIP version of the scene I was working with before it was released. For this one, we’ll switch to the final version, which is slightly more complex.

When rendering the latest Moana island scene, pbrt-v3 uses 81 GB of RAM to store the scene description. Today’s pbrt-next uses 41 GB—about half was much. A handful of changes totaling a few hundred lines of code was enough to get us there.

Smaller Primitives

Recall that in pbrt a Primitive is a combination of a shape, its material, its emission function (if it’s an area light), and a record of which participating media are inside and outside its surface. In pbrt-v3, GeometricPrimitives store:

    std::shared_ptr<Shape> shape;
    std::shared_ptr<Material> material;
    std::shared_ptr<AreaLight> areaLight;
    MediumInterface mediumInterface;

As discussed earlier, most of the time, areaLight is nullptr, and the MediumInterface holds a pair of nullptrs. Therefore, in pbrt-next I added a SimplePrimitive variant of Primitive that only stores pointers to a shape and a material. It is used in place of GeometricPrimitive when possible:

class SimplePrimitive : public Primitive {
    // ...
    std::shared_ptr<Shape> shape;
    std::shared_ptr<Material> material;
};

For non-animated object instances, there’s now TransformedPrimitive, which just stores a pointer to a primitive and a transformation, saving the nearly 500 bytes of unneeded bloat that an AnimatedTransform instance added to pbrt-v3’s TransformedPrimitive.

class TransformedPrimitive : public Primitive {
    // ...
    std::shared_ptr<Primitive> primitive;
    std::shared_ptr<Transform> PrimitiveToWorld;
};

(pbrt-next has an AnimatedPrimitive for the case where an animated transformation is actually needed.)

With these changes, the statistics report 7.8 GB used for Primitives, down from 28.9 GB in pbrt-v3. Lovely as it is to save 21 GB, that’s not as much of a reduction as we’d have expected from the back of the envelope estimate earlier; we’ll come back to that discrepancy at the end of this post.

Smaller Shapes

Memory used for geometry is also significantly reduced in pbrt-next: space used for triangle meshes is down from 19.4 GB to 9.9 GB and space used for curves is down from 1.4 GB to 1.1 GB. A little more than half of this savings comes from a simplification of the Shape base class.

In pbrt-v3, Shape brings with it a few members that all Shape implementations carry along—these are just a few things that are handy to have access to in Shape implementations.

class Shape {
    // ....
    const Transform *ObjectToWorld, *WorldToObject;
    const bool reverseOrientation;
    const bool transformSwapsHandedness;
};

To understand why those member variables are problematic, it’s helpful to understand how triangle meshes are represented in pbrt. First, there’s a TriangleMesh class, which stores the vertex and index buffers for an entire mesh:

struct TriangleMesh {
    int nTriangles, nVertices;
    std::vector<int> vertexIndices;
    std::unique_ptr<Point3f[]> p;
    std::unique_ptr<Normal3f[]> n;
    // ...
};

Each triangle in the mesh is represented by a Triangle, which inherits from Shape. The idea is to keep Triangles as small as possible: they only store a pointer to the mesh that they’re a part of and a pointer to the offset in the index buffer where their vertex indices start:

class Triangle : public Shape {
    // ...
    std::shared_ptr<TriangleMesh> mesh;
    const int *v;
};

When the Triangle implementation needs to find the positions of its vertices or the like, it does the appropriate indexing to get them from the TriangleMesh.

The problem with pbrt-v3’s Shape is that the values it stores are the same for all of the triangles in a mesh, so it’d be better to just store them once for each entire mesh in TriangleMesh and then to allow Triangles to access a single copy of the shared values when needed.

That’s fixed in pbrt-next: the Shape base class in pbrt-next doesn’t have those members and as such, each and every Triangle is 24 bytes smaller. The Curve shape follows a similar strategy and also benefits from a leaner Shape.

Shared triangle buffers

Although the Moana island scene makes extensive use of object instancing for explicitly replicated geometry, I wondered how much reuse there might happen to be across things like index buffers, texture coordinate buffers, and so forth across separate triangle meshes.

I wrote a little class that hashes those buffers as they come in and stores them in a cache. I modified TriangleMesh to check the cache and use the already stored version of any redundant buffer that it needed. There was a nice benefit: 4.7 GB of redundant storage was eliminated, which was much more than I expected.

The std::shared_ptr disaster

With the changes so far, the statistics reported about 36 GB of known memory allocations, while at the start of rendering, top reported 53 GB used. Sigh.

I was dreading another series of slow massif runs to figure out which allocations the statistics were missing when an email from Arseny Kapoulkine appeared in my inbox. Arseny explained to me that my earlier estimates of GeometricPrimitive memory use were significantly off. It took a few back and forths before I fully understood; many thanks to Arseny for pointing the issue out and for explaining until I understood.

Before Arseny’s email, my mental model of how std::shared_ptrs were implemented was that there was a shared descriptor that stored a reference count and the pointer to the actual allocated object, along the lines of:

template <typename T> class shared_ptr_info {
    std::atomic<int> refCount;
    T *ptr;
};

Then I assumed that a shared_ptr instance would just point to and use that:

template <typename T> class shared_ptr {
    // ...
    T *operator->() { return info->ptr; }
    shared_ptr_info<T> *info;
};

In short, I assumed that sizeof(shared_ptr<>) is the same as the size of a pointer, and I assumed that there was about 16 bytes of additional overhead for each shared pointer.

That is not so.

In the implementation on my system here, the shared descriptor is 32 bytes, and sizeof(shared_ptr<>) is 16 bytes. As such, a GeometricPrimitive that’s mostly std::shared_ptrs is about twice as big as I’d estimated. If you’re curious about the whys, these two Stack Overflow postings nicely explain the details: 1 2.

Almost all of my uses of std::shared_ptr in pbrt-next didn’t need to be shared pointers. In a day’s frenzy of hacking, I replaced as many as I could with std::unique_ptr, which is indeed the same size as a regular pointer. For example, here’s what SimplePrimitive looks like now:

class SimplePrimitive : public Primitive {
    // ...
    std::unique_ptr<Shape> shape;
    const Material *material;
};

The payoff was better than I’d hoped for: memory use at the start of rendering dropped from 53 GB to 41 GB—a 12 GB savings, totally unexpected just a few days ago, and down to basically half of what pbrt-v3 uses. Woop!

Next time, we’ll finally wrap up this series with a look at rendering performance with pbrt-next and discuss a few ideas about additional things to look into for reducing memory use with this scene.