Swallowing the elephant (part 4)
I have a branch of pbrt that I use for trying out new ideas, implementing neat things from papers, and generally exploring things that end up in a future edition of Physically Based Rendering. Unlike pbrt-v3, which we try to keep as close as possible to the system described in the book, it’s possible to change anything in that branch. Today we’ll look at how a few more radical changes to the system than we’ve considered previously in this series significantly reduce memory usage with Disney’s Moana island scene.
A note on methodology: in the previous three posts, all of the statistics were measured with the WIP version of the scene I was working with before it was released. For this one, we’ll switch to the final version, which is slightly more complex.
When rendering the latest Moana island scene, pbrt-v3 uses 81 GB of RAM to store the scene description. Today’s pbrt-next uses 41 GB—about half was much. A handful of changes totaling a few hundred lines of code was enough to get us there.
Smaller Primitives
Recall that in pbrt a Primitive is a combination of a shape, its
material, its emission function (if it’s an area light), and a record of
which participating media are inside and outside its surface. In pbrt-v3,
GeometricPrimitives store:
std::shared_ptr<Shape> shape;
std::shared_ptr<Material> material;
std::shared_ptr<AreaLight> areaLight;
MediumInterface mediumInterface;
As discussed
earlier,
most of the time, areaLight is nullptr, and the MediumInterface holds
a pair of nullptrs. Therefore, in pbrt-next I added a SimplePrimitive
variant of Primitive that only stores pointers to a shape and a material.
It is used in place of GeometricPrimitive when possible:
class SimplePrimitive : public Primitive {
// ...
std::shared_ptr<Shape> shape;
std::shared_ptr<Material> material;
};
For non-animated object instances, there’s now TransformedPrimitive,
which just stores a pointer to a primitive and a transformation, saving the
nearly 500 bytes of unneeded
bloat
that an AnimatedTransform instance added to pbrt-v3’s
TransformedPrimitive.
class TransformedPrimitive : public Primitive {
// ...
std::shared_ptr<Primitive> primitive;
std::shared_ptr<Transform> PrimitiveToWorld;
};
(pbrt-next has an AnimatedPrimitive for the case where an animated
transformation is actually needed.)
With these changes, the statistics report 7.8 GB used for Primitives,
down from 28.9 GB in pbrt-v3. Lovely as it is to save 21 GB, that’s not as
much of a reduction as we’d have expected from the back of the envelope
estimate earlier; we’ll come back to that discrepancy at the end of this
post.
Smaller Shapes
Memory used for geometry is also significantly reduced in pbrt-next: space
used for triangle meshes is down from 19.4 GB to 9.9 GB and space used for
curves is down from 1.4 GB to 1.1 GB. A little more than half of this
savings comes from a simplification of the Shape base class.
In pbrt-v3, Shape brings with it a few members that all Shape
implementations carry along—these are just a few things that are handy to
have access to in Shape implementations.
class Shape {
// ....
const Transform *ObjectToWorld, *WorldToObject;
const bool reverseOrientation;
const bool transformSwapsHandedness;
};
To understand why those member variables are problematic, it’s helpful to
understand how triangle meshes are represented in pbrt. First, there’s a
TriangleMesh class, which stores the vertex and index buffers for an
entire mesh:
struct TriangleMesh {
int nTriangles, nVertices;
std::vector<int> vertexIndices;
std::unique_ptr<Point3f[]> p;
std::unique_ptr<Normal3f[]> n;
// ...
};
Each triangle in the mesh is represented by a Triangle, which inherits
from Shape. The idea is to keep Triangles as small as
possible: they only store a pointer to the mesh that they’re a
part of and a pointer to the offset in the index buffer where their vertex
indices start:
class Triangle : public Shape {
// ...
std::shared_ptr<TriangleMesh> mesh;
const int *v;
};
When the Triangle implementation needs to find the positions of its
vertices or the like, it does the appropriate indexing to get them from the
TriangleMesh.
The problem with pbrt-v3’s Shape is that the values it stores are the same for
all of the triangles in a mesh, so it’d be better to just store them once for
each entire mesh in TriangleMesh and then to allow Triangles to access
a single copy of the shared values when needed.
That’s fixed in pbrt-next: the Shape base class in pbrt-next doesn’t have
those members and as such, each and every Triangle is 24 bytes smaller.
The Curve shape follows a similar strategy and also benefits from a
leaner Shape.
Shared triangle buffers
Although the Moana island scene makes extensive use of object instancing for explicitly replicated geometry, I wondered how much reuse there might happen to be across things like index buffers, texture coordinate buffers, and so forth across separate triangle meshes.
I wrote a little class that hashes those buffers as they come in and stores
them in a cache. I modified TriangleMesh to check the cache and
use the already stored version of any redundant buffer that it needed.
There was a nice benefit: 4.7 GB of redundant storage was eliminated, which
was much more than I expected.
The std::shared_ptr disaster
With the changes so far, the statistics reported about 36 GB of known
memory allocations, while at the start of rendering, top reported 53 GB
used. Sigh.
I was dreading another series of slow massif runs to figure out which
allocations the statistics were missing when an email from Arseny
Kapoulkine appeared in my inbox. Arseny explained to
me that my earlier estimates of GeometricPrimitive memory
use
were significantly off. It took a few back and forths before I fully
understood; many thanks to Arseny for pointing the issue out and for
explaining until I understood.
Before Arseny’s email, my mental model of how std::shared_ptrs were
implemented was that there was a shared descriptor that stored a reference
count and the pointer to the actual allocated object, along the lines of:
template <typename T> class shared_ptr_info {
std::atomic<int> refCount;
T *ptr;
};
Then I assumed that a shared_ptr instance would just point to and use
that:
template <typename T> class shared_ptr {
// ...
T *operator->() { return info->ptr; }
shared_ptr_info<T> *info;
};
In short, I assumed that sizeof(shared_ptr<>) is the same as the size of
a pointer, and I assumed that there was about 16 bytes of additional
overhead for each shared pointer.
That is not so.
In the implementation on my system here, the shared descriptor is 32
bytes, and sizeof(shared_ptr<>) is 16 bytes. As such, a
GeometricPrimitive that’s mostly std::shared_ptrs is about twice as big
as I’d estimated. If you’re curious about the whys, these two Stack
Overflow postings nicely explain the details: 1
2.
Almost all of my uses of std::shared_ptr in pbrt-next didn’t need to be
shared pointers. In a day’s frenzy of hacking, I replaced as many as I
could with std::unique_ptr, which is indeed the same size as a regular
pointer. For example, here’s what SimplePrimitive looks like now:
class SimplePrimitive : public Primitive {
// ...
std::unique_ptr<Shape> shape;
const Material *material;
};
The payoff was better than I’d hoped for: memory use at the start of rendering dropped from 53 GB to 41 GB—a 12 GB savings, totally unexpected just a few days ago, and down to basically half of what pbrt-v3 uses. Woop!
Next time, we’ll finally wrap up this series with a look at rendering performance with pbrt-next and discuss a few ideas about additional things to look into for reducing memory use with this scene.