Swallowing the elephant (part 4)
I have a branch of pbrt that I use for trying out new ideas, implementing neat things from papers, and generally exploring things that end up in a future edition of Physically Based Rendering. Unlike pbrt-v3, which we try to keep as close as possible to the system described in the book, it’s possible to change anything in that branch. Today we’ll look at how a few more radical changes to the system than we’ve considered previously in this series significantly reduce memory usage with Disney’s Moana island scene.
A note on methodology: in the previous three posts, all of the statistics were measured with the WIP version of the scene I was working with before it was released. For this one, we’ll switch to the final version, which is slightly more complex.
When rendering the latest Moana island scene, pbrt-v3 uses 81 GB of RAM to store the scene description. Today’s pbrt-next uses 41 GB—about half was much. A handful of changes totaling a few hundred lines of code was enough to get us there.
Smaller Primitives
Recall that in pbrt a Primitive
is a combination of a shape, its
material, its emission function (if it’s an area light), and a record of
which participating media are inside and outside its surface. In pbrt-v3,
GeometricPrimitive
s store:
std::shared_ptr<Shape> shape;
std::shared_ptr<Material> material;
std::shared_ptr<AreaLight> areaLight;
MediumInterface mediumInterface;
As discussed
earlier,
most of the time, areaLight
is nullptr
, and the MediumInterface
holds
a pair of nullptr
s. Therefore, in pbrt-next I added a SimplePrimitive
variant of Primitive
that only stores pointers to a shape and a material.
It is used in place of GeometricPrimitive
when possible:
class SimplePrimitive : public Primitive {
// ...
std::shared_ptr<Shape> shape;
std::shared_ptr<Material> material;
};
For non-animated object instances, there’s now TransformedPrimitive
,
which just stores a pointer to a primitive and a transformation, saving the
nearly 500 bytes of unneeded
bloat
that an AnimatedTransform
instance added to pbrt-v3’s
TransformedPrimitive
.
class TransformedPrimitive : public Primitive {
// ...
std::shared_ptr<Primitive> primitive;
std::shared_ptr<Transform> PrimitiveToWorld;
};
(pbrt-next has an AnimatedPrimitive
for the case where an animated
transformation is actually needed.)
With these changes, the statistics report 7.8 GB used for Primitive
s,
down from 28.9 GB in pbrt-v3. Lovely as it is to save 21 GB, that’s not as
much of a reduction as we’d have expected from the back of the envelope
estimate earlier; we’ll come back to that discrepancy at the end of this
post.
Smaller Shapes
Memory used for geometry is also significantly reduced in pbrt-next: space
used for triangle meshes is down from 19.4 GB to 9.9 GB and space used for
curves is down from 1.4 GB to 1.1 GB. A little more than half of this
savings comes from a simplification of the Shape
base class.
In pbrt-v3, Shape
brings with it a few members that all Shape
implementations carry along—these are just a few things that are handy to
have access to in Shape
implementations.
class Shape {
// ....
const Transform *ObjectToWorld, *WorldToObject;
const bool reverseOrientation;
const bool transformSwapsHandedness;
};
To understand why those member variables are problematic, it’s helpful to
understand how triangle meshes are represented in pbrt. First, there’s a
TriangleMesh
class, which stores the vertex and index buffers for an
entire mesh:
struct TriangleMesh {
int nTriangles, nVertices;
std::vector<int> vertexIndices;
std::unique_ptr<Point3f[]> p;
std::unique_ptr<Normal3f[]> n;
// ...
};
Each triangle in the mesh is represented by a Triangle
, which inherits
from Shape
. The idea is to keep Triangle
s as small as
possible: they only store a pointer to the mesh that they’re a
part of and a pointer to the offset in the index buffer where their vertex
indices start:
class Triangle : public Shape {
// ...
std::shared_ptr<TriangleMesh> mesh;
const int *v;
};
When the Triangle
implementation needs to find the positions of its
vertices or the like, it does the appropriate indexing to get them from the
TriangleMesh
.
The problem with pbrt-v3’s Shape
is that the values it stores are the same for
all of the triangles in a mesh, so it’d be better to just store them once for
each entire mesh in TriangleMesh
and then to allow Triangle
s to access
a single copy of the shared values when needed.
That’s fixed in pbrt-next: the Shape
base class in pbrt-next doesn’t have
those members and as such, each and every Triangle
is 24 bytes smaller.
The Curve
shape follows a similar strategy and also benefits from a
leaner Shape
.
Shared triangle buffers
Although the Moana island scene makes extensive use of object instancing for explicitly replicated geometry, I wondered how much reuse there might happen to be across things like index buffers, texture coordinate buffers, and so forth across separate triangle meshes.
I wrote a little class that hashes those buffers as they come in and stores
them in a cache. I modified TriangleMesh
to check the cache and
use the already stored version of any redundant buffer that it needed.
There was a nice benefit: 4.7 GB of redundant storage was eliminated, which
was much more than I expected.
The std::shared_ptr disaster
With the changes so far, the statistics reported about 36 GB of known
memory allocations, while at the start of rendering, top
reported 53 GB
used. Sigh.
I was dreading another series of slow massif
runs to figure out which
allocations the statistics were missing when an email from Arseny
Kapoulkine appeared in my inbox. Arseny explained to
me that my earlier estimates of GeometricPrimitive
memory
use
were significantly off. It took a few back and forths before I fully
understood; many thanks to Arseny for pointing the issue out and for
explaining until I understood.
Before Arseny’s email, my mental model of how std::shared_ptr
s were
implemented was that there was a shared descriptor that stored a reference
count and the pointer to the actual allocated object, along the lines of:
template <typename T> class shared_ptr_info {
std::atomic<int> refCount;
T *ptr;
};
Then I assumed that a shared_ptr
instance would just point to and use
that:
template <typename T> class shared_ptr {
// ...
T *operator->() { return info->ptr; }
shared_ptr_info<T> *info;
};
In short, I assumed that sizeof(shared_ptr<>)
is the same as the size of
a pointer, and I assumed that there was about 16 bytes of additional
overhead for each shared pointer.
That is not so.
In the implementation on my system here, the shared descriptor is 32
bytes, and sizeof(shared_ptr<>)
is 16 bytes. As such, a
GeometricPrimitive
that’s mostly std::shared_ptr
s is about twice as big
as I’d estimated. If you’re curious about the whys, these two Stack
Overflow postings nicely explain the details: 1
2.
Almost all of my uses of std::shared_ptr
in pbrt-next didn’t need to be
shared pointers. In a day’s frenzy of hacking, I replaced as many as I
could with std::unique_ptr
, which is indeed the same size as a regular
pointer. For example, here’s what SimplePrimitive
looks like now:
class SimplePrimitive : public Primitive {
// ...
std::unique_ptr<Shape> shape;
const Material *material;
};
The payoff was better than I’d hoped for: memory use at the start of rendering dropped from 53 GB to 41 GB—a 12 GB savings, totally unexpected just a few days ago, and down to basically half of what pbrt-v3 uses. Woop!
Next time, we’ll finally wrap up this series with a look at rendering performance with pbrt-next and discuss a few ideas about additional things to look into for reducing memory use with this scene.