An early release of pbrt-v4 and some notes about pbrt-gpu

A frame of @beeple's Zero Day, rendered with pbrt-v4.

I’m happy to report that we’ve posted a first drop of the source code to pbrt-v4, the next version of pbrt, corresponing to the system that will be described in the next edition of Physically Based Rendering. There’s a ton of new stuff in this release; as with previous releases, I’d estimate that at least a third of the system is new.

Of course, there are known bugs, rough edges, and there’s not much more documentation than the code itself. This is a release for the adventurous and for people familiar with pbrt-v3, but we hope that by making it available now the final version will be that much better thanks to bugs fixed and our having had the chance to make various improvements before the book text was final.

Not only do we have a range of new state-of-the-art rendering techniques implemented in pbrt-v4 (detailed extensively in the README), but what I’m perhaps most excited about with this release is the arrival of GPU rendering as an option for pbrt. (Note that pbrt-v4 still runs on the CPU on systems that don’t have a supported GPU, just as well as it ever did; keeping portability across a wide range of systems was critically important.)

Yasutoshi Mori (@MirageYM)'s sportscar model rendered with pbrt-v4, where nearly every surface uses a measured BRDF from the RGL Material Database.

I gave a talk about the GPU work at HPG this year; the key challenge was not whether it would be possible to implement a GPU ray tracer in the first place, but whether a system that still had the soul of pbrt would remain after doing so—could it still be clean enough to fulfill the pedagogical goals of pbrt and to be understandable enough to work as a basis for the book?

I think we were successful—the GPU version runs all of the same C++ code as regular pbrt to generate camera rays, compute values for low-discrepancy sampling patterns, evaluate and sample lights and BSDFs, filter image samples—pretty much all of the core rendering computation. That is all wired up differently for the GPU than for the CPU—on the GPU, in a sequence of individual kernels connected with work queues—but it’s built out of all of the same pieces. I’ve posted a video walkthrough of the code that gives an overview of changes to the system’s organization, mostly related to the GPU path.

Our old friend, San Miguel from Guillermo M. Leal Llaguno, rendered with pbrt-v4.

And GPUs nowadays are fast… After a little performance work after HPG, now pbrt on the GPU is even faster than it was when I gave that talk. Speedups versus pbrt running on a 6 core CPU are generally 50-100x. If you feel like a 6 core CPU isn’t a fair baseline, then it’s about 10-20x faster than running pbrt on a 32-core Threadripper 3970X. While I recognize that as an employee of a GPU manufacturer, I may be expected to have some bias on this topic, I think the performance is compelling. I am super excited to see all that code that was originally written for not-GPUs running so quickly.

pbrt’s GPU path requires three things:

C++17 support on the GPU, including kernel launch taking C++ lambdas.
Unified memory so that the CPU can allocate and initialize data structures for code that runs on the GPU.
An API for ray-object intersections on the GPU.

The current implementation is built using CUDA and OptiX, the only options that fulfill those requirements at this point. Almost all of the CUDA code is plain old C++17; there’s no use of shared memory, inter-thread communication, or any fancy GPU programming stuff in it. We’d be happy to take patches getting pbrt running on other vendors’ GPUs, should any of them also support modern C++, unified memory, and ray tracing.

And now, it’s time to get back to writing the book…