Sampling in Floating Point (1/3): The Unit Interval

(The following assumes basic familiarity with the IEEE floating point representation—sign, power of two exponent, and significand—but not necessarily expert-level understanding of it.)

Taking samples from various distributions is at the heart of rendering; perhaps most importantly, it allows us to use importance sampling when performing Monte Carlo integration, which gives us a powerful tool to reduce error. The associated sampling algorithms are generally derived assuming that real numbers are afoot but are then implemented using floating-point math on computers. For sampling, the differences between reals and floats usually doesn’t cause any problems, though if you look closely enough there are a few interesting subtleties. We’ll start this short series on that topic today with what seems like should be the simplest of problems: uniformly sampling a floating-point value between zero and one.

Uniform Floats by Dividing by an Integer

Just about anywhere you look, from Stack Overflow to all four editions of Physically Based Rendering, you’ll be told that it’s easy to sample a uniform floating-point value in \([0,1)\): just generate a random \(n\) bit unsigned integer and divide by \(2^{n}\). Given real numbers, that’s fine—the largest value your integer can take is \(2^n-1\) and dividing by \(2^n\) gives a value that’s strictly less than one.¹ With 32-bit floats (as we will exclusively consider today), there’s a nit: say that \(n=32\) (as is used in pbrt). After floating point rounding, one will find that \[ \frac{2^{32} - 1}{2^{32}} \rightarrow 1; \] so much for that non-inclusive upper bound. The problem is that the spacing between the floats right below 1 is \(2^{-24}\). Because \(2^{-32}\) is much less than half that, \(1-2^{-32}\) rounds to 1. Even worse, all 128 floating-point values in \([2^{32}-128, 2^{32}-1]\) round to 1.

pbrt works around that problem by bumping any such 1s down to \(1-2^{-24}\), the last representable float before 1. That gets things back to \([0,1)\) but it’s a stinkiness in the code that in retrospect should have led to the algorithm used being given more attention.

One way to avoid this issue and many of the following is to set \(n=24\), in which case all values after division are valid float32 values and no rounding is required.² However, that gives slightly more than 16 million unique values; that’s a fair number of them, but there are actually a total of 1,065,353,216 float32 values in \([0,1)\)—nearly a quarter of all possible 32-bit floats. Under that lens, those 16 million seem rather few.

How much better do we do with \(n=32\)? Although we start out with over 4 billion distinct integer values, if you divide each by \(2^{-32}\) to generate samples in \([0,1)\) and count how many float32s are generated, it turns out that those four billion yield only 83,886,081 distinct floating-point values, or 7.87% of all of the possible ones between zero and one. Not only do we have multiple integer values mapping to the same floating-point value all the way from \(2^{-8}=0.00390625\) to 1, but between 0 and \(2^{-9}=0.001953125\), the spacing between floats is less than \(2^{-32}\) and many floating-point values are never generated.

There’s another problem that comes with the choice of \(n>24\), nicely explained in the paper Generating Random Floating-Point Numbers by Dividing Integers: A Case Study, by Frédéric Goualard. When the usual round-to-nearest-even is applied after dividing by \(2^{32}\), a systemic bias is introduced in the final values, clearly shown in that paper with examples that use low floating-point precision. Thus, it’s not just “we’re not making the most of what we’ve been given”, but it’s “the distribution isn’t actually uniform.”

The rounding problem is still evident with float32s and \(n=32\) bits; if we consider all \(2^{32}\) floating-point values, we would expect for example that all floats in \([0.5,1)\) would be generated the same number of times. (Indeed, we would expect 256 of each since we have \(2^{32}\) values, the float32 spacing in that interval is \(2^{-24}\), and \(2^{32} \cdot 2^{-24} = 256\).) However, if we count them up, it turns out that alternating floating-point values are generated 255 times and 257 times, all the way from 0.5 to 1. That happens in many other intervals, becoming its worst in the interval \([0.00390625, 0.0078125)\) where alternating values are generated one and three times.³

Depending on one’s application, all of these issues may be no problem in practice, and I wouldn’t make the argument that they are likely to cause errors in rendered images. Most of the time \(n=24\) and not worrying about it is probably fine. Yet IEEE has given us all that precision and it seems wasteful not to make use of it, if it isn’t too much trouble to do so…

Uniform Floats by Sampling a Geometric Distribution

What might be done about these problems? A remarkably elegant and efficient solution dates to 1974 with Walker’s paper Fast Generation of Uniformly Distributed Pseudorandom Numbers with Floating-Point Representation, which is based on the following observation (expressed here in terms of modern IEEE float32s):

In the interval \([1/2, 1)\), there are exactly \(2^{23}\) equally-spaced numbers that can be represented in float32.
In the interval \([1/4, 1/2)\), there are exactly \(2^{23}\) equally-spaced numbers that can be represented in float32.
In the interval \([1/8, 1/4)\), there are exactly \(2^{23}\) equally-spaced numbers that can be represented in float32.
And so on…

We would like an algorithm that can generate all of those numbers but does so in a way that gives a uniform distribution over \([0,1)\). Walker observed that this can be done in two steps: first by choosing an interval with probability according to its width, and then by sampling uniformly within its interval. (This algorithm is sometimes credited to Downey, who seems to have independently derived it in an unfinished paper from 2007.)

Because each interval’s width is half that of the one above it, sampling an interval corresponds to sampling a geometric distribution with \(p=1/2\). There’s thus an easy iterative algorithm to select an interval: one can first randomly choose to generate the sample within \([1/2, 1)\) with probability \(1/2\). Otherwise, sample within \([1/4,1/2)\) with probability \(1/2\) and so forth; bottom out if you hit the denorms. Given an interval, the exponent follows and a sample within an interval can be found by uniformly sampling a significand, since values within a given interval are equally-spaced.

Choosing an interval in that way takes only two iterations in expectation, but the worst case requires many more. The associated execution divergence is especially undesirable for processors like GPUs. Walker had another trick up his sleeve, however:

Pseudorandom integer numbers with a truncated geometric distribution may be obtained by counting consecutive 1s or 0s in a binary random number, drawn from a set having a uniform frequency distribution.

In other words, generate a random binary integer and, say, count the number of leading zero bits. Use that count to choose an interval, where zero leading zero bits has you sampling in \([1/2,1)\), one leading zero bit puts you in \([1/4,1/2)\), and so forth. Given an index \(i\) into the intervals that starts at 0, the exponent is then \(-1 - i\). Modern processors offer bit counting instructions that yield such counts, so this algorithm can be implemented very efficiently.

From Theory to Implementation

With float32, the floating-point exponent factors over the \([0,1)\) interval go from \(2^{-1}\) down to \(2^{-126}\) before the denorms start. Thus, 128 random bits may be required to choose the interval. However, those intervals start becoming so small that one’s commitment to possibly sampling every possible float might start to waver; the odds of making it to one of the tiny ones becomes vanishingly small.

A blog post by Marc Reynolds has all sorts of good insights on the efficient implementation of this algorithm. (More generally, Marc’s blog is full of great sampling and floating-point content; highly recommended.) He considers multiple approaches (for example, successively generating as many random 32-bit values as needed) and ends with a pragmatic compromise that takes a single 64-bit random value, uses 41 bits to choose the exponent, and uses the remaining 23 bits to sample the significand. The remaining \([0,2^{-40})\) interval is sampled uniformly. As long as an efficient count leading zeros instruction is used, it’s only slightly more work than multiplying by \(2^{-32}\) and clamping; in practice, most of the extra expense comes from needing to generate a 64-bit pseudorandom value rather than just a 32-bit one.

Conclusion

Unless you’re bottlenecked on sample generation, it’s worth considering using an efficient implementation of Walker’s algorithm to generate uniform random floating-point numbers over \([0,1)\). It’s not much more computation than the usual, it makes the most of what floating point offers, and it eliminates a minor source of bias. Plus, you get to exercise the bit counting instructions and feel like that much more of a hacker.

Next time we’ll look at uniformly sampling intervals of floating point numbers beyond \([0,1)\). After that, on to how low-discrepancy sampling interacts with some of the topics that came up today as well as some discussion about avoiding an unnecessary waste of precision when sampling exponential functions.

notes

In practice, one multiplies by \(2^{-32}\) since dividing by a power of two and multiplying by its reciprocal give the same result with IEEE floats. ↩
If I remember correctly, Petrik Clarberg explained the superiority of \(n=24\) over \(n=32\) in this context to me a few years ago; it’s a point that I underappreciated at the time. ↩
On a processor where changing the rounding mode is inexpensive, it is probably a good idea to select rounding down in this case. For example, in CUDA, the multiplication by \(2^{-32}\) might be performed using __fmul_rd(). ↩