Sampling in Floating Point (2/3): 1D Intervals

After learning about Walker’s algorithm for uniformly sampling \([0,1)\) in floating-point, I started thinking about how his approach might be generalized to arbitrary intervals; being able to uniformly sample any interval while potentially generating all possible floating-point values inside it would certainly be a nice tool to add to the toolbox.

The good news is that it was a fun thought exercise with things learned along the way. In the end I found enough insights to come up with a solution. However, upon further digging I found two previous implementations of the approach I came up with. So much for a tour de force paper describing my findings in ACM Transactions on Modeling and Computer Simulation. They are:

Christoph Conrads’s Rademacher Floating Point Library, which dates to 2018. See the make_uniform_random_value() function there.
Olaf Bernstein’s dist_uniformf_dense() which seems to date to 2021.

(I’d be interested to hear if there are any earlier instances of it.)

Nevertheless, I still thought it might be useful to write up some of the motivation, walk through my route to a solution, and explain some of the subtleties.

What’s Wrong With Linear Interpolation?

If you want to sample a value in \([a,b)\) for arbitrary \(a\) and \(b\), the usual thing is to take a uniform value in \([0,1)\) and use it to linearly interpolate between \(a\) and \(b\). Last time we saw how to compute a gold-standard uniform floating-point value in \([0,1)\), so why not use that to interpolate? Needless to say, that works in practice, but once again, there are some subtleties.

First one must choose an equation with which to linearly interpolate. For an interval \([a,b)\) with interpolation parameter \(t\), are two common choices: \(a+t (b-a)\) and \((1-t)a +t b\). Each has its own strengths and weaknesses.

The first, \(a + t (b-a)\), requires fewer operations but has the disadvantage that even if \(t \in [0,1)\), it is not guaranteed that the interpolated value will be in \([a,b)\). For example, with \(a=2.5\), \(b=8.87385559\), and \(t=1-2^{-24}\), the last floating-point value before 1, then with float32, we find that \[ 2.5 + (1 - 2^{-24}) (8.87385559 - 2.5) \rightarrow 8.87385559; \] the result is equal to the upper bound even though \(t<1\). A similar problem occurs with the closed interval \([a,b]\): the value \(t=1\) can yield value that is greater than \(b\).

In graphics, respecting intervals is important since we often do things like bound vertex positions at different times, linearly interpolate them, and then want to be able to assert that the result is inside the bounds. In this case, \((1-t)a+tb\) is preferable, since \(t=0\) always yields \(a\) and \(t=1\) gives \(b\). However, that formulation has the surprising shortcoming that increasing \(t\) sometimes causes the interpolated value to move backwards. Consider this interpolant with \(a=2.5\) and \(b=10.53479\): \[ (1-t) \cdot 2.5 + t \cdot 10.53479. \] With \(t=0.985167086\), the interpolant gives 10.4156113. Moving \(t\) up to the next possible floating-point value, 0.985167146, the interpolant’s value is reduced, down to 10.4156103. The rounding on the terms has gone differently with that small change in \(t\) and it’s (slightly) downhill from there. In practice, these little wobbles are unlikely to cause trouble, though they do mean that an assertion that implicitly assumes that the interpolant is monotonic may fail for fairly obscure reasons.

Both of these approaches also suffer from a minor bias for reasons similar to why dividing a random 32-bit value by \(2^{-32}\) to generate a uniform random variable led to a bias in sampled floats: each \(t\) value maps to a single floating-point value and if there are more of one than the other, rounding may introduce a minor non-uniformity. (The numerics people like to say this is due to the pigeonhole principle. I can’t say that is incorrect, but I like to think of it in terms of aliasing: it’s taking something at one frequency it and then resampling it at another—things always get a little messy when you do that and aren’t careful.)

For more on the above, including an entertaining review of how linear interpolation is implemented in assorted programming languages’ standard libraries and assorted misstatements in their documentation about the characteristics of the expected results, see Drawing random floating-point numbers from an interval, by Frédéric Goualard.¹

A Few Utility Functions

Before we go further, let’s specify a few utility functions that will be used in the forthcoming implementations. (All of the following code is available in a small header-only library.)

For efficient sampling of the \(p=1/2\) geometric distribution, a function that uses native bit counting instructions will be useful:

int CountLeadingZeros(uint64_t value);

We’ll assume the existence of two functions that provide uniform random values.

uint64_t Random64Bits();
uint32_t Random32Bits();

C++20’s std::bit_cast() function makes it easy to convert from a float32 to its bitwise representation and back.

uint32_t ToBits(float f) { return std::bit_cast<uint32_t>(f); }
float FromBits(uint32_t b) { return std::bit_cast<float>(b); }

A few helper functions extract the pieces of a float32. (If necessary, see the Wikipedia page for details of the in-memory layout of float32s to understand what these are doing.) Zero is returned by SignBit() for positive values and one for negative. Because the float32 exponent is stored in memory in biased form as an unsigned value from zero to 255, Exponent() returns the unbiased exponent, which ranges from \(-126\) to 127 for normal floating-point values, with \(-127\) reserved for zero and the denormalized floats.

int SignBit(float f) { return ToBits(f) >> 31; }
int Exponent(float f) { return ((ToBits(f) >> 23) & 0xff) - 127; }
constexpr int SignificandMask = (1 << 23) - 1;
int Significand(float f) { return ToBits(f) & SignificandMask; }

We’ll also find it useful to be able to generate uniform random significands.

uint32_t RandomSignificand() { return Random32Bits() & SignificandMask; }

Float32FromParts() constructs a float32 value from the specified pieces. The assertions document the requirements for the input parameters.

float Float32FromParts(int sign, int exponent, int significand) {
    assert(sign == 0 || sign == 1);
    assert(exponent >= -127 && exponent <= 127);
    assert(significand >= 0 && significand < (1 << 23));
    return FromBits((sign << 31) | ((exponent + 127) << 23) | significand);
}

A positive power-of-two float32 value can be constructed by shifting the biased exponent into place.

float FloatPow2(int exponent) {
    assert(exponent >= -126 && exponent <= 127);
    return FromBits((exponent + 127) << 23);
}

Expressed with these primitives, Reynolds’s pragmatic compromise algorithm for uniformly sampling in \([0,1)\) is:

float Sample01() {
    uint64_t bits = RandomBits64();
    int significand = bits & SignificandMask;
    if (int lz = CountLeadingZeros(bits); lz <= 40)
        return Float32FromParts(0, -1 - lz, significand);
    return 0x1p-64f * significand;
}

First Steps

Back to our original question: can we do better than linear interpolation to sample an arbitrary interval, and more specifically, is it possible to generalize Walker’s algorithm to remove the limitation of sampling over \([0,1)\)? I had no idea how to go all the way from the original algorithm to arbitrary intervals so I started with a few small thought experiments to chip away at the edges of the problem an improve my intuition. (In all of the following, I’ll assume a half-open interval \([a,b)\) that does not span zero; we’ll come back to the generalizations of a closed interval \([a,b]\) and intervals that span zero at the end.)

An easy first case is to consider intervals that start at zero and end with an arbitrary power of two. I first took the smallest step possible and thought about \([0,2)\). Indeed, Walker’s approach just works; there’s nothing in it that requires the upper bound to be 1; we can apply the same idea of starting with the upper \([1,2)\) interval, randomly selecting it with probability 1/2 and otherwise continuing down intervals until one is chosen or we hit the denorms. There’s an easy first victory.

Have we missed anything? We should be careful. To further validate this direction, consider the case where we have a tiny power-of-two sized interval, say \([0,2^{-124})\). The minimum exponent for normal numbers is \(-126\), so we have just two regular power-of-two sized intervals and then the denorms. Here’s how that looks on the floating-point number line with valid floats marked with hashes and a 3-bit significand to keep the figure scrutable:

We sample the \([2^{-125}, 2^{-124})\) interval with probability 1/2 and otherwise sample \([2^{-126}, 2^{-125})\) with probability 1/2. If neither is selected, we uniformly sample the denorms which are between \([0,2^{-126})\). This extreme case helps us better understand how the edge case of the denorms is handled: because the width of the last interval of normal floats and the width of the denorms is equal, choosing between them with equal probability leads to uniform sampling of the full interval.

On this topic, Walker wrote:

In practice, the range of the exponent will be limited, and the probability of the number falling into either of the two smallest intervals will be the same.

Denormalized numbers were invented after his paper so it seems that this was a minor fudge in his original approach, corrected today by advances in floating-point.

Here is a function to sample a float32 exponent for this case, taking 64 random bits at a time and counting zeros to sample the distribution. An exponent is returned either if a one bit is found in the random bits or if enough zero bits have been seen to make it to the denorms. In either case, if the denorms have been reached, \(-127\) is returned so that a denormalized or zero floating-point value results.

int SampleToPowerOfTwoExponent(int exponent) {
    assert(exponent >= -127 && exponent <= 127);
    while (exponent > -126) {
        if (int lz = CountLeadingZeros(Random64Bits()); lz == 64)
            exponent -= 64;
        else
            return std::max(-127, exponent - 1 - lz);
    }
    return -127;
}

Given SampleToPowerOfTwoExponent(), the full algorithm to uniformly sample an interval \([0,2^x)\) is simple.

float SampleToPowerOfTwo(int exponent) {
    int ex = SampleToPowerOfTwoExponent(exponent);
    return Float32FromParts(0, ex, RandomSignificand());
}

An implementation that uses a fixed number of random bits can be found with a straightforward generalization of Reynolds’s “pragmatic” algorithm that always consumes only 64 random bits, though there are two differences compared to the \([0,1)\) case. First, if the initial interval is \([0,2^{-88})\) or smaller, then the 41 bits remaining after the significand is extracted from the 64-bit random value are more than are needed to consider for all of the possible power-of-two intervals. In that case, we need to be careful to finish in the denorms rather than trying to construct a float32 with an invalid exponent. Clamping the exponent at \(-127\) takes care of this.

The second difference is that if all of the bits used to select an interval are zero, then if the initial exponent is \(x\), then the remaining interval that we will sample using equal spacing is \([0,2^{x-41})\). Given a 23-bit significand \(s\), the sampled value is then \[ \frac{s}{2^{23}} 2^{x-41}. \] It is tempting to merge the division by \(2^{23}\) and the multiplication by \(2^{x-41}\) into a single constant, though doing so would lead to underflow when \(x < -63\). (Reynolds’s algorithm for \([0,1)\) could just multiply the significand by \(2^{-64}\) in this case since \(x\) was always 0 and there were no concerns about underflow.)

float SampleToPowerOfTwoFast(int exponent, uint64_t bits) {
    int significand = bits & SignificandMask;
    int lz = CountLeadingZeros(bits);
    if (lz == 41 && exponent - 41 > -127)
        return significand * 0x1p-23f * FloatPow2(exponent - 41);
    int ex = exponent - 1 - lz;
    return Float32FromParts(0, std::max(-127, ex), significand);
}

Another easy case comes with an interval where both endpoints have the same exponent. In that case, the spacing between them is uniform and a value can be sampled by randomly sampling a significand between theirs. That setting is shown on the floating-point number line below; the values marked in red are the possible results, depending on the significand.

The code is easy to write given a RandomInt() function that returns a uniform random integer between 0 and the specified value, inclusive:

float SampleSameExponent(float a, float b) {
    assert(a < b && Exponent(a) == Exponent(b));
    int sa = Significand(a), sb = Significand(b);
    int sig = sa + RandomInt(sb - sa - 1);
    return Float32FromParts(SignBit(a), Exponent(a), sig);
}

Arbitrary (Power of 2) Lower Bounds

The easy successes stop coming when we consider intervals with a power-of-two value at their lower bounds: say that we’d like to sample uniformly over \([1,7)\). Our intervals are \([1,2)\), \([2,4)\), and \([4,7)\). Their respective widths are 1, 2, and 4; the sampling probabilities are \(1/7\), \(2/7\), and \(4/7\). So much for a nice geometric distribution with \(p=1/2\). The setting is illustrated below:

Here we most definitely see the importance of the denorms and the last power-of-two sized interval of normal floating-point numbers having the same width. With a power-of-two interval that ends above zero, we no longer have two intervals at the end that should be sampled with the same probability and things fall apart.

Upon reaching this realization, I had no idea how to proceed; I feared that the cause might be lost. Lacking any other ideas, I wondered if it would work to apply Walker’s approach still with the probability \(1/2\) of sampling each interval but then cycling around when one goes past the lower interval, along these lines:

With this method, the probability of sampling the \([4,7)\) interval is then \(1/2\) the first time around. With \(1/8\) probability we cycle back around for another \(1/2\) chance, and so forth. We have: \[ \frac{1}{2} + \frac{1}{8} \frac{1}{2} + \cdots = \frac{1}{2} \sum_{i=0}^\infty \frac{1}{8^i} = \frac{4}{7} \] Success! (Needless to say, the other intervals work out with the desired probabilities as well.)

SampleExponent() implements the algorithm that consumes random bits until it successfully samples such a single power-of-two interval.

int SampleExponent(int emin, int emax) {
    int c = 0;
    while (true) {
        if (int lz = CountLeadingZeros(Random64Bits()); lz == 64)
            c += 64;
        else
            return emax - 1 - ((c + lz) % (emax - emin));
    }
}

If emin and emax are not known at compile time, computing the integer modulus in SampleExponent() may be expensive. Because the maximum value of emax-emin is 253, it may be worthwhile to maintain a table of constants for use with an efficient integer modulus algorithm (see e.g., Lemire et al. 2021.)

With SampleExponent() in hand, the full algorithm is straightforward.

// Sample uniformly and comprehensively in [2^emin, 2^emax).
float SampleExponentRange(int emin, int emax) {
    assert(emax > emin);
    int significand = RandomSignificand();
    return Float32FromParts(0, SampleExponent(emin, emax), significand);
}

For a “pragmatic” version of this algorithm that uses a fixed number of random bits, we could take the number of leading zeros modulo the number of power-of-two sized intervals under consideration to choose an interval and then use a uniform random significand. However, in the rare case where all of the bits used to sample an interval are zero, the remaining interval is of the form \([2^a, 2^c)\) where \(c=41 \mod (b-a)\); we’ve used up all of our random bits to be faced with the same general problem we started with (unless it happens that \(c=a+1\).) At that point we might use linear interpolation to sample the remaining interval, though that’s admittedly unsatisfying, as linear interpolation is the thing we’re trying to avoid.

Partial Intervals at One or Both Ends

With that we finally have enough to return to the original task, uniformly and comprehensively sampling an arbitrary interval \([a,b)\). This is, unfortunately, the point at which I haven’t been able to figure out a reasonable “pragmatic” implementation that uses a small and fixed number of random bits. The figure below shows the general setting; as before, the valid candidate values are marked in red.

An approach based on rejection sampling can be used to sample the specified interval: the idea is that we will sample all of the possible intervals as before, with probability according to their width. Then we uniformly sample a significand in the chosen interval and then accept the value if it is within \([a,b)\). For the power-of-two intervals in the middle, we will always accept the sample, and for the intervals on the ends, the probability of acceptance is proportional to how much of the power-of-two interval overlaps \([a,b)\).

The implementation isn’t much code given all the helpers we’ve already defined, though there are two important details. First, the upper exponent is bumped up by one if b’s significand is non-zero. To understand why, consider the difference between sampling the intervals \([a, 8)\) and \([a, 8.5)\). In the former case, we will never need to consider an exponent of 3, but for the later case, we must. Second, the algorithm used for sampling exponent must account for whether zero or the denorms are included in \([a,b)\); this corresponds to the differences we saw earlier in how to sample intervals like \([0,2^x)\) versus \([2^x,2^y)\).

float SampleRange(float a, float b) {
    assert(a < b && a >= 0.f && b >= 0.f);
    int ea = Exponent(a), eb = Exponent(b);
    if (Significand(b) != 0) ++eb;
    while (true) {
       int e = (ea == -127) ? SampleToPowerOfTwoExponent(eb) :
                              SampleExponent(ea, eb);
       float v = Float32FromParts(0, e, RandomSignificand());
       if (v >= a && v < b)
           return v;
    }
}

Note that it would probably be worthwhile to handle the special case of matching exponents with a call to SampleSameExponent(), as rejection sampling with a significand that spans the entire power-of-two range will be highly inefficient if the two values are close together.

The worst case for this algorithm comes when b’s significand is small—i.e., b is just past a power-of-two. The upper power-of-two range will be sampled with probability at least 1/2 but then the sampled value v will usually be rejected, requiring another time through the while loop. Conversely, having a just below a power of 2 is less trouble, since the corresponding power-of-two interval is the least likely to be sampled.

Closed Intervals

One nice thing about the algorithm implemented in SampleRange() is that handling closed intervals \([a,b]\) is mostly a matter of updating the if test in the while loop accordingly. The only other difference is that eb is always be increased by one. Thus, the worst case for this version of the algorithm is when b is an exact power of 2, again giving a \(1/2\) chance of selecting the upper interval each time, with a \(1-2^{-23}\) probability of rejecting the sample in that interval.

Further Improvements

Stratified sampling is a topic we didn’t get to today; it is often desirable when one is generating multiple samples over an interval. For a power-of-2 stratification, it’s possible to work backward from the various sampling algorithms to determine constraints on the bit patterns the achieve stratification. I’ll leave the details of that to the reader; Sample01() is a good place to start.

We also haven’t dug into the case of an interval that spans zero. To achieve uniform sampling a similar rejection-based approach is probably needed where given such an interval \([a,b)\) we define an extended interval \([-c,c)\) with \(c=\max (|a|, |b|)\) that encompasses the original interval. We can then randomly select the positive or negative side, generate a sample, and then reject it if it is not inside the original interval. However, the combination of an unbalanced interval that spans zero and also includes an exact power of two at its upper bound gives an even worse worst case: consider a highly unbalanced interval like \([-2^{-100}, 2^{64}]\): we end up with a nearly \(3/4\) chance of rejecting each candidate sample.

Discussion

It was pretty good going through the special cases until we reached the end. Unfortunately, I don’t see a good way to work around the need to do rejection sampling when there are partial power-of-two intervals at the ends of the range. Perhaps that isn’t the worst thing ever, but having an irregular amount of computation afoot is not ideal if high performance on GPUs or using SIMD instructions is of interest.

Nevertheless, a quick benchmark suggests that SampleRange() is only about 2.5 times slower than \((1-t)a+tb\) on my system here if the cost of random number generation is included. If a reasonable amount of computation is performed for each sample, the added cost may be no concern. However, lacking a clear example of a case where this first-class sampling makes a difference in the final results, it’s hard to argue for the added expense in general.

note

Goualard also suggests a sampling algorithm based on a uniform spacing over the interval that is by design not able to generate all possible floating-point values. This algorithm seems to have been previously derived by Artur Grabowski in his random-double library from 2015; see rd_positive() there. ↩