-
Notifications
You must be signed in to change notification settings - Fork 12
Description
I chatted with Andrew at POPL and he asked me to file this issue, about a floating-point arithmetic trick that isn't verified/available in VCFloat today.
It is sometimes useful to know that the product of two floating-point numbers, or of a floating-point number and an integer, is exact if the numbers have many zeros at the start/end. For example, if you have a multiplication 3 * x, you know that 3 only has the last two bits set, so if x ends in a zero bit, the multiplication is exact. This trick is prominently used in math libraries, in a few ways.
One common use case is in Cody-Waite range reduction. This algorithm computes x % PI (or another constant) in several steps:
- First, divide
xbyPIand cast to integerk - Subtract
PI * k - To handle rounding error in
PI, have another constantPI_2(so thatPI_2is real PI minus float PI), and subtractPI_2 * k
This algorithms works only if the product PI * k is exact; otherwise, that step introduces a larger error than the PI_2 * k correction. To guarantee this, the PI constant doesn't use all 53 bits; instead, it uses, say, only 33 bits, and then k can be as large as 20 bits meaning that it works for |x| < 2^20 * PI. There's a tradeoff between the number of bits used for the PI constants and the number of bits used for k, and the implementor can choose values based on their desired performance tradeoffs for different inputs.
That case uses exact integer-float multiplication, but another use case of this exact multiplication is float-float. Consider acos in fdlibm, where (in the else branch) we compute:
s = sqrt(z);
df = s;
__LO(df) = 0;
c = (z-df*df)/(s+df);
Here the s and c variables are a double-double representation of the square root of z. To compute this, we first compute s directly and then correct for the error by squaring it, subtracting from z, and rescaling. But first, we define df which drops the lower half of s, so that the squaring step is done exactly. Then z - df*df is also exact through Sterbenz subtraction, which is important to preserving the error. If the multiply isn't exact, the error in df * df is as large as z - df*df, meaning that c is basically totally useless.
I'm not sure what it would take to add exact multiplication to VCFloat, but a POPL'18 paper on verifying math libraries tracks bit widths of various values and uses that as a verification condition for exact multiplies, perhaps something similar is possible.