-
Notifications
You must be signed in to change notification settings - Fork 1
Binary Search
Binary search is a well-known search algorithm with the fastest worst-case convergence. On every iteration, half of all possible solutions are eliminated. When dealing with root-finding on floats, the bisection method is often synonymous with binary search, but we wish to dispel this misconception and inform users on the necessary precautions taken within pyroot to produce a more accurate representation of binary search, which significantly improves the worst case of several thousands of iterations to less than 300 iterations (less than 150 under defaults).
Most commonly root-finding is done within double precision (IEEE 754-2008), which uses 64 bits to represent a floating point number. If one bit of accuracy for this float could be recovered per iteration, then convergence can be guaranteed in no more than 64 iterations. Achieving this, however, requires an understanding of the structure of a double, which is often neglected in other root-finding software. Because of this, many other software can take several thousands of iterations to converge, whereas our software is guaranteed to converge in at most 300 iterations, a stark difference.
The format of a double comes in 3 parts.
- A single sign bit.
- 11 bits for the exponent.
- 52 bits for the mantissa a.k.a. precision or fraction.
Handling the sign bit can be easily done by checking x = 0 if 0 lies in the bracketing interval. All remaining iterations will lie on one side or the other.
Handling the exponent from there can be done using the geometric mean (GM). In short, a binary search over the exponent bits may be conducted until the relative error is sufficiently small i.e. until x / y > 0.5 with | x | < | y |.
Handling the mantissa from there can be done using the arithmetic mean (AM). In short, a binary search over the mantissa bits is conducted.
We remark that all of this can be avoided on a sufficiently low level where doubles can be interpreted as bits. In this case, it suffices to use the arithmetic mean when interpreting the bits as an integer. Because of this, it may be potentially much faster to use integer arithmetic than the case-by-case approach we use below.
The arithmetic mean, as used in bisection, is guaranteed to halve the absolute error. Its formula is given by
AM( x, y ) = 0.5 ( x + y )
and is quite easy to use. The pyroot.arithmetic_mean implementation takes additional care to avoid overflow when x and y share the same signs.
Intuitively, the arithmetic mean provides optimal worst-case convergence once the order of magnitude of the root is found.
The geometric mean is guaranteed to "halve" the relative error. Its formula is given by
GM( x, y ) = sign √( x y ), if sign( x ) = sign( y )
GM( x, y ) = 0, otherwise.
It's worth noting that the square root is used here, and depending on the application, we warn this may be a heavy arithmetic operation. We remark however that on a bit level the geometric mean may be approximated for floats without any square root involved, making it much more efficient. The pyroot.geometric_mean implementation takes additional care to avoid overflow or underflow when x and y are both very large or very small.
Intuitively, the geometric mean provides optimal worst-case convergence to the correct order of magnitude.
The geometric mean may be understood of as an arithmetic mean on a log-scale. The log-log mean may be understood of as an arithmetic mean on a log-log-scale, or a geometric mean on a log-scale. Its formula is given by
LM( x, y ) = sign exp(GM(log(| x |), log(| y |))) if sign( x ) = sign( y )
LM( x, y ) = 0, otherwise.
Even worse than the geometric mean, an exponential function, 2 log functions, and one square root are required for this computation. The upside is that, unlike the arithmetic mean and geometric mean, the log-log mean heuristically gives significantly faster convergence on extremely large intervals.
Intuitively, the log-log mean provides near worst-case convergence to the correct order of magnitude while additionally providing faster to convergence if the correct order of magnitude is near 1e0.
The generalized mean is a parameterized generalization of the arithmetic and geometric means. It allows the convergence speed to lie somewhat in-between that of the arithmetic and geometric means. Its formula is given by
GenM( x, y, order=0 ) = GM( x, y )
GenM( x, y, order=1 ) = AM( x, y )
GenM( x, y, order=r ) = (( x r + y r ) / 2) 1 / r, otherwise, where x r is taken as sign( x ) | x | r.
Similar to the log-log mean, the generalized mean is computationally expensive. Although not really necessary, test cases suggest including the generalized mean for 0 < r < 1 can be beneficial for avoiding unnecessary geometric or log-log means when the arithmetic mean is closer, especially when x or y is close to 0 or they have opposite signs. The pyroot.generalized_mean implementation also avoids overflow or underflow when x and y are both very large or very small.
The mean used in pyroot.solver is the generalized mean, except for order=0, where the log-log mean is used instead. In most practical cases, this improves the convergence speed. In the worst case, it only costs a few additional iterations.