The transcendental functions in glibc were written by IBM as part of their ultimate math library implementation. The code seems to be based on [SGBB] and [AZ], but there are some differences in the paper and the implementation in code. Since the code has not been adequately commented, someone relatively new to this code has to start from scratch, trying to understand the rationale behind various constants chosen or even the choice of some operations. This document is a (WIP) description of findings to make things easier. The document also details some findings that may be interesting for future improvements.
The exp function
The exp function is executed in two phases, the first being implemented as described in [SGBB] (well, almost) and the second one being a multiprecision calculation as suggested in [AZ].
Overall the logic differentiates between the following domains of input values:
Exponents less than 2-55 and numbers larger than 748. The former is approximated to 0 while the larger is bound to overflow.
From -748 to -708.000977, 708.000977 to 748 and everything in between, excluding the range between -2-55 and 2-55. The core algorithm is largely the same for these domains, with some minor differences near the end of the algorithm.
The core algorithm
[SGBB] says that the accurate tables are built with:
Xi = i/512 - ei
fi = exp(Xi)
However, the code has two levels of tables called coarse and fine (defined in uexp.tbl) and it seems to have a fairly sophisticated hash function to index into it. We look at the table lookup later.
The mathematical basis for the computation (from [SGBB]):
exp(x) = 2(x/ln(2)) = 2bexp * exp(t)
- t = x - bexp * ln(2),
bexp = INT(x / ln(2))
Now, x/ln(2) = x * log2(e), so we can simply calculate n as:
bexp = INT(x * log2(e))
- t = x - bexp * ln(2),
In the code, rounding is done by adding 1.5*252 and subtracting it again. Also to maintain accuracy of computation in t, it is broken down as follows:
t = x - bexp * ln1(2) - bexp * ln2(2)
where ln1(2) and ln2(2) are two components of ln(2), with ln1(2) having its 10 least significant bits zeroed out. The result is that x - bexp * ln,,1,,(2) is accurate. The term:
bexp * ln2(2)
is used as a correction term.
Now we massage the above equation to get the computation to a combination of a table lookup and a small enough value that can be approximated:
- Let i = INT(t*512)
and z = t - i/512 + ei
Hence, exp(x) = 2bexp * fi * exp(z) .......................[I]
While the formula above holds true, we have a two level table as opposed to the single table specified in the paper. As a result, the index into the tables are different, as shall be explained later. Also, the computation in the actual code does not proceed this way at all. Hence, this is the point where you toss the papers away and start looking only at the code.
t is now rounded to 2-18 (assigned to 'base') by adding and subtracting 1.5*234. The rounded t is used for getting the index into the accurate tables, as we shall see later. The remaining value, (i.e. t - base) is nothing but the z as described in the paper, except that it is used differently.
SGBB describes the final formula as:
exp(x) = 2bexp * fi * (1 + p(z) - eps - eps * p(z)) where:
exp(x) = 2bexp * fi * (1 + p(z))
can be deduced from [I] if exp(z) is approximated as a polynomial (1 + p(z)), adding in eps to adjust for the error. fi is obtained using our two level tables in our case.
However, what seems to be implemented is essentially:
exp(x) = 2bexp * fi * (1 + p(z - eps))
It's not clear what the advantage of that is or if it is equivalent.
The polynomial used is:
p(del) = p3 * del3 + p2 * del2 + del where del = z - eps
Once this is calculated, it only needs to be multiplied with the looked up value. This is where we come to the table lookup.
The y with 1.5*234 added is taken as input to index into two tables coarse and fine. The lower half of the number has all of the mantissa after rounding. The higher 23 bits are used to index into the coarse table while the lower 9 bits are used to index into the fine table. If the indexes are i and j respectively, the value is then obtained as:
(coarsei + coarsei+1) * (finei + finei+1)
The final value is multiplied by a binexp, which is essentially bexp computed in a strange manner. I'm not sure why the author decided to take the x * log2e+1.5*252 and take the lower 12 bytes to get the exponent. It basically amounts to the same thing as bexp within the operating bounds, unless I'm missing something.
TODO: Large negative numbers have some additional processing.
An error bound is calculated as the precision lost when computing fi * (1 + p(del)). If the error bound is effectively 0, i.e. it does not affect the output of the computation, then we have the final value and we can return. if not, the accurate phase is run in __slowexp(), which is a multiprecision phase.
The pow function
The pow function essentially computes xy as e(y * ln(x)). The first key point here is that it is very different from the crlibm pow_rn, which uses 2(y * log,,2,,(x)). The design decision seems to be to reuse the exp code to do exponentiation. There is an e_exp2.c, but it doesn't seem to have been part of the mathlib that the IBM folks wrote, so the intention must have been to avoid having to generate another lookup table. This also means that there might be a good reason to look at the possibility of using base 2 and use the exp2 tables we have available since we could massage the equation further to separate out the mantissa and exponent of the inputs, similar to the way crlibm works.
The __exp1 function used in the pow code is a replica of the exp function, with adjustments for the error bound and the delta to the input value.
The code works in three phases, with two different log functions and the third being the multiprecision accurate phase.
TODO: Describe log functions.
The first phase calculates the log of x and multiplies it with y. It sends the result to __exp1. If __exp1 is not able to calculate the result within the error bounds computed from the input error bounds (from log) and its internal bounds, it returns -10. The next phase calculates a more accurate log, resulting in a smaller error bound and sends it to __exp1. If even this fails, the accurate phase is called, i.e. the infamous slowpow.
The problem with this is the error bound for the log functions. The maximum error bound in the log functions is of the order of 10-21 (~ 2-70) and that is too small compared to the error bounds in the exp function for whom the calculated correction is of the order of 2-54. This is because the table value deltas (that form the major factor in the correction, along with the eps) is of the order of 2-32. eps is also of the order of 2-32.
Two things can be concluded due to this: Firstly, the second phase is useless since the smaller error bound even more useless than the first and secondly, the accuracy of the function pretty much depends on the accuracy of the exp() function.
[SGBB] An Accurate Elementary Mathematical Library for the IEEE Floating Point Standard
- - Shmuel Gal and Boris Bachelis, IBM Israel Science and Technology and Scientific Center
[AZ] Fast Evaluation of Elementary Mathematical Functions with Correctly Rounded Last Bit
- - Abraham Ziv, IBM Israel