This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
On 12/11/2017 2:14 AM, Siddhesh Poyarekar wrote:
On Saturday 09 December 2017 04:33 AM, Patrick McGehearty wrote:
+/* IBM exp(x) replaced by following exp(x) in 2017. IBM exp1(x,xx) remains. */
+ Hybrid algorithm of Peter Tang's Table driven method (for large
+ arguments) and an accurate table (for small arguments).
+ Written by K.C. Ng, November 1988.
+ Revised by Patrick McGehearty, Nov 2017 to use j/64 instead of j/32
+ Method (large arguments):
+ 1. Argument Reduction: given the input x, find r and integer k
+ and j such that
+ x = (k+j/64)*(ln2) + r, |r| <= (1/128)*ln2
+ 2. exp(x) = 2^k * (2^(j/64) + 2^(j/64)*expm1(r))
+ a. expm1(r) is approximated by a polynomial:
+ expm1(r) ~ r + t1*r^2 + t2*r^3 + ... + t5*r^6
+ Here t1 = 1/2 exactly.
+ b. 2^(j/64) is represented to twice double precision
+ as TBL[2j]+TBL[2j+1].
+ Note: If divide were fast enough, we could use another approximation
+ in 2.a:
+ expm1(r) ~ (2r)/(2-R), R = r - r^2*(t1 + t2*r^2)
+ (for the same t1 and t2 as above)
+ Special cases:
+ exp(INF) is INF, exp(NaN) is NaN;
+ exp(-INF)= 0;
+ for finite argument, only exp(0)=1 is exact.
+ According to an error analysis, the error is always less than
+ an ulp (unit in the last place). The largest errors observed
+ are less than 0.55 ulp for normal results and less than 0.75 ulp
+ for subnormal results.
+ Misc. info.
+ For IEEE double
+ if x > 7.09782712893383973096e+02 then exp(x) overflow
+ if x < -7.45133219101941108420e+02 then exp(x) underflow. */
Are you planning to work on the log implementation as well?
log, log10, pow, cbrt are on my short list of functions to investigate
as these all show significant performance advantage (1.8x or greater)
using the Solaris Studio libm functions vs the Linux libm functions in
the preliminary testing I did months ago. I intend to take these one
at a time, first with a trial port and extensive accuracy and perf tests.
Assuming the performance advantage applies across multiple platforms
and accuracy does not suffer to an unacceptable degree, I will
propose each in turn for patching. "log" is my next target, likely
with log10 following close behind. I'll also look at the 32 bit functions
to see if they offer similar opportunities, but I haven't written/run those
tests yet. I don't have plans to work on 80 bit or 128 bit (long double)
functions at this time.
As always, the above is not a formal commitment as my management
may redirect efforts to meet other corporate goals.
My background project right now is supporting Vladimir Mezentsev's
work on improving accuracy and range of complex divide.
By range, I mean the range of input values which currently cause
overflow/underflow but don't need to with appropriate scaling.