This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] v11 Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
On 14/02/18 16:41, Joseph Myers wrote:
On Tue, 13 Feb 2018, Patrick McGehearty wrote:
Any thoughts on general principles on how to decide which patch
to accept, given both seem much more better than the existing code?
My understanding would be that Szabolcs intends (as per
<https://sourceware.org/ml/libc-alpha/2018-02/msg00061.html>) to eliminate
rounding mode changes from the present exp, and possibly make other
speedups there. Then the final result of such speedups would need
i won't have a patch that keeps the current algorithm just
removes the rounding mode change.
i'm now trying various approaches doing exp with < 0.51 worst case
ulp error and < 0.2% misroundings, and < 4K table size (i think these
are reasonable parameters and the proposed exp is similar too)
i think the rounding mode change can be eliminated from such an exp
with 1.0 ulp worst case non-nearest-rounding error (or may be 2).
i haven't yet dealt with the subnormal range: it seems the proposed
exp and my prototype one both have about 0.75 ulp error when the
result is between 0x1p-1023 and 0x1p-1022 (because the polynomial
has one rounding and then the final scaling does another rounding
right at the next bit, this can be fixed by doing the final add
of the polynomial differently, i'm not yet sure if it's worth fixing)
and i haven't yet looked at __exp1 (which should be probably moved to
e_pow.c and if it can share tables with exp then that should be in a
separate file), but i think it should be possible to do similarly.
comparing with a version of your patch that also eliminates rounding mode
changes (and updates libm-test-ulps expectations for other functions in
non-default rounding modes as needed to avoid introducing failures). It
would be best to have a precise statement of what "both my throughput and
latency benchmarks" are in
<https://sourceware.org/ml/libc-alpha/2018-02/msg00061.html>, to make sure
there is a common basis of comparison so we can see if it's really the
case that one version is faster on some architectures and another on other
architectures, or whether different people are measuring different things.