faster expf128

Paul Zimmermann Paul.Zimmermann@inria.fr
Wed Jun 24 06:22:22 GMT 2020


       Dear Paul,

thank you for your feedback.

> From: Paul E Murphy <murphyp@linux.ibm.com>
> Date: Mon, 22 Jun 2020 08:59:08 -0500
> 
> On 6/22/20 6:02 AM, Paul Zimmermann wrote:
> > I have written some expf128 for x86_64 that is more than 10 times faster than
> > the current glibc/libquadmath code [1] (see slide 21 of [2]).
> 
> I would highly recommend running the benchmarks against ppc64le or s390x 
> before replacing the existing implementation.  I think it would improve 
> the code to have more explicit separation between implementations 
> optimized for soft and hardfp if performance cannot be rectified.  I 
> think much of the float128 support assumes the underlying machine does 
> not natively support binary128.

I forgot to say my code is intended mainly for machines that do not provide
hardware float128 support. However I did compare with the glibc
expf128 on gcc135.fsffrance.org (ppc64le GNU/Linux) and below are the
results. You can reproduce them with the code from [1]. We see that
my implementation is about 27% faster, but slightly less accurate
(999585 instead of 999999 correct rounding over 1000000). One caveat
though: I did not find how to efficiently set the inexact flag, thus
it is not set in my code.

glibc function (with hardware float128):

[zimmerma@gcc135 ~]$ /opt/at12.0/bin/gcc -DUSE_GLIBC -DNO_WARN_X86_INTRINSICS -O3 main.c expf128.c -lm -lmpfr -lgmp
[zimmerma@gcc135 ~]$ ./a.out 
GNU libc version: 2.28
GNU libc release: stable
correct roundings: 999999/1000000 max err=1 ulp(s)
maximal error for
x=-4.2166924211009987727735597908208042e+00
y=1.47473419221889191873789731438093288e-02
z=1.47473419221889191873789731438093303e-02

[zimmerma@gcc135 ~]$ /opt/at12.0/bin/gcc -DTIMINGS -DUSE_GLIBC -DNO_WARN_X86_INTRINSICS -O3 main.c expf128.c -lm -lmpfr -lgmp
[zimmerma@gcc135 ~]$ time ./a.out 
GNU libc version: 2.28
GNU libc release: stable
s=1.09651217175878924483994909720534935e+09

real	0m0.195s
user	0m0.194s
sys	0m0.000s

my implementation:

[zimmerma@gcc135 ~]$ /opt/at12.0/bin/gcc -DNO_WARN_X86_INTRINSICS -O3 main.c expf128.c -lm -lmpfr -lgmp
[zimmerma@gcc135 ~]$ ./a.out 
correct roundings: 999585/1000000 max err=1 ulp(s)
maximal error for
x=-9.88703896394271837099996910948152675e+00
y=5.08292305698879224291515174794000669e-05
z=5.08292305698879224291515174794000728e-05

[zimmerma@gcc135 ~]$ /opt/at12.0/bin/gcc -DTIMINGS -DNO_WARN_X86_INTRINSICS -O3 main.c expf128.c -lm -lmpfr -lgmp
[zimmerma@gcc135 ~]$ time ./a.out 
s=1.09651217175878924483994909720534935e+09

real	0m0.143s
user	0m0.142s
sys	0m0.000s

> > Before making a proper patch for glibc, I'd like to make sure it fits the
> > glibc requirements. In particular, the table size is 16kb. Is that ok?
> > If too large, what table size would be ok?
> 
> I think that is acceptable.  The current tables for expf128 probably 
> aren't much smaller, if I recall correctly.

ok, then I will prepare a patch, once glibc 2.32 is out.

Best regards,
Paul

[1] https://homepages.loria.fr/PZimmermann/glibc-contrib/



More information about the Libc-alpha mailing list