[PATCH 00/17] Add more CORE-MATH on libm
Adhemerval Zanella
adhemerval.zanella@linaro.org
Fri Oct 25 18:21:38 GMT 2024
Following the tgammaf implementation (392b3f0971764) and its telling
performance improvement, I worked with Pauz Zimmermann to check if we
can integrate more routines on glibc.
This patchset adds the optimized and correctly rounded exp10m1f,
exp2m1f, expm1f, log10f, log2p1f, log1pf, and log10p1f. I also added
a benchmark to evaluate each implementation.
I tested the implementation on recent hardware (Ryzen 9 5900X for
x86_64, Ampere/Neoverse for aarch64, and POWER10 for powerpc), and
most of the implementation shows impressive performance
improvements. Like the implementation from ARM optimized routines,
the CORE-MATH one takes advantage of recent ISA and platform support
(like fma and rounding instructions, along with FP throughpu).
For a couple of implementations, exp10m1f, and exp2m1f, CORE-MATH
shows slightly worse performance for x86_64-v1. It is due the glibc
generic implementation that calls optimized exp10f/exp2f, and when a
more recent ISA is used (x86_64-v2 or x86_64-v3) CORE-MATH shows a
better output than the current implementation. For both cases I added
iFUNC support to use FMA on x86_64.
Adhemerval Zanella (17):
math: Add e_gammaf_r to glibc code and style
benchtests: Add exp10m1f benchmark
benchtests: Add exp2m1f benchmark
benchtests: Add expm1f benchmark
benchtests: Add log10f benchmark
benchtests: Add log2p1f benchmark
benchtests: Add log1p benchmark
benchtests: Add log10p1f benchmark
math: Use exp10m1f from CORE-MATH
math: Use exp2m1f from CORE-MATH
math: Use expm1f from CORE-MATH
math: Use log10f from CORE-MATH
math: Use log2p1f from CORE-MATH
math: Use log1pf from CORE-MATH
math: Use log10p1f from CORE-MATH
x86_64: Add exp10m1f with FMA
x86_64: Add exp2m1f with FMA
SHARED-FILES | 16 +
benchtests/Makefile | 7 +
benchtests/exp10m1f-inputs | 2389 ++++++++++++++
benchtests/exp2m1f-inputs | 2388 ++++++++++++++
benchtests/expm1f-inputs | 799 +++++
benchtests/log10f-inputs | 1005 ++++++
benchtests/log10p1f-inputs | 2888 +++++++++++++++++
benchtests/log1pf-inputs | 1005 ++++++
benchtests/log2p1f-inputs | 2888 +++++++++++++++++
sysdeps/aarch64/libm-test-ulps | 29 +-
sysdeps/alpha/fpu/libm-test-ulps | 12 -
sysdeps/arc/fpu/libm-test-ulps | 25 -
sysdeps/arc/nofpu/libm-test-ulps | 7 -
sysdeps/arm/libm-test-ulps | 31 +-
sysdeps/csky/fpu/libm-test-ulps | 12 -
sysdeps/csky/nofpu/libm-test-ulps | 12 -
sysdeps/hppa/fpu/libm-test-ulps | 28 -
sysdeps/i386/fpu/e_log10f.S | 66 -
sysdeps/i386/fpu/libm-test-ulps | 25 -
sysdeps/i386/fpu/s_expm1f.S | 112 -
sysdeps/i386/fpu/s_log1pf.S | 66 -
.../i386/i686/fpu/multiarch/libm-test-ulps | 25 -
sysdeps/ieee754/flt-32/e_gammaf_r.c | 178 +-
sysdeps/ieee754/flt-32/e_log10f.c | 196 +-
sysdeps/ieee754/flt-32/s_exp10m1f.c | 227 ++
sysdeps/ieee754/flt-32/s_exp2m1f.c | 194 ++
sysdeps/ieee754/flt-32/s_expm1f.c | 232 +-
sysdeps/ieee754/flt-32/s_log10p1f.c | 182 ++
sysdeps/ieee754/flt-32/s_log1pf.c | 271 +-
sysdeps/ieee754/flt-32/s_log2p1f.c | 248 ++
.../math_errf.c => ieee754/flt-32/w_log1pf.c} | 0
sysdeps/loongarch/lp64/libm-test-ulps | 28 -
sysdeps/m68k/coldfire/fpu/libm-test-ulps | 6 -
sysdeps/m68k/m680x0/fpu/libm-test-ulps | 12 -
sysdeps/m68k/m680x0/fpu/w_log1pf.c | 20 +
sysdeps/microblaze/libm-test-ulps | 3 -
sysdeps/mips/mips32/libm-test-ulps | 28 -
sysdeps/mips/mips64/libm-test-ulps | 28 -
sysdeps/nios2/libm-test-ulps | 3 -
sysdeps/or1k/fpu/libm-test-ulps | 4 -
sysdeps/or1k/nofpu/libm-test-ulps | 12 -
sysdeps/powerpc/fpu/libm-test-ulps | 29 +-
sysdeps/powerpc/nofpu/libm-test-ulps | 28 -
sysdeps/riscv/nofpu/libm-test-ulps | 16 -
sysdeps/riscv/rvd/libm-test-ulps | 28 -
sysdeps/s390/fpu/libm-test-ulps | 28 -
sysdeps/sh/libm-test-ulps | 6 -
sysdeps/sparc/fpu/libm-test-ulps | 28 -
sysdeps/x86_64/fpu/libm-test-ulps | 29 +-
sysdeps/x86_64/fpu/multiarch/Makefile | 4 +
sysdeps/x86_64/fpu/multiarch/s_exp10m1f-fma.c | 4 +
sysdeps/x86_64/fpu/multiarch/s_exp10m1f.c | 33 +
sysdeps/x86_64/fpu/multiarch/s_exp2m1f-fma.c | 4 +
sysdeps/x86_64/fpu/multiarch/s_exp2m1f.c | 33 +
54 files changed, 14873 insertions(+), 1104 deletions(-)
create mode 100644 benchtests/exp10m1f-inputs
create mode 100644 benchtests/exp2m1f-inputs
create mode 100644 benchtests/expm1f-inputs
create mode 100644 benchtests/log10f-inputs
create mode 100644 benchtests/log10p1f-inputs
create mode 100644 benchtests/log1pf-inputs
create mode 100644 benchtests/log2p1f-inputs
delete mode 100644 sysdeps/i386/fpu/e_log10f.S
delete mode 100644 sysdeps/i386/fpu/s_expm1f.S
delete mode 100644 sysdeps/i386/fpu/s_log1pf.S
create mode 100644 sysdeps/ieee754/flt-32/s_exp10m1f.c
create mode 100644 sysdeps/ieee754/flt-32/s_exp2m1f.c
create mode 100644 sysdeps/ieee754/flt-32/s_log10p1f.c
create mode 100644 sysdeps/ieee754/flt-32/s_log2p1f.c
rename sysdeps/{m68k/m680x0/fpu/math_errf.c => ieee754/flt-32/w_log1pf.c} (100%)
create mode 100644 sysdeps/m68k/m680x0/fpu/w_log1pf.c
create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp10m1f-fma.c
create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp10m1f.c
create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp2m1f-fma.c
create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp2m1f.c
--
2.43.0
More information about the Libc-alpha
mailing list