[PATCH 00/17] Add more CORE-MATH on libm

Fri Oct 25 18:21:38 GMT 2024

Following the tgammaf implementation (392b3f0971764) and its telling
performance improvement, I worked with Pauz Zimmermann to check if we
can integrate more routines on glibc.

This patchset adds the optimized and correctly rounded exp10m1f,
exp2m1f, expm1f, log10f, log2p1f, log1pf, and log10p1f. I also added
a benchmark to evaluate each implementation.

I tested the implementation on recent hardware (Ryzen 9 5900X for
x86_64, Ampere/Neoverse for aarch64, and POWER10 for powerpc), and
most of the implementation shows impressive performance
improvements. Like the implementation from ARM optimized routines,
the CORE-MATH one takes advantage of recent ISA and platform support
(like fma and rounding instructions, along with FP throughpu).

For a couple of implementations, exp10m1f, and exp2m1f, CORE-MATH
shows slightly worse performance for x86_64-v1. It is due the glibc
generic implementation that calls optimized exp10f/exp2f, and when a
more recent ISA is used (x86_64-v2 or x86_64-v3) CORE-MATH shows a
better output than the current implementation. For both cases I added
iFUNC support to use FMA on x86_64. 

Adhemerval Zanella (17):
  math: Add e_gammaf_r to glibc code and style
  benchtests: Add exp10m1f benchmark
  benchtests: Add exp2m1f benchmark
  benchtests: Add expm1f benchmark
  benchtests: Add log10f benchmark
  benchtests: Add log2p1f benchmark
  benchtests: Add log1p benchmark
  benchtests: Add log10p1f benchmark
  math: Use exp10m1f from CORE-MATH
  math: Use exp2m1f from CORE-MATH
  math: Use expm1f from CORE-MATH
  math: Use log10f from CORE-MATH
  math: Use log2p1f from CORE-MATH
  math: Use log1pf from CORE-MATH
  math: Use log10p1f from CORE-MATH
  x86_64: Add exp10m1f with FMA
  x86_64: Add exp2m1f with FMA

 SHARED-FILES                                  |   16 +
 benchtests/Makefile                           |    7 +
 benchtests/exp10m1f-inputs                    | 2389 ++++++++++++++
 benchtests/exp2m1f-inputs                     | 2388 ++++++++++++++
 benchtests/expm1f-inputs                      |  799 +++++
 benchtests/log10f-inputs                      | 1005 ++++++
 benchtests/log10p1f-inputs                    | 2888 +++++++++++++++++
 benchtests/log1pf-inputs                      | 1005 ++++++
 benchtests/log2p1f-inputs                     | 2888 +++++++++++++++++
 sysdeps/aarch64/libm-test-ulps                |   29 +-
 sysdeps/alpha/fpu/libm-test-ulps              |   12 -
 sysdeps/arc/fpu/libm-test-ulps                |   25 -
 sysdeps/arc/nofpu/libm-test-ulps              |    7 -
 sysdeps/arm/libm-test-ulps                    |   31 +-
 sysdeps/csky/fpu/libm-test-ulps               |   12 -
 sysdeps/csky/nofpu/libm-test-ulps             |   12 -
 sysdeps/hppa/fpu/libm-test-ulps               |   28 -
 sysdeps/i386/fpu/e_log10f.S                   |   66 -
 sysdeps/i386/fpu/libm-test-ulps               |   25 -
 sysdeps/i386/fpu/s_expm1f.S                   |  112 -
 sysdeps/i386/fpu/s_log1pf.S                   |   66 -
 .../i386/i686/fpu/multiarch/libm-test-ulps    |   25 -
 sysdeps/ieee754/flt-32/e_gammaf_r.c           |  178 +-
 sysdeps/ieee754/flt-32/e_log10f.c             |  196 +-
 sysdeps/ieee754/flt-32/s_exp10m1f.c           |  227 ++
 sysdeps/ieee754/flt-32/s_exp2m1f.c            |  194 ++
 sysdeps/ieee754/flt-32/s_expm1f.c             |  232 +-
 sysdeps/ieee754/flt-32/s_log10p1f.c           |  182 ++
 sysdeps/ieee754/flt-32/s_log1pf.c             |  271 +-
 sysdeps/ieee754/flt-32/s_log2p1f.c            |  248 ++
 .../math_errf.c => ieee754/flt-32/w_log1pf.c} |    0
 sysdeps/loongarch/lp64/libm-test-ulps         |   28 -
 sysdeps/m68k/coldfire/fpu/libm-test-ulps      |    6 -
 sysdeps/m68k/m680x0/fpu/libm-test-ulps        |   12 -
 sysdeps/m68k/m680x0/fpu/w_log1pf.c            |   20 +
 sysdeps/microblaze/libm-test-ulps             |    3 -
 sysdeps/mips/mips32/libm-test-ulps            |   28 -
 sysdeps/mips/mips64/libm-test-ulps            |   28 -
 sysdeps/nios2/libm-test-ulps                  |    3 -
 sysdeps/or1k/fpu/libm-test-ulps               |    4 -
 sysdeps/or1k/nofpu/libm-test-ulps             |   12 -
 sysdeps/powerpc/fpu/libm-test-ulps            |   29 +-
 sysdeps/powerpc/nofpu/libm-test-ulps          |   28 -
 sysdeps/riscv/nofpu/libm-test-ulps            |   16 -
 sysdeps/riscv/rvd/libm-test-ulps              |   28 -
 sysdeps/s390/fpu/libm-test-ulps               |   28 -
 sysdeps/sh/libm-test-ulps                     |    6 -
 sysdeps/sparc/fpu/libm-test-ulps              |   28 -
 sysdeps/x86_64/fpu/libm-test-ulps             |   29 +-
 sysdeps/x86_64/fpu/multiarch/Makefile         |    4 +
 sysdeps/x86_64/fpu/multiarch/s_exp10m1f-fma.c |    4 +
 sysdeps/x86_64/fpu/multiarch/s_exp10m1f.c     |   33 +
 sysdeps/x86_64/fpu/multiarch/s_exp2m1f-fma.c  |    4 +
 sysdeps/x86_64/fpu/multiarch/s_exp2m1f.c      |   33 +
 54 files changed, 14873 insertions(+), 1104 deletions(-)
 create mode 100644 benchtests/exp10m1f-inputs
 create mode 100644 benchtests/exp2m1f-inputs
 create mode 100644 benchtests/expm1f-inputs
 create mode 100644 benchtests/log10f-inputs
 create mode 100644 benchtests/log10p1f-inputs
 create mode 100644 benchtests/log1pf-inputs
 create mode 100644 benchtests/log2p1f-inputs
 delete mode 100644 sysdeps/i386/fpu/e_log10f.S
 delete mode 100644 sysdeps/i386/fpu/s_expm1f.S
 delete mode 100644 sysdeps/i386/fpu/s_log1pf.S
 create mode 100644 sysdeps/ieee754/flt-32/s_exp10m1f.c
 create mode 100644 sysdeps/ieee754/flt-32/s_exp2m1f.c
 create mode 100644 sysdeps/ieee754/flt-32/s_log10p1f.c
 create mode 100644 sysdeps/ieee754/flt-32/s_log2p1f.c
 rename sysdeps/{m68k/m680x0/fpu/math_errf.c => ieee754/flt-32/w_log1pf.c} (100%)
 create mode 100644 sysdeps/m68k/m680x0/fpu/w_log1pf.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp10m1f-fma.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp10m1f.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp2m1f-fma.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp2m1f.c

-- 
2.43.0