This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 1/9 v3] Optimized generic expf and exp2f with wrappers
- From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
- To: GNU C Library <libc-alpha at sourceware dot org>
- Cc: nd at arm dot com
- Date: Wed, 20 Sep 2017 14:22:09 +0100
- Subject: Re: [PATCH 1/9 v3] Optimized generic expf and exp2f with wrappers
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Szabolcs dot Nagy at arm dot com;
- Nodisclaimer: True
- References: <59C1123F.9080003@arm.com> <59C11310.5000004@arm.com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
On 19/09/17 13:52, Szabolcs Nagy wrote:
> Based on new expf and exp2f code from
> https://github.com/ARM-software/optimized-routines/
>
> with the new expf benchmark (with wrapper, aarch64):
> reciprocal-throughput: 2.3x faster
> latency: 1.7x faster
> with the new expf benchmark (without wrapper, aarch64):
> reciprocal-throughput: 3.3x faster
> latency: 1.7x faster
> with naive ubenchmark (without wrapper, aarch64):
> reciprocal-throughput: 3.7x faster
> latency: 1.3x faster
with the committed exp2f benchmark it's
reciprocal-throughput: 2.8x faster
latency: 1.3x faster
> libm.so size on aarch64:
> .text size: -152 bytes
> .rodata size: -1740 bytes
> expf/exp2f worst case nearest rounding error: 0.502 ulp
> worst case non-nearest rounding error: 1 ulp
>
> Error checks are inline and errno setting is in separate
> tail called functions, but the wrappers are kept in this
> patch to handle the _LIB_VERSION==_SVID_ case. (So e.g.
> errno is set twice for expf calls and once __expf_finite
> calls now on targets where the new code is used.)
>
> Double precision arithmetics is used which is expected
> to be faster on most targets (including soft-float) than
> using single precision and it is easier to get good
> precision result with it.
>
> Const data is kept in a separate translation unit which
> complicates maintenance a bit, but is expected to give
> good code for literal loads on most targets and allows
> sharing data across expf, exp2f and powf. (This data is
> disabled on i386 and ia64 which have their own expf, exp2f
> and powf code.)
>
> Some configuration is in a new math_config.h the settings
> may need further discussion.
>
> Some details may need target specific tweaks:
> - best convert and round to int operation in the arg
> reduction may be different across targets.
> - code was optimized on fma target, optimal polynomial
> eval may be different without fma.
> - gcc does not always generate good code for fp bit
> representation access via unions or it may be inherently
> slow on some target.
>
> The libm-test-ulps will need adjustment because..
> - The argument reduction ideally uses nearest rounded rint,
> but that is not efficient on most targets, so the polynomial
> can get evaluated on a wider interval in non-nearest
> rounding mode making 1 ulp errors common in that case.
> - The polynomial is evaluated such that it has 1 ulp error
> on negative tiny inputs with upward rounding, but in
> exchange the evaluation is better pipelined.
>
> v3:
> - Add sysdeps/m68k/m680x0/fpu/{math_errf/e_exp2f_data}.c
>
> 2017-09-19 Szabolcs Nagy <szabolcs.nagy@arm.com>
>
> * math/Makefile (type-float-routines): Add math_errf and e_exp2f_data.
> * sysdeps/aarch64/fpu/math_private.h (TOINT_INTRINSICS): Define.
> (roundtoint, converttoint): Likewise.
> * sysdeps/ieee754/flt-32/e_expf.c: New implementation.
> * sysdeps/ieee754/flt-32/e_exp2f.c: New implementation.
> * sysdeps/ieee754/flt-32/e_exp2f_data.c: New file.
> * sysdeps/ieee754/flt-32/math_config.h: New file.
> * sysdeps/ieee754/flt-32/math_errf.c: New file.
> * sysdeps/ieee754/flt-32/t_exp2f.h: Remove.
> * sysdeps/i386/fpu/e_exp2f_data.c: New file.
> * sysdeps/i386/fpu/math_errf.c: New file.
> * sysdeps/ia64/fpu/e_exp2f_data.c: New file.
> * sysdeps/ia64/fpu/math_errf.c: New file.
> * sysdeps/m68k/m680x0/fpu/e_exp2f_data.c: New file.
> * sysdeps/m68k/m680x0/fpu/math_errf.c: New file.
>
>