This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 1/9 v3] Optimized generic expf and exp2f with wrappers


On 19/09/17 13:52, Szabolcs Nagy wrote:
> Based on new expf and exp2f code from
> https://github.com/ARM-software/optimized-routines/
> 
> with the new expf benchmark (with wrapper, aarch64):
> reciprocal-throughput: 2.3x faster
> latency: 1.7x faster
> with the new expf benchmark (without wrapper, aarch64):
> reciprocal-throughput: 3.3x faster
> latency: 1.7x faster
> with naive ubenchmark (without wrapper, aarch64):
> reciprocal-throughput: 3.7x faster
> latency: 1.3x faster

with the committed exp2f benchmark it's

reciprocal-throughput: 2.8x faster
latency: 1.3x faster

> libm.so size on aarch64:
> .text size: -152 bytes
> .rodata size: -1740 bytes
> expf/exp2f worst case nearest rounding error: 0.502 ulp
> worst case non-nearest rounding error: 1 ulp
> 
> Error checks are inline and errno setting is in separate
> tail called functions, but the wrappers are kept in this
> patch to handle the _LIB_VERSION==_SVID_ case.  (So e.g.
> errno is set twice for expf calls and once __expf_finite
> calls now on targets where the new code is used.)
> 
> Double precision arithmetics is used which is expected
> to be faster on most targets (including soft-float) than
> using single precision and it is easier to get good
> precision result with it.
> 
> Const data is kept in a separate translation unit which
> complicates maintenance a bit, but is expected to give
> good code for literal loads on most targets and allows
> sharing data across expf, exp2f and powf. (This data is
> disabled on i386 and ia64 which have their own expf, exp2f
> and powf code.)
> 
> Some configuration is in a new math_config.h the settings
> may need further discussion.
> 
> Some details may need target specific tweaks:
> - best convert and round to int operation in the arg
> reduction may be different across targets.
> - code was optimized on fma target, optimal polynomial
> eval may be different without fma.
> - gcc does not always generate good code for fp bit
> representation access via unions or it may be inherently
> slow on some target.
> 
> The libm-test-ulps will need adjustment because..
> - The argument reduction ideally uses nearest rounded rint,
> but that is not efficient on most targets, so the polynomial
> can get evaluated on a wider interval in non-nearest
> rounding mode making 1 ulp errors common in that case.
> - The polynomial is evaluated such that it has 1 ulp error
> on negative tiny inputs with upward rounding, but in
> exchange the evaluation is better pipelined.
> 
> v3:
> - Add sysdeps/m68k/m680x0/fpu/{math_errf/e_exp2f_data}.c
> 
> 2017-09-19  Szabolcs Nagy  <szabolcs.nagy@arm.com>
> 
> 	* math/Makefile (type-float-routines): Add math_errf and e_exp2f_data.
> 	* sysdeps/aarch64/fpu/math_private.h (TOINT_INTRINSICS): Define.
> 	(roundtoint, converttoint): Likewise.
> 	* sysdeps/ieee754/flt-32/e_expf.c: New implementation.
> 	* sysdeps/ieee754/flt-32/e_exp2f.c: New implementation.
> 	* sysdeps/ieee754/flt-32/e_exp2f_data.c: New file.
> 	* sysdeps/ieee754/flt-32/math_config.h: New file.
> 	* sysdeps/ieee754/flt-32/math_errf.c: New file.
> 	* sysdeps/ieee754/flt-32/t_exp2f.h: Remove.
> 	* sysdeps/i386/fpu/e_exp2f_data.c: New file.
> 	* sysdeps/i386/fpu/math_errf.c: New file.
> 	* sysdeps/ia64/fpu/e_exp2f_data.c: New file.
> 	* sysdeps/ia64/fpu/math_errf.c: New file.
> 	* sysdeps/m68k/m680x0/fpu/e_exp2f_data.c: New file.
> 	* sysdeps/m68k/m680x0/fpu/math_errf.c: New file.
> 
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]