[PATCH] Inline C99 math functions
Wilco Dijkstra
wdijkstr@arm.com
Tue Jun 16 16:25:00 GMT 2015
> -----Original Message-----
> From: Joseph Myers [mailto:joseph@codesourcery.com]
> Sent: 15 June 2015 18:01
> To: Wilco Dijkstra
> Cc: GNU C Library
> Subject: RE: [PATCH] Inline C99 math functions
>
> On Mon, 15 Jun 2015, Wilco Dijkstra wrote:
>
> > > Where are the benchmarks for this? Please put them in benchtests so
> > > actual reproducible figures can be given. That's the standard practice
> > > for any change being justified on the basis of performance.
> >
> > I'll add a benchmark in another patch - it's not trivial as benchtest is not
> > suitable to accurately time very simple functions, especially when inlined...
>
> Well, the benchmark should come first....
I added a new math-inlines benchmark based on the string benchmark infrastructure.
I used 2x1024 inputs, one 99% finite FP numbers (20% zeroes) and 1% inf/NaN,
and the 2nd with 50% inf, and 50% Nan. Here are the relative timings for Cortex-A57:
__fpclassify_t: 8.76 7.04
fpclassify_t: 4.91 5.17
__isnormal_inl_t: 8.77 7.16
isnormal_t: 3.16 3.17
__finite_inl_t: 1.91 1.91
__finite_t: 15.29 15.28
isfinite_t: 1.28 1.28
__isinf_inl_t: 1.92 2.99
__isinf_t: 8.9 6.17
isinf_t: 1.28 1.28
__isnan_inl_t: 1.91 1.92
__isnan_t: 15.28 15.28
isnan_t: 1 1.01
The plain isnan_t functions use the GCC built-ins, the _inl variant uses the
existing math_private.h inlines (with __isinf fixed to return the sign too),
and the __isnan variants are the non-inline GLIBC functions.
So this clearly shows the GCC built-ins win by a huge margin, including the
inline versions. It also shows that multiple isinf/isnan calls would be faster
than a single inlined fpclassify...
> > > What are the effects on code size (especially for fpclassify)? If code
> > > becomes larger, conditioning on !defined __OPTIMIZE_SIZE__ should be
> > > considered.
> >
> > Codesize of what? Few applications use these functions... GLIBC mathlib is
>
> Size of any code calling these macros (for nonconstant arguments).
Well the size of the __isinf_t function is 160 bytes vs isinf_t 84 bytes
due to the callee-save overhead of the function call. The builtin isinf uses
3 instructions inside the loop plus 3 lifted before it, while the call to
__isinf needs 3 plus a lot of code to save/restore the callee-saves.
> > Also I don't think having special inlines that are only used inside
> > GLIBC is a good approach - if the GCC built-ins are not fast enough then
> > we should fix them.
>
> Yes, we should improve the built-in functions, but first we should
> understand the effects on performance of glibc libm functions (I don't
> know if the existing benchtests cover cases where __finite / __isnan /
> __isinf_ns inlines were used) to see if this cleanup patch indeed doesn't
> significantly harm performance of affected libm functions (and possibly
> improves performance through the changes in cases that wouldn't previously
> have been inlined at all).
A run of the math tests doesn't show up any obvious differences beyond the
usual variations from run to run. I suspect the difference due to inlining
is in the noise for expensive math functions.
Wilco
More information about the Libc-alpha
mailing list