This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Inline C99 math functions
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Wilco Dijkstra <wdijkstr at arm dot com>
- Cc: 'Joseph Myers' <joseph at codesourcery dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Tue, 16 Jun 2015 18:40:20 +0200
- Subject: Re: [PATCH] Inline C99 math functions
- Authentication-results: sourceware.org; auth=none
- References: <001201d0a75b$921d9860$b658c920$ at com> <alpine dot DEB dot 2 dot 10 dot 1506151431490 dot 26683 at digraph dot polyomino dot org dot uk> <001701d0a789$f2ab86f0$d80294d0$ at com> <alpine dot DEB dot 2 dot 10 dot 1506151654100 dot 26683 at digraph dot polyomino dot org dot uk> <001801d0a84c$8c5cd7a0$a51686e0$ at com>
On Tue, Jun 16, 2015 at 04:53:11PM +0100, Wilco Dijkstra wrote:
>
>
> > -----Original Message-----
> > From: Joseph Myers [mailto:joseph@codesourcery.com]
> > Sent: 15 June 2015 18:01
> > To: Wilco Dijkstra
> > Cc: GNU C Library
> > Subject: RE: [PATCH] Inline C99 math functions
> >
> > On Mon, 15 Jun 2015, Wilco Dijkstra wrote:
> >
> > > > Where are the benchmarks for this? Please put them in benchtests so
> > > > actual reproducible figures can be given. That's the standard practice
> > > > for any change being justified on the basis of performance.
> > >
> > > I'll add a benchmark in another patch - it's not trivial as benchtest is not
> > > suitable to accurately time very simple functions, especially when inlined...
> >
> > Well, the benchmark should come first....
>
> I added a new math-inlines benchmark based on the string benchmark infrastructure.
> I used 2x1024 inputs, one 99% finite FP numbers (20% zeroes) and 1% inf/NaN,
> and the 2nd with 50% inf, and 50% Nan. Here are the relative timings for Cortex-A57:
>
Where is benchmark, there are several things that could go wrong with it.
> __fpclassify_t: 8.76 7.04
> fpclassify_t: 4.91 5.17
> __isnormal_inl_t: 8.77 7.16
> isnormal_t: 3.16 3.17
Where did you get inline? I couldn't find it anywhere. Also such big
number for inline implementation is suspect
> __finite_inl_t: 1.91 1.91
> __finite_t: 15.29 15.28
> isfinite_t: 1.28 1.28
> __isinf_inl_t: 1.92 2.99
> __isinf_t: 8.9 6.17
> isinf_t: 1.28 1.28
> __isnan_inl_t: 1.91 1.92
> __isnan_t: 15.28 15.28
> isnan_t: 1 1.01
>
> The plain isnan_t functions use the GCC built-ins, the _inl variant uses the
> existing math_private.h inlines (with __isinf fixed to return the sign too),
> and the __isnan variants are the non-inline GLIBC functions.
>
> So this clearly shows the GCC built-ins win by a huge margin, including the
> inline versions.
That looks bit suspect, submit a benchmark to see if its correct or not.
> It also shows that multiple isinf/isnan calls would be faster
> than a single inlined fpclassify...
>
No, thats completely wrong. When you look on assembly when using __builtin_isnan
then its identical to one of (on x64 but I doubt that arm gcc is worse)
__builtin_fpclassify (FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL, FP_ZERO, x),0) == FP_NAN
So removing fpclassify wouldn't in better case didn't change performance
at all, in worse one it would harm it due to duplicated checks.
> > > > What are the effects on code size (especially for fpclassify)? If code
> > > > becomes larger, conditioning on !defined __OPTIMIZE_SIZE__ should be
> > > > considered.
> > >
> > > Codesize of what? Few applications use these functions... GLIBC mathlib is
> >
> > Size of any code calling these macros (for nonconstant arguments).
>
> Well the size of the __isinf_t function is 160 bytes vs isinf_t 84 bytes
> due to the callee-save overhead of the function call. The builtin isinf uses
> 3 instructions inside the loop plus 3 lifted before it, while the call to
> __isinf needs 3 plus a lot of code to save/restore the callee-saves.
>
> > > Also I don't think having special inlines that are only used inside
> > > GLIBC is a good approach - if the GCC built-ins are not fast enough then
> > > we should fix them.
> >
> > Yes, we should improve the built-in functions, but first we should
> > understand the effects on performance of glibc libm functions (I don't
> > know if the existing benchtests cover cases where __finite / __isnan /
> > __isinf_ns inlines were used) to see if this cleanup patch indeed doesn't
> > significantly harm performance of affected libm functions (and possibly
> > improves performance through the changes in cases that wouldn't previously
> > have been inlined at all).
>
> A run of the math tests doesn't show up any obvious differences beyond the
> usual variations from run to run. I suspect the difference due to inlining
> is in the noise for expensive math functions.
>
Look at complex math, these use it. For real math you need to pick
specific inputs to trigger unlikely path that uses isinf...