This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Inline C99 math functions

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Wilco Dijkstra <wdijkstr at arm dot com>
Cc: 'Joseph Myers' <joseph at codesourcery dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Tue, 16 Jun 2015 18:40:20 +0200
Subject: Re: [PATCH] Inline C99 math functions
Authentication-results: sourceware.org; auth=none
References: <001201d0a75b$921d9860$b658c920$ at com> <alpine dot DEB dot 2 dot 10 dot 1506151431490 dot 26683 at digraph dot polyomino dot org dot uk> <001701d0a789$f2ab86f0$d80294d0$ at com> <alpine dot DEB dot 2 dot 10 dot 1506151654100 dot 26683 at digraph dot polyomino dot org dot uk> <001801d0a84c$8c5cd7a0$a51686e0$ at com>

On Tue, Jun 16, 2015 at 04:53:11PM +0100, Wilco Dijkstra wrote:
> 
> 
> > -----Original Message-----
> > From: Joseph Myers [mailto:joseph@codesourcery.com]
> > Sent: 15 June 2015 18:01
> > To: Wilco Dijkstra
> > Cc: GNU C Library
> > Subject: RE: [PATCH] Inline C99 math functions
> > 
> > On Mon, 15 Jun 2015, Wilco Dijkstra wrote:
> > 
> > > > Where are the benchmarks for this?  Please put them in benchtests so
> > > > actual reproducible figures can be given.  That's the standard practice
> > > > for any change being justified on the basis of performance.
> > >
> > > I'll add a benchmark in another patch - it's not trivial as benchtest is not
> > > suitable to accurately time very simple functions, especially when inlined...
> > 
> > Well, the benchmark should come first....
> 
> I added a new math-inlines benchmark based on the string benchmark infrastructure.
> I used 2x1024 inputs, one 99% finite FP numbers (20% zeroes) and 1% inf/NaN,
> and the 2nd with 50% inf, and 50% Nan. Here are the relative timings for Cortex-A57:
> 
Where is benchmark, there are several things that could go wrong with it.

> __fpclassify_t:	8.76	7.04
> fpclassify_t:	4.91	5.17

> __isnormal_inl_t:	8.77	7.16
> isnormal_t:		3.16	3.17

Where did you get inline? I couldn't find it anywhere. Also such big
number for inline implementation is suspect

> __finite_inl_t:	1.91	1.91
> __finite_t:		15.29	15.28
> isfinite_t:		1.28	1.28
> __isinf_inl_t:	1.92	2.99
> __isinf_t:		8.9	6.17
> isinf_t:		1.28	1.28
> __isnan_inl_t:	1.91	1.92
> __isnan_t:		15.28	15.28
> isnan_t:		1	1.01
> 
> The plain isnan_t functions use the GCC built-ins, the _inl variant uses the
> existing math_private.h inlines (with __isinf fixed to return the sign too),
> and the __isnan variants are the non-inline GLIBC functions.
> 
> So this clearly shows the GCC built-ins win by a huge margin, including the
> inline versions.
That looks bit suspect, submit a benchmark to see if its correct or not.


>  It also shows that multiple isinf/isnan calls would be faster
> than a single inlined fpclassify...
> 
No, thats completely wrong. When you look on assembly when using __builtin_isnan
then its identical to one of (on x64 but I doubt that arm gcc is worse)

__builtin_fpclassify (FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL, FP_ZERO, x),0) == FP_NAN

So removing fpclassify wouldn't in better case didn't change performance
at all, in worse one it would harm it due to duplicated checks.


> > > > What are the effects on code size (especially for fpclassify)?  If code
> > > > becomes larger, conditioning on !defined __OPTIMIZE_SIZE__ should be
> > > > considered.
> > >
> > > Codesize of what? Few applications use these functions... GLIBC mathlib is
> > 
> > Size of any code calling these macros (for nonconstant arguments).
> 
> Well the size of the __isinf_t function is 160 bytes vs isinf_t 84 bytes
> due to the callee-save overhead of the function call. The builtin isinf uses
> 3 instructions inside the loop plus 3 lifted before it, while the call to
> __isinf needs 3 plus a lot of code to save/restore the callee-saves.
> 
> > > Also I don't think having special inlines that are only used inside
> > > GLIBC is a good approach - if the GCC built-ins are not fast enough then
> > > we should fix them.
> > 
> > Yes, we should improve the built-in functions, but first we should
> > understand the effects on performance of glibc libm functions (I don't
> > know if the existing benchtests cover cases where __finite / __isnan /
> > __isinf_ns inlines were used) to see if this cleanup patch indeed doesn't
> > significantly harm performance of affected libm functions (and possibly
> > improves performance through the changes in cases that wouldn't previously
> > have been inlined at all).
> 
> A run of the math tests doesn't show up any obvious differences beyond the
> usual variations from run to run. I suspect the difference due to inlining
> is in the noise for expensive math functions.
>
Look at complex math, these use it. For real math you need to pick
specific inputs to trigger unlikely path that uses isinf...

Follow-Ups:
- RE: [PATCH] Inline C99 math functions
  - From: Wilco Dijkstra

References:
- [PATCH] Inline C99 math functions
  - From: Wilco Dijkstra
- Re: [PATCH] Inline C99 math functions
  - From: Joseph Myers
- RE: [PATCH] Inline C99 math functions
  - From: Wilco Dijkstra
- RE: [PATCH] Inline C99 math functions
  - From: Joseph Myers
- RE: [PATCH] Inline C99 math functions
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]