[PATCH] Inline C99 math functions

Tue Jun 16 16:25:00 GMT 2015

> -----Original Message-----
> From: Joseph Myers [mailto:joseph@codesourcery.com]
> Sent: 15 June 2015 18:01
> To: Wilco Dijkstra
> Cc: GNU C Library
> Subject: RE: [PATCH] Inline C99 math functions
> 
> On Mon, 15 Jun 2015, Wilco Dijkstra wrote:
> 
> > > Where are the benchmarks for this?  Please put them in benchtests so
> > > actual reproducible figures can be given.  That's the standard practice
> > > for any change being justified on the basis of performance.
> >
> > I'll add a benchmark in another patch - it's not trivial as benchtest is not
> > suitable to accurately time very simple functions, especially when inlined...
> 
> Well, the benchmark should come first....

I added a new math-inlines benchmark based on the string benchmark infrastructure.
I used 2x1024 inputs, one 99% finite FP numbers (20% zeroes) and 1% inf/NaN,
and the 2nd with 50% inf, and 50% Nan. Here are the relative timings for Cortex-A57:

__fpclassify_t:	8.76	7.04
fpclassify_t:	4.91	5.17
__isnormal_inl_t:	8.77	7.16
isnormal_t:		3.16	3.17
__finite_inl_t:	1.91	1.91
__finite_t:		15.29	15.28
isfinite_t:		1.28	1.28
__isinf_inl_t:	1.92	2.99
__isinf_t:		8.9	6.17
isinf_t:		1.28	1.28
__isnan_inl_t:	1.91	1.92
__isnan_t:		15.28	15.28
isnan_t:		1	1.01

The plain isnan_t functions use the GCC built-ins, the _inl variant uses the
existing math_private.h inlines (with __isinf fixed to return the sign too),
and the __isnan variants are the non-inline GLIBC functions.

So this clearly shows the GCC built-ins win by a huge margin, including the
inline versions. It also shows that multiple isinf/isnan calls would be faster
than a single inlined fpclassify...

> > > What are the effects on code size (especially for fpclassify)?  If code
> > > becomes larger, conditioning on !defined __OPTIMIZE_SIZE__ should be
> > > considered.
> >
> > Codesize of what? Few applications use these functions... GLIBC mathlib is
> 
> Size of any code calling these macros (for nonconstant arguments).

Well the size of the __isinf_t function is 160 bytes vs isinf_t 84 bytes
due to the callee-save overhead of the function call. The builtin isinf uses
3 instructions inside the loop plus 3 lifted before it, while the call to
__isinf needs 3 plus a lot of code to save/restore the callee-saves.

> > Also I don't think having special inlines that are only used inside
> > GLIBC is a good approach - if the GCC built-ins are not fast enough then
> > we should fix them.
> 
> Yes, we should improve the built-in functions, but first we should
> understand the effects on performance of glibc libm functions (I don't
> know if the existing benchtests cover cases where __finite / __isnan /
> __isinf_ns inlines were used) to see if this cleanup patch indeed doesn't
> significantly harm performance of affected libm functions (and possibly
> improves performance through the changes in cases that wouldn't previously
> have been inlined at all).

A run of the math tests doesn't show up any obvious differences beyond the
usual variations from run to run. I suspect the difference due to inlining
is in the noise for expensive math functions.

Wilco