This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RE: [PATCH] Inline C99 math functions
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: "'Joseph Myers'" <joseph at codesourcery dot com>
- Cc: "GNU C Library" <libc-alpha at sourceware dot org>
- Date: Wed, 17 Jun 2015 18:03:08 +0100
- Subject: RE: [PATCH] Inline C99 math functions
- Authentication-results: sourceware.org; auth=none
- References: <001201d0a75b$921d9860$b658c920$ at com> <alpine dot DEB dot 2 dot 10 dot 1506151431490 dot 26683 at digraph dot polyomino dot org dot uk> <001701d0a789$f2ab86f0$d80294d0$ at com> <alpine dot DEB dot 2 dot 10 dot 1506151654100 dot 26683 at digraph dot polyomino dot org dot uk> <001801d0a84c$8c5cd7a0$a51686e0$ at com> <alpine dot DEB dot 2 dot 10 dot 1506161606550 dot 16478 at digraph dot polyomino dot org dot uk>
> Joseph Myers wrote:
> On Tue, 16 Jun 2015, Wilco Dijkstra wrote:
>
> > > Well, the benchmark should come first....
> >
> > I added a new math-inlines benchmark based on the string benchmark
> > infrastructure.
>
> Thanks. I await the patch submission.
See https://sourceware.org/ml/libc-alpha/2015-06/msg00569.html
> > So this clearly shows the GCC built-ins win by a huge margin, including the
> > inline versions. It also shows that multiple isinf/isnan calls would be faster
>
> That's interesting information - suggesting that changes in GCC to use
> integer arithmetic should be conditional on -fsignaling-nans, if doing the
> operations by integer arithmetic is slower (at least on this processor).
>
> (It also suggests it's safe to remove the existing glibc-internal inlines
> as part of moving to using the built-in functions when possible.)
Indeed. To check which sequence is better we'd need to write a better benchmark,
maybe base it on a GLIBC function which uses these functions in the hot path.
> > > > Codesize of what? Few applications use these functions... GLIBC mathlib is
> > >
> > > Size of any code calling these macros (for nonconstant arguments).
> >
> > Well the size of the __isinf_t function is 160 bytes vs isinf_t 84 bytes
> > due to the callee-save overhead of the function call. The builtin isinf uses
> > 3 instructions inside the loop plus 3 lifted before it, while the call to
> > __isinf needs 3 plus a lot of code to save/restore the callee-saves.
>
> One might suppose that most functions using these macros contain other
> function calls as well, and so that the callee-save overhead should not be
> included in the comparison.
That may be true in some cases, but if you can tailcall (which might be possible
in several math veneers) then the callee-save savings would apply.
> When you exclude callee-save overhead, how do things compare for
> fpclassify (the main case where inlining may be questionable when
> optimizing for size)?
Well in the worst-case scenario where you need all 5 tests of fpclassify it
effectively changes a single-instruction call into 16 instructions plus 2 double
immediate. So it is best to use OPTIMIZE_SIZE for fpclassify for now and revisit
when the GCC implementation has been improved. I also wonder what the difference
would be once I've optimized the __fpclassify implementation - I can do it in
about 8-9 instructions.
Wilco