This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH v3] Add math-inline benchmark
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Wilco Dijkstra <wdijkstr at arm dot com>
- Cc: 'GNU C Library' <libc-alpha at sourceware dot org>
- Date: Thu, 23 Jul 2015 22:59:13 +0200
- Subject: Re: [PATCH v3] Add math-inline benchmark
- Authentication-results: sourceware.org; auth=none
- References: <003601d0c546$b559d3d0$200d7b70$ at com>
On Thu, Jul 23, 2015 at 01:54:27PM +0100, Wilco Dijkstra wrote:
> Add a benchmark for isinf/isnan/isnormal/isfinite/fpclassify. The test uses 2 arrays with 1024
> doubles, one with 99% finite FP numbers (10% zeroes, 10% negative) and 1% inf/NaN, the other with
> 50% inf, and 50% Nan.
> This version removes various tests that caused confusion and only leaves the existing GLIBC
> definitions and inlines for comparison with the GCC builtins. I changed the tests to not inline
> inside the loop and use a branch on the boolean result. The 64-bit immediates used by the GLIBC
> inlines seem very expensive on some microarchitectures, so this shows even more clearly that using
> the built-ins results in a significant performance gain (see x64 results below).
Thats better but still not ok.
First is what we need to make explicit. You argue both that you use
benchmark just to show that inlining provides speedup and to justify
claims. You cannot have it both ways.
If you decide to write benchmark just to show that inlining is better
than not then write simple benchmark that compares current noinline
implementation with builtin inlines. I would be ok with just that with
comment that it shouldn't be used as justification for claims about
Thats simpler way if you really just want to show that builtins are
better than noninline version.
If you also want to make claims about builtins/current inlines then you
need to make a benchmark that can accurately measure inlines and
builtins to see what is correct and what is wrong. Thats quite hard as I
said as it depends how gcc will optimize implementation.
Here there are still unresolved issues from previous patches.
First you still test on x64 without getting EXTRACT_WORDS64
from math_private. Without that you don't measure
First as isinf you ommited current isinf inline. As its faster than
isinf_ns and builtin which you check.
Then remainder test is wrong. It is inlined but kernel_standard isn't.
As you wanted to used it to measure performance of noninline function it
obviously doesn't measure that.
When you fix that it clearly shows that current inlines are better on