This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Add math-inline benchmark
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Wilco Dijkstra <wdijkstr at arm dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Thu, 9 Jul 2015 14:44:54 +0200
- Subject: Re: [PATCH] Add math-inline benchmark
- Authentication-results: sourceware.org; auth=none
- References: <001c01d0a912$42357710$c6a06530$ at com> <20150622083657 dot GA3684 at domone> <000701d0b7fb$0f27b840$2d7728c0$ at com>
On Mon, Jul 06, 2015 at 03:50:11PM +0100, Wilco Dijkstra wrote:
>
>
> > OndÅej BÃlka wrote:
> > But with latency hiding by using argument first suddenly even isnan and
> > isnormal become regression.
> >
> > for (i = 0; i < n; i++){ res += 3*sin(p[i] * 2.0); \
> > if (func (p[i] * 2.0)) res += 5;} \
> >
> >
> > __fpclassify_test2_t: 92929.4 37256.8
> > __fpclassify_test1_t: 94020.1 35512.1
> > __fpclassify_t: 17321.2 13325.1
> > fpclassify_t: 8021.29 4376.89
> > __isnormal_inl2_t: 93896.9 38941.8
> > __isnormal_inl_t: 98069.2 46140.4
> > isnormal_t: 94775.6 36941.8
> > __finite_inl_t: 84059.9 38304
> > __finite_t: 96052.4 45998.2
> > isfinite_t: 93371.5 36659.1
> > __isinf_inl_t: 92532.9 36050.1
> > __isinf_t: 95929.4 46585.2
> > isinf_t: 93290.1 36445.6
> > __isnan_inl_t: 82760.7 37452.2
> > __isnan_t: 98064.6 45338.8
> > isnan_t: 93386.7 37786.4
>
> Can you try this with:
>
> for (i = 0; i < n; i++) \
> { double tmp = p[i] * 2.0; \
> if (sin(tmp) < 1.0) res++; if (func (tmp)) res += 5;} \
>
That doesn't change outcome:
__fpclassify_test2_t: 99721 51051.6
__fpclassify_test1_t: 85015.2 43607.4
__fpclassify_t: 13997.3 10475.1
fpclassify_t: 13502.5 10253.6
__isnormal_inl2_t: 76479.4 41531.7
__isnormal_inl_t: 76526.9 41560.8
isnormal_t: 76458.6 41547.7
__finite_inl_t: 71108.6 33271.3
__finite_t: 73031 37452.3
isfinite_t: 73024.9 37447
__isinf_inl_t: 68599.2 32792.9
__isinf_t: 74851 40108.8
isinf_t: 74871.9 40109.9
__isnan_inl_t: 71100.8 33659.6
__isnan_t: 72914 37592.4
isnan_t: 72909.4 37635.8
> Basically GCC does the array read and multiply twice just like you told it
> to (remember this is not using -ffast-math). Also avoid adding unnecessary
> FP operations and conversions (which may interact badly with timing the
> code we're trying to test).
>
And how do you know that most users don't use fp conversions in their
code just before isinf? These interactions make benchtests worthless as
in practice a different variant would be faster than one that you
measure.
> For me the fixed version still shows the expected answer: the built-ins are
> either faster or as fast as the inlines. So I don't think there is any
> regression here (remember also that previously there were no inlines at all
> except for a few inside GLIBC, so the real speedup is much larger).
Thats arm only. So it looks that we need platform-specific headers and testing.
These give speedup but as internal on x64 are better as they are its
natural to ask if using these in general would give same speedup. That
leads to fixing gcc builtins.