This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Add math-inline benchmark

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Wilco Dijkstra <wdijkstr at arm dot com>
Cc: GNU C Library <libc-alpha at sourceware dot org>
Date: Thu, 9 Jul 2015 14:44:54 +0200
Subject: Re: [PATCH] Add math-inline benchmark
Authentication-results: sourceware.org; auth=none
References: <001c01d0a912$42357710$c6a06530$ at com> <20150622083657 dot GA3684 at domone> <000701d0b7fb$0f27b840$2d7728c0$ at com>

On Mon, Jul 06, 2015 at 03:50:11PM +0100, Wilco Dijkstra wrote:
> 
> 
> > OndÅej BÃlka wrote:
> > But with latency hiding by using argument first suddenly even isnan and
> > isnormal become regression.
> > 
> >     for (i = 0; i < n; i++){ res += 3*sin(p[i] * 2.0);    \
> >       if (func (p[i] * 2.0)) res += 5;}                   \
> > 
> > 
> > __fpclassify_test2_t:   92929.4 37256.8
> > __fpclassify_test1_t:   94020.1 35512.1
> >       __fpclassify_t:   17321.2 13325.1
> >         fpclassify_t:   8021.29 4376.89
> >    __isnormal_inl2_t:   93896.9 38941.8
> >     __isnormal_inl_t:   98069.2 46140.4
> >           isnormal_t:   94775.6 36941.8
> >       __finite_inl_t:   84059.9 38304
> >           __finite_t:   96052.4 45998.2
> >           isfinite_t:   93371.5 36659.1
> >        __isinf_inl_t:   92532.9 36050.1
> >            __isinf_t:   95929.4 46585.2
> >              isinf_t:   93290.1 36445.6
> >        __isnan_inl_t:   82760.7 37452.2
> >            __isnan_t:   98064.6 45338.8
> >              isnan_t:   93386.7 37786.4
> 
> Can you try this with:
> 
>     for (i = 0; i < n; i++)                               \
>       { double tmp = p[i] * 2.0;    \
>       if (sin(tmp) < 1.0) res++; if (func (tmp)) res += 5;}                   \
>
That doesn't change outcome:

__fpclassify_test2_t: 	99721	51051.6
__fpclassify_test1_t: 	85015.2	43607.4
      __fpclassify_t: 	13997.3	10475.1
        fpclassify_t: 	13502.5	10253.6
   __isnormal_inl2_t: 	76479.4	41531.7
    __isnormal_inl_t: 	76526.9	41560.8
          isnormal_t: 	76458.6	41547.7
      __finite_inl_t: 	71108.6	33271.3
          __finite_t: 	73031	37452.3
          isfinite_t: 	73024.9	37447
       __isinf_inl_t: 	68599.2	32792.9
           __isinf_t: 	74851	40108.8
             isinf_t: 	74871.9	40109.9
       __isnan_inl_t: 	71100.8	33659.6
           __isnan_t: 	72914	37592.4
             isnan_t: 	72909.4	37635.8
 
> Basically GCC does the array read and multiply twice just like you told it
> to (remember this is not using -ffast-math). Also avoid adding unnecessary
> FP operations and conversions (which may interact badly with timing the
> code we're trying to test). 
> 
And how do you know that most users don't use fp conversions in their
code just before isinf? These interactions make benchtests worthless as
in practice a different variant would be faster than one that you
measure.

> For me the fixed version still shows the expected answer: the built-ins are
> either faster or as fast as the inlines. So I don't think there is any
> regression here (remember also that previously there were no inlines at all
> except for a few inside GLIBC, so the real speedup is much larger).

Thats arm only. So it looks that we need platform-specific headers and testing.

These give speedup but as internal on x64 are better as they are its
natural to ask if using these in general would give same speedup. That
leads to fixing gcc builtins.

Follow-Ups:
- RE: [PATCH] Add math-inline benchmark
  - From: Wilco Dijkstra

References:
- RE: [PATCH] Add math-inline benchmark
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]