This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: [PATCH] Add math-inline benchmark

From: "Wilco Dijkstra" <wdijkstr at arm dot com>
To: 'Ondřej Bílka' <neleai at seznam dot cz>
Cc: "GNU C Library" <libc-alpha at sourceware dot org>
Date: Mon, 6 Jul 2015 15:50:11 +0100
Subject: RE: [PATCH] Add math-inline benchmark
Authentication-results: sourceware.org; auth=none
References: <001c01d0a912$42357710$c6a06530$ at com> <20150622083657 dot GA3684 at domone>


> Ondřej Bílka wrote:
> On Wed, Jun 17, 2015 at 04:28:27PM +0100, Wilco Dijkstra wrote:
> > Hi,
> >
> > Due to popular demand, here is a new benchmark that tests isinf, isnan,
> > isnormal, isfinite and fpclassify. It uses 2 arrays with 1024 doubles,
> > one with 99% finite FP numbers (10% zeroes, 10% negative) and 1% inf/NaN,
> > the other with 50% inf, and 50% Nan.
> >
> > Results shows that using the GCC built-ins in math.h will give huge speedups
> > due to avoiding explict calls, PLT indirection to execute a function with
> > 3-4 instructions. The GCC builtins have similar performance as the existing
> > math_private inlines for __isnan, __finite and __isinf_ns.
> >
> > OK for commit?
> >
> Ran these, on x64 using builtins is regression even with your benchmark.
> 
> Main problem here is what exactly you do measure. I don't know how much
> of your results were caused by measuring latency of load/multiply/move
> to int register chain. With OoO that latency shouldn't be problem.
> 
> Original results are following, when I also inlined isfinite:
> 
> __fpclassify_test2_t: 	3660.24	3733.22
> __fpclassify_test1_t: 	3696.33	3691.3
>       __fpclassify_t: 	14365.8	11116.5
>         fpclassify_t: 	6045.69	3128.76
>    __isnormal_inl2_t: 	5275.85	14562.6
>     __isnormal_inl_t: 	14753.3	11143.5
>           isnormal_t: 	4418.84	4411.59
>       __finite_inl_t: 	3038.75	3038.4
>           __finite_t: 	7712.42	7697.24
>           isfinite_t: 	3108.91	3107.85
>        __isinf_inl_t: 	2109.05	2817.19
>            __isinf_t: 	8555.51	8559.36
>              isinf_t: 	3472.62	3408.8
>        __isnan_inl_t: 	2682.12	2691.39
>            __isnan_t: 	7698.14	7735.29
>              isnan_t: 	2592.58	2572.82
> 
> 
> But with latency hiding by using argument first suddenly even isnan and
> isnormal become regression.
> 
>     for (i = 0; i < n; i++){ res += 3*sin(p[i] * 2.0);    \
>       if (func (p[i] * 2.0)) res += 5;}                   \
> 
> 
> __fpclassify_test2_t:   92929.4 37256.8
> __fpclassify_test1_t:   94020.1 35512.1
>       __fpclassify_t:   17321.2 13325.1
>         fpclassify_t:   8021.29 4376.89
>    __isnormal_inl2_t:   93896.9 38941.8
>     __isnormal_inl_t:   98069.2 46140.4
>           isnormal_t:   94775.6 36941.8
>       __finite_inl_t:   84059.9 38304
>           __finite_t:   96052.4 45998.2
>           isfinite_t:   93371.5 36659.1
>        __isinf_inl_t:   92532.9 36050.1
>            __isinf_t:   95929.4 46585.2
>              isinf_t:   93290.1 36445.6
>        __isnan_inl_t:   82760.7 37452.2
>            __isnan_t:   98064.6 45338.8
>              isnan_t:   93386.7 37786.4

Can you try this with:

    for (i = 0; i < n; i++)                               \
      { double tmp = p[i] * 2.0;    \
      if (sin(tmp) < 1.0) res++; if (func (tmp)) res += 5;}                   \

Basically GCC does the array read and multiply twice just like you told it
to (remember this is not using -ffast-math). Also avoid adding unnecessary
FP operations and conversions (which may interact badly with timing the
code we're trying to test). 

For me the fixed version still shows the expected answer: the built-ins are
either faster or as fast as the inlines. So I don't think there is any
regression here (remember also that previously there were no inlines at all
except for a few inside GLIBC, so the real speedup is much larger).

__fpclassify_test2_t:	1.07
__fpclassify_test1_t:	1.07
__fpclassify_t:	1.24
fpclassify_t:	1
__isnormal_inl2_t:	1.11
__isnormal_inl_t:	1.24
isnormal_t:	1.04
__finite_inl_t:	1.04
__finite_t:	1.19
isfinite_t:	1
__isinf_inl_t:	1.07
__isinf_t:	1.22
isinf_t:	1
__isnan_inl_t:	1.04
__isnan_t:	1.14
isnan_t:	1

Wilco

Follow-Ups:
- Re: [PATCH] Add math-inline benchmark
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]