This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: [PATCH] Add math-inline benchmark

From: "Wilco Dijkstra" <wdijkstr at arm dot com>
To: 'OndÅej BÃlka' <neleai at seznam dot cz>
Cc: "GNU C Library" <libc-alpha at sourceware dot org>
Date: Mon, 13 Jul 2015 12:02:51 +0100
Subject: RE: [PATCH] Add math-inline benchmark
Authentication-results: sourceware.org; auth=none
References: <001c01d0a912$42357710$c6a06530$ at com> <20150622083657 dot GA3684 at domone> <000701d0b7fb$0f27b840$2d7728c0$ at com> <20150709124454 dot GA29625 at domone> <001a01d0bb2a$c4f893b0$4ee9bb10$ at com> <20150710181111 dot GA27786 at domone>

> OndÅej BÃlka wrote:
> On Fri, Jul 10, 2015 at 05:09:16PM +0100, Wilco Dijkstra wrote:
> > > OndÅej BÃlka wrote:
> > > On Mon, Jul 06, 2015 at 03:50:11PM +0100, Wilco Dijkstra wrote:
> > > >
> > > >
> > > > > OndÅej BÃlka wrote:
> > > > > But with latency hiding by using argument first suddenly even isnan and
> > > > > isnormal become regression.
> > > > >
> >
> > That doesn't look correct - it looks like this didn't use the built-ins at all,
> > did you forget to apply that patch?
> >
> No, from what you wrote I expected that patch already tests builtins
> which doesn't. Applied patch and got different results. When I added
> patch results are similar.

OK, I extended the benchmark to add the built-ins explicitly so that
you don't need to apply the math.h inline patch first.

> Which still doesn't have to mean anything, only if you test a
> application that frequently uses these you will get result without
> doubt.

We don't have applications that uses these, but we can say without any
doubt that they will show huge speedups if they do use these functions
frequently or any math functions that use them a lot. Remainder() for
example shows ~7% gain with the new inlines.

> Here a simple modification produces different results. One of many
> objections is that by simply adding gcc will try to make branchless code
> like converting that to res += 5 * (isnan(tmp)). So use more difficult
> branch and and with following two I get __builtin_isinf lot slower.
> 
>     { double tmp = p[i] * 2.0;    \
>        res += 3 * sin (tmp); if (func (tmp)) res += 3* sin (2 * tmp) ;} \
> 
>     { double tmp = p[i] * 2.0;    \
>        if (func (tmp)) res += 3 * sin (2 * tmp) ;} \

So here are the results again for the original test and your 2 tests above:

   remainder_test2_t:   40966.3 192314
   remainder_test1_t:   43697.4 196474
      __fpclassify_t:   12665.2 9951.16
        fpclassify_t:   2979.56 2974.35
__fpclassify_test2_t:   2889.92 2984.95
__fpclassify_test1_t:   3269.67 3199.05
          isnormal_t:   4381.54 4041.78
__isnormal_builtin_t:   4586.15 4318.18
   __isnormal_inl2_t:   4371.76 10737.4
    __isnormal_inl_t:   12635.5 10418.4
          isfinite_t:   2992.79 2979.5
__isfinite_builtin_t:   2982.96 2982.92
      __finite_inl_t:   4090.2  4064.52
          __finite_t:   7058.1  7039.74
             isinf_t:   3274.14 3299.75
   __isinf_builtin_t:   3195.79 3196.05
__isinf_ns_builtin_t:   3241.91 3241.96
        __isinf_ns_t:   3500.85 3493.8
       __isinf_inl_t:   2834.83 3433.89
           __isinf_t:   8794.62 8812.5
             isnan_t:   2801.83 2801.67
   __isnan_builtin_t:   2794.7  2891.37
       __isnan_inl_t:   4216.83 3980.52
           __isnan_t:   7070.36 7088.15

   remainder_test2_t:   105654  239008
   remainder_test1_t:   107533  239310
      __fpclassify_t:   12523.5 10080.2
        fpclassify_t:   2974.47 2983.21
__fpclassify_test2_t:   64227.5 55564.1
__fpclassify_test1_t:   64036.1 55424
          isnormal_t:   122300  34529.5
__isnormal_builtin_t:   122592  34616
   __isnormal_inl2_t:   123425  35056
    __isnormal_inl_t:   129589  41615.3
          isfinite_t:   123254  34041.5
__isfinite_builtin_t:   123302  34093
      __finite_inl_t:   123455  34631.8
          __finite_t:   127298  39587.5
             isinf_t:   63744   45997.6
   __isinf_builtin_t:   63545.2 46100.2
__isinf_ns_builtin_t:   63570.9 46087.5
        __isinf_ns_t:   63890.9 45754.5
       __isinf_inl_t:   64008.5 46505.2
           __isinf_t:   68915.7 51833.8
             isnan_t:   62866.8 45023.5
   __isnan_builtin_t:   62951.9 44956.8
       __isnan_inl_t:   63855.1 45294.6
           __isnan_t:   67156.5 49505.3

   remainder_test2_t:   41147.4 216349
   remainder_test1_t:   43860.8 220614
      __fpclassify_t:   12569.1 10124.3
        fpclassify_t:   3068.91 2974.31
__fpclassify_test2_t:   4048.88 32446.5
__fpclassify_test1_t:   4005.86 32783.1
          isnormal_t:   63707.9 14550.8
__isnormal_builtin_t:   63672   14383
   __isnormal_inl2_t:   65730.4 15059.1
    __isnormal_inl_t:   73352.2 10570.7
          isfinite_t:   64756.7 2719.12
__isfinite_builtin_t:   64748.8 2664.12
      __finite_inl_t:   65331.4 2740.1
          __finite_t:   70374.5 7944.69
             isinf_t:   2927.67 20684.2
   __isinf_builtin_t:   2848.58 20050.9
__isinf_ns_builtin_t:   2932.22 21809.9
        __isinf_ns_t:   2908.41 18973.9
       __isinf_inl_t:   2971.63 18025.5
           __isinf_t:   9010.58 28392.3
             isnan_t:   2841.28 16457.3
   __isnan_builtin_t:   2841.34 15017.8
       __isnan_inl_t:   2846.25 19736.8
           __isnan_t:   8171.32 23874.3

> > >From this it seems that __isinf_inl is slightly better than the builtin, but
> > it does not show up as a regression when combined with sin or in the remainder
> > test.
> >
> That doesn't hold generaly as remainder test it could be just caused by
> isnan being slower than isinf.

No, the new isinf/isnan are both faster than the previous versions (some isinf
calls were inlined as __isinf_ns, but even that one is clearly slower than the
builtin in all the results). Remember once again this patch creates new inlines
that didn't exist before as well as replacing existing inlines in GLIBC with
even faster ones. The combination of these means it is simply an impossibility
that anything could become slower.

> > Well I just confirmed the same gains apply to x64.
> >
> No, that doesn't confirm anything yet. You need to do more extensive
> testing to get somewhat reliable answer and still you won't be sure.

No, this benchmark does give a very clear and reliable answer: everything
speeds up by a huge factor.

> I asked you to run on arm my benchmark to measure results of inlining.
> I attached again version. You should run it to see how results will differ.

I did run it but I don't understand what it's supposed to mean, and I can't share
the results. So do you have something simpler that shows what point you're trying
to make? Or maybe you could add your own benchmark to GLIBC?

Wilco

Follow-Ups:
- Re: [PATCH] Add math-inline benchmark
  - From: Carlos O'Donell
- Re: [PATCH] Add math-inline benchmark
  - From: OndÅej BÃlka

References:
- RE: [PATCH] Add math-inline benchmark
  - From: Wilco Dijkstra
- Re: [PATCH] Add math-inline benchmark
  - From: OndÅej BÃlka
- RE: [PATCH] Add math-inline benchmark
  - From: Wilco Dijkstra
- Re: [PATCH] Add math-inline benchmark
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]