This is the mail archive of the
`libc-alpha@sourceware.org`
mailing list for the glibc project.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |

Other format: | [Raw text] |

*From*: "Wilco Dijkstra" <wdijkstr at arm dot com>*To*: 'OndÅej BÃlka' <neleai at seznam dot cz>*Cc*: "GNU C Library" <libc-alpha at sourceware dot org>*Date*: Mon, 13 Jul 2015 12:02:51 +0100*Subject*: RE: [PATCH] Add math-inline benchmark*Authentication-results*: sourceware.org; auth=none*References*: <001c01d0a912$42357710$c6a06530$ at com> <20150622083657 dot GA3684 at domone> <000701d0b7fb$0f27b840$2d7728c0$ at com> <20150709124454 dot GA29625 at domone> <001a01d0bb2a$c4f893b0$4ee9bb10$ at com> <20150710181111 dot GA27786 at domone>

> OndÅej BÃlka wrote: > On Fri, Jul 10, 2015 at 05:09:16PM +0100, Wilco Dijkstra wrote: > > > OndÅej BÃlka wrote: > > > On Mon, Jul 06, 2015 at 03:50:11PM +0100, Wilco Dijkstra wrote: > > > > > > > > > > > > > OndÅej BÃlka wrote: > > > > > But with latency hiding by using argument first suddenly even isnan and > > > > > isnormal become regression. > > > > > > > > > That doesn't look correct - it looks like this didn't use the built-ins at all, > > did you forget to apply that patch? > > > No, from what you wrote I expected that patch already tests builtins > which doesn't. Applied patch and got different results. When I added > patch results are similar. OK, I extended the benchmark to add the built-ins explicitly so that you don't need to apply the math.h inline patch first. > Which still doesn't have to mean anything, only if you test a > application that frequently uses these you will get result without > doubt. We don't have applications that uses these, but we can say without any doubt that they will show huge speedups if they do use these functions frequently or any math functions that use them a lot. Remainder() for example shows ~7% gain with the new inlines. > Here a simple modification produces different results. One of many > objections is that by simply adding gcc will try to make branchless code > like converting that to res += 5 * (isnan(tmp)). So use more difficult > branch and and with following two I get __builtin_isinf lot slower. > > { double tmp = p[i] * 2.0; \ > res += 3 * sin (tmp); if (func (tmp)) res += 3* sin (2 * tmp) ;} \ > > { double tmp = p[i] * 2.0; \ > if (func (tmp)) res += 3 * sin (2 * tmp) ;} \ So here are the results again for the original test and your 2 tests above: remainder_test2_t: 40966.3 192314 remainder_test1_t: 43697.4 196474 __fpclassify_t: 12665.2 9951.16 fpclassify_t: 2979.56 2974.35 __fpclassify_test2_t: 2889.92 2984.95 __fpclassify_test1_t: 3269.67 3199.05 isnormal_t: 4381.54 4041.78 __isnormal_builtin_t: 4586.15 4318.18 __isnormal_inl2_t: 4371.76 10737.4 __isnormal_inl_t: 12635.5 10418.4 isfinite_t: 2992.79 2979.5 __isfinite_builtin_t: 2982.96 2982.92 __finite_inl_t: 4090.2 4064.52 __finite_t: 7058.1 7039.74 isinf_t: 3274.14 3299.75 __isinf_builtin_t: 3195.79 3196.05 __isinf_ns_builtin_t: 3241.91 3241.96 __isinf_ns_t: 3500.85 3493.8 __isinf_inl_t: 2834.83 3433.89 __isinf_t: 8794.62 8812.5 isnan_t: 2801.83 2801.67 __isnan_builtin_t: 2794.7 2891.37 __isnan_inl_t: 4216.83 3980.52 __isnan_t: 7070.36 7088.15 remainder_test2_t: 105654 239008 remainder_test1_t: 107533 239310 __fpclassify_t: 12523.5 10080.2 fpclassify_t: 2974.47 2983.21 __fpclassify_test2_t: 64227.5 55564.1 __fpclassify_test1_t: 64036.1 55424 isnormal_t: 122300 34529.5 __isnormal_builtin_t: 122592 34616 __isnormal_inl2_t: 123425 35056 __isnormal_inl_t: 129589 41615.3 isfinite_t: 123254 34041.5 __isfinite_builtin_t: 123302 34093 __finite_inl_t: 123455 34631.8 __finite_t: 127298 39587.5 isinf_t: 63744 45997.6 __isinf_builtin_t: 63545.2 46100.2 __isinf_ns_builtin_t: 63570.9 46087.5 __isinf_ns_t: 63890.9 45754.5 __isinf_inl_t: 64008.5 46505.2 __isinf_t: 68915.7 51833.8 isnan_t: 62866.8 45023.5 __isnan_builtin_t: 62951.9 44956.8 __isnan_inl_t: 63855.1 45294.6 __isnan_t: 67156.5 49505.3 remainder_test2_t: 41147.4 216349 remainder_test1_t: 43860.8 220614 __fpclassify_t: 12569.1 10124.3 fpclassify_t: 3068.91 2974.31 __fpclassify_test2_t: 4048.88 32446.5 __fpclassify_test1_t: 4005.86 32783.1 isnormal_t: 63707.9 14550.8 __isnormal_builtin_t: 63672 14383 __isnormal_inl2_t: 65730.4 15059.1 __isnormal_inl_t: 73352.2 10570.7 isfinite_t: 64756.7 2719.12 __isfinite_builtin_t: 64748.8 2664.12 __finite_inl_t: 65331.4 2740.1 __finite_t: 70374.5 7944.69 isinf_t: 2927.67 20684.2 __isinf_builtin_t: 2848.58 20050.9 __isinf_ns_builtin_t: 2932.22 21809.9 __isinf_ns_t: 2908.41 18973.9 __isinf_inl_t: 2971.63 18025.5 __isinf_t: 9010.58 28392.3 isnan_t: 2841.28 16457.3 __isnan_builtin_t: 2841.34 15017.8 __isnan_inl_t: 2846.25 19736.8 __isnan_t: 8171.32 23874.3 > > >From this it seems that __isinf_inl is slightly better than the builtin, but > > it does not show up as a regression when combined with sin or in the remainder > > test. > > > That doesn't hold generaly as remainder test it could be just caused by > isnan being slower than isinf. No, the new isinf/isnan are both faster than the previous versions (some isinf calls were inlined as __isinf_ns, but even that one is clearly slower than the builtin in all the results). Remember once again this patch creates new inlines that didn't exist before as well as replacing existing inlines in GLIBC with even faster ones. The combination of these means it is simply an impossibility that anything could become slower. > > Well I just confirmed the same gains apply to x64. > > > No, that doesn't confirm anything yet. You need to do more extensive > testing to get somewhat reliable answer and still you won't be sure. No, this benchmark does give a very clear and reliable answer: everything speeds up by a huge factor. > I asked you to run on arm my benchmark to measure results of inlining. > I attached again version. You should run it to see how results will differ. I did run it but I don't understand what it's supposed to mean, and I can't share the results. So do you have something simpler that shows what point you're trying to make? Or maybe you could add your own benchmark to GLIBC? Wilco

**Follow-Ups**:**Re: [PATCH] Add math-inline benchmark***From:*Carlos O'Donell

**Re: [PATCH] Add math-inline benchmark***From:*OndÅej BÃlka

**References**:**RE: [PATCH] Add math-inline benchmark***From:*Wilco Dijkstra

**Re: [PATCH] Add math-inline benchmark***From:*OndÅej BÃlka

**RE: [PATCH] Add math-inline benchmark***From:*Wilco Dijkstra

**Re: [PATCH] Add math-inline benchmark***From:*OndÅej BÃlka

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |