This is the mail archive of the
`libc-alpha@sourceware.org`
mailing list for the glibc project.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |

Other format: | [Raw text] |

*From*: "Wilco Dijkstra" <wdijkstr at arm dot com>*To*: 'OndÅej BÃlka' <neleai at seznam dot cz>*Cc*: "'GNU C Library'" <libc-alpha at sourceware dot org>*Date*: Mon, 20 Jul 2015 12:01:50 +0100*Subject*: RE: [PATCH v2] Add math-inline benchmark*Authentication-results*: sourceware.org; auth=none*References*: <002001d0bfb8$b36fa330$1a4ee990$ at com> <20150716225056 dot GA24479 at domone> <002501d0c094$3ea04cd0$bbe0e670$ at com> <20150718113423 dot GC30356 at domone>

> OndÅej BÃlka wrote: > On Fri, Jul 17, 2015 at 02:26:53PM +0100, Wilco Dijkstra wrote: > > > OndÅej BÃlka wrote: > > > On Thu, Jul 16, 2015 at 12:15:19PM +0100, Wilco Dijkstra wrote: > > > > Add a benchmark for isinf/isnan/isnormal/isfinite/fpclassify. This new version adds > explicit > > > tests > > > > for the GCC built-ins and uses json format as suggested and no longer includes any > string > > > headers. > > > > The test uses 2 arrays with 1024 doubles, one with 99% finite FP numbers (10% zeroes, > 10% > > > negative) > > > > and 1% inf/NaN, the other with 50% inf, and 50% Nan. > > > > > > > > Results shows that using the GCC built-ins in math.h gives huge speedups due to avoiding > > > explict > > > > calls, PLT indirection to execute a function with 3-4 instructions - around 7x on > AArch64 > > > and 2.8x > > > > on x64. The GCC builtins have better performance than the existing math_private inlines > for > > > __isnan, > > > > __finite and __isinf_ns, so these should be removed. > > > > > > > No, this benchmark is invalid for following two reasons. > > > > > > 1) It doesn't measure real workload at all. Constructing large constant > > > could be costy and by inlining this benchmark ignores cost. > > > > It was never meant to measure a real workload. If you'd like to add your own > > workload, that would be great, but my micro benchmark is more than sufficient > > in proving that the new inlines give a huge performance gain. > > > But you claimed following in original mail which is wrong: > > " > Results shows that using the GCC built-ins in math.h gives huge speedups due to avoiding > explict > calls, PLT indirection to execute a function with 3-4 instructions - around 7x on AArch64 and > 2.8x > on x64. The GCC builtins have better performance than the existing math_private inlines for > __isnan, > __finite and __isinf_ns, so these should be removed. > " No that statement is 100% correct. > Also when inlines give speedup you should also add math inlines for > signaling nan case. That gives similar speedup. And it would be natural > to ask if you should use these inlines everytime if they are already > faster than builtins. I'm not sure what you mean here - I enable the new inlines in exactly the right case. Improvements to support signalling NaNs or to speedup the built-ins further will be done in GCC. > > > So at least on x64 we should publish math_private inlines instead using > > > slow builtins. > > > > Well it was agreed we are going to use the GCC built-ins and then improve > > those. If you want to propose additional patches with special inlines for > > x64 then please go ahead, but my plan is to improve the builtins. > > > And how are you sure that its just isolated x64 case. It may also happen > on powerpc, arm, sparc and other architectures and you need to test > that. It's obvious the huge speedup applies to all other architectures as well - it's hard to imagine that avoiding a call, a return, a PLT indirection and additional optimization of 3-4 instructions could ever cause a slowdown... > So I ask you again to run my benchmark with changed EXTRACT_WORDS64 to > see if this is problem also and arm. Here are the results for x64 with inlining disabled (__always_inline changed into noinline) and the movq instruction like you suggested: "__isnan_t": { "normal": { "duration": 3.52048e+06, "iterations": 500, "mean": 7040 } }, "__isnan_inl_t": { "normal": { "duration": 3.09247e+06, "iterations": 500, "mean": 6184 } }, "__isnan_builtin_t": { "normal": { "duration": 2.20378e+06, "iterations": 500, "mean": 4407 } }, "isnan_t": { "normal": { "duration": 1.50514e+06, "iterations": 500, "mean": 3010 } }, "__isinf_t": { "normal": { "duration": 4.54681e+06, "iterations": 500, "mean": 9093 } }, "__isinf_inl_t": { "normal": { "duration": 3.09981e+06, "iterations": 500, "mean": 6199 } }, "__isinf_ns_t": { "normal": { "duration": 3.08074e+06, "iterations": 500, "mean": 6161 } }, "__isinf_ns_builtin_t": { "normal": { "duration": 2.64185e+06, "iterations": 500, "mean": 5283 } }, "__isinf_builtin_t": { "normal": { "duration": 3.13833e+06, "iterations": 500, "mean": 6276 } }, "isinf_t": { "normal": { "duration": 1.60055e+06, "iterations": 500, "mean": 3201 } }, "__finite_t": { "normal": { "duration": 3.54966e+06, "iterations": 500, "mean": 7099 } }, "__finite_inl_t": { "normal": { "duration": 3.08112e+06, "iterations": 500, "mean": 6162 } }, "__isfinite_builtin_t": { "normal": { "duration": 2.6426e+06, "iterations": 500, "mean": 5285 } }, "isfinite_t": { "normal": { "duration": 1.49071e+06, "iterations": 500, "mean": 2981 } }, "__isnormal_inl_t": { "normal": { "duration": 6.31925e+06, "iterations": 500, "mean": 12638 } }, "__isnormal_inl2_t": { "normal": { "duration": 2.18113e+06, "iterations": 500, "mean": 4362 } }, "__isnormal_builtin_t": { "normal": { "duration": 3.08183e+06, "iterations": 500, "mean": 6163 } }, "isnormal_t": { "normal": { "duration": 2.19867e+06, "iterations": 500, "mean": 4397 } }, "__fpclassify_test1_t": { "normal": { "duration": 2.71552e+06, "iterations": 500, "mean": 5431 } }, "__fpclassify_test2_t": { "normal": { "duration": 2.69459e+06, "iterations": 500, "mean": 5389 } }, "__fpclassify_t": { "normal": { "duration": 6.42558e+06, "iterations": 500, "mean": 12851 } }, "fpclassify_t": { "normal": { "duration": 1.4907e+06, "iterations": 500, "mean": 2981 } }, "remainder_test1_t": { "normal": { "duration": 2.20506e+07, "iterations": 500, "mean": 44101 } }, "remainder_test2_t": { "normal": { "duration": 2.12782e+07, "iterations": 500, "mean": 42556 } }

**Follow-Ups**:**Re: [PATCH v2] Add math-inline benchmark***From:*OndÅej BÃlka

**References**:**[PATCH v2] Add math-inline benchmark***From:*Wilco Dijkstra

**Re: [PATCH v2] Add math-inline benchmark***From:*OndÅej BÃlka

**RE: [PATCH v2] Add math-inline benchmark***From:*Wilco Dijkstra

**Re: [PATCH v2] Add math-inline benchmark***From:*OndÅej BÃlka

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |