This is the mail archive of the
mailing list for the glibc project.
RE: [PATCH v2] Add math-inline benchmark
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: 'Ondřej Bílka' <neleai at seznam dot cz>
- Cc: "'GNU C Library'" <libc-alpha at sourceware dot org>
- Date: Fri, 17 Jul 2015 14:26:53 +0100
- Subject: RE: [PATCH v2] Add math-inline benchmark
- Authentication-results: sourceware.org; auth=none
- References: <002001d0bfb8$b36fa330$1a4ee990$ at com> <20150716225056 dot GA24479 at domone>
> Ondřej Bílka wrote:
> On Thu, Jul 16, 2015 at 12:15:19PM +0100, Wilco Dijkstra wrote:
> > Add a benchmark for isinf/isnan/isnormal/isfinite/fpclassify. This new version adds explicit
> > for the GCC built-ins and uses json format as suggested and no longer includes any string
> > The test uses 2 arrays with 1024 doubles, one with 99% finite FP numbers (10% zeroes, 10%
> > and 1% inf/NaN, the other with 50% inf, and 50% Nan.
> > Results shows that using the GCC built-ins in math.h gives huge speedups due to avoiding
> > calls, PLT indirection to execute a function with 3-4 instructions - around 7x on AArch64
> and 2.8x
> > on x64. The GCC builtins have better performance than the existing math_private inlines for
> > __finite and __isinf_ns, so these should be removed.
> No, this benchmark is invalid for following two reasons.
> 1) It doesn't measure real workload at all. Constructing large constant
> could be costy and by inlining this benchmark ignores cost.
It was never meant to measure a real workload. If you'd like to add your own
workload, that would be great, but my micro benchmark is more than sufficient
in proving that the new inlines give a huge performance gain.
> 2) Results on x64 don't measure inlines but inferior version as they use
> assembly to change double into integer.
> As I and Joseph told you multiple times to measure these and I send
> benchmarks to demonstrate its effects. So I fixed your benchmark and now
> it clearly shows that in all cases math_private inlines are better than
> builtins on x64 (Now its x64 only due EXTRACT_WORDS64, you need to sync that with math-
> Even remainder is slower.
It shouldn't be necessary to use assembly code, GCC is perfectly capable of
generating efficient code for EXTRACT_WORDS. If GCC doesn't for x64 then you
should report it and ensure it gets fixed. I don't understand why people keep
using target-specific hacks and assembly rather than fixing the underlying
issues and use generic code that works well on all targets...
> So at least on x64 we should publish math_private inlines instead using
> slow builtins.
Well it was agreed we are going to use the GCC built-ins and then improve
those. If you want to propose additional patches with special inlines for
x64 then please go ahead, but my plan is to improve the builtins.
> I also added better finite,isnormal inlines, these should have same
> speed as isnan. Main improvement is multiplication by 2 instead anding
> with constant to mask sign bit.
Your inlines may well be very fast but they are not correct, eg. this doesn't
check the exponent at all:
> +extern __always_inline int
> +isinf_new (double dx)
> + uint64_t di;
> + EXTRACT_WORDS64 (di, dx);
> + if (__builtin_expect ((di << 12) != 0, 1))
> + return 0;
> + return (int) (di >> 32);