This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: [PATCH v2] Add math-inline benchmark

From: "Wilco Dijkstra" <wdijkstr at arm dot com>
To: 'Ondřej Bílka' <neleai at seznam dot cz>
Cc: "'GNU C Library'" <libc-alpha at sourceware dot org>
Date: Fri, 17 Jul 2015 14:26:53 +0100
Subject: RE: [PATCH v2] Add math-inline benchmark
Authentication-results: sourceware.org; auth=none
References: <002001d0bfb8$b36fa330$1a4ee990$ at com> <20150716225056 dot GA24479 at domone>

> Ondřej Bílka wrote:
> On Thu, Jul 16, 2015 at 12:15:19PM +0100, Wilco Dijkstra wrote:
> > Add a benchmark for isinf/isnan/isnormal/isfinite/fpclassify. This new version adds explicit
> tests
> > for the GCC built-ins and uses json format as suggested and no longer includes any string
> headers.
> > The test uses 2 arrays with 1024 doubles, one with 99% finite FP numbers (10% zeroes, 10%
> negative)
> > and 1% inf/NaN, the other with 50% inf, and 50% Nan.
> >
> > Results shows that using the GCC built-ins in math.h gives huge speedups due to avoiding
> explict
> > calls, PLT indirection to execute a function with 3-4 instructions - around 7x on AArch64
> and 2.8x
> > on x64. The GCC builtins have better performance than the existing math_private inlines for
> __isnan,
> > __finite and __isinf_ns, so these should be removed.
> >
> No, this benchmark is invalid for following two reasons.
> 
> 1) It doesn't measure real workload at all. Constructing large constant
> could be costy and by inlining this benchmark ignores cost.

It was never meant to measure a real workload. If you'd like to add your own 
workload, that would be great, but my micro benchmark is more than sufficient
in proving that the new inlines give a huge performance gain.

> 2) Results on x64 don't measure inlines but inferior version as they use
> assembly to change double into integer.
> 
> As I and Joseph told you multiple times to measure these and I send
> benchmarks to demonstrate its effects. So I fixed your benchmark and now
> it clearly shows that in all cases math_private inlines are better than
> builtins on x64 (Now its x64 only due EXTRACT_WORDS64, you need to sync that with math-
> private).
> Even remainder is slower.

It shouldn't be necessary to use assembly code, GCC is perfectly capable of
generating efficient code for EXTRACT_WORDS. If GCC doesn't for x64 then you
should report it and ensure it gets fixed. I don't understand why people keep
using target-specific hacks and assembly rather than fixing the underlying
issues and use generic code that works well on all targets...

> So at least on x64 we should publish math_private inlines instead using
> slow builtins.

Well it was agreed we are going to use the GCC built-ins and then improve
those. If you want to propose additional patches with special inlines for
x64 then please go ahead, but my plan is to improve the builtins.

> I also added better finite,isnormal inlines, these should have same
> speed as isnan. Main improvement is multiplication by 2 instead anding
> with constant to mask sign bit.

Your inlines may well be very fast but they are not correct, eg. this doesn't
check the exponent at all:

> +extern __always_inline int
> +isinf_new (double dx)
> +{
> +  uint64_t di;
> +  EXTRACT_WORDS64 (di, dx);
> +
> +  if (__builtin_expect ((di << 12) != 0, 1))
> +    return 0;
> +
> +  return (int) (di >> 32);
> +}

Wilco

Follow-Ups:
- Re: [PATCH v2] Add math-inline benchmark
  - From: Carlos O'Donell
- Re: [PATCH v2] Add math-inline benchmark
  - From: OndÅej BÃlka

References:
- [PATCH v2] Add math-inline benchmark
  - From: Wilco Dijkstra
- Re: [PATCH v2] Add math-inline benchmark
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]