This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: [PATCH v2] Add math-inline benchmark

From: "Wilco Dijkstra" <wdijkstr at arm dot com>
To: 'OndÅej BÃlka' <neleai at seznam dot cz>
Cc: "'GNU C Library'" <libc-alpha at sourceware dot org>
Date: Mon, 20 Jul 2015 12:01:50 +0100
Subject: RE: [PATCH v2] Add math-inline benchmark
Authentication-results: sourceware.org; auth=none
References: <002001d0bfb8$b36fa330$1a4ee990$ at com> <20150716225056 dot GA24479 at domone> <002501d0c094$3ea04cd0$bbe0e670$ at com> <20150718113423 dot GC30356 at domone>

> OndÅej BÃlka wrote:
> On Fri, Jul 17, 2015 at 02:26:53PM +0100, Wilco Dijkstra wrote:
> > > OndÅej BÃlka wrote:
> > > On Thu, Jul 16, 2015 at 12:15:19PM +0100, Wilco Dijkstra wrote:
> > > > Add a benchmark for isinf/isnan/isnormal/isfinite/fpclassify. This new version adds
> explicit
> > > tests
> > > > for the GCC built-ins and uses json format as suggested and no longer includes any
> string
> > > headers.
> > > > The test uses 2 arrays with 1024 doubles, one with 99% finite FP numbers (10% zeroes,
> 10%
> > > negative)
> > > > and 1% inf/NaN, the other with 50% inf, and 50% Nan.
> > > >
> > > > Results shows that using the GCC built-ins in math.h gives huge speedups due to avoiding
> > > explict
> > > > calls, PLT indirection to execute a function with 3-4 instructions - around 7x on
> AArch64
> > > and 2.8x
> > > > on x64. The GCC builtins have better performance than the existing math_private inlines
> for
> > > __isnan,
> > > > __finite and __isinf_ns, so these should be removed.
> > > >
> > > No, this benchmark is invalid for following two reasons.
> > >
> > > 1) It doesn't measure real workload at all. Constructing large constant
> > > could be costy and by inlining this benchmark ignores cost.
> >
> > It was never meant to measure a real workload. If you'd like to add your own
> > workload, that would be great, but my micro benchmark is more than sufficient
> > in proving that the new inlines give a huge performance gain.
> >
> But you claimed following in original mail which is wrong:
> 
> "
> Results shows that using the GCC built-ins in math.h gives huge speedups due to avoiding
> explict
> calls, PLT indirection to execute a function with 3-4 instructions - around 7x on AArch64 and
> 2.8x
> on x64. The GCC builtins have better performance than the existing math_private inlines for
> __isnan,
> __finite and __isinf_ns, so these should be removed.
> "

No that statement is 100% correct.

> Also when inlines give speedup you should also add math inlines for
> signaling nan case. That gives similar speedup. And it would be natural
> to ask if you should use these inlines everytime if they are already
> faster than builtins.

I'm not sure what you mean here - I enable the new inlines in exactly the
right case. Improvements to support signalling NaNs or to speedup the 
built-ins further will be done in GCC.

> > > So at least on x64 we should publish math_private inlines instead using
> > > slow builtins.
> >
> > Well it was agreed we are going to use the GCC built-ins and then improve
> > those. If you want to propose additional patches with special inlines for
> > x64 then please go ahead, but my plan is to improve the builtins.
> >
> And how are you sure that its just isolated x64 case. It may also happen
> on powerpc, arm, sparc and other architectures and you need to test
> that.

It's obvious the huge speedup applies to all other architectures as well -
it's hard to imagine that avoiding a call, a return, a PLT indirection and 
additional optimization of 3-4 instructions could ever cause a slowdown...

> So I ask you again to run my benchmark with changed EXTRACT_WORDS64 to
> see if this is problem also and arm.

Here are the results for x64 with inlining disabled (__always_inline changed
into noinline) and the movq instruction like you suggested:

   "__isnan_t": {
    "normal": {
     "duration": 3.52048e+06,
     "iterations": 500,
     "mean": 7040
    }
   },
   "__isnan_inl_t": {
    "normal": {
     "duration": 3.09247e+06,
     "iterations": 500,
     "mean": 6184
    }
   },
   "__isnan_builtin_t": {
    "normal": {
     "duration": 2.20378e+06,
     "iterations": 500,
     "mean": 4407
    }
   },
   "isnan_t": {
    "normal": {
     "duration": 1.50514e+06,
     "iterations": 500,
     "mean": 3010
    }
   },
   "__isinf_t": {
    "normal": {
     "duration": 4.54681e+06,
     "iterations": 500,
     "mean": 9093
    }
   },
   "__isinf_inl_t": {
    "normal": {
     "duration": 3.09981e+06,
     "iterations": 500,
     "mean": 6199
    }
   },
   "__isinf_ns_t": {
    "normal": {
     "duration": 3.08074e+06,
     "iterations": 500,
     "mean": 6161
    }
   },
   "__isinf_ns_builtin_t": {
    "normal": {
     "duration": 2.64185e+06,
     "iterations": 500,
     "mean": 5283
    }
   },
   "__isinf_builtin_t": {
    "normal": {
     "duration": 3.13833e+06,
     "iterations": 500,
     "mean": 6276
    }
   },
   "isinf_t": {
    "normal": {
     "duration": 1.60055e+06,
     "iterations": 500,
     "mean": 3201
    }
   },
   "__finite_t": {
    "normal": {
     "duration": 3.54966e+06,
     "iterations": 500,
     "mean": 7099
    }
   },
   "__finite_inl_t": {
    "normal": {
     "duration": 3.08112e+06,
     "iterations": 500,
     "mean": 6162
    }
   },
   "__isfinite_builtin_t": {
    "normal": {
     "duration": 2.6426e+06,
     "iterations": 500,
     "mean": 5285
    }
   },
   "isfinite_t": {
    "normal": {
     "duration": 1.49071e+06,
     "iterations": 500,
     "mean": 2981
    }
   },
   "__isnormal_inl_t": {
    "normal": {
     "duration": 6.31925e+06,
     "iterations": 500,
     "mean": 12638
    }
   },
   "__isnormal_inl2_t": {
    "normal": {
     "duration": 2.18113e+06,
     "iterations": 500,
     "mean": 4362
    }
   },
   "__isnormal_builtin_t": {
    "normal": {
     "duration": 3.08183e+06,
     "iterations": 500,
     "mean": 6163
    }
   },
   "isnormal_t": {
    "normal": {
     "duration": 2.19867e+06,
     "iterations": 500,
     "mean": 4397
    }
   },
   "__fpclassify_test1_t": {
    "normal": {
     "duration": 2.71552e+06,
     "iterations": 500,
     "mean": 5431
    }
   },
   "__fpclassify_test2_t": {
    "normal": {
     "duration": 2.69459e+06,
     "iterations": 500,
     "mean": 5389
    }
   },
   "__fpclassify_t": {
    "normal": {
     "duration": 6.42558e+06,
     "iterations": 500,
     "mean": 12851
    }
   },
   "fpclassify_t": {
    "normal": {
     "duration": 1.4907e+06,
     "iterations": 500,
     "mean": 2981
    }
   },
   "remainder_test1_t": {
    "normal": {
     "duration": 2.20506e+07,
     "iterations": 500,
     "mean": 44101
    }
   },
   "remainder_test2_t": {
    "normal": {
     "duration": 2.12782e+07,
     "iterations": 500,
     "mean": 42556
    }
   }

Follow-Ups:
- Re: [PATCH v2] Add math-inline benchmark
  - From: OndÅej BÃlka

References:
- [PATCH v2] Add math-inline benchmark
  - From: Wilco Dijkstra
- Re: [PATCH v2] Add math-inline benchmark
  - From: OndÅej BÃlka
- RE: [PATCH v2] Add math-inline benchmark
  - From: Wilco Dijkstra
- Re: [PATCH v2] Add math-inline benchmark
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]