This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2] Add math-inline benchmark
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Wilco Dijkstra <wdijkstr at arm dot com>
- Cc: Carlos O'Donell <carlos at redhat dot com>, 'Siddhesh Poyarekar' <siddhesh at redhat dot com>, 'GNU C Library' <libc-alpha at sourceware dot org>
- Date: Tue, 21 Jul 2015 20:12:14 +0200
- Subject: Re: [PATCH v2] Add math-inline benchmark
- Authentication-results: sourceware.org; auth=none
- References: <002001d0bfb8$b36fa330$1a4ee990$ at com> <55A907F6 dot 8000504 at redhat dot com> <20150717143406 dot GG19592 at spoyarek dot pnq dot redhat dot com> <002c01d0c30a$8f23c600$ad6b5200$ at com> <55AD4AC6 dot 5070401 at redhat dot com> <20150721113502 dot GA3415 at domone> <003101d0c3d0$b9fb4350$2df1c9f0$ at com> <20150721170416 dot GA2294 at domone> <003201d0c3dd$322727c0$96757740$ at com>
On Tue, Jul 21, 2015 at 06:46:38PM +0100, Wilco Dijkstra wrote:
> > OndÅej BÃlka wrote:
> > On Tue, Jul 21, 2015 at 05:17:23PM +0100, Wilco Dijkstra wrote:
>
> > > Fpclassify uses the builtin, but I think it's better to remove these
> > > tests as they don't add much value. It's pretty obvious it is better
> > > to use separate tests instead of fpclassify when you only need 2 of them.
> > >
> > I agree that its better to remove them.
> >
> > However it isn't obvious that separate tests are better as its wrong. It
> > should be same code as gcc should optimize that to better but often
> > isnt. When I looked to assembly of my benchmark of noinline tests then
> > performance difference was caused just by gcc bug. In fpclassify case it
> > increased and decreased stack pointer without writing anything to stack.
>
> Sure optimization makes a difference to these benchmarks, that's unavoidable,
> so that's why I always carefully check the generated assembly to ensure I
> really measure what I wanted to measure.
>
> > And its easy to find cases where opposite is true as this depends how
> > you write benchmark, for example if I change my benchmark to use following:
> >
> > extern __always_inline int
> > __fpclassify_test1 (double d)
> > {
> > int cl = fpclassify (d);
> > return (cl == FP_INFINITE) ? 3 * sin (d) :
> > ((cl == FP_NAN) ? 2 * cos (d) : 0) ;
> > }
> >
> > extern __always_inline int
> > __fpclassify_test2 (double d)
> > {
> > return (__builtin_isinf(d)) ? 3 * sin (d) :
> > ((__builtin_isnan(d)) ? 2 * cos (d) : 0) ;
> >
> > }
> >
> > then fpclassify becomes lot faster.
> >
> > "__fpclassify_test1_t": {
> > "normal": {
> > "duration": 5.91344e+06,
> > "iterations": 500,
> > "mean": 11826
> > }
> > },
> > "__fpclassify_test2_t": {
> > "normal": {
> > "duration": 7.45868e+06,
> > "iterations": 500,
> > "mean": 14917
> > }
> > },
>
> Well that doesn't seem correct. Here are my results for the above
> using latest GCC and modern x64 hardware:
>
Thats what your benchmark produces. Again difference between mine
benchmarks and yours is in how you handled constants loads by forbidding
inlining. My benchmark used following:
#define BOOLTEST(func) \
int __attribute__((noinline)) func## _t_t (double d,int a) \
{ \
if (func(d)) \
return 3 * sin (d) + a; \
else \
return 2; \
} \
int \
func ## _t (volatile double *p, size_t n, size_t iters) \
{ \
int i, j; \
int res = 0; \
for (j = 0; j < iters; j++) \
for (i = 0; i < n; i++) \
res += func## _t_t (2.0 * p[i], i); \
return res; \
}
> "__fpclassify_test1_t": {
> "normal": {
> "duration": 1.99627e+06,
> "iterations": 500,
> "mean": 3992
> }
> },
> "__fpclassify_test2_t": {
> "normal": {
> "duration": 1.9893e+06,
> "iterations": 500,
> "mean": 3978
> }
> },
>
> > > > Also you don't test inlines in isnormal:
> > > >
> > > > > > +/* Explicit inline similar to existing math.h implementation. */
> > > > > > +
> > > > > > +#define __isnormal_inl(X) (__fpclassify (X) == FP_NORMAL)
> > >
> > > This is an inline and exactly what the current math.h does. The goal of my
> > > benchmark is comparing the existing implementation with the new inlines.
> > >
> > Then yqu should compare existing implementation with new inlines. Now
> > both would just call fpclassify without your patch.
>
> OK, I'll add a builtin version for fpclassify as well (and use that in
> __isnormal_inl2) - then all tests have the non-inline version, an internal
> GLIBC inline if it exists, the builtin and the math.h version.
>