This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RE: [PATCH v2] Add math-inline benchmark
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: 'OndÅej BÃlka' <neleai at seznam dot cz>
- Cc: "Carlos O'Donell" <carlos at redhat dot com>, "'Siddhesh Poyarekar'" <siddhesh at redhat dot com>, "'GNU C Library'" <libc-alpha at sourceware dot org>
- Date: Tue, 21 Jul 2015 18:46:38 +0100
- Subject: RE: [PATCH v2] Add math-inline benchmark
- Authentication-results: sourceware.org; auth=none
- References: <002001d0bfb8$b36fa330$1a4ee990$ at com> <55A907F6 dot 8000504 at redhat dot com> <20150717143406 dot GG19592 at spoyarek dot pnq dot redhat dot com> <002c01d0c30a$8f23c600$ad6b5200$ at com> <55AD4AC6 dot 5070401 at redhat dot com> <20150721113502 dot GA3415 at domone> <003101d0c3d0$b9fb4350$2df1c9f0$ at com> <20150721170416 dot GA2294 at domone>
> OndÅej BÃlka wrote:
> On Tue, Jul 21, 2015 at 05:17:23PM +0100, Wilco Dijkstra wrote:
> > Fpclassify uses the builtin, but I think it's better to remove these
> > tests as they don't add much value. It's pretty obvious it is better
> > to use separate tests instead of fpclassify when you only need 2 of them.
> >
> I agree that its better to remove them.
>
> However it isn't obvious that separate tests are better as its wrong. It
> should be same code as gcc should optimize that to better but often
> isnt. When I looked to assembly of my benchmark of noinline tests then
> performance difference was caused just by gcc bug. In fpclassify case it
> increased and decreased stack pointer without writing anything to stack.
Sure optimization makes a difference to these benchmarks, that's unavoidable,
so that's why I always carefully check the generated assembly to ensure I
really measure what I wanted to measure.
> And its easy to find cases where opposite is true as this depends how
> you write benchmark, for example if I change my benchmark to use following:
>
> extern __always_inline int
> __fpclassify_test1 (double d)
> {
> int cl = fpclassify (d);
> return (cl == FP_INFINITE) ? 3 * sin (d) :
> ((cl == FP_NAN) ? 2 * cos (d) : 0) ;
> }
>
> extern __always_inline int
> __fpclassify_test2 (double d)
> {
> return (__builtin_isinf(d)) ? 3 * sin (d) :
> ((__builtin_isnan(d)) ? 2 * cos (d) : 0) ;
>
> }
>
> then fpclassify becomes lot faster.
>
> "__fpclassify_test1_t": {
> "normal": {
> "duration": 5.91344e+06,
> "iterations": 500,
> "mean": 11826
> }
> },
> "__fpclassify_test2_t": {
> "normal": {
> "duration": 7.45868e+06,
> "iterations": 500,
> "mean": 14917
> }
> },
Well that doesn't seem correct. Here are my results for the above
using latest GCC and modern x64 hardware:
"__fpclassify_test1_t": {
"normal": {
"duration": 1.99627e+06,
"iterations": 500,
"mean": 3992
}
},
"__fpclassify_test2_t": {
"normal": {
"duration": 1.9893e+06,
"iterations": 500,
"mean": 3978
}
},
> > > Also you don't test inlines in isnormal:
> > >
> > > > > +/* Explicit inline similar to existing math.h implementation. */
> > > > > +
> > > > > +#define __isnormal_inl(X) (__fpclassify (X) == FP_NORMAL)
> >
> > This is an inline and exactly what the current math.h does. The goal of my
> > benchmark is comparing the existing implementation with the new inlines.
> >
> Then yqu should compare existing implementation with new inlines. Now
> both would just call fpclassify without your patch.
OK, I'll add a builtin version for fpclassify as well (and use that in
__isnormal_inl2) - then all tests have the non-inline version, an internal
GLIBC inline if it exists, the builtin and the math.h version.
Wilco