This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] Add math-inline benchmark


On Tue, Jul 21, 2015 at 05:17:23PM +0100, Wilco Dijkstra wrote:
> > OndÅej BÃlka wrote:
> > On Mon, Jul 20, 2015 at 03:23:50PM -0400, Carlos O'Donell wrote:
> > > On 07/20/2015 12:38 PM, Wilco Dijkstra wrote:
> > > >> Siddhesh Poyarekar wrote:
> > > >> > On Fri, Jul 17, 2015 at 09:49:42AM -0400, Carlos O'Donell wrote:
> > > >>> > > Maybe we can place this in another C file e.g. bench-util.c and #include that
> > > >>> > > and then call those functions?
> > > >> >
> > > >> > Yes, that is fine.
> > > > I've extracted the start function in bench-util.c, see updated version.
> > > >
> > > > Wilco
> > >
> > > One nit, OK to commit with that fixed.
> > >
> > No Carlos, this isn't ok. You need to do better review as this patch has
> > still some issues.
> 
> I don't think there is any point in blaming Carlos.
>
One needs to review patches to fix issues as early as possible. Saying
that something is ok when it isn't doesn't help. Just quickly post v3.

> > I am not talking about problems that this doesn't measure real workload
> > at all but simple problems that reviewer should point out.
> > 
> > Here its that builtins were added but not well. As this doesn't have
> > builtin_fpclassify comparing these doesn't make sense.
> > 
> > 
> > > > +  int cl = fpclassify (d);
> > > > +  return cl == FP_NAN || cl == FP_INFINITE;
> > 
> > and
> > 
> > > > +  return __builtin_isnan (d) || __builtin_isinf (d);
> 
> Fpclassify uses the builtin, but I think it's better to remove these
> tests as they don't add much value. It's pretty obvious it is better
> to use separate tests instead of fpclassify when you only need 2 of them.
> 
I agree that its better to remove them.

However it isn't obvious that separate tests are better as its wrong. It
should be same code as gcc should optimize that to better but often
isnt. When I looked to assembly of my benchmark of noinline tests then
performance difference was caused just by gcc bug. In fpclassify case it
increased and decreased stack pointer without writing anything to stack.

And its easy to find cases where opposite is true as this depends how
you write benchmark, for example if I change my benchmark to use following:

extern __always_inline int
__fpclassify_test1 (double d)
{
  int cl = fpclassify (d);
  return (cl == FP_INFINITE) ? 3 * sin (d) :
         ((cl == FP_NAN) ? 2 * cos (d) : 0) ;
}

extern __always_inline int
__fpclassify_test2 (double d)
{
  return (__builtin_isinf(d)) ? 3 * sin (d) :
         ((__builtin_isnan(d)) ? 2 * cos (d) : 0) ;

}

then fpclassify becomes lot faster.

   "__fpclassify_test1_t": {
    "normal": {
     "duration": 5.91344e+06,
     "iterations": 500,
     "mean": 11826
    }
   },
   "__fpclassify_test2_t": {
    "normal": {
     "duration": 7.45868e+06,
     "iterations": 500,
     "mean": 14917
    }
   },



> > Also you don't test inlines in isnormal:
> > 
> > > > +/* Explicit inline similar to existing math.h implementation.  */
> > > > +
> > > > +#define __isnormal_inl(X) (__fpclassify (X) == FP_NORMAL)
> 
> This is an inline and exactly what the current math.h does. The goal of my
> benchmark is comparing the existing implementation with the new inlines.
> 
Then yqu should compare existing implementation with new inlines. Now
both would just call fpclassify without your patch. 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]