This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Support separate benchmark outputs

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Siddhesh Poyarekar <siddhesh dot poyarekar at gmail dot com>
Cc: Siddhesh Poyarekar <siddhesh at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Thu, 18 Apr 2013 21:49:45 +0200
Subject: Re: [PATCH] Support separate benchmark outputs
References: <20130416122544 dot GH3063 at spoyarek dot pnq dot redhat dot com> <20130416132838 dot GA29626 at domone dot kolej dot mff dot cuni dot cz> <20130416140355 dot GI3063 at spoyarek dot pnq dot redhat dot com> <20130416155032 dot GA30216 at domone dot kolej dot mff dot cuni dot cz> <CAAHN_R1U_8_OMuzVKJ2q2w8i_XWYrx0mYmvHodBLxmNLiznSkQ at mail dot gmail dot com> <20130416185720 dot GA790 at domone dot kolej dot mff dot cuni dot cz> <CAAHN_R0AVLCRDf=Qp1269U3xxUN9LFh_+32mbjyvOVfqSC-rsw at mail dot gmail dot com> <20130417061507 dot GA5120 at domone dot kolej dot mff dot cuni dot cz> <CAAHN_R2xhvrwo=5L-0Q5cDPuUnn=wz--w1c_frJQwACDiWpE9Q at mail dot gmail dot com>

On Wed, Apr 17, 2013 at 01:05:33PM +0530, Siddhesh Poyarekar wrote:
> On 17 April 2013 11:45, OndÅej BÃlka <neleai@seznam.cz> wrote:
> > And how you describe conditions of your benchmark. Conditions under
> > which we test should be carefully selected to get as much information as
> > possible.
> 
> We assume ideal conditions, i.e. how the function would behave if it
> were relatively uncontested at the processor and cache level.  That is
And exactly these ideal conditions are problem. They are not ideal 
 uncontested at the processor and cache level like you imagine.

They are ideal in happens in
for (i = 0; i < 100; i++) memcpy (x, y, n);
and nothing else.

More important factor in this idealization than uncontested processor is 
hot branch cache. Which when combined with all branches gives biased
results. 
It is quite rare that function is called with same arguments in
succession (same in takes same branches sense.) 
Even if this would be true you fail to account code that runs between
calls which will trash branch cache.

> why I believe we take into consideration the best time and not the
> average.  We test various defined variations (alignment, sizes) to see
> how the functions behave and scale, again in these ideal conditions.
>
Testing alignments is important because as it varies different paths are
taken. This causes some branches to be mispredicted which we want to
capture. Setting alignment deterministicaly defeats the purpose.
 
> >>  Our goal is to  provide acceptable average case performance,
> >
> > A code that I commented is following:
> >
> >> +      for (i = 0; i < 32; ++i)
> >> +     {
> >> +       HP_TIMING_NOW (start);
> >> +       CALL (impl, dst, src, len);
> >> +       HP_TIMING_NOW (stop);
> >> +       HP_TIMING_BEST (best_time, start, stop);
> >> +     }
> >
> > It does not provide average case performance. It computes best case performance.
> > This favours implementation with complex control flow that are in 5% of
> > cases where conditions are rigth faster but not so in 95% of bulk data.
> > Please compute average.
> 
> I disagree for reasons I explained above.  In any case I'd like to
> separate the plain copying over of the tests from any changes that
> could be made to these files.  So if you have any ideas that you want
> to implement here, you could do them once these tests are in.
> 
This is not related to ideal conditions. You first make ideal
conditions. Then you pick among results.

This is basic mistake. 

According to this metric following code

int foo(int x)
{
  if (rand()%10 < 1) 
    for (i=0 ; i<10 ; i++) x=3*x;
	else
    for (i=0 ; i<100000 ; i++) x=3*x;
	return x;
}
Is 100 times faster than
int bar(int x)
{
  for (i=0 ; i<1000)  x=3*x;
  return x;
}
Despite opposite is true.

Second problem is that even if both implementation have same running
time then according to minimum metric one with bigger variance is
better.

This is wrong. We need to favor implementations with small variance as
we are more sure that performance data are correct.

Third problem with minimum is that you could pick only random noise.
When implementations have near same time then deciding factor could be
for example which implementation got input with only 2 cache misses
instead 5 that was minimum of other implementations.

Follow-Ups:
- Re: [PATCH] Support separate benchmark outputs
  - From: Siddhesh Poyarekar

References:
- [PATCH] Support separate benchmark outputs
  - From: Siddhesh Poyarekar
- Re: [PATCH] Support separate benchmark outputs
  - From: OndÅej BÃlka
- Re: [PATCH] Support separate benchmark outputs
  - From: Siddhesh Poyarekar
- Re: [PATCH] Support separate benchmark outputs
  - From: OndÅej BÃlka
- Re: [PATCH] Support separate benchmark outputs
  - From: Siddhesh Poyarekar
- Re: [PATCH] Support separate benchmark outputs
  - From: OndÅej BÃlka
- Re: [PATCH] Support separate benchmark outputs
  - From: Siddhesh Poyarekar
- Re: [PATCH] Support separate benchmark outputs
  - From: OndÅej BÃlka
- Re: [PATCH] Support separate benchmark outputs
  - From: Siddhesh Poyarekar

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]