This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Support separate benchmark outputs


On Wed, Apr 17, 2013 at 01:05:33PM +0530, Siddhesh Poyarekar wrote:
> On 17 April 2013 11:45, OndÅej BÃlka <neleai@seznam.cz> wrote:
> > And how you describe conditions of your benchmark. Conditions under
> > which we test should be carefully selected to get as much information as
> > possible.
> 
> We assume ideal conditions, i.e. how the function would behave if it
> were relatively uncontested at the processor and cache level.  That is
And exactly these ideal conditions are problem. They are not ideal 
 uncontested at the processor and cache level like you imagine.

They are ideal in happens in
for (i = 0; i < 100; i++) memcpy (x, y, n);
and nothing else.

More important factor in this idealization than uncontested processor is 
hot branch cache. Which when combined with all branches gives biased
results. 
It is quite rare that function is called with same arguments in
succession (same in takes same branches sense.) 
Even if this would be true you fail to account code that runs between
calls which will trash branch cache.

> why I believe we take into consideration the best time and not the
> average.  We test various defined variations (alignment, sizes) to see
> how the functions behave and scale, again in these ideal conditions.
>
Testing alignments is important because as it varies different paths are
taken. This causes some branches to be mispredicted which we want to
capture. Setting alignment deterministicaly defeats the purpose.
 
> >>  Our goal is to  provide acceptable average case performance,
> >
> > A code that I commented is following:
> >
> >> +      for (i = 0; i < 32; ++i)
> >> +     {
> >> +       HP_TIMING_NOW (start);
> >> +       CALL (impl, dst, src, len);
> >> +       HP_TIMING_NOW (stop);
> >> +       HP_TIMING_BEST (best_time, start, stop);
> >> +     }
> >
> > It does not provide average case performance. It computes best case performance.
> > This favours implementation with complex control flow that are in 5% of
> > cases where conditions are rigth faster but not so in 95% of bulk data.
> > Please compute average.
> 
> I disagree for reasons I explained above.  In any case I'd like to
> separate the plain copying over of the tests from any changes that
> could be made to these files.  So if you have any ideas that you want
> to implement here, you could do them once these tests are in.
> 
This is not related to ideal conditions. You first make ideal
conditions. Then you pick among results.

This is basic mistake. 

According to this metric following code

int foo(int x)
{
  if (rand()%10 < 1) 
    for (i=0 ; i<10 ; i++) x=3*x;
	else
    for (i=0 ; i<100000 ; i++) x=3*x;
	return x;
}
Is 100 times faster than
int bar(int x)
{
  for (i=0 ; i<1000)  x=3*x;
  return x;
}
Despite opposite is true.

Second problem is that even if both implementation have same running
time then according to minimum metric one with bigger variance is
better.

This is wrong. We need to favor implementations with small variance as
we are more sure that performance data are correct.

Third problem with minimum is that you could pick only random noise.
When implementations have near same time then deciding factor could be
for example which implementation got input with only 2 cache misses
instead 5 that was minimum of other implementations.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]