This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Support separate benchmark outputs


I feel like I'm repeating myself, but I'm going to have a go one last
time.  I need feedback from the rest of the community as well on
whether we simply drop these tests or go ahead because this is turning
into a stalemate.

On 19 April 2013 01:19, OndÅej BÃlka <neleai@seznam.cz> wrote:
> And exactly these ideal conditions are problem. They are not ideal
>  uncontested at the processor and cache level like you imagine.
>
> They are ideal in happens in
> for (i = 0; i < 100; i++) memcpy (x, y, n);
> and nothing else.
>
> More important factor in this idealization than uncontested processor is
> hot branch cache. Which when combined with all branches gives biased
> results.
> It is quite rare that function is called with same arguments in
> succession (same in takes same branches sense.)
> Even if this would be true you fail to account code that runs between
> calls which will trash branch cache.

I am not looking for 'real world' performance numbers -
microbenchmarks are not intended to do that.  That is the job of a
system benchmark.microbenchmarks test performance of specific code
paths.  I want to minimize jitter caused due to clobbering of the
cache.

>> I disagree for reasons I explained above.  In any case I'd like to
>> separate the plain copying over of the tests from any changes that
>> could be made to these files.  So if you have any ideas that you want
>> to implement here, you could do them once these tests are in.
>>
> This is not related to ideal conditions. You first make ideal
> conditions. Then you pick among results.
>
> This is basic mistake.
>
> According to this metric following code
>
> int foo(int x)
> {
>   if (rand()%10 < 1)
>     for (i=0 ; i<10 ; i++) x=3*x;
>         else
>     for (i=0 ; i<100000 ; i++) x=3*x;
>         return x;
> }
> Is 100 times faster than
> int bar(int x)
> {
>   for (i=0 ; i<1000)  x=3*x;
>   return x;
> }
> Despite opposite is true.

Firstly, those two are not equivalent functions.  Take these
equivalent functions as an example:

> int foo(int x)
> {
>   if (x != 0)
>     {
>       for (i=0 ; i<1000; i++) x=3*x;
>         return x;
>     }
>   else
>     return 0;
> }
>
> int bar(int x)
> {
>   for (i=0 ; i<1000)  x=3*x;
>   return x;
> }

Here, there ought to be two benchmarks to cover the branches in foo.
That way you break down the performance numbers into various classes
and then see variance explicitly.  That is precisely what we do with
the tests when we look at misaligned and aligned data and then various
sizes separately.

> Third problem with minimum is that you could pick only random noise.
> When implementations have near same time then deciding factor could be
> for example which implementation got input with only 2 cache misses
> instead 5 that was minimum of other implementations.

That is why you need to (1) control your benchmark environment and (2)
repeat enough number of times to reduce the effects of jitter.  Heck,
I'd suggest benchmarking code in runlevel 1 if I didn't have to access
machines over the network for benchmarking.

Siddhesh
--
http://siddhesh.in


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]