This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] benchtests/Makefile: Run the string benchmarks four times by default.


On Thu, Sep 05, 2013 at 04:18:18PM +0100, Will Newton wrote:
> On 5 September 2013 16:03, OndÅej BÃlka <neleai@seznam.cz> wrote:
> > On Thu, Sep 05, 2013 at 08:51:53AM +0100, Will Newton wrote:
> >> The intention of my patch - which I may have not made completely clear
> >> in the commit message - is to improve test stability. What I mean by
> >> this is that with a physically indexed cache the physical pages
> >> allocated to the test can have a significant effect on the performance
> >> at large (e.g. cache size / ways and above) buffer sizes and this will
> >> cause variation when running the same test multiple times. My aim is
> >> to average out these differences as it is hard to control for them
> >> without understanding the details of the cache subsystem of the system
> >> you are running on.
> >>
> > This can be just explained just by having more data. Simply multiplying
> > iteration count by four would then do same job.
> 
> No, it wouldn't. That would just mean four times as much data
> resulting in a reduced variance but the same systematic error.
> 
That is you claim. I am now asking you second time to prove it.

As I write in previous mail in same place:

Please run your patch ten times and calculate variance. Compare that to
variance when iteration count is increased 4 times and show if there is
improvement.



> >> Your test appears to be addressing concerns of test validity by
> >> running a wider range of buffer alignments, which is an important but
> >> separate concern IMO.
> >>
> > No, your patch will pick src pointer at 4 different physical pages (one
> > allocated in each run) and calculate average performance.
> >
> > Mine will pick src pointers in 2000000/4096 = 488 different pages and
> > calculate average.
> 
> Yes, this would work too. But it has a number of flaws:
> 
> 1. It does not allow one to analyze the performance of the code across
> alignments, everything gets averaged together.

You cannot analyse performance across alignments now as benchmarks do
not print necessary data. 

For such analysis make sense you would need much more data. This would
make benchmarks as they are slower so it would need be used as option. 
If you would like analysis done in separate program you would need to 
print much more data. As this would make .out files harder to read for 
humans a better way would print results in separate part where you would 
use separate format. 
If you would do analysis inside benchmark then you would implement your
own logic making this issue also moot.

> 2. It has no mechanism for showing variance, whereas multiple runs of
> the same test the variance of the means can at least be seen.

There is a pretty good merchanism of showing variance and it is called 
calculating variance. However adding variance calculation is separate
issue.


> 3. It only works for one test (memcpy).
>
It is first step. A randomization is needed for all string functions and
it is better to start on concrete example. 

 
> -- 
> Will Newton
> Toolchain Working Group, Linaro


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]