This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] benchtests/Makefile: Run the string benchmarks four times by default.
- From: Will Newton <will dot newton at linaro dot org>
- To: Ondřej Bílka <neleai at seznam dot cz>
- Cc: libc-alpha <libc-alpha at sourceware dot org>, Patch Tracking <patches at linaro dot org>
- Date: Thu, 5 Sep 2013 18:06:40 +0100
- Subject: Re: [PATCH] benchtests/Makefile: Run the string benchmarks four times by default.
- Authentication-results: sourceware.org; auth=none
- References: <52274838 dot 7010902 at linaro dot org> <20130904161743 dot GA10358 at domone dot kolej dot mff dot cuni dot cz> <CANu=DmiVWFijri_iMjFGJEWdTWheHbBFOH8XULURRE8pLMkuLA at mail dot gmail dot com> <20130904165211 dot GA14906 at domone dot kolej dot mff dot cuni dot cz> <CANu=Dmgsr3RcN7dRge0QB0tE9GGDuwVYu-TR_vOEOhzdtT4LJw at mail dot gmail dot com> <20130905150303 dot GA18450 at domone dot kolej dot mff dot cuni dot cz> <CANu=Dmj1k+a6AAwz1Fe5vfaSojdzTDGfLuHNffaoaaGt6j3sWA at mail dot gmail dot com> <20130905160449 dot GA18784 at domone dot kolej dot mff dot cuni dot cz>
On 5 September 2013 17:04, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Thu, Sep 05, 2013 at 04:18:18PM +0100, Will Newton wrote:
>> On 5 September 2013 16:03, OndÅej BÃlka <neleai@seznam.cz> wrote:
>> > On Thu, Sep 05, 2013 at 08:51:53AM +0100, Will Newton wrote:
>> >> The intention of my patch - which I may have not made completely clear
>> >> in the commit message - is to improve test stability. What I mean by
>> >> this is that with a physically indexed cache the physical pages
>> >> allocated to the test can have a significant effect on the performance
>> >> at large (e.g. cache size / ways and above) buffer sizes and this will
>> >> cause variation when running the same test multiple times. My aim is
>> >> to average out these differences as it is hard to control for them
>> >> without understanding the details of the cache subsystem of the system
>> >> you are running on.
>> >>
>> > This can be just explained just by having more data. Simply multiplying
>> > iteration count by four would then do same job.
>>
>> No, it wouldn't. That would just mean four times as much data
>> resulting in a reduced variance but the same systematic error.
>>
> That is you claim. I am now asking you second time to prove it.
>
> As I write in previous mail in same place:
>
> Please run your patch ten times and calculate variance. Compare that to
> variance when iteration count is increased 4 times and show if there is
> improvement.
The benchmarks do not currently have any measure of variance so it's
not possible to do this with the benchmarks as they stand. I have seen
this effect with other benchmarks however.
>> >> Your test appears to be addressing concerns of test validity by
>> >> running a wider range of buffer alignments, which is an important but
>> >> separate concern IMO.
>> >>
>> > No, your patch will pick src pointer at 4 different physical pages (one
>> > allocated in each run) and calculate average performance.
>> >
>> > Mine will pick src pointers in 2000000/4096 = 488 different pages and
>> > calculate average.
>>
>> Yes, this would work too. But it has a number of flaws:
>>
>> 1. It does not allow one to analyze the performance of the code across
>> alignments, everything gets averaged together.
>
> You cannot analyse performance across alignments now as benchmarks do
> not print necessary data.
It currently prints the alignments of the buffers, that is all that is
required. The alignments chosen are a rather poor selection though I
would agree.
>
>> 2. It has no mechanism for showing variance, whereas multiple runs of
>> the same test the variance of the means can at least be seen.
>
> There is a pretty good merchanism of showing variance and it is called
> calculating variance. However adding variance calculation is separate
> issue.
I think you misunderstand me. The benchmarks as they stand do not
output any measure of variance. Multiple runs is a quick and easy way
to get a measure of variance without modifying the benchmarks or their
output.
>> 3. It only works for one test (memcpy).
>>
> It is first step. A randomization is needed for all string functions and
> it is better to start on concrete example.
I agree completely, lets start by finding the best way to fix the
benchmarks, but once we have consensus i think it would be best to fix
all the benchmarks rather than leave some unfixed.
--
Will Newton
Toolchain Working Group, Linaro