This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v1.1] Randomize memcpy benchmark addresses.
- From: Will Newton <will dot newton at linaro dot org>
- To: Ondřej Bílka <neleai at seznam dot cz>
- Cc: libc-alpha <libc-alpha at sourceware dot org>
- Date: Thu, 5 Sep 2013 15:32:09 +0100
- Subject: Re: [PATCH v1.1] Randomize memcpy benchmark addresses.
- Authentication-results: sourceware.org; auth=none
- References: <20130904163151 dot GB10358 at domone dot kolej dot mff dot cuni dot cz> <20130904165025 dot GA15899 at domone dot kolej dot mff dot cuni dot cz> <CANu=DmjT9Do_1rbVJUGmR=WqGZmwX+KWNvp1QDOQrij3L3q3Xg at mail dot gmail dot com> <20130905113241 dot GA5818 at domone dot kolej dot mff dot cuni dot cz> <CANu=Dmitq7qp8gXHb0Us_WuL0bD7u_RfM-ZiR=E1b4=Da8ozQQ at mail dot gmail dot com> <20130905141547 dot GA18090 at domone dot kolej dot mff dot cuni dot cz>
On 5 September 2013 15:15, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Thu, Sep 05, 2013 at 01:00:17PM +0100, Will Newton wrote:
>> On 5 September 2013 12:32, OndÅej BÃlka <neleai@seznam.cz> wrote:
>> >> This means we no longer print what the buffer alignment is which makes
>> >> results analysis impossible.
>> >>
>> > Could you elaborate.
>>
>> The current benchmark shows the performance for memcpy for a given
>> length and source/dest alignment. This can be analyzed to see where
>> performance is strongest and weakest. If we do not print the alignment
>> of the buffers for each test then we can't do this analysis.
>>
> There are 1024 possible alignment to given size(assuming 64byte cache
> lines). Some pairs tend to be slower than others as we need to cross cache lines
> when reading/writing.
>
> Current benchmark prints results for 4 pairs of alignments. Please
> explain why do you thing that best and worst case are among them.
I don't think the current tests test all the necessary alignments but
that is a separate issue from whether or not we should print the
benchmarked alignment. For example, if a memcpy implementation has an
average case performance that is equal to another across a range of
random alignments it may have quite different performance
characteristics with various specific alignments of buffers. I think
this is something that is useful to be able to see.
--
Will Newton
Toolchain Working Group, Linaro