This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: memcpy walk benchmark [was: Hoist ZVA check out of the memset function]
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: "adhemerval dot zanella at linaro dot org" <adhemerval dot zanella at linaro dot org>, "siddhesh at sourceware dot org" <siddhesh at sourceware dot org>
- Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>
- Date: Mon, 16 Oct 2017 13:12:16 +0000
- Subject: Re: memcpy walk benchmark [was: Hoist ZVA check out of the memset function]
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco dot Dijkstra at arm dot com;
- Nodisclaimer: True
- References: <DB6PR0801MB20531D1099A99E5A4DF1E042834A0@DB6PR0801MB2053.eurprd08.prod.outlook.com>,<e4041b71-6b37-f8d0-d2c4-6f4224cf03bf@sourceware.org>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
Siddhesh Poyarekar wrote:
> On Thursday 12 October 2017 02:50 AM, Wilco Dijkstra wrote:
>> Finally we'll need to look into more detail at the new memcpy benchmarks -
>> while looping through memory seems like a good idea, it appears like it
>> only increments by 1. So at first sight it's still testing the exact same thing as
>> the existing benchmarks - all data is always cached in L1. For memset I guess
>> we're still missing a randomized benchmark based on real trace data.
>
> That's a slightly incorrect description of the benchmark. The benchmark
> walks through two buffers, one forward and one backward. While it
> increments one buffer, it decrements the other. Also, it interchanges
> the source and destination buffers for alternate memcpy calls.
> Altogether it ends up ensuring that the L1 hit impact is significantly
> reduced; the reduction in impact is proportional to the size of the
> memcpy, so larger sizes would almost never hit L1.
No this is not the case. For larger sizes it accesses less and less memory.
See eg. bench-memset-walk.c:
size_t i, iters = MIN_PAGE_SIZE / n;
for (i = 0; i < iters && s <= s_end; s++, i++)
CALL (impl, s, c, n);
Clearly the working set is (MIN_PAGE_SIZE / n) + n. For n=256 it is
128KB, for n=1024 it is just 33KB. I wouldn't call either a 32MB walk...
Presumably the intention was to do s += n?
Wilco