This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: memcpy walk benchmark [was: Hoist ZVA check out of the memset function]

From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
To: "adhemerval dot zanella at linaro dot org" <adhemerval dot zanella at linaro dot org>, "siddhesh at sourceware dot org" <siddhesh at sourceware dot org>
Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>
Date: Mon, 16 Oct 2017 13:12:16 +0000
Subject: Re: memcpy walk benchmark [was: Hoist ZVA check out of the memset function]
Authentication-results: sourceware.org; auth=none
Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco dot Dijkstra at arm dot com;
Nodisclaimer: True
References: <DB6PR0801MB20531D1099A99E5A4DF1E042834A0@DB6PR0801MB2053.eurprd08.prod.outlook.com>,<e4041b71-6b37-f8d0-d2c4-6f4224cf03bf@sourceware.org>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99

Siddhesh Poyarekar wrote:  
> On Thursday 12 October 2017 02:50 AM, Wilco Dijkstra wrote:
>> Finally we'll need to look into more detail at the new memcpy benchmarks -
>> while looping through memory seems like a good idea, it appears like it
>> only increments by 1. So at first sight it's still testing the exact same thing as
>> the existing benchmarks - all data is always cached in L1. For memset I guess
>> we're still missing a randomized benchmark based on real trace data.
>
> That's a slightly incorrect description of the benchmark.  The benchmark
> walks through two buffers, one forward and one backward.  While it
> increments one buffer, it decrements the other.  Also, it interchanges
> the source and destination buffers for alternate memcpy calls.
> Altogether it ends up ensuring that the L1 hit impact is significantly
> reduced; the reduction in impact is proportional to the size of the
> memcpy, so larger sizes would almost never hit L1.

No this is not the case. For larger sizes it accesses less and less memory.
See eg. bench-memset-walk.c:

size_t i, iters = MIN_PAGE_SIZE / n;

   for (i = 0; i < iters && s <= s_end; s++, i++) 
     CALL (impl, s, c, n);

Clearly the working set is (MIN_PAGE_SIZE / n) + n. For n=256 it is
128KB, for n=1024 it is just 33KB. I wouldn't call either a 32MB walk...

Presumably the intention was to do s += n?

Wilco

Follow-Ups:
- Re: memcpy walk benchmark [was: Hoist ZVA check out of the memset function]
  - From: Siddhesh Poyarekar

References:
- Re: [PING][PATCHv3 1/2] aarch64: Hoist ZVA check out of the memset function
  - From: Wilco Dijkstra
- memcpy walk benchmark [was: Hoist ZVA check out of the memset function]
  - From: Siddhesh Poyarekar

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]