This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: memcpy walk benchmark [was: Hoist ZVA check out of the memset function]


Siddhesh Poyarekar wrote:  
> On Thursday 12 October 2017 02:50 AM, Wilco Dijkstra wrote:
>> Finally we'll need to look into more detail at the new memcpy benchmarks -
>> while looping through memory seems like a good idea, it appears like it
>> only increments by 1. So at first sight it's still testing the exact same thing as
>> the existing benchmarks - all data is always cached in L1. For memset I guess
>> we're still missing a randomized benchmark based on real trace data.
>
> That's a slightly incorrect description of the benchmark.  The benchmark
> walks through two buffers, one forward and one backward.  While it
> increments one buffer, it decrements the other.  Also, it interchanges
> the source and destination buffers for alternate memcpy calls.
> Altogether it ends up ensuring that the L1 hit impact is significantly
> reduced; the reduction in impact is proportional to the size of the
> memcpy, so larger sizes would almost never hit L1.

No this is not the case. For larger sizes it accesses less and less memory.
See eg. bench-memset-walk.c:

size_t i, iters = MIN_PAGE_SIZE / n;

   for (i = 0; i < iters && s <= s_end; s++, i++) 
     CALL (impl, s, c, n);

Clearly the working set is (MIN_PAGE_SIZE / n) + n. For n=256 it is
128KB, for n=1024 it is just 33KB. I wouldn't call either a 32MB walk...

Presumably the intention was to do s += n?

Wilco

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]