This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: memcpy walk benchmark [was: Hoist ZVA check out of the memset function]


On Monday 16 October 2017 08:28 PM, Wilco Dijkstra wrote:
> They are identical so have the same issues. For memcpy/memmove I don't
> think it is a good idea to copy the same data back and forth, since that's not
> a common usage scenario, but also because it might penalize cores that
> bypass L1 for write streams.

Those benchmarks actually emulate the behaviour of an internal
proprietary workload, i.e. it is not 1:1 exact behaviour but it tracks
the performance.  However it is a fair point.  I'll just make it a copy
that walks backwards, like memset and see what the difference is and if
it continues to track the internal workload.  My guess is that it simply
depends on invalidation, which automatically happens for falkor since it
bypasses L1 for write streams.

> Generally the tests don't run long enough (even if they do access all 32MB),
> so I'd say they need an outer loop to repeat say 20 times. Also if we do

The total run time of the benchmark is quite long.  If we repeat the
runs then maybe we should consider reducing the sizes that are measured,
maybe limiting them to just > 128 bytes.

> exactly the same amount of work for each possible size, printing the total
> time would make comparing results between different sizes a bit easier.

Agreed.  I had done that with my first iteration with the transfer rate,
but put in time in the end to make it consistent with other tests.  A
total time per size is a better replacement than per-call time.

Siddhesh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]