This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: memcpy walk benchmark [was: Hoist ZVA check out of the memset function]

From: Siddhesh Poyarekar <siddhesh at sourceware dot org>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "adhemerval dot zanella at linaro dot org" <adhemerval dot zanella at linaro dot org>
Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>
Date: Mon, 16 Oct 2017 20:47:14 +0530
Subject: Re: memcpy walk benchmark [was: Hoist ZVA check out of the memset function]
Authentication-results: sourceware.org; auth=none
References: <DB6PR0801MB20531D1099A99E5A4DF1E042834A0@DB6PR0801MB2053.eurprd08.prod.outlook.com> <e4041b71-6b37-f8d0-d2c4-6f4224cf03bf@sourceware.org> <DB6PR0801MB2053B6A855B6D8C4FAB68566834F0@DB6PR0801MB2053.eurprd08.prod.outlook.com> <db500c7e-780b-b677-3fd4-230ca16cb7d9@sourceware.org> <DB6PR0801MB20538991D7F828EECA891944834F0@DB6PR0801MB2053.eurprd08.prod.outlook.com>
Reply-to: siddhesh at sourceware dot org

On Monday 16 October 2017 08:28 PM, Wilco Dijkstra wrote:
> They are identical so have the same issues. For memcpy/memmove I don't
> think it is a good idea to copy the same data back and forth, since that's not
> a common usage scenario, but also because it might penalize cores that
> bypass L1 for write streams.

Those benchmarks actually emulate the behaviour of an internal
proprietary workload, i.e. it is not 1:1 exact behaviour but it tracks
the performance.  However it is a fair point.  I'll just make it a copy
that walks backwards, like memset and see what the difference is and if
it continues to track the internal workload.  My guess is that it simply
depends on invalidation, which automatically happens for falkor since it
bypasses L1 for write streams.

> Generally the tests don't run long enough (even if they do access all 32MB),
> so I'd say they need an outer loop to repeat say 20 times. Also if we do

The total run time of the benchmark is quite long.  If we repeat the
runs then maybe we should consider reducing the sizes that are measured,
maybe limiting them to just > 128 bytes.

> exactly the same amount of work for each possible size, printing the total
> time would make comparing results between different sizes a bit easier.

Agreed.  I had done that with my first iteration with the transfer rate,
but put in time in the end to make it consistent with other tests.  A
total time per size is a better replacement than per-call time.

Siddhesh

References:
- Re: [PING][PATCHv3 1/2] aarch64: Hoist ZVA check out of the memset function
  - From: Wilco Dijkstra
- memcpy walk benchmark [was: Hoist ZVA check out of the memset function]
  - From: Siddhesh Poyarekar
- Re: memcpy walk benchmark [was: Hoist ZVA check out of the memset function]
  - From: Wilco Dijkstra
- Re: memcpy walk benchmark [was: Hoist ZVA check out of the memset function]
  - From: Siddhesh Poyarekar
- Re: memcpy walk benchmark [was: Hoist ZVA check out of the memset function]
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]