This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PING][PATCHv3 1/2] aarch64: Hoist ZVA check out of the memset function
- From: Siddhesh Poyarekar <siddhesh at sourceware dot org>
- To: Andrew Pinski <pinskia at gmail dot com>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- Cc: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>
- Date: Thu, 12 Oct 2017 06:59:57 +0530
- Subject: Re: [PING][PATCHv3 1/2] aarch64: Hoist ZVA check out of the memset function
- Authentication-results: sourceware.org; auth=none
- References: <DB6PR0801MB20531D1099A99E5A4DF1E042834A0@DB6PR0801MB2053.eurprd08.prod.outlook.com> <ac874303-6330-1a84-6970-7768ea408e96@linaro.org> <CA+=Sn1kkFgQE1yKq6vODmjkbsqao01giVSr3fbYvYErPFfC-Sg@mail.gmail.com>
- Reply-to: siddhesh at sourceware dot org
On Thursday 12 October 2017 03:14 AM, Andrew Pinski wrote:
> For at least the micro-archs I work with, reading dczid_el0 can and
> will most likely be faster than reading from global memory.
> Especially if the global memory is not in the L1 cache. This is one
> case where a micro-benchmark can fall down. If the global memory is
> in L1 cache the read is 3 cycles while reading from dczid_el0 is 4
> cycles, but once it is out of L1 cache, reading becomes 10x worse plus
> it pollutes the L1 cache.
This is a falkor caveat - mrs ends up being significantly slower. Also,
the question about reading a global is pointless since I've dropped that
code path. It only affected non-standard zva sizes anyway so it doesn't
affect current cores.
Siddhesh