This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PING][PATCHv3 1/2] aarch64: Hoist ZVA check out of the memset function
- From: Siddhesh Poyarekar <siddhesh at sourceware dot org>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "adhemerval dot zanella at linaro dot org" <adhemerval dot zanella at linaro dot org>
- Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>
- Date: Thu, 12 Oct 2017 07:18:32 +0530
- Subject: Re: [PING][PATCHv3 1/2] aarch64: Hoist ZVA check out of the memset function
- Authentication-results: sourceware.org; auth=none
- References: <DB6PR0801MB20531D1099A99E5A4DF1E042834A0@DB6PR0801MB2053.eurprd08.prod.outlook.com>
- Reply-to: siddhesh at sourceware dot org
[I've responded to your point about macros elsewhere in the thread]
On Thursday 12 October 2017 02:50 AM, Wilco Dijkstra wrote:
> Note we also should benchmark this on various other cores to see what
> the impact is. I wrote this memset code for specific reasons - changing that
> could have a big impact on some cores, so we need to show this doesn't
> cause regressions.
Yes, and that's something I'll lean on you to study and fix since I
don't have access to all that hardware :)
That said, the effect of dropping zva checks should be positive on every
core for memset(0). The alignments might change things a bit but I
think we should fix those as we go along and not wait for that to be
correct since this gain is pretty big to keep away. This also adds
incentive to document our alignment decisions since they weren't
documented in the earlier code.
> Also we need to decide on how to read the ZVA size. I don't think there is
> any point in supporting that many options (reading it, using a global, using
> an ifunc and not using ZVA at all). The very first memset had a few of these
> options, and I removed them precisely because it created way too many
> permutations to test and benchmark. So if we add ifuncs, then we shouldn't
> need to use a global too. For the dynamic linker we could just use a basic
> memset without any ifuncs (so then it becomes 2 variants, ifunc using ZVA,
> and no ZVA).
Sorry, I just noticed that I had not updated the commit text; I dropped
the static variable after discussion with Szabolcs. The code now simply
uses the old style memset for non-standard zva sizes since we agreed
that it's not a problem worth solving right now given that no announced
cores have non-standard zva sizes.
Siddhesh