This is the mail archive of the
mailing list for the glibc project.
RE: [RFC PATCH] aarch64: improve memset
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: "'Richard Henderson'" <rth at twiddle dot net>
- Cc: <will dot newton at linaro dot org>, <marcus dot shawcroft at gmail dot com>, <libc-alpha at sourceware dot org>
- Date: Fri, 7 Nov 2014 16:14:33 -0000
- Subject: RE: [RFC PATCH] aarch64: improve memset
- Authentication-results: sourceware.org; auth=none
- References: <002701cffaa0$77623570$6626a050$ at com>
> Richard Henderson wrote:
> On 11/05/2014 03:35 PM, Will Newton wrote:
> > On 30 September 2014 12:03, Marcus Shawcroft <email@example.com> wrote:
> >> On 14 June 2014 08:06, Richard Henderson <firstname.lastname@example.org> wrote:
> >>> The major idea here is to use IFUNC to check the zva line size once, and use
> >>> that to select different entry points. This saves 3 branches during startup,
> >>> and allows significantly more flexibility.
> >>> Also, I've cribbed several of the unaligned store ideas that Ondrej has done
> >>> with the x86 versions.
> >>> I've done some performance testing using cachebench, which suggests that the
> >>> unrolled memset_zva_64 path is 1.5x faster than the current memset at 1024
> >>> bytes and above. The non-zva path appears to be largely unchanged.
> >> OK Thanks /Marcus
> > It looks like this patch has slipped through the cracks. Richard, are
> > you happy to apply this or do you think it warrants further
> > discussion?
> Sorry for the radio silence.
> Just before I went to apply it I thought I spotted a bug that would affect
> ld.so. I haven't had time to make sure one way or another.
I've got a few comments on this patch:
* Do we really need variants for cache line sizes that are never going to be used?
I'd say just support 64 and 128, and default higher sizes to no_zva.
* Why special case line size=64 only? Unrolling might not help for 128 but should not
harm either, and the alignment overhead only increases with larger line sizes, so you
want to bypass the zva code in all cases if N < 3-4x line size.
* Is the no-ifunc variant still required/used? We're now having at least 4 different
variants which all need to be tested and maintained...
* Finally, which version is used when linking statically? I presume there is some
makefile magic that causes the no-zva version to be used, however that might not be
optimal for all targets.