This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RE: [RFC PATCH] aarch64: improve memset
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: "'Richard Henderson'" <rth at twiddle dot net>
- Cc: <will dot newton at linaro dot org>, <marcus dot shawcroft at gmail dot com>, <libc-alpha at sourceware dot org>
- Date: Mon, 10 Nov 2014 20:09:34 -0000
- Subject: RE: [RFC PATCH] aarch64: improve memset
- Authentication-results: sourceware.org; auth=none
- References: <002701cffaa0$77623570$6626a050$ at com> <002801cffaa5$eb2852f0$c178f8d0$ at com> <545F237A dot 8070808 at twiddle dot net>
> Richard Henderson wrote:
> On 11/07/2014 05:14 PM, Wilco Dijkstra wrote:
> >
> > * Finally, which version is used when linking statically? I presume there is some
> > makefile magic that causes the no-zva version to be used, however that might not be
> > optimal for all targets.
So it turns out ifuncs are used even with static linking.
> That leaves ld.so using the no-zva path, which is perhaps a tad unfortunate
> given that it needs to zero partial .bss pages during startup, and on a
> system with 64k pages, we probably wind up with larger clears more often
> than not...
I'm not sure how often ld.so calls memset but I'm guessing it is minor compared
to the total time to load.
> Thoughts?
I spotted one issue in the alignment code:
+ stp xzr, xzr, [tmp2, #64]
+
+ /* Store up to first SIZE, aligned 16. */
+.ifgt \size - 64
+ stp xzr, xzr, [tmp2, #80]
+ stp xzr, xzr, [tmp2, #96]
+ stp xzr, xzr, [tmp2, #112]
+ stp xzr, xzr, [tmp2, #128]
+.ifgt \size - 128
+.err
+.endif
+.endif
This should be:
+ /* Store up to first SIZE, aligned 16. */
+.ifgt \size - 64
+ stp xzr, xzr, [tmp2, #64]
+ stp xzr, xzr, [tmp2, #80]
+ stp xzr, xzr, [tmp2, #96]
+ stp xzr, xzr, [tmp2, #112]
+.ifgt \size - 128
+.err
+.endif
+.endif
Other than that it looks good to me.
Wilco