This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] aarch64: optimized memcpy implementation for thunderx2
- From: Siddhesh Poyarekar <siddhesh at gotplt dot org>
- To: Anton Youdkevitch <anton dot youdkevitch at bell-sw dot com>, libc-alpha at sourceware dot org
- Date: Sat, 29 Sep 2018 06:25:26 +0530
- Subject: Re: [PATCH] aarch64: optimized memcpy implementation for thunderx2
- References: <f5bd5315-c9ff-faf0-5522-78a3d33e68b0@bell-sw.com>
Hi!
This seems like your first contribution to glibc, Welcome! Can you (or
one of the stewards) please let us know if you have signed the FSF
copyright assignment for glibc? That is necessary for any contributions
to be included into glibc. In general, please review the Contribution
Checklist[1] to understand the prerequisites for contributing patches to
glibc.
On 28/09/18 11:38 PM, Anton Youdkevitch wrote:
Optimized memcpy implementation using "ext" instruction. The
speedup is up to 30% on larger lengths comparing to the existing
thunderx2 implementation. Performance comparison is done using
the standard lib's benchmarks.
Could you please name the microbenchmarks you've used to make the
comparison. That is, have you checked memcpy-large or memcpy-walk?
Given that typically large copies are uncached, I would trust
memcpy-walk more than memcpy-large for them because in the latter case
the instruction costs will tend to dominate over the cost of load from
memory, which is not very useful for large copies.
Also, (and this is a personal nit so you don't have to take it
seriously) it would be really nice if the patch is inline and not an
attachment since that allows me to respond to the patch contents inline :)
Thanks,
Siddhesh
[1] https://sourceware.org/glibc/wiki/Contribution%20checklist