This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH][AArch64] Optimized memcpy/memmove
- From: Marcus Shawcroft <marcus dot shawcroft at gmail dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Fri, 10 Jun 2016 08:05:47 +0100
- Subject: Re: [PATCH][AArch64] Optimized memcpy/memmove
- Authentication-results: sourceware.org; auth=none
- References: <AM3PR08MB00884C9733FDD45E7E75974783EE0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <CAFqB+PynVXCDgy+8dqk207Q0geg4dSyXyzpegHWnBJ5qsZHG_A at mail dot gmail dot com> <AM3PR08MB0088235E0A76DD4C810EFBB383730 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com>
On 12 May 2016 at 17:25, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Marcus Shawcroft wrote:
>> Hi, There appear to be odd tab characters inserted throughout the
>> comments, for example:
>
> Hmm, it appears unexpand is buggy, so you end up having to do tabs
> all by hand... Attached updated version.
>
> Wilco
>
> ---
> This is an optimized memcpy/memmove for AArch64. Copies are split into 3 main cases: small copies of up to 16 bytes, medium copies of 17..96 bytes which are fully unrolled. Large copies of more than 96 bytes align the destination and use an unrolled loop processing 64 bytes per iteration. In order to share code with memmove, small and medium copies read all data before writing, allowing any kind of overlap. All memmoves except for the large backwards case fall into memcpy for optimal performance. On a random copy test memcpy/memmove are 40.8% faster on A57 and 28.4% on A53.
>
> ChangeLog:
> 2015-07-08 Wilco Dijkstra <wdijkstr@arm.com>
>
> * sysdeps/aarch64/memcpy.S (memcpy):
> Rewrite of optimized memcpy and memmove.
> * sysdeps/aarch64/memmove.S (memmove): Remove
> memmove code (merged into memcpy.S).
>
Thanks Wilco. OK. /Marcus