[PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX

naohirot@fujitsu.com naohirot@fujitsu.com
Tue Apr 27 11:03:48 GMT 2021


Hi Wilco-san,

This mail is a continuation of removing redundant instructions.

> From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
 
> For the first 2 CBZ cases in both [1] and [2] the fastest option is to use
> ANDS+BEQ. ANDS only requires 1 ALU operation while AND+CBZ uses 2 ALU
> operations on A64FX.

I see, I haven't used ANDS before. Thanks for the advice.
I updated memcpy[1] and memset[2].

[1] https://github.com/NaohiroTamura/glibc/commit/fca2c1cf1fd80ec7ecb93f7cd08be9aab9ca9412
[2] https://github.com/NaohiroTamura/glibc/commit/5004e34c35a20faf3e12e6ce915845a75b778cbf

Thanks.
Naohiro



More information about the Libc-alpha mailing list