[PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX
naohirot@fujitsu.com
naohirot@fujitsu.com
Tue Apr 27 11:03:48 GMT 2021
Hi Wilco-san,
This mail is a continuation of removing redundant instructions.
> From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
> For the first 2 CBZ cases in both [1] and [2] the fastest option is to use
> ANDS+BEQ. ANDS only requires 1 ALU operation while AND+CBZ uses 2 ALU
> operations on A64FX.
I see, I haven't used ANDS before. Thanks for the advice.
I updated memcpy[1] and memset[2].
[1] https://github.com/NaohiroTamura/glibc/commit/fca2c1cf1fd80ec7ecb93f7cd08be9aab9ca9412
[2] https://github.com/NaohiroTamura/glibc/commit/5004e34c35a20faf3e12e6ce915845a75b778cbf
Thanks.
Naohiro
More information about the Libc-alpha
mailing list