[PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX
Wilco Dijkstra
Wilco.Dijkstra@arm.com
Tue Apr 20 11:39:41 GMT 2021
Hi Haohiro,
> I removed redundant instructions using cbz and prfm offset address [1][2].
>
> [1] https://github.com/NaohiroTamura/glibc/commit/94363b4ab2e5b4b29843a47a6970b9645a8e4eeb
> [2] https://github.com/NaohiroTamura/glibc/commit/4648eb559e46d978ded65d40c6bf8c38dd2519d7
For the first 2 CBZ cases in both [1] and [2] the fastest option is to use ANDS+BEQ. ANDS only
requires 1 ALU operation while AND+CBZ uses 2 ALU operations on A64FX.
Wilco
More information about the Libc-alpha
mailing list