[PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX

Wilco Dijkstra Wilco.Dijkstra@arm.com
Tue Apr 20 11:39:41 GMT 2021


Hi Haohiro,

> I removed redundant instructions using cbz and prfm offset address [1][2].
>
> [1] https://github.com/NaohiroTamura/glibc/commit/94363b4ab2e5b4b29843a47a6970b9645a8e4eeb
> [2] https://github.com/NaohiroTamura/glibc/commit/4648eb559e46d978ded65d40c6bf8c38dd2519d7

For the first 2 CBZ cases in both [1] and [2] the fastest option is to use ANDS+BEQ. ANDS only
requires 1 ALU operation while AND+CBZ uses 2 ALU operations on A64FX.

Wilco


More information about the Libc-alpha mailing list