[PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX

Wilco Dijkstra Wilco.Dijkstra@arm.com
Thu May 6 17:31:26 GMT 2021


Hi Naohiro,

> I've read the mail thread regarding BTI, but I think I couldn't fully understand the
> problem. BTI seems available from ARMv8.5, and A64FX is ARMv8.2.

BTI instructions are NOP hints, so it is possible to enable BTI even on ARMv8.0.
Using BTI instructions is harmless on CPUs that don't support it if NOP hints are as
cheap as a NOP (which generally doesn't need any execution resources).

> Even though distro distributed BTI enabled binary, BTI doesn't work on A64FX.

It works (ie. it is binary compatible with A64FX) and should have no effect. However
it seems to cause an unexpected slowdown.

> So BTI_J macro can be removed from A64FX IFUNC code at least, because A64FX
> IFUNC code is executed only on A64FX.

How is removing it just from memcpy going to help? The worry is not about memcpy
but the slowdown from all the BTI instructions that will be added to most functions.

Note it is still worthwhile to change BTI_C to NOP as suggested - that is the case when
BTI is not enabled, and there you want to avoid inserting BTI when it is not needed.

Cheers,
Wilco


More information about the Libc-alpha mailing list