[PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX

Florian Weimer fweimer@redhat.com
Tue May 4 11:07:37 GMT 2021


* Szabolcs Nagy:

> The 05/04/2021 12:17, Florian Weimer wrote:
>> * Szabolcs Nagy:
>> 
>> > The 04/30/2021 16:40, Wilco Dijkstra wrote:
>> >> >> Well it doesn't seem to behave like a NOP. So to avoid slowing down
>> >> >> all string functions, bti c must be removed completely, not just from
>> >> >> A64FX memcpy.  Using a real NOP is fine in all cases as long as
>> >> >> HAVE_AARCH64_BTI is not defined.
>> >> >
>> >> > I'm probably confused, but: If BTI is active, many more glibc functions
>> >> > will have BTI markers.  What makes the string functions special?
>> >> 
>> >> Exactly. And at that point trying to remove it from memcpy is just pointless.
>> >> 
>> >> The case we are discussing is where BTI is not turned on in GLIBC but we still
>> >> emit a BTI at the start of assembler functions for simplicity. By using a NOP
>> >> instead, A64FX will not execute BTI anywhere in GLIBC.
>> >
>> > the asm ENTRY was written with the assumption that bti c
>> > behaves like a nop when bti is disabled, so we don't have
>> > to make the asm conditional based on cflags.
>> >
>> > if that's not the case i agree with the patch, however we
>> > will have to review some other code (e.g. libgcc outline
>> > atomics asm) where we made the same assumption.
>> 
>> I find this discussion extremely worrisome.  If bti c does not behave
>> like a nop, then we need a new AArch64 ABI variant to enable BTI.
>> 
>> That being said, a distribution with lots of bti c instructions in
>> binaries seems to run on A64FX CPUs, so I'm not sure what is going on.
>
> this does not have correctness impact, only performance impact.
>
> hint space instructions are seem slower than expected on a64fx.
>
> which means unconditionally adding bti c to asm entry code is not
> ideal if somebody tries to build a system without branch-protection.
> distros that build all binaries with branch protection will just
> take a performance hit on a64fx, we cant fix that easily.

I think I see it now.  It's not critically slow, but there appears to be
observable impact.  I'm still worried.

Thanks,
Florian



More information about the Libc-alpha mailing list