This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH 0/3] aarch64: Update ld.so for vector abi
On 08/02/2018 11:24 AM, Carlos O'Donell wrote:
> On 08/01/2018 06:23 PM, email@example.com wrote:
>> From: Richard Henderson <firstname.lastname@example.org>
>> There is a new calling convention defined for vectorized functions .
> Correct me if I'm wrong.
> (a) We have a lot of stuff to save/restore in SVE.
> (b) It appears Szabolcs really wants to avoid the PLT at all with the
> new vector procedure call stanadard, since this avoids ever
> having to save/restore the large amounts of register data.
> (b.1) Assumes that save and restore of SVE has serious negative performance
> consequences, both in userspace and in kernel save/restore for
> context switches.
Yes. This is the biggie, really.
> (c) A PLT generally has only one kind of save/restore ABI that it follows
> and it follows pessimistically the worse case to support all possible
> calling conventions.
> (d) The compiler you are using is generating calls using the new ABI and
> those are going through the PLT, something in the dynamic loader is
> also using these registers and corrupting call results, otherwise
> you would never have made this patch to fix the problem.
Yes. Indeed, the normal AdvSIMD restore within _dl_runtime_resolve is exactly
what clobbers the SVE state.
> The better solution for aarch64 is:
> (1) All new-style SVE calls do *not* go through the PLT by default, but
> indirect through the GOT and are always bind-now.
This would be the ideal solution, yes.
> I don't expect you signed up for this, but that's my analysis.
The right people are now talking about the problem, which is the main thing.
>> I have *not* attempted to extend the <bits/link.h> interface for
>> the new ABI. This should be done with more discussion on list.
>> I have instead simply saved and restored registers as the abi
>> requires, so that the actual callee gets the correct data.
> We *should* adjust bits/link.h at the same time and extend it like
> we did for x86_64. LD_AUDIT should work.
Yes. We do need to talk about the design for this though; it's not as simple
as for x86_64. I did comment on this more in patch 3:
+ ??? Extending the profiling hook for full SVE register export
+ is tricky given the variable register size. Perhaps the new
+ La_aarch64_regs should contain pointers to Z0 and P0, and
+ the current VL, and one infers the addresses from there.
+ This one new form could be used for all, with AdvSIMD
+ devolving into VL=16 with no predicate registers.