This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH 0/3] aarch64: Update ld.so for vector abi
- From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
- To: rth at twiddle dot net, libc-alpha at sourceware dot org
- Cc: nd at arm dot com, marcus dot shawcroft at linaro dot org, Richard Henderson <richard dot henderson at linaro dot org>
- Date: Thu, 2 Aug 2018 11:44:08 +0100
- Subject: Re: [PATCH 0/3] aarch64: Update ld.so for vector abi
- References: <firstname.lastname@example.org>
On 01/08/18 23:23, email@example.com wrote:
From: Richard Henderson <firstname.lastname@example.org>
There is a new calling convention defined for vectorized functions .
This is similar to what has happened for x86_64, where the original ABI
did not pass or preserve full vector contents, but then a new ABI is
defined that does.
There was an old patch for [BZ #15128] that saves full AdvSIMD registers
along _dl_runtime_resolve, but failed to do so for _dl_runtime_profile.
In the chatter for the BZ , Markus Shawcroft mentions that it should
save and restore d0-d7, which is indeed correct fro the original ABI.
It is not clear from the BZ why q0-q7 are saved instead.
i think that comment was wrong, q0-q7 are argument registers in
the pcs so they have to be saved/restored (otherwise long double
args would be clobbered)
That said, the new abi for AdvSIMD does use q0-q7.
the new abi also requires q8-q25 to be preserved (callee saved regs
for the vector pcs, but not for the normal pcs so the dynamic linker
may clobber them, although current code may not do so)
When SVE is enabled, we need to save even more: z0-z7 plus p0-p3.
note that z8-z31 and p4-p15 have to be preserved across an sve
vector call, but the dynamic linker may clobber the z regs so
saving z0-z7 is not enough for lazy binding to work.
if a process touches sve regs then the kernel remembers that
the process uses sve and starts saving/restoring regs at kernel
entry/return, always accessing sve regs in the dynamic linker
defeats the purpose of that optimization.
This fixes a number of minor issues with _dl_runtime_resolve itself,
and more major issues with _dl_runtime_profile before copying those
routines and making the modifications required for SVE.
i will review the fixes, but the sve parts have to wait
(i will have to discuss with other ppl what to do with that).
I have *not* attempted to extend the <bits/link.h> interface for
the new ABI. This should be done with more discussion on list.
I have instead simply saved and restored registers as the abi
requires, so that the actual callee gets the correct data.
ok (ld audit is another reason to avoid plt for vector functions..)
I have lightly tested this under QEMU, in that the new sve paths
pass the same glibc tests as the old paths.
Richard Henderson (3):
aarch64: Clean up _dl_runtime_resolve
aarch64: Clean up _dl_runtime_profile
aarch64: Save and restore SVE registers in ld.so
sysdeps/aarch64/dl-machine.h | 13 +-
sysdeps/aarch64/dl-trampoline.S | 531 +++++++++++++++++++++++++-------
2 files changed, 438 insertions(+), 106 deletions(-)