This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
[AArch64 ELF ABI] Vector calls and lazy binding on AArch64
- From: Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>
- To: GNU C Library <libc-alpha at sourceware dot org>, Binutils <binutils at sourceware dot org>, GCC Development <gcc at gcc dot gnu dot org>, "gnu-gabi at sourceware dot org" <gnu-gabi at sourceware dot org>
- Cc: nd <nd at arm dot com>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>, Tejas Belagod <Tejas dot Belagod at arm dot com>, Richard Sandiford <Richard dot Sandiford at arm dot com>, Steve Ellcey <sellcey at marvell dot com>, Richard Henderson <richard dot henderson at linaro dot org>
- Date: Wed, 22 May 2019 14:42:05 +0000
- Subject: [AArch64 ELF ABI] Vector calls and lazy binding on AArch64
The lazy binding code of aarch64 currently only preserves q0-q7 of the
fp registers, but for an SVE call [AAPCS64+SVE] it should preserve p0-p3
and z0-z23, and for an AdvSIMD vector call [VABI64] it should preserve
q0-q23. (Vector calls are extensions of the base PCS [AAPCS64].)
A possible fix is to save and restore the additional register state in
the lazy binding entry code, this was discussed in
https://sourceware.org/ml/libc-alpha/2018-08/msg00017.html
the main objections were
(1) Linux may optimize the kernel entry code for processes that don't
use SVE, so lazy binding should avoid accessing SVE registers.
(2) If this is fixed in the dynamic linker, vector calls will not be
backward compatible with old glibc.
(3) The saved SVE register state can be large (> 8K), so binaries that
work today may run out of stack space on an SVE system during lazy
binding (which can e.g. happen in a signal handler on a tiny stack).
and the proposed solution was to force bind now semantics for vector
functions e.g. by not calling them via PLT. This turned out to be harder
than I expected. I no longer think (1) and (2) are critically important,
but (3) is a correctness issue which is hard to argue away (would
require larger stack allocations to accommodate the worst case stack
size increase, but the stack allocation is not always under the control
of glibc, so it cannot provide strict guarantees).
Some approaches to make symbols "bind now" were discussed at
https://groups.google.com/forum/#!topic/generic-abi/Bfb2CwX-u4M
The ABI change draft is below the notes, it requires marking symbols
in the ELF symbol table that follow the vector PCS (or other variant
PCS conventions). This is most relevant to dynamic linkers with lazy
binding support and to ELF linkers targeting AArch64, but assemblers
will need to be updated too.
Note 1: the dynamic linker may have to run user code during lazy binding
because of ifunc resolvers, so it cannot avoid clobbering fp regs.
Note 2: the tlsdesc entry is also affected by (3), so either the the
initial DTV setup should avoid clobbering fp regs or the SVE register
state should not be callee-preserved by the tlsdesc call ABI (the latter
was chosen, which is backward compatible with old dynamic linkers, but
tls access from SVE code is as expensive as an extern call now: the
caller has to spill).
Note 3: signal frame and SVE register spills in code using SVE can also
lead to variable stack usage (AT_MINSIGSZTKSZ was introduced to address
the former issue on linux) so it is a valid approach to just increase
min stack size limits on aarch64 compared to other targets (this is less
invasive, but does not fix old binaries).
Note 4: the proposal requires marking symbols in asm and elf objects, so
it is not compatible with existing tooling (old as or ld cannot create
valid vector function symbol references or definitions) and it is only
effective with a new dynamic linker.
Note 5: -fno-plt style code generation for vector function calls might
have worked too, but on aarch64 it requires compiler and linker changes
to avoid PLT in position dependent code when that is emitted for the
sake of pointer equality. It also requires tightening the ABI to ensure
the static linker does not introduce PLT when processing certain static
relocations. This approach would generate suboptimal static linked code
(the no-plt code is hard to relax into direct calls on aarch64) fragile
(easy to accidentally introduce a PLT) and hard to diagnose.
Note 6: the proposed solution applies to both SVE calls and AdvSIMD
vector calls, even though some issues only apply to SVE.
Note 7: a separate dynamic linker entry point for variant PCS calls
may be introduced (requires further ELF changes for a PLT0 like stub)
or the dynamic linker may decide to always preserve all registers or
decide to always bind symbols at load time.
AAELF64: in the Symbol Table section add
st_other Values
The st_other member of a symbol table entry specifies the symbol's
visibility in the lowest 2 bits. The top 6 bits are unused in the
generic ELF ABI [SCO-ELF], and while there are no values reserved for
processor-specific semantics, many other architectures have used these
bits.
The defined processor-specific st_other flag values are listed in
Table 4-5-1.
Table 4-5-1, Processor specific st_other flags
+------------------------+------+---------------------+
|Name | Mask | Comment |
+------------------------+------+---------------------+
|STO_AARCH64_VARIANT_PCS | 0x80 | The function |
| | | associated with the |
| | | symbol may follow a |
| | | variant procedure |
| | | call standard with |
| | | different register |
| | | usage convention. |
+------------------------+------+---------------------+
A symbol table entry that is marked with the STO_AARCH64_VARIANT_PCS
flag set in its st_other field may be associated with a function that
follows a variant procedure call standard with different register
usage convention from the one defined in the base procedure call
standard for the list of argument, caller-saved and callee-saved
registers [AAPCS64]. The rules in the Call and Jump relocations
section still apply to such functions, and if a subroutine is called
via a symbol reference that is marked with STO_AARCH64_VARIANT_PCS
then code that runs between the calling routine and called subroutine
must preserve the contents of all registers except IP0, IP1 and the
condition code flags [AAPCS64].
Static linkers must preserve the marking and propagate it to the
dynamic symbol table if any reference or definition of the symbol is
marked with STO_AARCH64_VARIANT_PCS, and add a DT_AARCH64_VARIANT_PCS
dynamic tag if required by the Dynamic Section section.
NOTE:
In particular, when a call is made via the PLT entry of a symbol
marked with STO_AARCH64_VARIANT_PCS, a dynamic linker cannot assume
that the call follows the register usage convention of the base
procedure call standard.
An example of a function that follows a variant procedure call
standard with different register usage convention is one that takes
parameters in scalable vector or predicate registers.
AAELF64: in the Dynamic Section section add
Table 5-4, AArch64 specific dynamic array tags
+-----------------------+------------+-------+------------+---------------+
|Name | Value | d_un | Executable | Shared Object |
+-----------------------+------------+-------+------------+---------------+
|DT_AARCH64_VARIANT_PCS | 0x70000005 | d_val | Platform | Platform |
| | | | specific | Specific |
+-----------------------+------------+-------+------------+---------------+
DT_AARCH64_VARIANT_PCS must be present if there are R_<CLS>_JUMP_SLOT
relocations that reference symbols marked with the
STO_AARCH64_VARIANT_PCS flag set in their st_other field.
VABI64: after the Vector Procedure Call Standard section add
Dynamic linking for AAVPCS
On ELF platforms with dynamic linking support, symbol definitions and
references must be marked with the STO_AARCH64_VARIANT_PCS flag set in
their st_other field if the following holds:
1. the symbol is visible outside of its defining component (executable
file or shared object), and
2. the symbol is associated with a function following the AAVPCS
convention.
For more information on STO_AARCH64_VARIANT_PCS, see AAELF64.
NOTE:
Marking all function symbol definitions and references is a valid
way of implementing this requirement.
[AAELF64]: ELF for the Arm 64-bit Architecture (AArch64)
https://developer.arm.com/docs/ihi0056/latest
[VABI64]: Vector Function ABI Specification for AArch64
https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-compiler-for-hpc/vector-function-abi
[AAPCS64]: Procedure Call Standard for the Arm 64-bit Architecture (AArch64)
https://developer.arm.com/docs/ihi0055/latest
[AAPCS64+SVE]: Procedure Call Standard for the ARM 64-bit Architecture
(AArch64) with SVE support
https://developer.arm.com/docs/100986/latest
[SCO-ELF]: System V Application Binary Interface
http://www.sco.com/developers/gabi/