CREL dynamic relocations
Fangrui Song
maskray@google.com
Mon Mar 25 18:51:09 GMT 2024
On Mon, Mar 25, 2024 at 4:53 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Fangrui Song:
>
> > I have proposed a compact relocation format CREL at
> > https://groups.google.com/g/generic-abi/c/yb0rjw56ORw/m/eiBcYxSfAQAJ
> > (previously named RELLEB).
> >
> > CREL primarily targets static relocations, achieving significant .o
> > file size reduction for lld builds: 18.0% for x86-64/aarch64 and 34.3%
> > for riscv64.
> > CREL holds promise for dynamic relocations as well, surpassing
> > Android's packed relocation format.
>
> As I said elsewhere, I'm concerned about the use of the ULEB128
> encoding. It's unnecessarily difficult to decode.
>
> Thanks,
> Florian
Thanks. I have seen your question at
https://groups.google.com/g/generic-abi/c/yb0rjw56ORw/m/osMXhg5XAgAJ
and replied there that
since one-byte encodings dominant for our use cases, LEB128 is
actually the best choice (in terms of both performance and
simplicity).
I've researched the dynamic relocation problem in the weekend and
incorporated the following text to my blog post
Traditionally, we have two dynamic relocation ranges for executables
and shared objects (except static position-dependent executables):
* `.rela.dyn` (`[DT_RELA, DT_RELA + DT_RELASZ)`) or `.rel.dyn`
(`[DT_REL, DT_REL + DT_RELSZ)`)
* `.rela.plt` (`[DT_JMPREL, DT_JMPREL + DT_PLTRELSZ)`): Stored
JUMP_SLOT relocations. `DT_PLTREL` specifies `DT_REL` or `DT_RELA`.
IRELATIVE relocations can be placed in either range, but preferrably
in `.rel[a].dyn`.
Some GNU ld ports (e.g. SPARC) treat `.rela.plt` as a subset of
`.rela.dyn`, introducing complexity for dynamic loaders.
**CREL adoption considerations**
* New dynamic tag (`DT_CREL`): To identify CREL relocations, separate
from existing `DT_REL`/`DT_RELA`.
* No `DT_CRELSZ`: Relocation count can be derived from the CREL header.
* Output section description `.rela.dyn : { *(.rela.dyn) *(.rela.plt)
}` is incompatible with CREL.
**Challenges with lazy binding**
glibc's lazy binding scheme relies on [random access to relocation
entries within the `DT_JMPREL`
table](https://maskray.me/blog/2021-09-19-all-about-procedure-linkage-table#:~:text=_dl_fixup).
CREL's sequential nature prevents this. However, eager binding doesn't
require random access.
Therefore, when `-z now` (eager binding) is enabled, we can:
* Set `DT_PLTREL` to `DT_CREL`.
* Replace `.rel[a].plt` with `.crel.plt`.
**Challenges with statically linked position-dependent executables**
glibc introduces additional complexity for IRELATIVE relocations in
statically linked position-dependent executables.
They should only contain IRELATIVE relocations and no other dynamic relocations.
glibc's `csu/libc-start.c` processes IRELATIVE relocations in the
range [`[__rela_iplt_start,
__rela_iplt_end)`](https://maskray.me/blog/2021-01-18-gnu-indirect-function#non-preemptible-ifunc#rela_iplt_start-and-__rela_iplt_end)
(or `[__rel_iplt_start, __rel_iplt_end)`, determined at build time
through `ELF_MACHINE_IREL`).
While CREL relocations cannot be decoded in the middle of the section,
we can still place IRELATIVE relocations in `.crel.dyn` because there
wouldn't be any other relocation types (position-dependent executables
don't have RELATIVE relocations).
When CREL is enabled, we can define `__crel_iplt_start` and
`__crel_iplt_end` for statically linked position-dependent
executables.
If glibc only intends to support `addend_bit==0`, the code can simply be:
```c
extern const uint8_t __crel_iplt_start[] __attribute__ ((weak));
extern const uint8_t __crel_iplt_end[] __attribute__ ((weak));
if (&__crel_iplt_start != &__crel_iplt_end) {
const uint8_t *p = __crel_iplt_start;
size_t offset = 0, count = read_uleb128 (&p), shift = count & 3;
for (count >>= 3; count; count--) {
uint8_t rel_head = *p++;
offset += rel_head >> 2;
if (rel_head & 128)
offset += (read_uleb128 (&p) << 5) - 32;
if (rel_head & 2)
read_sleb128 (&p);
elf_crel_irel ((ElfW (Addr) *) (offset << shift));
}
}
```
**Considering implicit addends for CREL**
Many dynamic relocations have zero addends:
* COPY/GLOB_DAT/JUMP_SLOT relocations only use zero addends.
* Absolute relocations could use non-zero addends with `STT_SECTION`
symbol, but linkers convert them to relative relocations.
Usually only RELATIVE/IRELATIVE and potentially TPREL/TPOFF might
require non-zero addends.
Switching from `DT_RELA` to `DT_REL` offers a minor size advantage.
I considered defining two separate dynamic tags (`DT_CREL` and
`DT_CRELA`) to distinguish between implicit and explicit addends.
However, this would have introduced complexity:
* Should `llvm-readelf -r` dump the zero addends for `DT_CRELA`?
* Should dynamic loaders support both dynamic tags?
I placed the delta addend bit next to offset bits so that it can be
reused for offsets.
Thanks to Stefan O'Rear's for making me believe that my original
thought of reserving a single bit flag (`addend_bit`) within the CREL
header is elegant.
Dynamic loaders prioritizing simplicity can hardcode the desired
`addend_bit` value.
`ld.lld -z crel` defaults to implicit addends (`addend_bit==0`), but
the option of using in-relocation addends is available with `-z crel
-z rela`.
**DT_AARCH64_AUTH_RELR vs CREL**
The AArch64 PAuth ABI introduces `DT_AARCH64_AUTH_RELR` as a variant
of RELR for signed relocations.
However, its benefit seems limited.
In a release build of Clang 16, using `-z crel -z rela` resulted in a
`.crel.dyn` section size of only 1.0% of the file size.
Notably, enabling implicit addends with `-z crel -z rel` further
reduced the size to just 0.3%.
While `DT_AARCH64_AUTH_RELR` will achieve a noticeable smaller
relocation size if most relative relocations are encoded with it, the
advantage seems less significant considering CREL's already compact
size.
Furthermore, `DT_AARCH64_AUTH_RLEL` introduces additional complexity
to the linker due to its 32-bit addend limitation: the in-place 64
value encodes a 32-bit schema, giving just 32 bits to the implicit
addend.
If the addend does not fit into 32 bits, `DT_AARCH64_AUTH_RELR` cannot be used.
CREL with addends would avoid this complexity.
I have filed [Quantifying the benefits of
DT_AARCH64_AUTH_RELR](https://github.com/ARM-software/abi-aa/issues/252).
--
宋方睿
More information about the Libc-alpha
mailing list