x86-64: new CET-enabled PLT format proposal
Rui Ueyama
rui314@gmail.com
Sun Feb 27 03:18:47 GMT 2022
Hello,
I'd like to propose an alternative instruction sequence for the Intel
CET-enabled PLT section. Compared to the existing one, the new scheme is
simple, compact (32 bytes vs. 16 bytes for each PLT entry) and does not
require a separate second PLT section (.plt.sec).
Here is the proposed code sequence:
PLT0:
f3 0f 1e fa // endbr64
41 53 // push %r11
ff 35 00 00 00 00 // push GOT[1]
ff 25 00 00 00 00 // jmp *GOT[2]
0f 1f 40 00 // nop
0f 1f 40 00 // nop
0f 1f 40 00 // nop
66 90 // nop
PLTn:
f3 0f 1e fa // endbr64
41 bb 00 00 00 00 // mov $namen_reloc_index %r11d
ff 25 00 00 00 00 // jmp *GOT[namen_index]
GOT[namen_index] is initialized to PLT0 for all PLT entries, so that when a
PLT entry is called for the first time, the control is passed to PLT0 to call
the resolver function.
It uses %r11 as a scratch register. x86-64 psABI explicitly allows PLT entries
to clobber this register (*1), and the resolve function (__dl_runtime_resolve)
already clobbers it.
(*1) x86-64 psABI p.24 footnote 17: "Note that %r11 is neither required to be
preserved, nor is it used to pass arguments. Making this register available as
scratch register means that code in the PLT need not spill any registers when
computing the address to which control needs to be transferred."
FYI, this is the current CET-enabled PLT:
PLT0:
ff 35 00 00 00 00 // push GOT[0]
f2 ff 25 e3 2f 00 00 // bnd jmp *GOT[1]
0f 1f 00 // nop
PLTn in .plt:
f3 0f 1e fa // endbr64
68 00 00 00 00 // push $namen_reloc_index
f2 e9 e1 ff ff ff // bnd jmpq PLT0
90 // nop
PLTn in .plt.sec:
f3 0f 1e fa // endbr64
f2 ff 25 ad 2f 00 00 // bnd jmpq *GOT[namen_index]
0f 1f 44 00 00 // nop
In the proposed format, PLT0 is 32 bytes long and each entry is 16 bytes. In
the existing format, PLT0 is 16 bytes and each entry is 32 bytes. Usually, we
have many PLT sections while we have only one header, so in practice, the
proposed format is almost 50% smaller than the existing one.
The proposed PLT does not use jump instructions with BND prefix, as Intel MPX
has been deprecated.
I already implemented the proposed scheme to my linker
(https://github.com/rui314/mold) and it looks like it's working fine.
Any thoughts?
More information about the Binutils
mailing list