This is the mail archive of the
mailing list for the binutils project.
On the implementation of IBT-enabled PLT with lazy binding
- From: "Fāng-ruì Sòng via binutils" <binutils at sourceware dot org>
- To: x86-64-abi at googlegroups dot com, binutils at sourceware dot org
- Date: Wed, 3 Apr 2019 14:11:31 +0800
- Subject: On the implementation of IBT-enabled PLT with lazy binding
- Reply-to: Fāng-ruì Sòng <maskray at google dot com>
Chapter 13 "Intel CET Extension" of x86-64 psABI describes an
alternative PLT scheme for IBT (Indirect Branch Tracking). With GCC>=8
and latest ld.bfd (in binutils-gdb), we can see the synthesized PLT
gcc -g -fuse-ld=bfd -fcf-protection=branch a.c -Wl,-z,ibtplt,-z,now -o
a # -mibt for some older GCC 8 releases
objdump -d a
A PLT function, say `putchar`, has instruction sequences in both .plt
.plt (16 bytes)
1030: f3 0f 1e fa endbr64
1034: 68 00 00 00 00 pushq $0x0
1039: f2 e9 e1 ff ff ff bnd jmpq 1020 <.plt>
103f: 90 nop
.plt.sec (16 bytes)
1060: f3 0f 1e fa endbr64
1064: f2 ff 25 5d 2f 00 00 bnd jmpq *0x2f5d(%rip)
# 3fc8 <putchar@GLIBC_2.2.5>
106b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
.text uses `callq 1060` to call putchar@plt. 0x1064 jumps to 0x1030
for the initial call (lazy binding). After the stub at 0x1030 resolves
the GOT slot to the real entry, future 0x1060 calls will jump directly
to the real entry.
I have several questions regarding the second PLT scheme.
1. Should psABI change .splt to .plt.sec?
The implementation uses .plt.sec for this feature.
PLT sections do not have a dedicated section type and in practice they
are usually recognized by the name .plt . The tools include but not
limited to disassemblers (objdump, llvm-objdump), assemblers
(assemblers (e.g. llvm-mc) emit warnings for unusual flags), binary
instrumentation tools, profilers, debuggers.
If the implementations pick names different from the ABI, tools have
to understand both .plt.sec and .splt to be ABI conforming. The
complexity could have been avoided if implementations and the ABI
agreed on the same name: .plt.sec
I prefer .plt.sec to .splt because the convention is already used in
several other places to assign fine-grained semantics to sections,
e.g. .text.hot .text.startup .text.unlikely
2. Merge .plt and .plt.sec
As I proposed at https://reviews.llvm.org/D59780#1451608 , since we
don't emit the bnd prefix (0xf2) for MPX
(dropped by GCC 9), we can merge .plt and .plt.sec entries as follows:
5 jmpq *xxx(%rip) ; jump to the next endbr64 for lazy binding
5 pushq ; relocaton index
5 jmpq *xxx(%rip) ; jump to .plt
This PLT entry takes 4+5+4+5+5=23 bytes, and fits in a 24-byte entry
size if we aim for 8-byte alignment.
Not having to deal with .plt.sec simplifies implementation of PLT-aware tools.
(If MPX resurrects (I am not sure about the likelyhood), the bnd
prefixes before jmpq will take another 2 bytes and the PLT entry will
no longer fit in a 24-byte entry. We can expand it to 32-byte then)
3. Necessity of the second PLT
It was raised in https://reviews.llvm.org/D58102 that having
instruction sequences split into .plt and .plt.sec, it may improve
code cache locality. According to my understanding, in theory .plt.sec
is hot while .plt is cold (only used for the first time). That being
said, we see no evidence or benchmark results supporting the claim.
The other argument is that it provides compatibility with other tools
that have an hardcoded limit of 16.
I found top-of-tree gdb/objdump cannot symbolize 16-byte .plt and
.plt.sec entries without the bnd prefixes
(https://reviews.llvm.org/D59780#1451608) => even if the entry size
sticks with 16, existing tools have to adapt new rules to symbolize
Thus, we are causing trouble to existing tools, no matter we introduce
the second PLT or not.
Given the complexity of the second PLT, not having the second PLT
might be better.
BTW, I want to remind readers the subject of this email contains "lazy
binding". 32-byte imposes more overhead to libc's without the lazy
binding functionality. For musl, a 5-byte `jmpq` instruction suffices,
but of course `-fno-plt` may be a better solution to not deal with PLT
stuff at all.
P.S. Don't get me wrong. The new security enhancement technology
attracts me. I've done a few ROP-style CTF pwn challenges in the past
and can imagine how useful IBT is, but I hope it introduces less
complexity to toolchains :)