This is the mail archive of the mailing list for the binutils project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: On the implementation of IBT-enabled PLT with lazy binding

On Tue, Apr 2, 2019 at 11:11 PM 'Fāng-ruì Sòng' via X86-64 System V
Application Binary Interface <> wrote:
> Chapter 13 "Intel CET Extension" of x86-64 psABI describes an
> alternative PLT scheme for IBT (Indirect Branch Tracking). With GCC>=8
> and latest ld.bfd (in binutils-gdb), we can see the synthesized PLT
> with:
> gcc -g -fuse-ld=bfd -fcf-protection=branch a.c -Wl,-z,ibtplt,-z,now -o
> a # -mibt for some older GCC 8 releases
> objdump -d a
> A PLT function, say `putchar`, has instruction sequences in both .plt
> and .plt.sec:
> .plt (16 bytes)
>     1030:       f3 0f 1e fa             endbr64
>     1034:       68 00 00 00 00          pushq  $0x0
>     1039:       f2 e9 e1 ff ff ff       bnd jmpq 1020 <.plt>
>     103f:       90                      nop
> .plt.sec (16 bytes)
> 0000000000001060 <putchar@plt>:
>     1060:       f3 0f 1e fa             endbr64
>     1064:       f2 ff 25 5d 2f 00 00    bnd jmpq *0x2f5d(%rip)
> # 3fc8 <putchar@GLIBC_2.2.5>
>     106b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> .text uses `callq 1060` to call putchar@plt. 0x1064 jumps to 0x1030
> for the initial call (lazy binding). After the stub at 0x1030 resolves
> the GOT slot to the real entry, future 0x1060 calls will jump directly
> to the real entry.
> I have several questions regarding the second PLT scheme.
> 1. Should psABI change .splt to .plt.sec?
> The implementation uses .plt.sec for this feature.
> PLT sections do not have a dedicated section type and in practice they
> are usually recognized by the name .plt . The tools include but not
> limited to disassemblers (objdump, llvm-objdump), assemblers
> (assemblers (e.g. llvm-mc) emit warnings for unusual flags), binary
> instrumentation tools, profilers, debuggers.
> If the implementations pick names different from the ABI, tools have
> to understand both .plt.sec and .splt to be ABI conforming. The
> complexity could have been avoided if implementations and the ABI
> agreed on the same name: .plt.sec

Sounds reasonable.

> I prefer .plt.sec to .splt because the convention is already used in
> several other places to assign fine-grained semantics to sections,
> e.g. .text.startup .text.unlikely
> 2. Merge .plt and .plt.sec
> As I proposed at , since we
> don't emit the bnd prefix (0xf2) for MPX
> (dropped by GCC 9), we can merge .plt and .plt.sec entries as follows:
> 4 endbr64
> 5 jmpq *xxx(%rip) ; jump to the next endbr64 for lazy binding
> 4 endbr64
> 5 pushq           ; relocaton index
> 5 jmpq *xxx(%rip) ; jump to .plt
> This PLT entry takes 4+5+4+5+5=23 bytes, and fits in a 24-byte entry
> size if we aim for 8-byte alignment.
> Not having to deal with .plt.sec simplifies implementation of PLT-aware tools.
> (If MPX resurrects (I am not sure about the likelyhood), the bnd
> prefixes before jmpq will take another 2 bytes and the PLT entry will
> no longer fit in a 24-byte entry. We can expand it to 32-byte then)
> 3. Necessity of the second PLT
> It was raised in that having
> instruction sequences split into .plt and .plt.sec, it may improve
> code cache locality. According to my understanding, in theory .plt.sec
> is hot while .plt is cold (only used for the first time). That being
> said, we see no evidence or benchmark results supporting the claim.

It may not show up on your benchmarks.   But improve cache locality
is a good thing for overall system performance.   We need every bit
of performance for CET.

> The other argument is that it provides compatibility with other tools
> that have an hardcoded limit of 16.
> I found top-of-tree gdb/objdump cannot symbolize 16-byte .plt and
> .plt.sec entries without the bnd prefixes
> ( => even if the entry size
> sticks with 16, existing tools have to adapt new rules to symbolize
> PLT entries.
> Thus, we are causing trouble to existing tools, no matter we introduce
> the second PLT or not.
> Given the complexity of the second PLT, not having the second PLT
> might be better.

A single PLT is simpler to implement.   We designed 2 PLTs with performance
in mind.  We have implemented it many years ago starting from MPX.  It shouldn't
be changed just because it is "hard" to implement.

> BTW, I want to remind readers the subject of this email contains "lazy
> binding". 32-byte imposes more overhead to libc's without the lazy
> binding functionality. For musl, a 5-byte `jmpq` instruction suffices,
> but of course `-fno-plt` may be a better solution to not deal with PLT
> stuff at all.
> P.S. Don't get me wrong. The new security enhancement technology
> attracts me. I've done a few ROP-style CTF pwn challenges in the past
> and can imagine how useful IBT is, but I hope it introduces less
> complexity to toolchains :)

2 PLTs is a small piece for CET run-time, comparing with kernel, GCC
and glibc, partially because ld has a very flexible PLT framework to
accommodate different PLT schemes with lazy PLT (the first PLT)
and non-lazy PLT (the second PLT).


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]