This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
- From: Ilya Enkovich <enkovich dot gnu at gmail dot com>
- To: Ian Lance Taylor <iant at google dot com>
- Cc: Roland McGrath <roland at hack dot frob dot com>, "H.J. Lu" <hjl dot tools at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>, GCC Development <gcc at gcc dot gnu dot org>, Binutils <binutils at sourceware dot org>, "Girkar, Milind" <milind dot girkar at intel dot com>, "Kreitzer, David L" <david dot l dot kreitzer at intel dot com>
- Date: Thu, 25 Jul 2013 15:08:51 +0400
- Subject: Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
- References: <CAMe9rOp=1v38F_aV-pbv50YOGSEr_ju+byZP1L_G_h4bm5Ad3w at mail dot gmail dot com> <20130724233621 dot DA6942C08C at topped-with-meat dot com> <CAKOQZ8zn2KHayUrdKPOB0ys0Y794c5-t8Zw6hefeB5NGXTXKRw at mail dot gmail dot com>
2013/7/25 Ian Lance Taylor <iant@google.com>:
> On Wed, Jul 24, 2013 at 4:36 PM, Roland McGrath <roland@hack.frob.com> wrote:
>>
>> Will an MPX-using binary require an MPX-supporting dynamic linker to run
>> correctly?
>>
>> * An old dynamic linker won't clobber %bndN directly, so that's not a
>> problem.
>
> These are my answers and likely incorrect.
Hi,
I want add some comments to your answers.
>
> It will clobber the registers indirectly, though, as soon as it
> executes a branching instruction. The effect will be that calls from
> bnd-checked code to bnd-checked code through the dynamic linker will
> not succeed.
I would not say that call will fail. Some bound info will just be
lost. MPX binaries should still work correctly with old dynamic
linker. The problem here is that when you decrease level of MPX
support (use legacy dynamic linker, and legacy libraries) you decrease
a quality of bound violation detection. BTW if new PLT section is used
then table fixup after the first call will lead to correct bounds
transfer in subsequent calls.
>
> I have not yet seen the changes this will require to the ABI, but I'm
> making the natural assumptions: the first four pointer arguments to a
> function will be associated with a pair of bound registers, and
> similarly for a returned pointer. I don't know what the proposal is
> for struct parameters and return values.
The general idea is to use bound registers for pointers passed in
registers. It does not matter if this pointer is a part of the
structure. BND0 is used to return bounds for returned pointer.
Of course, there are some more details (e.g. when more than 4 pointers
are passed in registers or when vararg call is made).
>
>
>> * Does having the bounds registers set have any effect on regular/legacy
>> code, or only when bndc[lun] instructions are used?
>
> As far as I can tell, only when the bndXX instructions are used,
> though I'd be happy to hear otherwise.
As usually new registers affect context save/restore instructions.
>
>
>> If it doesn't affect normal instructions, then I don't entirely
>> understand why it would matter to clear %bnd* when entering or leaving
>> legacy code. Is it solely for the case of legacy code returning a
>> pointer value, so that the new code would expect the new ABI wherein
>> %bnd0 has been set to correspond to the pointer returned in %rax?
>
> There is no problem with clearing the bnd registers when calling in or
> out of legacy code. The issue is avoiding clearing the pointers when
> calling from bnd-enabled code to bnd-enabled code.
When legacy code returns a pointer we need to clear at least BND0 to
avoid wrong bounds for returned pointer.
We also may have a calls sequence mpx code -> legacy code -> mpx code.
In such case we have to clear all bound register before calling mpx
code from legacy code. Otherwise nested mpx code gets wrong bounds.
Thanks,
Ilya
>
>
>> * What's the effect of entering the dynamic linker via "bnd jmp"
>> (i.e. new MPX-using binary with new PLT, old dynamic linker)? The old
>> dynamic linker will leave %bndN et al exactly as they are, until its
>> first unadorned branching instruction implicitly clears them. So the
>> only problem would be if the work _dl_runtime_{resolve,profile} does
>> before its first branch/call were affected by the %bndN state.
>
> "It's not a problem."
>
>> In a related vein, what's the effect of entering some legacy code via
>> "bnd jmp" (i.e. new binary using PLT call into legacy DSO)?
>>
>> * If the state of %bndN et al does not affect legacy code directly, then
>> it's not a problem. The legacy code will eventually use an unadorned
>> branch instruction, and that will implicitly clear %bnd*. (Even if
>> it's a leaf function that's entirely branch-free, its return will
>> count as such an unadorned branch instruction.)
>
> Yes.
>
>> * If that's not the case, ....
>
> It is the case.
>
>> I can't tell if you are proposing that a single object might contain
>> both 16-byte and 32-byte PLT slots next to each other in the same .plt
>> section. That seems like a bad idea. I can think of two things off
>> hand that expect PLT entries to be of uniform size, and there may well
>> be more.
>>
>> * The foo@plt pseudo-symbols that e.g. objdump will display are based on
>> the BFD backend knowing the size of PLT entries. Arguably this ought
>> to look at sh_entsize of .plt instead of using baked-in knowledge, but
>> it doesn't.
>
> This seems fixable. Of course, we could also keep the PLT the same
> length by changing it. The current PLT entries are
>
> jmpq *GOT(sym)
> pushq offset
> jmpq plt0
>
> The linker or dynamic linker initializes *GOT(sym) to point to the
> second instruction in this sequence. So we can keep the PLT at 16
> bytes by simply changing it to jump somewhere else.
>
> bnd jmpq *GOT(sym)
> .skip 9
>
> We have the linker or dynamic linker fill in *GOT(sym) to point to the
> second PLT table. When the dynamic linker is involved, we use another
> DT tag to point to the second PLT. The offsets are consistent: there
> is one entry in each PLT table, so the dynamic linker can compute the
> right value. Then in the second PLT we have the sequence
>
> pushq offset
> bnd jmpq plt0
>
> That gives the dynamic linker the offset that it needs to update
> *GOT(sym) to point to the runtime symbol value. So we get slightly
> worse instruction cache handling the first time a function is called,
> but after that we are the same as before. And PLT entries are the
> same size as always so everything is simpler.
>
> The special DT tag will tell the dynamic linker to apply the special
> processing. No attribute is needed to change behaviour. The issue
> then is: a program linked in this way will not work with an old
> dynamic linker, because the old dynamic linker will not initialize
> GOT(sym) to the right value. That is a problem for any scheme, so I
> think that is OK. But if that is a concern, we could actually handle
> by generating two PLTs. One conventional PLT, and another as I just
> outlined. The linker branches to the new PLT, and initializes
> GOT(sym) to point to the old PLT. The dynamic linker spots this
> because it recognizes the new DT tags, and cunningly rewrites the GOT
> to point to the new PLT. Cost is an extra jump the first time a
> function is called when using the old dynamic linker.
>
> Ian