This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Relocations to use when eliding plts
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Richard Henderson <rth at redhat dot com>
- Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, Binutils <binutils at sourceware dot org>, libc-alpha <libc-alpha at sourceware dot org>
- Date: Wed, 27 May 2015 17:44:57 -0700
- Subject: Re: Relocations to use when eliding plts
- Authentication-results: sourceware.org; auth=none
- References: <5566232B dot 4080904 at redhat dot com>
On Wed, May 27, 2015 at 1:03 PM, Richard Henderson <rth@redhat.com> wrote:
> There's one problem with the couple of patches that I've seen go by wrt eliding
> PLTs with -z now, and relaxing inlined PLTs (aka -fno-plt):
>
> They're currently using the same relocations used by data, and thus the linker
> and dynamic linker must ensure that pointer equality is maintained. Which
> results in branch-to-branch-(to-branch) situations.
>
> E.g. the attached test case, in which main has a plt entry for function A in
> a.so, and the function B in b.so calls A.
>
> $ LD_BIND_NOW=1 gdb main
> ...
> (gdb) b b
> Breakpoint 1 at 0x400540
> (gdb) run
> Starting program: /home/rth/x/main
> Breakpoint 1, b () at b.c:2
> 2 void b(void) { a(); }
> (gdb) si
> 2 void b(void) { a(); }
> => 0x7ffff7bf75f4 <b+4>: callq 0x7ffff7bf74e0
> (gdb)
> 0x00007ffff7bf74e0 in ?? () from ./b.so
> => 0x7ffff7bf74e0: jmpq *0x20034a(%rip) # 0x7ffff7df7830
> (gdb)
> 0x0000000000400560 in a@plt ()
> => 0x400560 <a@plt>: jmpq *0x20057a(%rip) # 0x600ae0
> (gdb)
> a () at a.c:2
> 2 void a() { printf("Hello, World!\n"); }
> => 0x7ffff7df95f0 <a>: sub $0x8,%rsp
>
>
> If we use -fno-plt, we eliminate the first callq, but do still have two
> consecutive jmpq's.
>
> If seems to me that we ought to have different relocations when we're only
> going to use a pointer for branching, and when we need a pointer to be
> canonicalized for pointer comparisons.
>
> In the linked image, we already have these: R_X86_64_GLOB_DAT vs
> R_X86_64_JUMP_SLOT. Namely, GLOB_DAT implies "data" (and therefore pointer
> equality), while JUMP_SLOT implies "code" (and therefore we can resolve past
> plt stubs in the main executable).
>
> Which means that HJ's patch of May 16 (git hash 25070364), is less than ideal.
> I do like the smaller PLT entries, but I don't like the fact that it now emits
> GLOB_DAT for the relocations instead of JUMP_SLOT.
ld.so just does whatever is arranged by ld. I am not sure change ld.so
is a good idea. I don't what kind of optimization we can do when function
is called and its address it taken.
>
> In the relocatable image, when we're talking about -fno-plt, we should think
> about what relocation we'd like to emit. Yes, the existing R_X86_64_GOTPCREL
> works with existing toolchains, and there's something to be said for that.
> However, if we're talking about adding a new relocation for relaxing an
> indirect call via GOTPCREL, then:
>
> If we want -fno-plt to be able to hoist function addresses, then we're going to
> want the address that we load for the call to also not be subject to possible
> jump-to-jump.
>
> Unless we want the linker to do an unreasonable amount of x86 code examination
> in order to determine mov vs call for relaxation, we need two different
> relocations (preferably using the same assembler mnemonic, and thus the correct
> relocation is enforced by the assembler).
>
> On the users/hjl/relax branch (and posted on list somewhere), the new
> relocation is called R_X86_64_RELAX_GOTPCREL. I'm not keen on that "relax"
> name, despite that being exactly what it's for.
>
> I suggest R_X86_64_GOTPLTPCREL_{CALL,LOAD} for the two relocation names. That
> is, the address is in the .got.plt section, it's a pc-relative relocation, and
> it's being used by a call or load (mov) insn.
Since it is used for indirect call, how about R_X86_64_INBR_GOTPCREL?
I updated users/hjl/relax branch to covert relocation in *foo@GOTPCREL(%rip)
from R_X86_64_GOTPCREL to R_X86_64_RELAX_GOTPCREL so that
existing assembly code works automatically with a new binutils.
> With those two, we can fairly easily relax call/jmp to direct branches, and mov
> to lea. Yes, LTO can perform the same optimization, but I'll also agree that
> there are many projects for which LTO is both overkill and unworkable.
>
> This does leave open other optimization questions, mostly around weak
> functions. Consider constructs like
>
> if (foo) foo();
>
> Do we, within the compiler, try to CSE GOTPCREL and GOTPLTPCREL, accepting the
> possibility (not certainty) of jump-to-jump but definitely avoiding a separate
> load insn and the latency implied by that?
>
>
> Comments?
>
>
> r~
--
H.J.