This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Allow pie links to create PLT entries


On Thu, Jan 29, 2015 at 3:13 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Jan 29, 2015 at 2:17 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Thu, Jan 29, 2015 at 12:17 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Thu, Jan 29, 2015 at 12:08 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> On Thu, Jan 29, 2015 at 11:48 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Thu, Jan 29, 2015 at 11:00 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>>     Here is a simple example that fails to link with -pie but which
>>>>>> should work just fine without having to use -fPIE.
>>>>>>
>>>>>> foo.cc
>>>>>> ======
>>>>>> int extern_func();
>>>>>> int main()
>>>>>> {
>>>>>>   extern_func();
>>>>>>   return 0;
>>>>>> }
>>>>>>
>>>>>> bar.cc
>>>>>> =====
>>>>>> int extern_func()
>>>>>> {
>>>>>>   return 1;
>>>>>> }
>>>>>>
>>>>>> $ g++ -fPIC -shared bar.cc -o libbar.so
>>>>>> $ g++ foo.cc -lbar -pie
>>>>>>
>>>>>> ld: error: foo.o: requires dynamic R_X86_64_PC32 reloc against
>>>>>> '_Z11extern_funcv' which may overflow at runtime; recompile with -fPIC
>>>>>>
>>>>>> It fails because the linker disallows creating a PLT for
>>>>>> R_X86_64_PC32 reloc when it is perfectly fine to do so.  Note that I
>>>>>> could have recompiled foo.cc with -fPIE or -fPIC but I still think
>>>>>> this can be allowed.  With support for copy relocations in pie in gold
>>>>>> and with this support, the cases where we would need to use -fPIE to
>>>>>> get working pie links is smaller.  This would help us link non-PIE
>>>>>> objects into pie executables.
>>>>>
>>>>> You can't do it for x86 since EBX isn't setup for calling via PLT.
>>>>> For x86-64, there should be little difference between PIE
>>>>> and non-PIE code.
>>>>
>>>> True but that little difference is sometimes causing non-trivial
>>>> performance penalties. With copyrelocations support for PIE added
>>>> recently, one big difference causing non-trivial performance penalty
>>>> went away.  However, there are still differences in the way global
>>>> arrays are accessed.  For instance,
>>>>
>>>> uint32 a[] = {1, 2, 3, 4}
>>>>
>>>> a[i] can be accessed with one insn without -fPIE, whereas with -fPIE,
>>>> we need two. One extra to get the 64-bit address of a.
>>>>
>>>> Without -fPIE:
>>>>
>>>> movslq   0x1655(%rip),%rax  # 401b80 <i>
>>>> mov    0x401b30(,%rax,4),%esi # a[i]
>
> If you link it with -pie, you will have TEXTREL in executable.
> Do you want relocations in text sections in PIE?
>
>>>> With -fPIE:
>>>>
>>>> movslq 0x16c5(%rip),%rdx        # <i>
>>>> lea    0x166e(%rip),%rax      # <&a>
>>>> mov    (%rax,%rdx,4),%esi   # a[i]
>>>>
>>>> I wish we could use just one insn to do the last two in the -fPIE
>>>> case, using PC-relative addressing like:
>>>> mov  0x166e(%rip, %rdx, 4), %esi
>>>
>>> Can you improve GCC codegen for this?
>>
>> I didnt find an instruction similar to that which I could use.  Is there one?
>>
>>  I implemented an
>>> optimization in ld to convert
>>>
>>>    mov foo@GOTPCREL(%rip), %reg
>>>    to
>>>    lea foo(%rip), %reg
>>>
>>> for the locally defined symbol, foo.  It improves PIE performance
>>> by as much as 10%.  You may want to implement it in gold.  See
>>> elf_x86_64_convert_mov_to_lea for details.
>>
>> Wow, this is cool! But, with copy relocations support for PIE, I think
>> this should be fixed since the compiler can safely assume that the
>> global is defined in the executable no matter what.  Do you have an
>> example where foo@GOTPCREL is still used for globals?
>>
>> foo.cc
>> ---------
>> extern int a;
>> int main()
>> {
>>   printf("%p", &a);
>> }
>>
>> Before copyrelocations support for PIE check in GCC:
>>
>> foo.s
>> ------
>>
>> ....
>> movq a@GOTPCREL(%rip), %rax
>> .....
>>
>> and after copyrelocs support:
>>
>> foo.s
>> ------
>>
>> .......
>> leaq a(%rip), %rsi
>> ......
>>
>> Did I miss something?
>>
>>
>
> If you don't have GOTPCREL relocations against locally
> defined symbols, this optimization won't apply.

The same libstdc++.so.6.0.21 from GCC 5 today on Linux/x86-64.
With ld.bfd:

[hjl@gnu-6 src]$ readelf -r /tmp/libstdc++.so.6.0.21 |wc -l
4659
[hjl@gnu-6 src]$

with ld.gold:

[hjl@gnu-6 src]$ readelf -r .libs/libstdc++.so.6.0.21 |wc -l
5516
[hjl@gnu-6 src]$

ld.bfd has another optimization:

commit dd7e64d45b317128f5fe813a8da0b13b4ad046ae
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Nov 25 05:05:39 2014 -0800

    Optimize out i386/x86-64 JUMP_SLOT relocation

    When there are both PLT and GOT references to the same function symbol,
    linker will create a GOTPLT slot for PLT entry and a GOT slot for GOT
    reference.  A run-time JUMP_SLOT relocation is created to update the
    GOTPLT slot and a run-time GLOB_DAT relocation is created to update the
    GOT slot.  Both JUMP_SLOT and GLOB_DAT relocations will apply the same
    symbol value to GOTPLT and GOT slots, respectively, at run-time.

    This optimization combines GOTPLT and GOT slots into a single GOT slot
    and removes the run-time JUMP_SLOT relocation.  It replaces the regular
    PLT entry:

      indirect jump  [GOTPLT slot]
      push     relocation index
      jump     PLT0

    with an GOT PLT entry with an indirect jump via the GOT slot:

      indirect jump  [GOT slot]
      nop

    and resolves PLT reference to the GOT PLT entry.

    We must avoid this optimization if pointer equality is needed since
    we don't clear symbol value in this case and the dynamic linker won't
    update the GOT slot.  Otherwise, the resulting binary will get into an
    infinite loop at run-time.

You may want to implement it in gold.

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]