The following testcase fails with gold $ cat test.s .globl _start _start: callq vcall $ cat test2.s .global vcall .type vcall,@function vcall: nop $ ld.gold test2.o -o test2.so -shared $ ld.gold test.o test2.so -o t -pie ld.gold: error: test.o: requires dynamic R_X86_64_PC32 reloc against 'vcall' which may overflow at runtime; recompile with -fPIC But there is no reason why trick of defining a symbol pointing to the plt entry would not work with a PIE. Starting with c5bb8910e80c6cd80c63541f86471c18375c8198 bfd uses TEXTREL, but that seems less desirable than creating a symbol pointing to the plt entry as is done in the non -pie case.
> But there is no reason why trick of defining a symbol pointing to the plt entry > would not work with a PIE. Does ld.so process SHN_UNDEF symbols with non-zero value properly in a PIE?
Created attachment 10786 [details] testcase
It seems to work. The attached hello world has Type: DYN (Shared object file) 6: 00000000000011e0 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5 and it works.
This would be fine if we knew for a fact that the relocation is on a call instruction. The problem is that R_X86_64_PC32 isn't always a call instruction.
(In reply to Cary Coutant from comment #4) > This would be fine if we knew for a fact that the relocation is on a call > instruction. The problem is that R_X86_64_PC32 isn't always a call > instruction. It still works. The library will see the address of the got entry. I will upload a testcase.
Created attachment 10787 [details] testcase that prints the address of a function
(In reply to Rafael Ávila de Espíndola from comment #5) > (In reply to Cary Coutant from comment #4) > > This would be fine if we knew for a fact that the relocation is on a call > > instruction. The problem is that R_X86_64_PC32 isn't always a call > > instruction. > > It still works. The library will see the address of the got entry. > > I will upload a testcase. s/got/plt/
(In reply to Rafael Ávila de Espíndola from comment #5) > (In reply to Cary Coutant from comment #4) > > This would be fine if we knew for a fact that the relocation is on a call > > instruction. The problem is that R_X86_64_PC32 isn't always a call > > instruction. > > It still works. The library will see the address of the got entry. Which won't necessarily be the "official" PLT entry. Function point comparison may break.
(In reply to Cary Coutant from comment #8) > (In reply to Rafael Ávila de Espíndola from comment #5) > > (In reply to Cary Coutant from comment #4) > > > This would be fine if we knew for a fact that the relocation is on a call > > > instruction. The problem is that R_X86_64_PC32 isn't always a call > > > instruction. > > > > It still works. The library will see the address of the got entry. > > Which won't necessarily be the "official" PLT entry. Function point > comparison may break. Not sure what you mean by "official". The testcase I uploaded computes address of an external function foo with movslq bar(%rip), %rax leaq bar(%rip), %rdi addq %rax, %rdi ... .data .p2align 3 bar: .long foo - . and in a shared library foo is defined as void foo(void *bar) { printf("%p %p\n", bar, foo); } I always get the same value printed twice.
The "official" or "canonical" PLT entry is the one that serves as the address of the function throughout the program. If you make the function protected in your shared library, I think the PIC sequence to get its address will always load the address of the function itself, while the PC32 relocation in the main program will have no alternative but to load the address of its own PLT entry. (Gnu ld will refuse to build a shared library with a PC32 ref to a protected symbol, but gold will build it.) That's probably not the only way to break function pointer comparison. I'm just really leery of treating this relocation as PIC-/PIE-compatible.
(In reply to Cary Coutant from comment #10) > The "official" or "canonical" PLT entry is the one that serves as the > address of the function throughout the program. OK, so we were talking about the same plt entry. > If you make the function protected in your shared library, I think the PIC > sequence to get its address will always load the address of the function > itself, while the PC32 relocation in the main program will have no > alternative but to load the address of its own PLT entry. (Gnu ld will > refuse to build a shared library with a PC32 ref to a protected symbol, but > gold will build it.) That is what I would expect. Using a PLT entry in the main executable as the canonical PLT entry should work in the same cases where a copy relocation works: The use is in the main program and the definition has default visibility.
Since there is no need to prepare for PLT branch on x86-64, we can treat PC32 relocation with branch as PLT32 relocation. I posted a patch for ld: https://sourceware.org/ml/binutils/2018-02/msg00065.html
(In reply to H.J. Lu from comment #12) > Since there is no need to prepare for PLT branch on x86-64, we can treat > PC32 relocation with branch as PLT32 relocation. I posted a patch for ld: > > https://sourceware.org/ml/binutils/2018-02/msg00065.html What do you mean by "prepare for PLT branch"? It looks like your patch examines the opcode(s) preceding the relocated value to determine whether it's a branch or not. How is this safe? It seems fragile to me. In addition, it requires the linker to read section contents while scanning relocations, which is not normally necessary, and will slow the linker down significantly -- we normally don't need to read most sections' contents until it's time to apply relocations. If your argument is that you can always treat PC32 relocations on branches as if they were PLT32 relocations, why not just have the compiler emit PLT32 relocations in the first place?
(In reply to Cary Coutant from comment #13) > (In reply to H.J. Lu from comment #12) > > Since there is no need to prepare for PLT branch on x86-64, we can treat > > PC32 relocation with branch as PLT32 relocation. I posted a patch for ld: > > > > https://sourceware.org/ml/binutils/2018-02/msg00065.html > > What do you mean by "prepare for PLT branch"? On i386, there are 2 types of PLTs, PIC and non-PIC. PIE and shared objects must use PIC PLT. To use PIC PLT, you need to load _GLOBAL_OFFSET_TABLE_ into EBX first. There is no need for that on x86-64. > It looks like your patch examines the opcode(s) preceding the relocated > value to determine whether it's a branch or not. How is this safe? It seems > fragile to me. In addition, it requires the linker to read section contents > while scanning relocations, which is not normally necessary, and will slow > the linker down significantly -- we normally don't need to read most > sections' contents until it's time to apply relocations. > > If your argument is that you can always treat PC32 relocations on branches > as if they were PLT32 relocations, why not just have the compiler emit PLT32 > relocations in the first place? Yes, this patch does that: https://sourceware.org/ml/binutils/2018-02/msg00074.html
On i386, there are 2 types of PLTs, PIC and non-PIC. PIE and shared objects must use PIC PLT. To use PIC PLT, you need to load _GLOBAL_OFFSET_TABLE_ into EBX first. There is no need for that on x86-64 since x86-64 uses PC-relative PLT. For 32-bit PC-relative branches, we can generate PLT32 relocation, instead of PC32 relocation, which can also be used as a marker for 32-bit PC-relative branches. Linker can always reduce PLT32 relocation to PC32 if function is defined locally. Local functions should use PC32 relocation.
> > If your argument is that you can always treat PC32 relocations on branches > > as if they were PLT32 relocations, why not just have the compiler emit PLT32 > > relocations in the first place? > > Yes, this patch does that: > > https://sourceware.org/ml/binutils/2018-02/msg00074.html If I understand it correctly, the idea is to generate a R_X86_64_PLT32 for "call foo" but a R_86_64_PC32 for ".long foo - .". If that is indeed what the patch is doing, I love it :-)
(In reply to Rafael Ávila de Espíndola from comment #16) > > > If your argument is that you can always treat PC32 relocations on branches > > > as if they were PLT32 relocations, why not just have the compiler emit PLT32 > > > relocations in the first place? > > > > Yes, this patch does that: > > > > https://sourceware.org/ml/binutils/2018-02/msg00074.html > > If I understand it correctly, the idea is to generate a R_X86_64_PLT32 for > "call foo" but a R_86_64_PC32 for ".long foo - .". If that is indeed what > the patch is doing, I love it :-) Yes, that is what patch does. My current patch is at https://github.com/hjl-tools/binutils-gdb/commit/80de23823c97db558436c9ad13d6728f43329706
The master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=bd7ab16b4537788ad53521c45469a1bdae84ad4a commit bd7ab16b4537788ad53521c45469a1bdae84ad4a Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Feb 13 07:34:22 2018 -0800 x86-64: Generate branch with PLT32 relocation Since there is no need to prepare for PLT branch on x86-64, generate R_X86_64_PLT32, instead of R_X86_64_PC32, if possible, which can be used as a marker for 32-bit PC-relative branches. To compile Linux kernel, this patch: From: "H.J. Lu" <hjl.tools@gmail.com> Subject: [PATCH] x86: Treat R_X86_64_PLT32 as R_X86_64_PC32 On i386, there are 2 types of PLTs, PIC and non-PIC. PIE and shared objects must use PIC PLT. To use PIC PLT, you need to load _GLOBAL_OFFSET_TABLE_ into EBX first. There is no need for that on x86-64 since x86-64 uses PC-relative PLT. On x86-64, for 32-bit PC-relative branches, we can generate PLT32 relocation, instead of PC32 relocation, which can also be used as a marker for 32-bit PC-relative branches. Linker can always reduce PLT32 relocation to PC32 if function is defined locally. Local functions should use PC32 relocation. As far as Linux kernel is concerned, R_X86_64_PLT32 can be treated the same as R_X86_64_PC32 since Linux kernel doesn't use PLT. is needed. It is available on hjl/plt32/master branch at https://github.com/hjl-tools/linux bfd/ PR gas/22791 * elf64-x86-64.c (is_32bit_relative_branch): Removed. (elf_x86_64_relocate_section): Check PIC relocations in PIE. Remove is_32bit_relative_branch usage. Disallow PC32 reloc against protected function in shared object. gas/ PR gas/22791 * config/tc-i386.c (need_plt32_p): New function. (output_jump): Generate BFD_RELOC_X86_64_PLT32 if possible. (md_estimate_size_before_relax): Likewise. * testsuite/gas/i386/reloc64.d: Updated. * testsuite/gas/i386/x86-64-jump.d: Likewise. * testsuite/gas/i386/x86-64-mpx-branch-1.d: Likewise. * testsuite/gas/i386/x86-64-mpx-branch-2.d: Likewise. * testsuite/gas/i386/x86-64-relax-2.d: Likewise. * testsuite/gas/i386/x86-64-relax-3.d: Likewise. * testsuite/gas/i386/ilp32/reloc64.d: Likewise. * testsuite/gas/i386/ilp32/x86-64-branch.d: Likewise. ld/ PR gas/22791 * testsuite/ld-x86-64/mpx1c.rd: Updated. * testsuite/ld-x86-64/pr22791-1.err: New file. * testsuite/ld-x86-64/pr22791-1a.c: Likewise. * testsuite/ld-x86-64/pr22791-1b.s: Likewise. * testsuite/ld-x86-64/pr22791-2.rd: Likewise. * testsuite/ld-x86-64/pr22791-2a.s: Likewise. * testsuite/ld-x86-64/pr22791-2b.c: Likewise. * testsuite/ld-x86-64/pr22791-2c.s: Likewise. * testsuite/ld-x86-64/x86-64.exp: Run PR ld/22791 tests.
Fixed for 2.31.
Note that while the assembler change is a nice improvement, the original issue still exists. In the testcase that I attached before, the call to foo is now assembled to R_X86_64_PLT32, but .long foo - . Still produces a R_X86_64_PC32 as it has to. Since foo has default visibility, the linker could create a canonical plt entry to be the runtime address of foo.
(In reply to Rafael Ávila de Espíndola from comment #20) > Note that while the assembler change is a nice improvement, the original > issue still exists. > > In the testcase that I attached before, the call to foo is now assembled to > R_X86_64_PLT32, but > > .long foo - . > > Still produces a R_X86_64_PC32 as it has to. > > Since foo has default visibility, the linker could create a canonical plt > entry to be the runtime address of foo. Please open a separate bug for this.
(In reply to H.J. Lu from comment #21) > (In reply to Rafael Ávila de Espíndola from comment #20) > > Note that while the assembler change is a nice improvement, the original > > issue still exists. > > > > In the testcase that I attached before, the call to foo is now assembled to > > R_X86_64_PLT32, but > > > > .long foo - . > > > > Still produces a R_X86_64_PC32 as it has to. > > > > Since foo has default visibility, the linker could create a canonical plt > > entry to be the runtime address of foo. > > Please open a separate bug for this. https://sourceware.org/bugzilla/show_bug.cgi?id=22842
The master branch has been updated by Rainer Orth <ro@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=a5def729be2596496aec225e843903b25c672e01 commit a5def729be2596496aec225e843903b25c672e01 Author: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> Date: Wed Apr 10 09:48:43 2019 +0200 Disable R_X86_64_PLT32 generation as branch marker on Solaris/x86 The fix H.J. implemented for PR gas/22791 in the thread starting at [PATCH] x86-64: Treat PC32 relocation with branch as PLT32 https://sourceware.org/ml/binutils/2018-02/msg00065.html is causing problems on Solaris/x86. The native linker is strongly preferred there, and there's no intention of implementing the linker optimization he plans there. Besides, the kernel runtime linker, otherwise has no need to deal with that reloc at all, and instead of adding (possibly even more) workarounds with no benefit, it seems appropriate to disable the R_X86_64_PLT32 generation as branch marker on Solaris/x86 in the first place. The patch itself is trivial, the only complication is adapting the testsuite. Since I've found no way to have conditional sections in the .d files, I've instead used the solution already found elsewhere of having separate .d files for the affected tests in an i386/solaris subdirectory and skipping the original ones. Tested on amd64-pc-solaris2.11 and x86_64-pc-linux-gnu without regressions. * config/tc-i386.c (need_plt32_p) [TE_SOLARIS]: Return FALSE. * testsuite/gas/i386/solaris/solaris.exp: New driver. * testsuite/gas/i386/solaris/reloc64.d, testsuite/gas/i386/solaris/x86-64-jump.d, testsuite/gas/i386/solaris/x86-64-mpx-branch-1.d, testsuite/gas/i386/solaris/x86-64-mpx-branch-2.d, testsuite/gas/i386/solaris/x86-64-nop-3.d, testsuite/gas/i386/solaris/x86-64-nop-4.d, testsuite/gas/i386/solaris/x86-64-nop-5.d, testsuite/gas/i386/solaris/x86-64-relax-2.d, testsuite/gas/i386/solaris/x86-64-relax-3.d: New tests. * testsuite/gas/i386/reloc64.d, testsuite/gas/i386/x86-64-jump.d, testsuite/gas/i386/x86-64-mpx-branch-1.d, testsuite/gas/i386/x86-64-mpx-branch-2.d, testsuite/gas/i386/x86-64-nop-3.d, testsuite/gas/i386/x86-64-nop-4.d, testsuite/gas/i386/x86-64-nop-5.d, testsuite/gas/i386/x86-64-relax-2.d, testsuite/gas/i386/x86-64-relax-3.d: Skip on *-*-solaris*.