Bug 22791 - PLT32 should be used for 32-bit PC-relative branches
Summary: PLT32 should be used for 32-bit PC-relative branches
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: gas (show other bugs)
Version: 2.31
: P2 normal
Target Milestone: 2.31
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: 24151
  Show dependency treegraph
 
Reported: 2018-02-05 23:01 UTC by Rafael Ávila de Espíndola
Modified: 2025-04-18 04:55 UTC (History)
5 users (show)

See Also:
Host:
Target: x86-64
Build:
Last reconfirmed: 2018-02-07 00:00:00


Attachments
testcase (2.20 KB, application/x-sharedlib)
2018-02-06 00:21 UTC, Rafael Ávila de Espíndola
Details
testcase that prints the address of a function (3.85 KB, application/x-xz)
2018-02-06 01:04 UTC, Rafael Ávila de Espíndola
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rafael Ávila de Espíndola 2018-02-05 23:01:17 UTC
The following testcase fails with gold

$ cat test.s
        .globl  _start
_start:
        callq   vcall
$ cat test2.s
        .global vcall
        .type vcall,@function
vcall:
        nop
$ ld.gold test2.o -o test2.so -shared
$ ld.gold test.o test2.so -o t -pie
ld.gold: error: test.o: requires dynamic R_X86_64_PC32 reloc against 'vcall' which may overflow at runtime; recompile with -fPIC

But there is no reason why trick of defining a symbol pointing to the plt entry would not work with a PIE.

Starting with c5bb8910e80c6cd80c63541f86471c18375c8198 bfd uses TEXTREL, but that seems less desirable than creating a symbol pointing to the plt entry as is done in the non -pie case.
Comment 1 Alan Modra 2018-02-05 23:24:42 UTC
> But there is no reason why trick of defining a symbol pointing to the plt entry > would not work with a PIE.

Does ld.so process SHN_UNDEF symbols with non-zero value properly in a PIE?
Comment 2 Rafael Ávila de Espíndola 2018-02-06 00:21:32 UTC
Created attachment 10786 [details]
testcase
Comment 3 Rafael Ávila de Espíndola 2018-02-06 00:22:32 UTC
It seems to work. The attached hello world has

Type:                              DYN (Shared object file)

 6: 00000000000011e0     0 FUNC    GLOBAL DEFAULT  UND puts@GLIBC_2.2.5

and it works.
Comment 4 Cary Coutant 2018-02-06 00:32:00 UTC
This would be fine if we knew for a fact that the relocation is on a call instruction. The problem is that R_X86_64_PC32 isn't always a call instruction.
Comment 5 Rafael Ávila de Espíndola 2018-02-06 01:04:16 UTC
(In reply to Cary Coutant from comment #4)
> This would be fine if we knew for a fact that the relocation is on a call
> instruction. The problem is that R_X86_64_PC32 isn't always a call
> instruction.

It still works. The library will see the address of the got entry.

I will upload a testcase.
Comment 6 Rafael Ávila de Espíndola 2018-02-06 01:04:49 UTC
Created attachment 10787 [details]
testcase that prints the address of a function
Comment 7 Rafael Ávila de Espíndola 2018-02-06 01:07:38 UTC
(In reply to Rafael Ávila de Espíndola from comment #5)
> (In reply to Cary Coutant from comment #4)
> > This would be fine if we knew for a fact that the relocation is on a call
> > instruction. The problem is that R_X86_64_PC32 isn't always a call
> > instruction.
> 
> It still works. The library will see the address of the got entry.
> 
> I will upload a testcase.

s/got/plt/
Comment 8 Cary Coutant 2018-02-06 01:29:27 UTC
(In reply to Rafael Ávila de Espíndola from comment #5)
> (In reply to Cary Coutant from comment #4)
> > This would be fine if we knew for a fact that the relocation is on a call
> > instruction. The problem is that R_X86_64_PC32 isn't always a call
> > instruction.
> 
> It still works. The library will see the address of the got entry.

Which won't necessarily be the "official" PLT entry. Function point comparison may break.
Comment 9 Rafael Ávila de Espíndola 2018-02-06 01:49:33 UTC
(In reply to Cary Coutant from comment #8)
> (In reply to Rafael Ávila de Espíndola from comment #5)
> > (In reply to Cary Coutant from comment #4)
> > > This would be fine if we knew for a fact that the relocation is on a call
> > > instruction. The problem is that R_X86_64_PC32 isn't always a call
> > > instruction.
> > 
> > It still works. The library will see the address of the got entry.
> 
> Which won't necessarily be the "official" PLT entry. Function point
> comparison may break.

Not sure what you mean by "official". The testcase I uploaded computes address of an external function foo with

        movslq  bar(%rip), %rax
        leaq    bar(%rip), %rdi
        addq    %rax, %rdi
...
        .data
        .p2align        3
bar:
        .long   foo - .

and in a shared library foo is defined as

void foo(void *bar) {
  printf("%p %p\n", bar, foo);
}

I always get the same value printed twice.
Comment 10 Cary Coutant 2018-02-06 03:38:40 UTC
The "official" or "canonical" PLT entry is the one that serves as the address of the function throughout the program.

If you make the function protected in your shared library, I think the PIC sequence to get its address will always load the address of the function itself, while the PC32 relocation in the main program will have no alternative but to load the address of its own PLT entry. (Gnu ld will refuse to build a shared library with a PC32 ref to a protected symbol, but gold will build it.)

That's probably not the only way to break function pointer comparison. I'm just really leery of treating this relocation as PIC-/PIE-compatible.
Comment 11 Rafael Ávila de Espíndola 2018-02-06 04:28:47 UTC
(In reply to Cary Coutant from comment #10)
> The "official" or "canonical" PLT entry is the one that serves as the
> address of the function throughout the program.

OK, so we were talking about the same plt entry.

> If you make the function protected in your shared library, I think the PIC
> sequence to get its address will always load the address of the function
> itself, while the PC32 relocation in the main program will have no
> alternative but to load the address of its own PLT entry. (Gnu ld will
> refuse to build a shared library with a PC32 ref to a protected symbol, but
> gold will build it.)

That is what I would expect. Using a PLT entry in the main executable as the canonical PLT entry should work in the same cases where a copy relocation works: The use is in the main program and the definition has default visibility.
Comment 12 H.J. Lu 2018-02-06 13:36:48 UTC
Since there is no need to prepare for PLT branch on x86-64, we can treat
PC32 relocation with branch as PLT32 relocation.  I posted a patch for ld:

https://sourceware.org/ml/binutils/2018-02/msg00065.html
Comment 13 Cary Coutant 2018-02-06 18:08:27 UTC
(In reply to H.J. Lu from comment #12)
> Since there is no need to prepare for PLT branch on x86-64, we can treat
> PC32 relocation with branch as PLT32 relocation.  I posted a patch for ld:
> 
> https://sourceware.org/ml/binutils/2018-02/msg00065.html

What do you mean by "prepare for PLT branch"?

It looks like your patch examines the opcode(s) preceding the relocated value to determine whether it's a branch or not. How is this safe? It seems fragile to me. In addition, it requires the linker to read section contents while scanning relocations, which is not normally necessary, and will slow the linker down significantly -- we normally don't need to read most sections' contents until it's time to apply relocations.

If your argument is that you can always treat PC32 relocations on branches as if they were PLT32 relocations, why not just have the compiler emit PLT32 relocations in the first place?
Comment 14 H.J. Lu 2018-02-06 18:50:50 UTC
(In reply to Cary Coutant from comment #13)
> (In reply to H.J. Lu from comment #12)
> > Since there is no need to prepare for PLT branch on x86-64, we can treat
> > PC32 relocation with branch as PLT32 relocation.  I posted a patch for ld:
> > 
> > https://sourceware.org/ml/binutils/2018-02/msg00065.html
> 
> What do you mean by "prepare for PLT branch"?

On i386, there are 2 types of PLTs, PIC and non-PIC.  PIE and shared objects
must use PIC PLT.  To use PIC PLT, you need to load _GLOBAL_OFFSET_TABLE_
into EBX first.  There is no need for that on x86-64.

> It looks like your patch examines the opcode(s) preceding the relocated
> value to determine whether it's a branch or not. How is this safe? It seems
> fragile to me. In addition, it requires the linker to read section contents
> while scanning relocations, which is not normally necessary, and will slow
> the linker down significantly -- we normally don't need to read most
> sections' contents until it's time to apply relocations.
> 
> If your argument is that you can always treat PC32 relocations on branches
> as if they were PLT32 relocations, why not just have the compiler emit PLT32
> relocations in the first place?

Yes, this patch does that:

https://sourceware.org/ml/binutils/2018-02/msg00074.html
Comment 15 H.J. Lu 2018-02-07 14:16:05 UTC
On i386, there are 2 types of PLTs, PIC and non-PIC.  PIE and shared objects 
must use PIC PLT.  To use PIC PLT, you need to load _GLOBAL_OFFSET_TABLE_ 
into EBX first.  There is no need for that on x86-64 since x86-64 uses 
PC-relative PLT. 

For 32-bit PC-relative branches, we can generate PLT32 relocation, instead of 
PC32 relocation, which can also be used as a marker for 32-bit PC-relative 
branches.   Linker can always reduce PLT32 relocation to PC32 if function is 
defined locally.   Local functions should use PC32 relocation.
Comment 16 Rafael Ávila de Espíndola 2018-02-07 21:42:43 UTC
> > If your argument is that you can always treat PC32 relocations on branches
> > as if they were PLT32 relocations, why not just have the compiler emit PLT32
> > relocations in the first place?
> 
> Yes, this patch does that:
> 
> https://sourceware.org/ml/binutils/2018-02/msg00074.html

If I understand it correctly, the idea is to generate a R_X86_64_PLT32 for "call foo" but a R_86_64_PC32 for ".long foo - .". If that is indeed what the patch is doing, I love it :-)
Comment 17 H.J. Lu 2018-02-07 22:11:36 UTC
(In reply to Rafael Ávila de Espíndola from comment #16)
> > > If your argument is that you can always treat PC32 relocations on branches
> > > as if they were PLT32 relocations, why not just have the compiler emit PLT32
> > > relocations in the first place?
> > 
> > Yes, this patch does that:
> > 
> > https://sourceware.org/ml/binutils/2018-02/msg00074.html
> 
> If I understand it correctly, the idea is to generate a R_X86_64_PLT32 for
> "call foo" but a R_86_64_PC32 for ".long foo - .". If that is indeed what
> the patch is doing, I love it :-)

Yes, that is what patch does.  My current patch is at

https://github.com/hjl-tools/binutils-gdb/commit/80de23823c97db558436c9ad13d6728f43329706
Comment 18 Sourceware Commits 2018-02-13 15:35:39 UTC
The master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=bd7ab16b4537788ad53521c45469a1bdae84ad4a

commit bd7ab16b4537788ad53521c45469a1bdae84ad4a
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Feb 13 07:34:22 2018 -0800

    x86-64: Generate branch with PLT32 relocation
    
    Since there is no need to prepare for PLT branch on x86-64, generate
    R_X86_64_PLT32, instead of R_X86_64_PC32, if possible, which can be
    used as a marker for 32-bit PC-relative branches.
    
    To compile Linux kernel, this patch:
    
    From: "H.J. Lu" <hjl.tools@gmail.com>
    Subject: [PATCH] x86: Treat R_X86_64_PLT32 as R_X86_64_PC32
    
    On i386, there are 2 types of PLTs, PIC and non-PIC.  PIE and shared
    objects must use PIC PLT.  To use PIC PLT, you need to load
    _GLOBAL_OFFSET_TABLE_ into EBX first.  There is no need for that on
    x86-64 since x86-64 uses PC-relative PLT.
    
    On x86-64, for 32-bit PC-relative branches, we can generate PLT32
    relocation, instead of PC32 relocation, which can also be used as
    a marker for 32-bit PC-relative branches.  Linker can always reduce
    PLT32 relocation to PC32 if function is defined locally.   Local
    functions should use PC32 relocation.  As far as Linux kernel is
    concerned, R_X86_64_PLT32 can be treated the same as R_X86_64_PC32
    since Linux kernel doesn't use PLT.
    
    is needed.  It is available on hjl/plt32/master branch at
    
    https://github.com/hjl-tools/linux
    
    bfd/
    
    	PR gas/22791
    	* elf64-x86-64.c (is_32bit_relative_branch): Removed.
    	(elf_x86_64_relocate_section): Check PIC relocations in PIE.
    	Remove is_32bit_relative_branch usage.  Disallow PC32 reloc
    	against protected function in shared object.
    
    gas/
    
    	PR gas/22791
    	* config/tc-i386.c (need_plt32_p): New function.
    	(output_jump): Generate BFD_RELOC_X86_64_PLT32 if possible.
    	(md_estimate_size_before_relax): Likewise.
    	* testsuite/gas/i386/reloc64.d: Updated.
    	* testsuite/gas/i386/x86-64-jump.d: Likewise.
    	* testsuite/gas/i386/x86-64-mpx-branch-1.d: Likewise.
    	* testsuite/gas/i386/x86-64-mpx-branch-2.d: Likewise.
    	* testsuite/gas/i386/x86-64-relax-2.d: Likewise.
    	* testsuite/gas/i386/x86-64-relax-3.d: Likewise.
    	* testsuite/gas/i386/ilp32/reloc64.d: Likewise.
    	* testsuite/gas/i386/ilp32/x86-64-branch.d: Likewise.
    
    ld/
    
    	PR gas/22791
    	* testsuite/ld-x86-64/mpx1c.rd: Updated.
    	* testsuite/ld-x86-64/pr22791-1.err: New file.
    	* testsuite/ld-x86-64/pr22791-1a.c: Likewise.
    	* testsuite/ld-x86-64/pr22791-1b.s: Likewise.
    	* testsuite/ld-x86-64/pr22791-2.rd: Likewise.
    	* testsuite/ld-x86-64/pr22791-2a.s: Likewise.
    	* testsuite/ld-x86-64/pr22791-2b.c: Likewise.
    	* testsuite/ld-x86-64/pr22791-2c.s: Likewise.
    	* testsuite/ld-x86-64/x86-64.exp: Run PR ld/22791 tests.
Comment 19 H.J. Lu 2018-02-13 15:37:09 UTC
Fixed for 2.31.
Comment 20 Rafael Ávila de Espíndola 2018-02-13 17:09:32 UTC
Note that while the assembler change is a nice improvement, the original issue still exists.

In the testcase that I attached before, the call to foo is now assembled to R_X86_64_PLT32, but

        .long   foo - .

Still produces a R_X86_64_PC32 as it has to.

Since foo has default visibility, the linker could create a canonical plt entry to be the runtime address of foo.
Comment 21 H.J. Lu 2018-02-13 17:23:16 UTC
(In reply to Rafael Ávila de Espíndola from comment #20)
> Note that while the assembler change is a nice improvement, the original
> issue still exists.
> 
> In the testcase that I attached before, the call to foo is now assembled to
> R_X86_64_PLT32, but
> 
>         .long   foo - .
> 
> Still produces a R_X86_64_PC32 as it has to.
> 
> Since foo has default visibility, the linker could create a canonical plt
> entry to be the runtime address of foo.

Please open a separate bug for this.
Comment 22 Rafael Ávila de Espíndola 2018-02-13 19:30:05 UTC
(In reply to H.J. Lu from comment #21)
> (In reply to Rafael Ávila de Espíndola from comment #20)
> > Note that while the assembler change is a nice improvement, the original
> > issue still exists.
> > 
> > In the testcase that I attached before, the call to foo is now assembled to
> > R_X86_64_PLT32, but
> > 
> >         .long   foo - .
> > 
> > Still produces a R_X86_64_PC32 as it has to.
> > 
> > Since foo has default visibility, the linker could create a canonical plt
> > entry to be the runtime address of foo.
> 
> Please open a separate bug for this.

https://sourceware.org/bugzilla/show_bug.cgi?id=22842
Comment 23 Sourceware Commits 2019-04-10 07:51:32 UTC
The master branch has been updated by Rainer Orth <ro@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=a5def729be2596496aec225e843903b25c672e01

commit a5def729be2596496aec225e843903b25c672e01
Author: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
Date:   Wed Apr 10 09:48:43 2019 +0200

    Disable R_X86_64_PLT32 generation as branch marker on Solaris/x86
    
    The fix H.J. implemented for PR gas/22791 in the thread starting at
    
    	[PATCH] x86-64: Treat PC32 relocation with branch as PLT32
    	https://sourceware.org/ml/binutils/2018-02/msg00065.html
    
    is causing problems on Solaris/x86.  The native linker is strongly
    preferred there, and there's no intention of implementing the linker
    optimization he plans there.  Besides, the kernel runtime linker,
    otherwise has no need to deal with that reloc at all, and instead of
    adding (possibly even more) workarounds with no benefit, it seems
    appropriate to disable the R_X86_64_PLT32 generation as branch marker on
    Solaris/x86 in the first place.
    
    The patch itself is trivial, the only complication is adapting the
    testsuite.  Since I've found no way to have conditional sections in the
    .d files, I've instead used the solution already found elsewhere of
    having separate .d files for the affected tests in an i386/solaris
    subdirectory and skipping the original ones.
    
    Tested on amd64-pc-solaris2.11 and x86_64-pc-linux-gnu without
    regressions.
    
    	* config/tc-i386.c (need_plt32_p) [TE_SOLARIS]: Return FALSE.
    	* testsuite/gas/i386/solaris/solaris.exp: New driver.
    	* testsuite/gas/i386/solaris/reloc64.d,
    	testsuite/gas/i386/solaris/x86-64-jump.d,
    	testsuite/gas/i386/solaris/x86-64-mpx-branch-1.d,
    	testsuite/gas/i386/solaris/x86-64-mpx-branch-2.d,
    	testsuite/gas/i386/solaris/x86-64-nop-3.d,
    	testsuite/gas/i386/solaris/x86-64-nop-4.d,
    	testsuite/gas/i386/solaris/x86-64-nop-5.d,
    	testsuite/gas/i386/solaris/x86-64-relax-2.d,
    	testsuite/gas/i386/solaris/x86-64-relax-3.d: New tests.
    	* testsuite/gas/i386/reloc64.d,
    	testsuite/gas/i386/x86-64-jump.d,
    	testsuite/gas/i386/x86-64-mpx-branch-1.d,
    	testsuite/gas/i386/x86-64-mpx-branch-2.d,
    	testsuite/gas/i386/x86-64-nop-3.d,
    	testsuite/gas/i386/x86-64-nop-4.d,
    	testsuite/gas/i386/x86-64-nop-5.d,
    	testsuite/gas/i386/x86-64-relax-2.d,
    	testsuite/gas/i386/x86-64-relax-3.d: Skip on *-*-solaris*.