Bug 27169 - i386: Emit R_386_PLT32 instead of R_386_PC32 for `call/jmp foo`
Summary: i386: Emit R_386_PLT32 instead of R_386_PC32 for `call/jmp foo`
Status: RESOLVED WONTFIX
Alias: None
Product: binutils
Classification: Unclassified
Component: gas (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-10 00:54 UTC by Fangrui Song
Modified: 2022-06-08 20:06 UTC (History)
1 user (show)

See Also:
Host:
Target: i386-*
Build:
Last reconfirmed:


Attachments
A patch to generate R_386_PLT32 (591 bytes, patch)
2021-01-10 22:55 UTC, H.J. Lu
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Fangrui Song 2021-01-10 00:54:00 UTC
gcc i386 -fno-pic emits `call/jmp foo`, which produces an R_386_PC32 relocation, which is indistinguishable from an address taken operation. If the symbol turns out to be external, the linker has to employ a tricky called "canonical PLT entry" (st_shndx=0, st_value!=0), which is similar to a copy relocation.

If a shared object is linked with -Bsymbolic or --dynamic-list and defines a function symbol which needs to be interposed by a canonical PLT entry, the linker could error (the address of the symbol may be different in the shared object and in the executable).

i386 needs a change similar to https://sourceware.org/bugzilla/show_bug.cgi?id=22791
Comment 1 H.J. Lu 2021-01-10 13:34:04 UTC
Since i386 doesn't have IP-relative addressing, non-PIC PLT is different
from PIC PLT.  Using R_386_PLT32 for "call/jmp foo" isn't appreciate.
Comment 2 Fangrui Song 2021-01-10 18:05:08 UTC
(In reply to H.J. Lu from comment #1)
> Since i386 doesn't have IP-relative addressing, non-PIC PLT is different
> from PIC PLT.  Using R_386_PLT32 for "call/jmp foo" isn't appreciate.

I know that this is a convention using R_386_PC32 for non-PIC PLT and R_386_PLT32. It is artificial and assembler/linker/ld.so do not need this convention for interop.

On most other architectures branch relocation types are distinguishable from address taken relocation types (direct access).
Comment 3 H.J. Lu 2021-01-10 19:24:24 UTC
(In reply to Fangrui Song from comment #2)
> (In reply to H.J. Lu from comment #1)
> > Since i386 doesn't have IP-relative addressing, non-PIC PLT is different
> > from PIC PLT.  Using R_386_PLT32 for "call/jmp foo" isn't appreciate.
> 
> I know that this is a convention using R_386_PC32 for non-PIC PLT and
> R_386_PLT32. It is artificial and assembler/linker/ld.so do not need this
> convention for interop.

R_386_PLT32 should be used with the EBX based PLT and "call foo" doesn't
require setting up EBX for PLT.

> On most other architectures branch relocation types are distinguishable from
> address taken relocation types (direct access).

This can't be fixed with R_386_PLT32.
Comment 4 Fangrui Song 2021-01-10 21:43:22 UTC
(In reply to H.J. Lu from comment #3)
> (In reply to Fangrui Song from comment #2)
> > (In reply to H.J. Lu from comment #1)
> > > Since i386 doesn't have IP-relative addressing, non-PIC PLT is different
> > > from PIC PLT.  Using R_386_PLT32 for "call/jmp foo" isn't appreciate.
> > 
> > I know that this is a convention using R_386_PC32 for non-PIC PLT and
> > R_386_PLT32. It is artificial and assembler/linker/ld.so do not need this
> > convention for interop.
> 
> R_386_PLT32 should be used with the EBX based PLT and "call foo" doesn't
> require setting up EBX for PLT.
> 
> > On most other architectures branch relocation types are distinguishable from
> > address taken relocation types (direct access).
> 
> This can't be fixed with R_386_PLT32.

Does GNU ld use R_386_PC32/R_386_PLT32 to decide whether a non-PIC PLT or a PIC PLT should be used? It can use a non-PIC PLT in -no-pie mode and a PIC PLT in -pie/-shared mode. Then branch R_386_PC32 can be freely converted to PLT32.

# a.s
call foo
# b.s
call foo@plt

gcc -fno-pic a.s -shared -o a.so -fuse-ld=bfd
gcc -fno-pic b.s -shared -o b.so -fuse-ld=bfd
do not have instruction difference.
Comment 5 Fangrui Song 2021-01-10 21:45:38 UTC
Sorry

# a.s
.globl main
main:
  call puts

# b.s
.globl main
main:
  call puts@plt

gcc -m32 -no-pie a.s -o a -fuse-ld=bfd
gcc -m32 -no-pie b.s -o b -fuse-ld=bfd
do not have instruction difference.
Comment 6 H.J. Lu 2021-01-10 22:55:14 UTC
Created attachment 13109 [details]
A patch to generate R_386_PLT32

I tried this patch when I made:

commit bd7ab16b4537788ad53521c45469a1bdae84ad4a
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Feb 13 07:34:22 2018 -0800

    x86-64: Generate branch with PLT32 relocation
    
    Since there is no need to prepare for PLT branch on x86-64, generate
    R_X86_64_PLT32, instead of R_X86_64_PC32, if possible, which can be
    used as a marker for 32-bit PC-relative branches.
    
    To compile Linux kernel, this patch:
    
    From: "H.J. Lu" <hjl.tools@gmail.com>
    Subject: [PATCH] x86: Treat R_X86_64_PLT32 as R_X86_64_PC32
    
    On i386, there are 2 types of PLTs, PIC and non-PIC.  PIE and shared
    objects must use PIC PLT.  To use PIC PLT, you need to load
    _GLOBAL_OFFSET_TABLE_ into EBX first.  There is no need for that on
    x86-64 since x86-64 uses PC-relative PLT.

and got

FAIL: visibility (hidden_normal) (non PIC)
FAIL: visibility (hidden_normal) (non PIC, load offset)
FAIL: visibility (hidden_normal) (PIC main, non PIC so)
FAIL: visibility (hidden_weak) (non PIC)
FAIL: visibility (hidden_weak) (non PIC, load offset)
FAIL: visibility (hidden_weak) (PIC main, non PIC so)
FAIL: visibility (protected) (non PIC)
FAIL: visibility (protected) (non PIC, load offset)
FAIL: visibility (protected) (PIC main, non PIC so)
FAIL: visibility (protected_undef_def) (non PIC)
FAIL: visibility (protected_undef_def) (non PIC, load offset)
FAIL: visibility (protected_undef_def) (PIC main, non PIC so)
FAIL: visibility (protected_weak) (non PIC)
FAIL: visibility (protected_weak) (non PIC, load offset)
FAIL: visibility (protected_weak) (PIC main, non PIC so)
FAIL: visibility (normal) (non PIC)
FAIL: visibility (normal) (non PIC, load offset)
FAIL: visibility (normal) (PIC main, non PIC so)
FAIL: ld-i386/pr19636-2a
FAIL: ld-i386/pr19636-2b
FAIL: ld-i386/pr19636-2c
FAIL: ld-i386/pr19636-2d
FAIL: ld-i386/pr19636-2e
FAIL: ld-i386/pr20515
FAIL: shared (non PIC)
FAIL: shared (non PIC, load offset)
FAIL: shared (PIC main, non PIC so)

in i386 linker tests.
Comment 7 Fangrui Song 2021-01-11 02:26:03 UTC
Applied your R_386_PLT32 patch.

# of unexpected failures        6

make -C Debug check-ld RUNTESTFLAGS=ld-shared/shared.exp # passed for me.

ld/testsuite/ld-i386/pr20515.d is an expected failure due to no-longer-relevant diagnostic. It can be repaired by using .reloc .-4, R_386_PC32, foo-4

#error: unsupported non-PIC call to IFUNC `foo'

For
ld/testsuite/ld-i386/pr19636-2a.d (I think bcd are similar)
both gold and LLD export undefined weak foo in .dynsym, regardless of R_386_PC32/R_386_PLT32.
GNU ld exports undefined weak foo for R_386_PC32 but not for R_386_PLT32.
I think there is some simplification which can be made.

Branch relocations to undefined weak symbols have varying behaviors across architectures (e.g. some resolve it to the current pc, some resolve it to link-time 0, ppc32 might resolve it to the beginning of text segment/section (I forgot the details but it is strange)). In addition, I don't think users understand or use dynamic-undefined-weak in practice, so altering the 2017 changed behavior should not be a problem...
Comment 8 H.J. Lu 2021-01-11 02:45:11 UTC
(In reply to Fangrui Song from comment #7)
> Applied your R_386_PLT32 patch.
> 
> # of unexpected failures        6
> 
> make -C Debug check-ld RUNTESTFLAGS=ld-shared/shared.exp # passed for me.
> 

You need to build i386 native linker to see all failures.  I used:

CC="/usr/gcc-10.1.1-32bit/bin/gcc -m32 -fcf-protection" CXX="/usr/gcc-10.1.1-32bit/bin/g++ -m32 -fcf-protection" /export/gnu/import/git/sources/binutils-gdb/configure \
	 \
	 i686-linux \
	--enable-plugins --disable-gdb --disable-gdbserver --disable-libdecnumber --disable-readline --disable-sim --with-sysroot=/ --with-system-zlib \
	--prefix=/usr/local \
	--with-local-prefix=/usr/local

to configure binutils on Linux/x86-64.  /usr/gcc-10.1.1-32bit/bin/gcc is
32bit GCC.
Comment 9 H.J. Lu 2021-01-13 00:00:55 UTC
ld uses R_386_PC32 to tell if call site supports PIC PLT.  Calling an IFUNC
function in static PIE requires PLT. If call site doesn't support PIC PLT,
linker will issue an error:

https://sourceware.org/bugzilla/show_bug.cgi?id=20515
Comment 10 Fangrui Song 2021-01-13 03:11:00 UTC
(In reply to H.J. Lu from comment #9)
> ld uses R_386_PC32 to tell if call site supports PIC PLT.  Calling an IFUNC
> function in static PIE requires PLT. If call site doesn't support PIC PLT,
> linker will issue an error:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=20515

Let me rephrase what PR20515 is about:

For a call to a hidden function declaration, the compiler produces an R_386_PC32 relocation. The relocation is an indicator that EBX may not be set up.

If the declaration refers to an ifunc definition, the linker will resolve the R_386_PC32 to an IPLT entry. For -pie and -shared links, the IPLT entry references EBX. If the call site does not set up EBX, the IPLT entry call will be incorrect.

The resolution to PR20515 has implemented the diagnostic. If we change the compiler/assembler to use R_386_PLT32 for non-default visibility function declarations, this diagnostic will be lost.

So unfortunately we cannot find a satisfactory relocation type for branches to undefined symbols:

* R_386_PC32: canonical PLT entries (similar to copy relocations) which may break -Bsymbolic or --dynamic-list usage.
* R_386_PLT32: lose a diagnostic for non-default ifunc in -pie/-shared modules.
Comment 11 Fangrui Song 2021-01-13 03:15:00 UTC
I agree that the assembler needs a notation to differentiate R_386_PC32/R_386_PLT32 branches. So perhaps this should be implemented in GCC instead: for a default visibility function declaration, emit `call/jmp foo@plt` instead of `call/jmp foo`.

This does not degrade the ld diagnostics for non-default visibility ifunc.
Comment 12 H.J. Lu 2021-01-13 12:43:59 UTC
i386 is legacy.  Let's leave it alone.
Comment 13 H.J. Lu 2021-01-13 18:41:29 UTC
These native i386 tests:

FAIL: visibility (hidden_normal) (non PIC)
FAIL: visibility (hidden_normal) (non PIC, load offset)
FAIL: visibility (hidden_normal) (PIC main, non PIC so)
FAIL: visibility (hidden_weak) (non PIC)
FAIL: visibility (hidden_weak) (non PIC, load offset)
FAIL: visibility (hidden_weak) (PIC main, non PIC so)
FAIL: visibility (protected) (non PIC)
FAIL: visibility (protected) (non PIC, load offset)
FAIL: visibility (protected) (PIC main, non PIC so)
FAIL: visibility (protected_undef_def) (non PIC)
FAIL: visibility (protected_undef_def) (non PIC, load offset)
FAIL: visibility (protected_undef_def) (PIC main, non PIC so)
FAIL: visibility (protected_weak) (non PIC)
FAIL: visibility (protected_weak) (non PIC, load offset)
FAIL: visibility (protected_weak) (PIC main, non PIC so)
FAIL: visibility (normal) (non PIC)
FAIL: visibility (normal) (non PIC, load offset)
FAIL: visibility (normal) (PIC main, non PIC so)
FAIL: shared (non PIC)
FAIL: shared (non PIC, load offset)
FAIL: shared (PIC main, non PIC so)

track the R_386_PLT32 vs R_386_PC32 issue.
Comment 14 H.J. Lu 2022-06-08 20:06:41 UTC
Just to be clear. Some i386 shared libraries are compiled without
-fPIC on purpose to improve performance.  When ld sees R_386_PC32
of an undefined symbol in a shared library, it creates a dynamic
R_386_PC32 relocation in the .text section.  Replace R_386_PC32
with R_386_PLT32 will break this.