Bug 28021 - riscv: wrong double relaxation with LTO
Summary: riscv: wrong double relaxation with LTO
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.37
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-28 15:55 UTC by Michael Matz
Modified: 2021-07-07 16:07 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Matz 2021-06-28 15:55:35 UTC
This is related to the fix for PR22756, which turns out to be incomplete.
We have seen the problem with GCC11 and it needs LTO, which makes this a bit
involved to reproduce.  See https://bugzilla.suse.com/show_bug.cgi?id=1187542
for how it came to be, but it contains unrelated things.  Eventually I came up
with a single file testcase without compiler, but that involved a hex-edited
.o file.  After much back and forth I came up with a two-file testcase that only involves assembler, no compiler, and that I believe faithfully reflects the error
mode.  So, the files:

% cat relax-twice-1.s
        .file   "<artificial>"
        .option pic
        .text
        # this align is here so that the .text section starts at 0x1000,
        # in order to make matching of testcase results easier
        .align 12
        .globl  foobar_new
        .weak   foobar_new
        .type   foobar_new, @function
foobar_new:
        jr ra
        .size   foobar_new, .-foobar_new
        .symver foobar_new, foobar@@New
        .section        .note.GNU-stack,"",@progbits

% cat relax-twice-2.s
        .file   "<artificial>"
        .option pic
        .text
        .section        .rodata.str1.8,"aMS",@progbits,1
        .align  3
.LC0:
        .string "%u"
        .text
        .align  1
        .globl  relaxme
        .type   relaxme, @function
relaxme:
        addi    sp,sp,-32
        addi    a2,sp,12
        lla     a1,.LC0
        li      a0,0
        sd      ra,24(sp)
        call    sscanf@plt
        ld      ra,24(sp)
        addi    sp,sp,32
        jr      ra
        .size   relaxme, .-relaxme
        .align  1
        .globl  foobar_new
        .type   foobar_new, @function
foobar_new:
        li      a0,1
        ret
        .size   foobar_new, .-foobar_new
        .symver foobar_new, foobar@@New
        .align  1
        .globl  foobar_old
        .type   foobar_old, @function
foobar_old:
        addi    sp,sp,-16
        sd      ra,8(sp)
        call    foobar@plt
        ld      ra,8(sp)
        snez    a0,a0
        addi    sp,sp,16
        jr      ra
        .size   foobar_old, .-foobar_old
        .symver foobar_old, foobar@Old
        .section        .note.GNU-stack,"",@progbits

% cat relax-twice.ver
Old {
        global:
                foobar;
                relaxme;
        local:
                *;
};
New {
        global:
                foobar;
} Old;

% ../as-new -o relax-twice-1.o relax-twice-1.s
% ../as-new -o relax-twice-2.o relax-twice-2.s
% ../ld-new -o bla.so --relax -shared --version-script relax-twice.ver relax-twice-1.o relax-twice-2.o
% nm bla.so
...
000000000000102c t foobar_new
0000000000001028 T foobar@@New
...

Note how these two symbols are supposed to point to the same address.  (The stunt with the weak variant of foobar_new and foobar@@New in relax-twice-1.s
is only there to prime the internal symbol table of ld in the same way as
it would have been with LTO and the linker plugin.  The prevailing variants
are the ones from relax-twice-2.s)

The reason is the one already analyzed in PR22756: multiple entries in
sym_hashes[] pointing to the same symbol entry, such that the relaxation code
adjusts the same entry multiple times.  That is the situation always created
by aliases, and the PR22756 patch fixes it for hidden aliases (created by 
non-default symbol versions, i.e. foobar@BLA).  Of course the same can happen
with non-hidden aliases, as above demonstrates.

A definitely working fix would be to always check for duplicates, but a slowdown
for that can then only be avoided by extending the elf symbol hash structure
to contain an already-handled flag.  The more immediate, but possibly also
still incomplete fix is simply:

--- a/bfd/elfnn-riscv.c
+++ b/bfd/elfnn-riscv.c
@@ -3993,7 +3993,7 @@ riscv_relax_delete_bytes (bfd *abfd, asection *sec, bfd_vma addr, size_t count,
         foo becomes an alias for foo@BAR, and hence they need the same
         treatment.  */
       if (link_info->wrap_hash != NULL
-         || sym_hash->versioned == versioned_hidden)
+         || sym_hash->versioned != unversioned)
        {
          struct elf_link_hash_entry **cur_sym_hashes;
 

I will note that the problem exists for all targets that implement relaxation
that can also delete bytes from sections (it seems to all have been copied
around for decades), at least MIPS and AVR seem somewhat relevant still, but it's also m10[23]00, cr16, h8300, ip2k, m32c, m68hc11, msp430, rl78, rx and
v850.  Only riscv goes to an effort to remidy the situation with symbol aliases.
Comment 1 Andreas Schwab 2021-06-29 16:51:54 UTC
I'd guess the other archs never see that issue since versioned aliases are a rare thing in the embedded world.
Comment 2 Sourceware Commits 2021-07-06 13:49:16 UTC
The master branch has been updated by Michael Matz <matz@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=235f5ef4a6b8fbdcfaea8b629f7c6a9792a789de

commit 235f5ef4a6b8fbdcfaea8b629f7c6a9792a789de
Author: Michael Matz <matz@suse.de>
Date:   Mon Jun 28 17:57:17 2021 +0200

    elf/riscv: Fix relaxation with aliases [PR28021]
    
    the fix for PR22756 only changed behaviour for hidden aliases,
    but the same situation exists for non-hidden aliases: sym_hashes[]
    can contain multiple entries pointing to the same symbol structure
    leading to relaxation adjustment to be applied twice.
    
    Fix this by testing for duplicates for everything that looks like it
    has a version.
    
    PR ld/28021
    
    bfd/
            * elfnn-riscv.c (riscv_relax_delete_bytes): Check for any
            versioning.
    
    ld/
            * testsuite/ld-riscv-elf/relax-twice.ver: New.
            * testsuite/ld-riscv-elf/relax-twice-1.s: New.
            * testsuite/ld-riscv-elf/relax-twice-2.s: New.
            * testsuite/ld-riscv-elf/ld-riscv-elf.exp
            (run_relax_twice_test): New, and call it.
Comment 3 Sourceware Commits 2021-07-07 16:06:05 UTC
The binutils-2_37-branch branch has been updated by Michael Matz <matz@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=7764dd1f5fdc9a4d6d6233b35bbb14ab161df299

commit 7764dd1f5fdc9a4d6d6233b35bbb14ab161df299
Author: Michael Matz <matz@suse.de>
Date:   Mon Jun 28 17:57:17 2021 +0200

    elf/riscv: Fix relaxation with aliases [PR28021]
    
    the fix for PR22756 only changed behaviour for hidden aliases,
    but the same situation exists for non-hidden aliases: sym_hashes[]
    can contain multiple entries pointing to the same symbol structure
    leading to relaxation adjustment to be applied twice.
    
    Fix this by testing for duplicates for everything that looks like it
    has a version.
    
    PR ld/28021
    
    bfd/
            * elfnn-riscv.c (riscv_relax_delete_bytes): Check for any
            versioning.
    
    ld/
            * testsuite/ld-riscv-elf/relax-twice.ver: New.
            * testsuite/ld-riscv-elf/relax-twice-1.s: New.
            * testsuite/ld-riscv-elf/relax-twice-2.s: New.
            * testsuite/ld-riscv-elf/ld-riscv-elf.exp
            (run_relax_twice_test): New, and call it.
    
    (cherry picked from commit 235f5ef4a6b8fbdcfaea8b629f7c6a9792a789de)
Comment 4 Michael Matz 2021-07-07 16:07:27 UTC
Fixed in master and 2.37.  Not planning further backports.