Bug 31615 - Hang when linking vorbis-tools
Summary: Hang when linking vorbis-tools
Status: NEW
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.43 (HEAD)
: P2 normal
Target Milestone: 2.43
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-04-06 15:33 UTC by Sam James
Modified: 2024-04-08 12:20 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
A testcase (764 bytes, application/octet-stream)
2024-04-07 15:15 UTC, H.J. Lu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sam James 2024-04-06 15:33:48 UTC
I noticed that building vorbis-tools-1.4.2 hangs and took > 10 minutes before I killed it.

During the build, the hanging command is:
```
x86_64-pc-linux-gnu-gcc -O2 -Wall -ffast-math -fsigned-char -O3 -march=native -mtls-dialect=gnu2 -flto=jobserver -fno-semantic-interposition -pipe -fcf-protection=none -fdiagnostics-color=always -fdiagnostics-urls=never -frecord-gcc-switches -Wa,-O2 -Wa,-mtune=znver2 -Wstrict-aliasing -Wfree-nonheap-object -Werror=lto-type-mismatch -Werror=strict-aliasing -Werror=odr -Wstrict-aliasing -Wfree-nonheap-object -Werror=lto-type-mismatch -Werror=strict-aliasing -Werror=odr -Wbuiltin-declaration-mismatch -ggdb3 -Wformat -Wformat-security -Waddress -Warray-bounds -Wfree-nonheap-object -Wint-to-pointer-cast -Wmain -Wnonnull -Wodr -Wreturn-type -Wsizeof-pointer-memaccess -Wstrict-aliasing -Wstring-compare -Wuninitialized -Wvarargs -Wl,-O1 -Wl,-z -Wl,pack-relative-relocs -flto=jobserver -Wl,--defsym=__gentoo_check_ldflags__=0 -ggdb3 -o oggenc flac.o easyflac.o oggenc.o audio.o encode.o platform.o resample.o skeleton.o  -Wl,--as-needed ../share/libutf8.a ../share/libgetopt.a -lvorbisenc -lvorbis -lFLAC -logg -lm
```

Reduced is:
```
gcc -o oggenc flac.o easyflac.o oggenc.o audio.o encode.o platform.o resample.o skeleton.o  -Wl,--as-needed libutf8.a libgetopt.a libvorbisenc.so.2.0.12 libvorbis.so.0.4.9 libFLAC.so.12.1.0 libogg.so.0.8.5 -lm
```

* Appending -fuse-ld=mold makes it complete (I used mold as I used LTO so needed the plugin support which lld didn't have here.)
* Dropping -Wl,--as-needed makes it complete.

gdb says:
```
0x00007fce71af2674 in _bfd_generic_link_add_one_symbol (info=0x5630c7418ea0 <link_info>, abfd=0x5630c86a12e0, name=0x5630c9014ef8 "ldexp", flags=<optimized out>,
    section=0x7fce72114388 <_bfd_std_section+840>, value=0, string=0x5630c9014ee0 "ldexp@@GLIBC_2.2.5", copy=false, collect=false, hashp=0x7ffdfb92ea80)
    at /usr/src/debug/sys-devel/binutils-9999/binutils/bfd/linker.c:1477
1477          action = link_action[(int) row][prev];
(gdb) bt
#0  0x00007fce71af2674 in _bfd_generic_link_add_one_symbol (info=0x5630c7418ea0 <link_info>, abfd=0x5630c86a12e0, name=0x5630c9014ef8 "ldexp", flags=<optimized out>,
    section=0x7fce72114388 <_bfd_std_section+840>, value=0, string=0x5630c9014ee0 "ldexp@@GLIBC_2.2.5", copy=false, collect=false, hashp=0x7ffdfb92ea80)
    at /usr/src/debug/sys-devel/binutils-9999/binutils/bfd/linker.c:1477
#1  0x00007fce71b3bfe2 in _bfd_elf_add_default_symbol (abfd=<optimized out>, info=<optimized out>, h=0x5630c88fcc30, name=<optimized out>, sym=0x5630c8e01400,
    sec=0x7fce72114158 <_bfd_std_section+280>, value=<optimized out>, poldbfd=0x7ffdfb92ea68, dynsym=<synthetic pointer>)
    at /usr/src/debug/sys-devel/binutils-9999/binutils/bfd/elflink.c:2023
#2  elf_link_add_object_symbols (abfd=<optimized out>, info=<optimized out>) at /usr/src/debug/sys-devel/binutils-9999/binutils/bfd/elflink.c:5419
#3  bfd_elf_link_add_symbols (abfd=<optimized out>, info=<optimized out>) at /usr/src/debug/sys-devel/binutils-9999/binutils/bfd/elflink.c:6339
#4  0x00005630c72b5f9a in load_symbols (entry=entry@entry=0x5630c84d8370, place=place@entry=0x7ffdfb92eb70) at /usr/src/debug/sys-devel/binutils-9999/binutils/ld/ldlang.c:3129
#5  0x00005630c72a89a0 in open_input_bfds (s=0x5630c84d8370, os=<optimized out>, mode=(OPEN_BFD_FORCE | OPEN_BFD_RESCAN))
    at /usr/src/debug/sys-devel/binutils-9999/binutils/ld/ldlang.c:3621
#6  0x00005630c72a8941 in open_input_bfds (s=0x5630c84d8350, os=<optimized out>, mode=OPEN_BFD_RESCAN) at /usr/src/debug/sys-devel/binutils-9999/binutils/ld/ldlang.c:3569
#7  0x00005630c72a7932 in lang_process () at /usr/src/debug/sys-devel/binutils-9999/binutils/ld/ldlang.c:8248
#8  0x00005630c72afcb8 in main (argc=63, argv=<optimized out>) at /usr/src/debug/sys-devel/binutils-9999/binutils/ld/ldmain.c:511
```

perf but perhaps not so insightful ;) says:
``
   100.00%  ld       libbfd-2.42.50.20240406.gentoo-sys-devel-binutils-mt.so  [.] _bfd_generic_link_add_one_symbol
     0.00%  ld       [kernel.kallsyms]                                        [k] stackleak_erase
     0.00%  ld       [kernel.kallsyms]                                        [k] arch_perf_update_userpage
     0.00%  ld       [kernel.kallsyms]                                        [k] perf_ibs_handle_irq
     0.00%  ld       [kernel.kallsyms]                                        [k] perf_ibs_start
```

```
$ ld --version | head -1
GNU ld (Gentoo 9999 p1) 2.42.50.20240406
```

I can try reduce it to C if needed but it will take more time. I have attached a reproducible tarball with object files. Tell me if need to do anything else. Thanks.
Comment 1 Sam James 2024-04-06 15:35:00 UTC
Tarball of objects: PR31615.tar.xz">https://dev.gentoo.org/~sam/bugs/sourceware/31615/binutils-PR31615.tar.xz.
Comment 2 Sam James 2024-04-06 15:35:44 UTC
My guess is the recent LTO rescan fixes but I did not bisect yet.
Comment 3 Sam James 2024-04-06 15:36:34 UTC
Given LTO is involved, you may need matching GCC version: gcc version 14.0.1 20240405 (experimental) 4b02dd48f531ea88587edd2b75b6e5243b4389e8 (Gentoo Hardened 14.0.9999 p, commit 7bbfb01a32b73842f8908de028703510a0e12057).
Comment 4 H.J. Lu 2024-04-06 16:07:49 UTC
It is caused by

commit e7e05a9dd0c93038fdd5ed1904ca660e52beabdc
Author: Alan Modra <amodra@gmail.com>
Date:   Sat Apr 6 15:49:44 2024 +1030

    Don't have first_hash entries of strings that can be freed.
Comment 5 H.J. Lu 2024-04-06 16:48:17 UTC
#0  0x000000000043c171 in _bfd_generic_link_add_one_symbol (
    info=0x61ef80 <link_info>, abfd=0x7f7d80, name=0xeb69a8 "ldexp", 
    flags=<optimized out>, section=0x61e008 <_bfd_std_section+840>, value=0, 
    string=0xeb6990 "ldexp@@GLIBC_2.2.5", copy=false, collect=false, 
    hashp=0x7fffffffc770)

became an infinite loop.
Comment 6 Alan Modra 2024-04-06 22:58:03 UTC
(In reply to H.J. Lu from comment #4)
> It is caused by
> 
> commit e7e05a9dd0c93038fdd5ed1904ca660e52beabdc
Not possible.  The real cause must be something else.  At most, this patch exposed another bug that was hidden due to first_hash entries pointing to freed memory.
Comment 7 Alan Modra 2024-04-07 04:54:57 UTC
/* gcc -O3 -flto -c xxx.c; gcc -o xxx xxx.o -lm */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

int
main (int argc, char **argv)
{
  double x = 1.0;
  long y = 0;
  if (--argc > 0)
    x = strtod (*++argv, NULL);
  if (--argc > 0)
    y = strtol (*++argv, NULL, 0);
  printf ("%f\n", ldexp (x, y));
  return 0;
}

The above simpler testcase shows the problem.  Prior to my commit e7e05a9dd0, lookup of "ldexp@@GLIBC_2.2.5" (which I verified is put in first_hash) returns NULL due to the string pointing into freed and reused memory.

 0x00005555558a66b3 in bfd_hash_lookup (table=0x555555ffe5f0, string=0x5555563f72c8 "ldexp@@GLIBC_2.2.5", create=create@entry=false, copy=copy@entry=false) at /home/alan/src/binutils-gdb/bfd/hash.c:564
564		  && strcmp (hashp->string, string) == 0)
(gdb) p *hashp
$5 = {next = 0x0, string = 0x5555561113f0 "", hash = 337613448}

This results in no reload of libm.so.6.  If libm.so.6 is reloaded (post e7e05a9dd0) ldexp is already an indirect symbol to a defweak versioned sym from libc.so.6, and attempting to mash in the duplicated libm.so ldexp results in a u.i.link loop back to itself.  We don't normally try to override symbols like that, but you could argue it is a long-standing bug in _bfd_generic_link_add_one_symbol that case is not handled gracefully.
Comment 8 H.J. Lu 2024-04-07 13:57:31 UTC
(In reply to Alan Modra from comment #7)
> /* gcc -O3 -flto -c xxx.c; gcc -o xxx xxx.o -lm */
> #include <stdio.h>
> #include <stdlib.h>
> #include <math.h>
> 
> int
> main (int argc, char **argv)
> {
>   double x = 1.0;
>   long y = 0;
>   if (--argc > 0)
>     x = strtod (*++argv, NULL);
>   if (--argc > 0)
>     y = strtol (*++argv, NULL, 0);
>   printf ("%f\n", ldexp (x, y));
>   return 0;
> }
> 
> The above simpler testcase shows the problem.  Prior to my commit

I can't reproduce it with GCC 13 nor GCC 14 using this testcase.
Comment 9 Sam James 2024-04-07 14:07:48 UTC
H.J, try with `gcc -O3 -flto -c foo.c ; gcc -o foo -Wl,--as-needed foo.o -lm`. I needed that to repro.
Comment 10 H.J. Lu 2024-04-07 14:09:06 UTC
(In reply to Alan Modra from comment #7)
> This results in no reload of libm.so.6.  If libm.so.6 is reloaded (post
> e7e05a9dd0) ldexp is already an indirect symbol to a defweak versioned sym
> from libc.so.6, and attempting to mash in the duplicated libm.so ldexp
> results in a u.i.link loop back to itself.  We don't normally try to
> override symbols like that, but you could argue it is a long-standing bug in
> _bfd_generic_link_add_one_symbol that case is not handled gracefully.

Before my commit, ldexp@@GLIBC_2.2.5 in libc.so.6 is used even when it is
also defined in libm.so.6.  After my commit, ldexp@@GLIBC_2.2.5 in libm.so.6
is used.  However, info->hash->undefs_tail == &h->root isn't handled properly.
Comment 11 H.J. Lu 2024-04-07 14:10:29 UTC
(In reply to Sam James from comment #9)
> H.J, try with `gcc -O3 -flto -c foo.c ; gcc -o foo -Wl,--as-needed foo.o
> -lm`. I needed that to repro.

Yes, it works.
Comment 12 H.J. Lu 2024-04-07 14:19:29 UTC
(In reply to H.J. Lu from comment #10)
> Before my commit, ldexp@@GLIBC_2.2.5 in libc.so.6 is used even when it is
> also defined in libm.so.6.  After my commit, ldexp@@GLIBC_2.2.5 in libm.so.6
> is used.  However, info->hash->undefs_tail == &h->root isn't handled
> properly.

We need to handle prior indirection to properly override ldexp@@GLIBC_2.2.5 in
libc.so.6.
Comment 13 H.J. Lu 2024-04-07 15:15:15 UTC
Created attachment 15452 [details]
A testcase
Comment 14 Sourceware Commits 2024-04-08 02:01:59 UTC
The master branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=248b6326a49ed49e2f627d3bddbac514a074bac0

commit 248b6326a49ed49e2f627d3bddbac514a074bac0
Author: Alan Modra <amodra@gmail.com>
Date:   Mon Apr 8 08:16:20 2024 +0930

    Re: PR26978, Inconsistency for strong foo@v1 and weak foo@@v1
    
    Commit 726d7d1ecf opened a hole that allowed a u.i.link loop to be
    created, resulting in _bfd_generic_link_add_one_symbol never
    returning.  Fix that.  Note that the MIND case handles two types of
    redefinition.  For a new indirect symbol we'll have string non-NULL.
    For a new def, string will be NULL.  So moving the string comparison
    earlier would work.  However, we've already looked up inh in the first
    case so can dispense with name comparisons.  Either way, for a new def
    we'll get to the defweak test and possibly cycle.  Which is what we
    want here.
    
            PR 31615
            PR 26978
            * linker.c (_bfd_generic_link_add_one_symbol <MIND>): Test for
            exactly matching indirect symbols before cycling on a defweak.
Comment 15 Sourceware Commits 2024-04-08 09:18:39 UTC
The binutils-2_42-branch branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=6224493e457e72b11818c87cdc112bdb0fee5f81

commit 6224493e457e72b11818c87cdc112bdb0fee5f81
Author: Alan Modra <amodra@gmail.com>
Date:   Mon Apr 8 08:16:20 2024 +0930

    Re: PR26978, Inconsistency for strong foo@v1 and weak foo@@v1
    
    Commit 726d7d1ecf opened a hole that allowed a u.i.link loop to be
    created, resulting in _bfd_generic_link_add_one_symbol never
    returning.  Fix that.  Note that the MIND case handles two types of
    redefinition.  For a new indirect symbol we'll have string non-NULL.
    For a new def, string will be NULL.  So moving the string comparison
    earlier would work.  However, we've already looked up inh in the first
    case so can dispense with name comparisons.  Either way, for a new def
    we'll get to the defweak test and possibly cycle.  Which is what we
    want here.
    
            PR 31615
            PR 26978
            * linker.c (_bfd_generic_link_add_one_symbol <MIND>): Test for
            exactly matching indirect symbols before cycling on a defweak.
    
    (cherry picked from commit 248b6326a49ed49e2f627d3bddbac514a074bac0)
Comment 16 Sourceware Commits 2024-04-08 12:20:56 UTC
The master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d05e1a4a6d438c11af5a2b9b0ac88a74727b5f0f

commit d05e1a4a6d438c11af5a2b9b0ac88a74727b5f0f
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Apr 7 19:52:49 2024 -0700

    ld: Add PR ld/31615 tests
    
    PR ld/31615
            * testsuite/ld-plugin/lto.exp: Run ld/31615 tests.
            * testsuite/ld-plugin/pr31615.ver: New file.
            * testsuite/ld-plugin/pr31615a.c: Likewise.
            * testsuite/ld-plugin/pr31615b.c: Likewise.
            * testsuite/ld-plugin/pr31615c.c: Likewise.
            * testsuite/ld-plugin/pr31615d.c: Likewise.