Bug 23607 - gold linker --threads --thread-count,2 causes ld segmentation fault
Summary: gold linker --threads --thread-count,2 causes ld segmentation fault
Status: RESOLVED DUPLICATE of bug 26200
Alias: None
Product: binutils
Classification: Unclassified
Component: gold (show other bugs)
Version: 2.31
: P2 normal
Target Milestone: ---
Assignee: Cary Coutant
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-05 15:26 UTC by Alexandru Dascalu
Modified: 2020-11-25 00:05 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexandru Dascalu 2018-09-05 15:26:46 UTC
The gold linker segfault at linking if it uses multiple threads.

The scenario:
  1. I have multiple source files which are compiled with gcc 8.2 with some optimization flags
  2. I use cmake to generate the makefile.
  3. The linker crashes with segfault:  

```  
collect2: fatal error: ld terminated with signal 11 [Segmentation fault], core dumped
compilation terminated.  
```  

  I have obtained the crash on centos 7.5, ubuntu and debian 9.5.



Test case here:
 https://github.com/alexandrudsc/gold-linker-threads-segfault  

There is a docker file, based on which you can re-create the environment I used. 

For a simple test just run
```
bash build-docker-image.sh 
```  
this will create the docker image, it compile the code and will try to link.  


But you can start your own container if you want to run this multiple times.
Comment 1 Alexandru Dascalu 2018-09-05 15:34:55 UTC
I forgot to mention that on centos and ubuntu I have compiled binutils from sources.
Comment 2 Alexandru Dascalu 2019-03-01 07:59:39 UTC
Hi guys, I have reproduced the same bug using gold linker from binutils 2.32. I don't know if there are any news on this, but this is a gentle reminder
Comment 3 Richard 2020-06-03 21:45:48 UTC
I see a segmentation fault trying to link a very simple program using `--threads --thread-count 2`.

The full command is:
```
 "/usr/bin/ld.gold" -z relro --hash-style=gnu --build-id --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o test_1.exe /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/crt1.o /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/crti.o /usr/bin/../lib/gcc/x86_64-linux-gnu/9/crtbegin.o -L/usr/bin/../lib/gcc/x86_64-linux-gnu/9 -L/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu -L/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib64 -L/usr/lib/x86_64-linux-gnu/../../lib64 -L/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../.. -L/usr/lib/llvm-10/bin/../lib -L/lib -L/usr/lib -plugin /usr/lib/llvm-10/bin/../lib/LLVMgold.so -plugin-opt=mcpu=x86-64 -plugin-opt=thinlto hilbert.o --threads --thread-count 2 -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/bin/../lib/gcc/x86_64-linux-gnu/9/crtend.o /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/crtn.o
```

This is with `GNU gold (GNU Binutils for Ubuntu 2.34) 1.16` on Lubuntu 20.04.
Comment 4 Richard 2020-06-03 21:47:17 UTC
A traceback with `gdb` gives:

```
#0  0x00007ffff3879f40 in llvm::PMTopLevelManager::addImmutablePass(llvm::ImmutablePass*) ()
   from /usr/lib/llvm-10/bin/../lib/../lib/libLLVM-10.so.1
#1  0x00007ffff387991f in llvm::PMTopLevelManager::schedulePass(llvm::Pass*) () from /usr/lib/llvm-10/bin/../lib/../lib/libLLVM-10.so.1
#2  0x00007ffff469a58e in ?? () from /usr/lib/llvm-10/bin/../lib/../lib/libLLVM-10.so.1
#3  0x00007ffff469989c in llvm::lto::backend(llvm::lto::Config const&, std::function<std::unique_ptr<llvm::lto::NativeObjectStream, std::default_delete<llvm::lto::NativeObjectStream> > (unsigned int)>, unsigned int, std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, llvm::ModuleSummaryIndex&) () from /usr/lib/llvm-10/bin/../lib/../lib/libLLVM-10.so.1
#4  0x00007ffff4693434 in llvm::lto::LTO::runRegularLTO(std::function<std::unique_ptr<llvm::lto::NativeObjectStream, std::default_delete<llvm::lto::NativeObjectStream> > (unsigned int)>) () from /usr/lib/llvm-10/bin/../lib/../lib/libLLVM-10.so.1
#5  0x00007ffff4692f22 in llvm::lto::LTO::run(std::function<std::unique_ptr<llvm::lto::NativeObjectStream, std::default_delete<llvm::lto::NativeObjectStream> > (unsigned int)>, std::function<std::function<std::unique_ptr<llvm::lto::NativeObjectStream, std::default_delete<llvm::lto::NativeObjectStream> > (unsigned int)> (unsigned int, llvm::StringRef)>) ()
   from /usr/lib/llvm-10/bin/../lib/../lib/libLLVM-10.so.1
#6  0x00007ffff749f4ac in ?? () from /usr/lib/llvm-10/bin/../lib/LLVMgold.so
#7  0x000055555567c27f in ?? ()
#8  0x000055555567c3d0 in ?? ()
#9  0x00005555556c1b58 in ?? ()
#10 0x00005555556c1daa in ?? ()
#11 0x0000555555591ee5 in ?? ()
#12 0x00007ffff7ba20b3 in __libc_start_main (main=0x5555555918d0, argc=46, argv=0x7fffffffd458, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd448) at ../csu/libc-start.c:308
#13 0x0000555555592ace in ?? ()
```
Comment 5 Richard 2020-06-03 22:08:23 UTC
I've made a repo with a Makefile that reproduces the bug here: https://github.com/r-barnes/gold_segfault_reproducer

The program I'm trying to compile is a simple "Hello, World!".

The compilation string is:

clang main.c -flto -fuse-ld=gold -Wl,--threads -Wl,--thread-count,4 -v --save-temps

I observe the segfault with both clang-8 and clang-10 using `GNU gold (GNU Binutils for Ubuntu 2.34) 1.16` running on Lubuntu 20.04.

The segfault is intermittent: not every run will trigger the segfault.

Removing `-flto` seems to stop the segfault.

Using `-Wl,--thread-count,1` stops the segfault.
Comment 6 Richard 2020-06-03 22:08:34 UTC
I've made a repo with a Makefile that reproduces the bug here: https://github.com/r-barnes/gold_segfault_reproducer

The program I'm trying to compile is a simple "Hello, World!".

The compilation string is:

clang main.c -flto -fuse-ld=gold -Wl,--threads -Wl,--thread-count,4 -v --save-temps

I observe the segfault with both clang-8 and clang-10 using `GNU gold (GNU Binutils for Ubuntu 2.34) 1.16` running on Lubuntu 20.04.

The segfault is intermittent: not every run will trigger the segfault.

Removing `-flto` seems to stop the segfault.

Using `-Wl,--thread-count,1` stops the segfault.
Comment 7 Sergei Trofimovich 2020-11-24 22:38:04 UTC
For me the simplest reproducer is the following one-liner:

"""
$ echo 'int main() {}' | x86_64-pc-linux-gnu-gcc -flto -fuse-ld=gold -Wl,--threads -Wl,--thread-count,32 -x c -
collect2: fatal error: ld terminated with signal 11 [Segmentation fault], core dumped
compilation terminated.
"""

(gcc-master, binutils-2.35.1, x86_64-pc-linux-gnu target)

binutils backtrace:

"""
(gdb) bt
#0  gold::Pluginobj::get_symbol_resolution_info (this=0x7fdc10001010, symtab=0x7ffe9622ef50, nsyms=<optimized out>, syms=<optimized out>, version=<optimized out>)
    at ../../binutils-2.35.1/gold/plugin.cc:1293
#1  0x00007fdc94747c7a in write_resolution () at /usr/src/debug/sys-devel/gcc-11.0.0_pre9999/gcc-11.0.0_pre9999/lto-plugin/lto-plugin.c:569
#2  all_symbols_read_handler () at /usr/src/debug/sys-devel/gcc-11.0.0_pre9999/gcc-11.0.0_pre9999/lto-plugin/lto-plugin.c:749
#3  0x000055e7fdf1004f in gold::Plugin::all_symbols_read (this=<optimized out>) at ../../binutils-2.35.1/gold/plugin.cc:403
#4  gold::Plugin_manager::all_symbols_read (this=0x55e7fe561360, workqueue=workqueue@entry=0x7ffe9622ec50, task=task@entry=0x55e7fe5bacc0, input_objects=<optimized out>,
    symtab=<optimized out>, dirpath=<optimized out>, mapfile=0x0, last_blocker=0x55e7fe5bad20) at ../../binutils-2.35.1/gold/plugin.cc:856
#5  0x000055e7fdf1018c in gold::Plugin_hook::run (this=0x55e7fe5bacc0, workqueue=0x7ffe9622ec50) at ../../binutils-2.35.1/gold/plugin.cc:1770
#6  0x000055e7fdf6ba70 in gold::Workqueue::find_and_run_task (this=0x7ffe9622ec50, thread_number=23) at ../../binutils-2.35.1/gold/workqueue.cc:319
#7  0x000055e7fdf6bcca in gold::Workqueue::process (this=0x7ffe9622ec50, thread_number=23) at ../../binutils-2.35.1/gold/workqueue.cc:495
#8  0x000055e7fdf6be23 in gold::Workqueue_threader_threadpool::process (thread_number=<optimized out>, this=<optimized out>) at ../../binutils-2.35.1/gold/workqueue-internal.h:92
#9  gold::Workqueue_thread::thread_body (arg=0x55e7fe5b97d0) at ../../binutils-2.35.1/gold/workqueue-threads.cc:117
#10 0x00007fdc9444be6e in start_thread (arg=0x7fdc3c132640) at pthread_create.c:463
#11 0x00007fdc94381a5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) info threads
  Id   Target Id                           Frame
* 1    Thread 0x7fdc3c132640 (LWP 1087079) gold::Pluginobj::get_symbol_resolution_info (this=0x7fdc10001010, symtab=0x7ffe9622ef50, nsyms=<optimized out>, syms=<optimized out>,
    version=<optimized out>) at ../../binutils-2.35.1/gold/plugin.cc:1293
  2    Thread 0x7fdc90147640 (LWP 1087058) futex_wait_cancelable (private=0, expected=0, futex_word=0x55e7fe567d24) at ../sysdeps/nptl/futex-internal.h:183
  ...
  32   Thread 0x7fdc2812d640 (LWP 1087084) futex_wait_cancelable (private=0, expected=0, futex_word=0x55e7fe567d20) at ../sysdeps/nptl/futex-internal.h:183
"""

valgrind says with unexpected access happens at the same location::

"""
==1087267== Thread 30:
==1087267== Invalid read of size 1
==1087267==    at 0x458800: gold::Pluginobj::get_symbol_resolution_info(gold::Symbol_table*, int, ld_plugin_symbol*, int) const (plugin.cc:1295)
==1087267==    by 0x484BC79: write_resolution (lto-plugin.c:569)
==1087267==    by 0x484BC79: all_symbols_read_handler (lto-plugin.c:749)
==1087267==    by 0x45704E: all_symbols_read (plugin.cc:403)
==1087267==    by 0x45704E: gold::Plugin_manager::all_symbols_read(gold::Workqueue*, gold::Task*, gold::Input_objects*, gold::Symbol_table*, gold::Dirsearch*, gold::Mapfile*, gold::Task_token**) (plugin.cc:856)
==1087267==    by 0x45718B: gold::Plugin_hook::run(gold::Workqueue*) (plugin.cc:1770)
==1087267==    by 0x4B2A6F: gold::Workqueue::find_and_run_task(int) (workqueue.cc:319)
==1087267==    by 0x4B2CC9: gold::Workqueue::process(int) (workqueue.cc:495)
==1087267==    by 0x4B2E22: process (workqueue-internal.h:92)
==1087267==    by 0x4B2E22: gold::Workqueue_thread::thread_body(void*) (workqueue-threads.cc:117)
==1087267==    by 0x4B42E6D: start_thread (pthread_create.c:463)
==1087267==    by 0x4C55A5E: clone (clone.S:95)
==1087267==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
==1087267==
==1087267==
==1087267== Process terminating with default action of signal 11 (SIGSEGV): dumping core
"""
Comment 8 Sergei Trofimovich 2020-11-24 23:50:50 UTC
binutils from master does not seem to crash on a simple test from #comment7. 

Bisected the fix down to https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=d4820dac5e7608e24fba6d08cde9248b4c4b2928

"""
$ git bisect bad
d4820dac5e7608e24fba6d08cde9248b4c4b2928 is the first bad commit
commit d4820dac5e7608e24fba6d08cde9248b4c4b2928
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Nov 8 04:10:01 2020 -0800

    gold: Avoid sharing Plugin_list::iterator

    class Plugin_manager has

      // A pointer to the current plugin.  Used while loading plugins.
      Plugin_list::iterator current_;

    The same iterator is shared by all threads. It is OK to use it to load
    plugins since only one thread loads plugins.  Avoid sharing Plugin_list
    iterator in all other cases.

            PR gold/26200
            * plugin.cc (Plugin_manager::claim_file): Don't share Plugin_list
            iterator.
            (Plugin_manager::all_symbols_read): Likewise.
            (Plugin_manager::cleanup): Likewise.

 gold/ChangeLog |  8 ++++++++
 gold/plugin.cc | 34 +++++++++++++++++-----------------
 2 files changed, 25 insertions(+), 17 deletions(-)
"""

Looks related. Dupe of bug #26200?
Comment 9 H.J. Lu 2020-11-25 00:05:50 UTC
Dup.

*** This bug has been marked as a duplicate of bug 26200 ***