Bug 23016 - assert in output.h on mix of .eh_frame types for x86_64
Summary: assert in output.h on mix of .eh_frame types for x86_64
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: gold (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Cary Coutant
URL:
Keywords:
: 23155 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-03-29 13:47 UTC by Than McIntosh
Modified: 2018-05-10 17:46 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
Project(s) to access:
ssh public key:


Attachments
tar file containing C sources, makefile (551 bytes, application/x-tar)
2018-03-29 13:47 UTC, Than McIntosh
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Than McIntosh 2018-03-29 13:47:30 UTC
Created attachment 10924 [details]
tar file containing C sources, makefile

The code in the gold linker that handles relocatable links seems to be having problems when there are a mix of input object files from GCC and Clang (x86_64), in some cases triggering an assert.

Given two C source files (contents unimportant, but each should have at least one function), bug can be triggered by running the following commands:

  gcc -fPIC -g -O2 -c this.c
  clang -fPIC -g -O2 -c other.c
  gcc -fuse-ld=gold  -fPIC -g -O2  -Wl,-r -nostdlib -no-pie this.o other.o -o rel.o

resulting in this assert:

  ld.gold: internal error in set_info_section, at ../../binutils/gold/output.h:3386

I've reproduced this on linux x86_64 with several versions of clang, gcc, and ld.gold (including top of trunk for each). GCC 7 + clang 3.9 + gold 1.15 is one starting point.

Stack trace for the assert looks like:

#0  gold::Output_section::set_info_section (this=0x<...> os=0x<...> at ../../binutils/gold/output.h:3382
#1  0x<...> in gold::Layout::layout_reloc<64, false> (this=0x<...> object=0x<...> shdr=..., data_section=0x<...> rr=0x<...> at ../../binutils/gold/layout.cc:1354
#2  0x<...> in gold::Sized_relobj_file<64, false>::do_layout (this=0x<...> symtab=0x<...> layout=0x<...> sd=0x<...> at ../../binutils/gold/object.cc:1856
#3  0x<...> in gold::Object::layout (this=0x<...> symtab=0x<...> layout=0x<...> sd=0x<...> at ../../binutils/gold/object.h:651
#4  0x<...> in gold::Add_symbols::run (this=0x<...> at ../../binutils/gold/readsyms.cc:634
#5  0x<...> in gold::Workqueue::find_and_run_task (this=0x<...> thread_number=0) at ../../binutils/gold/workqueue.cc:319
#6  0x<...> in gold::Workqueue::process (this=0x<...> thread_number=0) at ../../binutils/gold/workqueue.cc:495
#7  0x<...> in main (argc=6, argv=0x<...> at ../../binutils/gold/main.cc:252

The relocation section being laid out at the point of the assert is one of the .rela.eh_frame sections. 

When I look at the gcc-compiled objects feeding into the link, I see .eh_frame sections that look like:

  [ 7] .eh_frame
       PROGBITS        0000000000000000 000088 000038 00   0   0  8
       [0000000000000002]: ALLOC
  [ 8] .rela.eh_frame
       RELA            0000000000000000 0001e0 000018 18   9   7  8
       [0000000000000040]: INFO LINK

The corresponding sections from LLVM-backend compiled objects look like

  [16] .eh_frame
       X86_64_UNWIND   0000000000000000 0001e8 000030 00   0   0  8
       [0000000000000002]: ALLOC
  [17] .rela.eh_frame
       RELA            0000000000000000 000470 000018 18  20  16  8
       [0000000000000000]:

Note the section types -- this seems to be the crux of the problem. LLVM (as of 2013 or so) sets the section type to X86_64_UNWIND, not PROGBITS. The X86_64 ps-ABI recommends using this section type, but not all producers do (gcc does not).

Googling assert in question turns up a few other instances, including:

  https://stackoverflow.com/questions/47797817/ld-gold-internal-error-in-set-info-section-at-output-h3209
  https://sourceware.org/bugzilla/show_bug.cgi?id=15861

however it's not clear whether this is the same bug (some of the instances out in the wild seem to involve the use of a linker script).

I spent a while in the debugger looking at what is happening. When gold processes the two different .eh_frame sections, it effectively treats them as distinct sections in the output file (since output section lookup is based on name, type, and flags). In the case of the .rela.eh_frame sections, however, we get only a single output section (since both input ".rela.eh_frame" sections have the same name/type/flags). The assert is happening when Layout::layout_reloc is called on the second relocation section. The code looks like:

    gold_assert((this->info_section_ == NULL
		 || (this->info_section_ == os
		     && this->info_uses_section_index_))
		&& this->info_symndx_ == NULL
		&& this->info_ == 0);

Here "info_section_" is already pointing to the first output section for .eh_frame, however we're trying to set it to the second output section.

I have been experimenting with fixes.  I note that for non-relocatable links, gold is already effectively merging PROGBITS + X86_64_UNWIND .eh_frame sections into a single merged PROGBITS section (see the "FIXME" comment in Layout::make_eh_frame_section).  With this in mind, a similar way to solve the problem would be to do the same merging for relocatable links. 

A patch to do this appears below; this fixes the assert.

A second possibility would be to change things so that the two separate "flavors" of .eh_frame section are preserved in the output object for relocatable links, meaning that there would have to be two separate relocation sections as well. This would mean special handling of some sort when creating output sections for .rela.eh_frame sections (e.g. not just calling Layout::choose_output_section).

I am attaching a reproducer; unpack the tar file and run "make" (may need to edit the makefile so that correct gcc and clang paths are picked up). Ex:

$ make THISCC=gcc-7 OTHERCC=clang-3.9 all
gcc-7 -fPIC -g -O2 -c this.c
clang-3.9 -fPIC -g -O2 -c other.c
gcc-7 -fuse-ld=gold  -fPIC -g -O2 -Wl,-r -nostdlib -no-pie this.o other.o -o rel.o 
/usr/bin/ld.gold: internal error in set_info_section, at ../../gold/output.h:3386
collect2: error: ld returned 1 exit status
makefile:19: recipe for target 'rel.o' failed
make: *** [rel.o] Error 1
$

-----

Tentative patch:

diff --git a/gold/layout.cc b/gold/layout.cc
index f5fe805ea..987e12092 100644
--- a/gold/layout.cc
+++ b/gold/layout.cc
@@ -829,6 +829,13 @@ Layout::get_output_section(const char* name, Stringpool::Key name_key,
       || lookup_type == elfcpp::SHT_PREINIT_ARRAY)
     lookup_type = elfcpp::SHT_PROGBITS;
 
+  // Some compilers assign a type of SHT_PROGBITS to the .eh_frame
+  // section, while others use the psABI-recommended SHT_X86_64_UNWIND.
+  // Insure that we combine the two in the resulting output file (mainly
+  // an issue for relocatable links).
+  if (lookup_type == elfcpp::SHT_X86_64_UNWIND)
+    lookup_type = elfcpp::SHT_PROGBITS;
+
   elfcpp::Elf_Xword lookup_flags = flags;
 
   // Ignoring SHF_WRITE and SHF_EXECINSTR here means that we combine
Comment 1 Than McIntosh 2018-03-30 16:14:31 UTC
After discussion with colleagues, I'm now opting in favor of the second proposal (being more selective about merging relocation sections). Sent a patch to the mailing list: https://sourceware.org/ml/binutils/2018-03/msg00416.html.
Comment 2 Sourceware Commits 2018-04-03 02:07:57 UTC
The master branch has been updated by Cary Coutant <ccoutant@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=bce5a025d2ed7eda2c5bbb85bd9b33333ca5d556

commit bce5a025d2ed7eda2c5bbb85bd9b33333ca5d556
Author: Cary Coutant <ccoutant@gmail.com>
Date:   Mon Apr 2 16:12:10 2018 -0700

    Fix problem where mixed section types can cause internal error during a -r link.
    
    During a -r (or --emit-relocs) link, if two sections had the same name but
    different section types, gold would put relocations for both sections into
    the same relocation section even though the data sections remained separate.
    
    For .eh_frame sections, when one section is PROGBITS and another is
    X86_64_UNWIND, we really should be using the UNWIND section type and
    combining the sections anyway.  For other sections, we should be
    creating one relocation section for each output data section.
    
    gold/
    	PR gold/23016
    	* incremental.cc (can_incremental_update): Check for unwind section
    	type.
    	* layout.h (Layout::layout): Add sh_type parameter.
    	* layout.cc (Layout::layout): Likewise.
    	(Layout::layout_reloc): Create new output reloc section if data
    	section does not already have one.
    	(Layout::layout_eh_frame): Check for unwind section type.
    	(Layout::make_eh_frame_section): Use unwind section type for .eh_frame
    	and .eh_frame_hdr.
    	* object.h (Sized_relobj_file::Shdr_write): New typedef.
    	(Sized_relobj_file::layout_section): Add sh_type parameter.
    	(Sized_relobj_file::Deferred_layout::Deferred_layout): Add sh_type
    	parameter.
    	* object.cc (Sized_relobj_file::check_eh_frame_flags): Check for
    	unwind section type.
    	(Sized_relobj_file::layout_section): Add sh_type parameter; pass it
    	to Layout::layout.
    	(Sized_relobj_file::do_layout): Make local copy of sh_type.
    	Force .eh_frame sections to unwind section type.
    	Pass sh_type to layout_section.
    	(Sized_relobj_file<size, big_endian>::do_layout_deferred_sections):
    	Pass sh_type to layout_section.
    	* output.cc (Output_section::Output_section): Initialize reloc_section_.
    	* output.h (Output_section::reloc_section): New method.
    	(Output_section::set_reloc_section): New method.
    	(Output_section::reloc_section_): New data member.
    	* target.h (Target::unwind_section_type): New method.
    	(Target::Target_info::unwind_section_type): New data member.
    
    	* aarch64.cc (aarch64_info): Add unwind_section_type.
    	* arm.cc (arm_info, arm_nacl_info): Likewise.
    	* i386.cc (i386_info, i386_nacl_info, iamcu_info): Likewise.
    	* mips.cc (mips_info, mips_nacl_info): Likewise.
    	* powerpc.cc (powerpc_info): Likewise.
    	* s390.cc (s390_info): Likewise.
    	* sparc.cc (sparc_info): Likewise.
    	* tilegx.cc (tilegx_info): Likewise.
    	* x86_64.cc (x86_64_info, x86_64_nacl_info): Likewise.
    
    	* testsuite/Makefile.am (pr23016_1, pr23016_2): New test cases.
    	* testsuite/Makefile.in: Regenerate.
    	* testsuite/testfile.cc: Add unwind_section_type.
    	* testsuite/pr23016_1.sh: New test script.
    	* testsuite/pr23016_1a.s: New source file.
    	* testsuite/pr23016_1b.s: New source file.
    	* testsuite/pr23016_2.sh: New test script.
    	* testsuite/pr23016_2a.s: New source file.
    	* testsuite/pr23016_2b.s: New source file.
Comment 3 Cary Coutant 2018-04-04 16:50:18 UTC
Fixed on trunk.
Comment 4 Cary Coutant 2018-05-10 17:46:54 UTC
*** Bug 23155 has been marked as a duplicate of this bug. ***