Bug 20642 - internal error in get_section_contents, at icf.cc:467 with --icf=safe
Summary: internal error in get_section_contents, at icf.cc:467 with --icf=safe
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: gold (show other bugs)
Version: 2.28
: P2 normal
Target Milestone: ---
Assignee: Cary Coutant
URL:
Keywords:
: 22820 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-09-27 17:35 UTC by Guillaume Morin
Modified: 2018-04-25 12:23 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2018-04-24 00:00:00


Attachments
readelf -S output (30.26 KB, text/plain)
2016-09-27 17:35 UTC, Guillaume Morin
Details
Replaces assert with fatal error providing additional info (479 bytes, patch)
2018-04-23 17:13 UTC, Cary Coutant
Details | Diff
objects_32/std/encoding.o from an ldc2 1.8.0 build. (61.38 KB, application/x-object)
2018-04-24 10:37 UTC, Marco Leise
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Guillaume Morin 2016-09-27 17:35:10 UTC
Created attachment 9531 [details]
readelf -S output

This assertion triggers when linking objects that were created with clang 3.8.0. I can reproduce this with the current git master

The cause of the assertion is that the section_info vector contains twice the same exact reloc (Note that I have modified the code to call abort in do_gold_unreachable to simplify debugging)

(gdb) bt
#0  0x00007ffff7561067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff7562448 in __GI_abort () at abort.c:89
#2  0x00000000006c1bd4 in gold::do_gold_unreachable (filename=0x99f9f5 "icf.cc", lineno=467, 
    function=0x99fba0 <gold::get_section_contents(bool, std::pair<gold::Relobj*, unsigned int> const&, unsigned int, unsigned int*, gold::Symbol_table*, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<std::string, std::allocator<std::string> >*)::__FUNCTION__> "get_section_contents") at gold.cc:102
#3  0x00000000006c7baf in gold::get_section_contents (first_iteration=true, secn=..., section_num=1141, num_tracked_relocs=0x7fffffff6498, symtab=0x7fffffff6ef0, kept_section_id=..., section_contents=0x7fffffff6660) at icf.cc:467
#4  0x00000000006c8252 in gold::match_sections (iteration_num=1, symtab=0x7fffffff6ef0, num_tracked_relocs=0x7fffffff6680, kept_section_id=0x7fffffff6e68, id_section=..., is_secn_or_group_unique=0x7fffffff65f0, section_contents=0x7fffffff6660)
    at icf.cc:624
#5  0x00000000006c8d81 in gold::Icf::find_identical_sections (this=0x7fffffff6e20, input_objects=0x7fffffff6d00, symtab=0x7fffffff6ef0) at icf.cc:783
#6  0x00000000006c2ef7 in gold::queue_middle_tasks (options=..., task=0xea3230, input_objects=0x7fffffff6d00, symtab=0x7fffffff6ef0, layout=0x7fffffff7170, workqueue=0x7fffffff6be0, mapfile=0x0) at gold.cc:526
#7  0x00000000006c1c25 in gold::Middle_runner::run (this=0xea31f0, workqueue=0x7fffffff6be0, task=0xea3230) at gold.cc:135
#8  0x00000000006c48af in gold::Task_function::run (this=0xea3230, workqueue=0x7fffffff6be0) at workqueue.h:178
#9  0x000000000084f2d9 in gold::Workqueue::find_and_run_task (this=0x7fffffff6be0, thread_number=0) at workqueue.cc:319
#10 0x000000000084f8de in gold::Workqueue::process (this=0x7fffffff6be0, thread_number=0) at workqueue.cc:495
#11 0x0000000000406e06 in main (argc=121, argv=0x7fffffffd728) at main.cc:252
(gdb) frame 3
#3  0x00000000006c7baf in gold::get_section_contents (first_iteration=true, secn=..., section_num=1141, num_tracked_relocs=0x7fffffff6498, symtab=0x7fffffff6ef0, kept_section_id=..., section_contents=0x7fffffff6660) at icf.cc:467
467			  gold_assert (offset < (long long) secn_len);
(gdb) print v
$1 = (gold::Icf::Sections_reachable_info &) @0x6d71c70: {<std::_Vector_base<std::pair<gold::Relobj*, unsigned int>, std::allocator<std::pair<gold::Relobj*, unsigned int> > >> = {
    _M_impl = {<std::allocator<std::pair<gold::Relobj*, unsigned int> >> = {<__gnu_cxx::new_allocator<std::pair<gold::Relobj*, unsigned int> >> = {<No data fields>}, <No data fields>}, _M_start = 0x6d718a0, _M_finish = 0x6d718c0, 
      _M_end_of_storage = 0x6d718c0}}, <No data fields>}
(gdb) print *(v->_M_impl->_M_start)
$2 = {first = 0x13cba80, second = 2105}
(gdb) print *(v->_M_impl->_M_start+1)
$3 = {first = 0x13cba80, second = 2105}

I am attaching the full readelf -S output for the .o
Comment 1 Cary Coutant 2016-09-27 19:07:18 UTC
Sri, can you take a look at this, please?
Comment 2 Sriraman Tallam 2016-09-27 19:10:04 UTC
(In reply to Cary Coutant from comment #1)
> Sri, can you take a look at this, please?

Will do!
Comment 3 Nicolas Capens 2017-06-05 17:44:28 UTC
Hi! Has there been any kind of progress on this bug? It's preventing Google Chrome from building with either --icf=all or --icf=safe: crbug.com/729532
Comment 4 Sriraman Tallam 2017-06-05 17:53:48 UTC
I guess I dropped the ball on this, been unable to get to it, sorry.  Let me take a look today.
Comment 5 Nicolas Capens 2017-06-08 19:40:33 UTC
We worked around this by using --icf=none for the affected project. It resulted in a 10% library size increase, but fortunately that's not too significant overall and it only affects 32-bit builds.
Comment 6 Sriraman Tallam 2017-06-08 20:06:20 UTC
On Thu, Jun 8, 2017 at 12:40 PM, nicolas.capens at gmail dot com
<sourceware-bugzilla@sourceware.org> wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=20642
>
> --- Comment #5 from Nicolas Capens <nicolas.capens at gmail dot com> ---
> We worked around this by using --icf=none for the affected project. It resulted
> in a 10% library size increase, but fortunately that's not too significant
> overall and it only affects 32-bit builds.

Sorry about that, I am having trouble reproing it though.  Do you have
a simpler repro? I could only work on it a little this week.

>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
> You are the assignee for the bug.
Comment 7 Guillaume Morin 2017-12-05 15:31:15 UTC
fwiw I can't reproduce this anymore with 2.29.1 (using clang 5.0)
Comment 8 Marco Leise 2018-04-23 15:10:58 UTC
I see this error when building ldc2 1.7.0 or 1.8.0 (the Dlang compiler) with gcc 7.2.0 and binutils 2.29.1 with --ifc=safe when compiling the 32-bit version (using -DMULTILIB=ON):

/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../x86_64-pc-linux-gnu/bin/ld: internal error in get_section_contents, at ../../binutils-2.29.1/gold/icf.cc:467

Unfortunately this is not a small repro case, only another data point. All other special linker flags like --gc-sections or --flto do not affect the outcome.
Comment 9 Cary Coutant 2018-04-23 17:13:24 UTC
Created attachment 10973 [details]
Replaces assert with fatal error providing additional info

If you can, could you please apply this patch to gold, and rerun your failing build? The additional info in the error message should at least help narrow down the cause of the problem.
Comment 10 Marco Leise 2018-04-24 10:31:29 UTC
It fails on the file:
https://github.com/ldc-developers/phobos/blob/91ebe89f3e7fc96594afa84c53f93d3bfcb5be6e/std/encoding.d

The error is:
fatal error: objects_32/std/encoding.o (shndx 150): reloc target offset 0xfffffebe beyond end of section

Whatever that means, 0xfffffebe is probably not a good starting address for anything that grows upwards.
Comment 11 Marco Leise 2018-04-24 10:37:08 UTC
Created attachment 10986 [details]
objects_32/std/encoding.o from an ldc2 1.8.0 build.
Comment 12 Cary Coutant 2018-04-24 21:49:40 UTC
Here's a disassembly of the first part of the function in section 150:

Disassembly of section .text._D3std8encoding__T6decodeTAxEQBbQBa10Latin2CharZQBeFNaNbNiNfKQBjZw:

00000000 <_D3std8encoding__T6decodeTAxEQBbQBa10Latin2CharZQBeFNaNbNiNfKQBjZw>:
   0:   53                      push   %ebx
   1:   56                      push   %esi
   2:   50                      push   %eax
   3:   89 c1                   mov    %eax,%ecx
   5:   e8 00 00 00 00          call   a <_D3std8encoding__T6decodeTAxEQBbQBa10Latin2CharZQBeFNaNbNiNfKQBjZw+0xa>
   a:   5b                      pop    %ebx
   b:   8b 11                   mov    (%ecx),%edx
   d:   81 c3 05 00 00 00       add    $0x5,%ebx
                        f: R_386_GOTPC  _GLOBAL_OFFSET_TABLE_
  13:   85 d2                   test   %edx,%edx
  15:   74 2d                   je     44 <_D3std8encoding__T6decodeTAxEQBbQBa10Latin2CharZQBeFNaNbNiNfKQBjZw+0x44>
  17:   8b 71 04                mov    0x4(%ecx),%esi
  1a:   4a                      dec    %edx
  1b:   0f b6 06                movzbl (%esi),%eax
  1e:   46                      inc    %esi
  1f:   89 11                   mov    %edx,(%ecx)
  21:   89 71 04                mov    %esi,0x4(%ecx)
  24:   3d a1 00 00 00          cmp    $0xa1,%eax
  29:   72 13                   jb     3e <_D3std8encoding__T6decodeTAxEQBbQBa10Latin2CharZQBeFNaNbNiNfKQBjZw+0x3e>
  2b:   8d 88 5f ff ff ff       lea    -0xa1(%eax),%ecx
  31:   83 f9 5e                cmp    $0x5e,%ecx
  34:   77 27                   ja     5d <_D3std8encoding__T6decodeTAxEQBbQBa10Latin2CharZQBeFNaNbNiNfKQBjZw+0x5d>
  36:   0f b7 84 43 be fe ff    movzwl -0x142(%ebx,%eax,2),%eax
  3d:   ff 
                        3a: R_386_GOTOFF        .L.str.69

(Side question: where does this name mangling scheme come from? It's not the standard C++ scheme.)

We have there at offset 0x36 a reference to .L.str.69 - 0x142 (aka 0xfffffebe). That symbol is a local symbol defined at offset 0 in section 2386:

    52: 00000000   192 OBJECT  LOCAL  DEFAULT 2386 .L.str.69

And section 2386 is a string merge section:

  [2386] .rodata.str2.16   PROGBITS        00000000 011eb0 000322 02 AMS  0   0 16

I'm not sure what that -0x142 offset is being used for, but I'll have to assume that the instruction is meant to address the string at .L.str.69, and the offset is compensation for some positive offset introduced elsewhere. In other words, for the purposes of figuring out what piece of the merge section we're referencing, we should ignore the addend.

We have a heuristic to determine whether we should ignore the addend or not in these cases -- we treat an addend in the range 0xffffff00-0xffffffff as a small negative addend as a compensation for a pc-relative reference, and ignore it. We treat everything else as a positive addend and consider it as part of the expression to determine what item is being referenced.

Unfortunately, that heuristic should only be applied for pc-relative relocations, and only for references to section symbols. Neither of those conditions apply here, and we should ignore the addend regardless of its value. For relocating the code normally, that is the case, and we should generate the right code eventually. For --icf, however, we are incorrectly applying that heuristic to this case, and that's leading to this error.
Comment 13 Marco Leise 2018-04-25 00:33:39 UTC
(In reply to Cary Coutant from comment #12)
> Here's a disassembly of the first part of the function in section 150:
> 
> Disassembly of section
> .text._D3std8encoding__T6decodeTAxEQBbQBa10Latin2CharZQBeFNaNbNiNfKQBjZw:
> [...]
> 
> (Side question: where does this name mangling scheme come from? It's not the
> standard C++ scheme.)

That is Dlang mangling and means "pure nothrow @nogc @safe dchar std.encoding.decode!(const(std.encoding.Latin2Char)[]).decode(ref const(std.encoding.Latin2Char)[])", which is an instance of this generic/template function:
https://github.com/ldc-developers/phobos/blob/91ebe89f3e7fc96594afa84c53f93d3bfcb5be6e/std/encoding.d#L1961
instanced with this type (basically a #typedef ubyte):
https://github.com/ldc-developers/phobos/blob/91ebe89f3e7fc96594afa84c53f93d3bfcb5be6e/std/encoding.d#L957

It boils down to reading a character 'c' off the string and then translating it through a lookup table to a UTF-16 code point:

    private static immutable dchar m_charMapStart = 0xa1;
    private static immutable dchar m_charMapEnd = 0xff;

    private immutable wstring charMap =
        "\u0104\u02D8\u0141\u00A4\u013D\u015A\u00A7\u00A8"~
        "\u0160\u015E\u0164\u0179\u00AD\u017D\u017B\u00B0"~
        "\u0105\u02DB\u0142\u00B4\u013E\u015B\u02C7\u00B8"~
        "\u0161\u015F\u0165\u017A\u02DD\u017E\u017C\u0154"~
        "\u00C1\u00C2\u0102\u00C4\u0139\u0106\u00C7\u010C"~
        "\u00C9\u0118\u00CB\u011A\u00CD\u00CE\u010E\u0110"~
        "\u0143\u0147\u00D3\u00D4\u0150\u00D6\u00D7\u0158"~
        "\u016E\u00DA\u0170\u00DC\u00DD\u0162\u00DF\u0155"~
        "\u00E1\u00E2\u0103\u00E4\u013A\u0107\u00E7\u010D"~
        "\u00E9\u0119\u00EB\u011B\u00ED\u00EE\u010F\u0111"~
        "\u0144\u0148\u00F3\u00F4\u0151\u00F6\u00F7\u0159"~
        "\u016F\u00FA\u0171\u00FC\u00FD\u0163\u02D9";

    return (c >= m_charMapStart && c <= m_charMapEnd) ? charMap[c-m_charMapStart] : c;

There is a safety range check in place here. That's where the "cmp    $0x5e,%ecx" comes from in the assembly before the table 'charMap' is indexed and the heuristic fails.
Comment 14 Cary Coutant 2018-04-25 00:53:56 UTC
> It boils down to reading a character 'c' off the string and then translating it
> through a lookup table to a UTF-16 code point:
>
>     private static immutable dchar m_charMapStart = 0xa1;
>     private static immutable dchar m_charMapEnd = 0xff;
>
>     private immutable wstring charMap =
>         "\u0104\u02D8\u0141\u00A4\u013D\u015A\u00A7\u00A8"~
>         "\u0160\u015E\u0164\u0179\u00AD\u017D\u017B\u00B0"~
>         "\u0105\u02DB\u0142\u00B4\u013E\u015B\u02C7\u00B8"~
>         "\u0161\u015F\u0165\u017A\u02DD\u017E\u017C\u0154"~
>         "\u00C1\u00C2\u0102\u00C4\u0139\u0106\u00C7\u010C"~
>         "\u00C9\u0118\u00CB\u011A\u00CD\u00CE\u010E\u0110"~
>         "\u0143\u0147\u00D3\u00D4\u0150\u00D6\u00D7\u0158"~
>         "\u016E\u00DA\u0170\u00DC\u00DD\u0162\u00DF\u0155"~
>         "\u00E1\u00E2\u0103\u00E4\u013A\u0107\u00E7\u010D"~
>         "\u00E9\u0119\u00EB\u011B\u00ED\u00EE\u010F\u0111"~
>         "\u0144\u0148\u00F3\u00F4\u0151\u00F6\u00F7\u0159"~
>         "\u016F\u00FA\u0171\u00FC\u00FD\u0163\u02D9";
>
>     return (c >= m_charMapStart && c <= m_charMapEnd) ?
> charMap[c-m_charMapStart] : c;
>
> There is a safety range check in place here. That's where the "cmp
> $0x5e,%ecx" comes from in the assembly before the table 'charMap' is indexed
> and the heuristic fails.

Aha! And 0x142 == m_charMapStart * sizeof(wchar). So it's just folding
the index computation into the base address.

-cary
Comment 15 Cary Coutant 2018-04-25 05:10:09 UTC
*** Bug 22820 has been marked as a duplicate of this bug. ***
Comment 16 Sourceware Commits 2018-04-25 05:15:04 UTC
The master branch has been updated by Cary Coutant <ccoutant@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=651d16203867f8013a0f78a2f2e24df8c02d1377

commit 651d16203867f8013a0f78a2f2e24df8c02d1377
Author: Cary Coutant <ccoutant@gmail.com>
Date:   Tue Apr 24 21:57:37 2018 -0700

    Fix bug with relocation addends and merge sections with --icf.
    
    During --icf processing, gold was incorrectly processing the relocation
    addend for references to items in a merge section. PC-relative references
    and other forms of reference with a biased base address require a
    non-section local symbol, where the addend is purely the bias.
    
    gold/
    	PR gold/20642
    	PR gold/22820
    	* gc.h (gc_process_relocs): Flag STT_SECTION symbols in symvec.
    	* icf.cc (get_section_contents): For merge sections, ignore the
    	addend for relocations against non-section symbols.
Comment 17 Cary Coutant 2018-04-25 05:17:12 UTC
Fixed on master branch.
Comment 18 Marco Leise 2018-04-25 11:55:19 UTC
And here is the confirmation from my end: ldc2 1.8.0 compiles now! I'll check if any other package that used to fail with --ifc=safe on my system got fixed by this.
Comment 19 Marco Leise 2018-04-25 12:23:18 UTC
It looks like two more Dlang packages were affected by this, the first of which compiles now. The second is a shared library and is hit by "relocation R_386_GOTOFF against preemptible symbol <...> cannot be used when making a shared object" issues when compiled with dmd (the language's reference compiler based on a formerly proprietary backend) targeting i386. Dlang compilers ldc2 and gdc (based on llvm and gcc) work fine, so I think it is just a backend issue.