Created attachment 9531 [details] readelf -S output This assertion triggers when linking objects that were created with clang 3.8.0. I can reproduce this with the current git master The cause of the assertion is that the section_info vector contains twice the same exact reloc (Note that I have modified the code to call abort in do_gold_unreachable to simplify debugging) (gdb) bt #0 0x00007ffff7561067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007ffff7562448 in __GI_abort () at abort.c:89 #2 0x00000000006c1bd4 in gold::do_gold_unreachable (filename=0x99f9f5 "icf.cc", lineno=467, function=0x99fba0 <gold::get_section_contents(bool, std::pair<gold::Relobj*, unsigned int> const&, unsigned int, unsigned int*, gold::Symbol_table*, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<std::string, std::allocator<std::string> >*)::__FUNCTION__> "get_section_contents") at gold.cc:102 #3 0x00000000006c7baf in gold::get_section_contents (first_iteration=true, secn=..., section_num=1141, num_tracked_relocs=0x7fffffff6498, symtab=0x7fffffff6ef0, kept_section_id=..., section_contents=0x7fffffff6660) at icf.cc:467 #4 0x00000000006c8252 in gold::match_sections (iteration_num=1, symtab=0x7fffffff6ef0, num_tracked_relocs=0x7fffffff6680, kept_section_id=0x7fffffff6e68, id_section=..., is_secn_or_group_unique=0x7fffffff65f0, section_contents=0x7fffffff6660) at icf.cc:624 #5 0x00000000006c8d81 in gold::Icf::find_identical_sections (this=0x7fffffff6e20, input_objects=0x7fffffff6d00, symtab=0x7fffffff6ef0) at icf.cc:783 #6 0x00000000006c2ef7 in gold::queue_middle_tasks (options=..., task=0xea3230, input_objects=0x7fffffff6d00, symtab=0x7fffffff6ef0, layout=0x7fffffff7170, workqueue=0x7fffffff6be0, mapfile=0x0) at gold.cc:526 #7 0x00000000006c1c25 in gold::Middle_runner::run (this=0xea31f0, workqueue=0x7fffffff6be0, task=0xea3230) at gold.cc:135 #8 0x00000000006c48af in gold::Task_function::run (this=0xea3230, workqueue=0x7fffffff6be0) at workqueue.h:178 #9 0x000000000084f2d9 in gold::Workqueue::find_and_run_task (this=0x7fffffff6be0, thread_number=0) at workqueue.cc:319 #10 0x000000000084f8de in gold::Workqueue::process (this=0x7fffffff6be0, thread_number=0) at workqueue.cc:495 #11 0x0000000000406e06 in main (argc=121, argv=0x7fffffffd728) at main.cc:252 (gdb) frame 3 #3 0x00000000006c7baf in gold::get_section_contents (first_iteration=true, secn=..., section_num=1141, num_tracked_relocs=0x7fffffff6498, symtab=0x7fffffff6ef0, kept_section_id=..., section_contents=0x7fffffff6660) at icf.cc:467 467 gold_assert (offset < (long long) secn_len); (gdb) print v $1 = (gold::Icf::Sections_reachable_info &) @0x6d71c70: {<std::_Vector_base<std::pair<gold::Relobj*, unsigned int>, std::allocator<std::pair<gold::Relobj*, unsigned int> > >> = { _M_impl = {<std::allocator<std::pair<gold::Relobj*, unsigned int> >> = {<__gnu_cxx::new_allocator<std::pair<gold::Relobj*, unsigned int> >> = {<No data fields>}, <No data fields>}, _M_start = 0x6d718a0, _M_finish = 0x6d718c0, _M_end_of_storage = 0x6d718c0}}, <No data fields>} (gdb) print *(v->_M_impl->_M_start) $2 = {first = 0x13cba80, second = 2105} (gdb) print *(v->_M_impl->_M_start+1) $3 = {first = 0x13cba80, second = 2105} I am attaching the full readelf -S output for the .o
Sri, can you take a look at this, please?
(In reply to Cary Coutant from comment #1) > Sri, can you take a look at this, please? Will do!
Hi! Has there been any kind of progress on this bug? It's preventing Google Chrome from building with either --icf=all or --icf=safe: crbug.com/729532
I guess I dropped the ball on this, been unable to get to it, sorry. Let me take a look today.
We worked around this by using --icf=none for the affected project. It resulted in a 10% library size increase, but fortunately that's not too significant overall and it only affects 32-bit builds.
On Thu, Jun 8, 2017 at 12:40 PM, nicolas.capens at gmail dot com <sourceware-bugzilla@sourceware.org> wrote: > https://sourceware.org/bugzilla/show_bug.cgi?id=20642 > > --- Comment #5 from Nicolas Capens <nicolas.capens at gmail dot com> --- > We worked around this by using --icf=none for the affected project. It resulted > in a 10% library size increase, but fortunately that's not too significant > overall and it only affects 32-bit builds. Sorry about that, I am having trouble reproing it though. Do you have a simpler repro? I could only work on it a little this week. > > -- > You are receiving this mail because: > You are on the CC list for the bug. > You are the assignee for the bug.
fwiw I can't reproduce this anymore with 2.29.1 (using clang 5.0)
I see this error when building ldc2 1.7.0 or 1.8.0 (the Dlang compiler) with gcc 7.2.0 and binutils 2.29.1 with --ifc=safe when compiling the 32-bit version (using -DMULTILIB=ON): /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../x86_64-pc-linux-gnu/bin/ld: internal error in get_section_contents, at ../../binutils-2.29.1/gold/icf.cc:467 Unfortunately this is not a small repro case, only another data point. All other special linker flags like --gc-sections or --flto do not affect the outcome.
Created attachment 10973 [details] Replaces assert with fatal error providing additional info If you can, could you please apply this patch to gold, and rerun your failing build? The additional info in the error message should at least help narrow down the cause of the problem.
It fails on the file: https://github.com/ldc-developers/phobos/blob/91ebe89f3e7fc96594afa84c53f93d3bfcb5be6e/std/encoding.d The error is: fatal error: objects_32/std/encoding.o (shndx 150): reloc target offset 0xfffffebe beyond end of section Whatever that means, 0xfffffebe is probably not a good starting address for anything that grows upwards.
Created attachment 10986 [details] objects_32/std/encoding.o from an ldc2 1.8.0 build.
Here's a disassembly of the first part of the function in section 150: Disassembly of section .text._D3std8encoding__T6decodeTAxEQBbQBa10Latin2CharZQBeFNaNbNiNfKQBjZw: 00000000 <_D3std8encoding__T6decodeTAxEQBbQBa10Latin2CharZQBeFNaNbNiNfKQBjZw>: 0: 53 push %ebx 1: 56 push %esi 2: 50 push %eax 3: 89 c1 mov %eax,%ecx 5: e8 00 00 00 00 call a <_D3std8encoding__T6decodeTAxEQBbQBa10Latin2CharZQBeFNaNbNiNfKQBjZw+0xa> a: 5b pop %ebx b: 8b 11 mov (%ecx),%edx d: 81 c3 05 00 00 00 add $0x5,%ebx f: R_386_GOTPC _GLOBAL_OFFSET_TABLE_ 13: 85 d2 test %edx,%edx 15: 74 2d je 44 <_D3std8encoding__T6decodeTAxEQBbQBa10Latin2CharZQBeFNaNbNiNfKQBjZw+0x44> 17: 8b 71 04 mov 0x4(%ecx),%esi 1a: 4a dec %edx 1b: 0f b6 06 movzbl (%esi),%eax 1e: 46 inc %esi 1f: 89 11 mov %edx,(%ecx) 21: 89 71 04 mov %esi,0x4(%ecx) 24: 3d a1 00 00 00 cmp $0xa1,%eax 29: 72 13 jb 3e <_D3std8encoding__T6decodeTAxEQBbQBa10Latin2CharZQBeFNaNbNiNfKQBjZw+0x3e> 2b: 8d 88 5f ff ff ff lea -0xa1(%eax),%ecx 31: 83 f9 5e cmp $0x5e,%ecx 34: 77 27 ja 5d <_D3std8encoding__T6decodeTAxEQBbQBa10Latin2CharZQBeFNaNbNiNfKQBjZw+0x5d> 36: 0f b7 84 43 be fe ff movzwl -0x142(%ebx,%eax,2),%eax 3d: ff 3a: R_386_GOTOFF .L.str.69 (Side question: where does this name mangling scheme come from? It's not the standard C++ scheme.) We have there at offset 0x36 a reference to .L.str.69 - 0x142 (aka 0xfffffebe). That symbol is a local symbol defined at offset 0 in section 2386: 52: 00000000 192 OBJECT LOCAL DEFAULT 2386 .L.str.69 And section 2386 is a string merge section: [2386] .rodata.str2.16 PROGBITS 00000000 011eb0 000322 02 AMS 0 0 16 I'm not sure what that -0x142 offset is being used for, but I'll have to assume that the instruction is meant to address the string at .L.str.69, and the offset is compensation for some positive offset introduced elsewhere. In other words, for the purposes of figuring out what piece of the merge section we're referencing, we should ignore the addend. We have a heuristic to determine whether we should ignore the addend or not in these cases -- we treat an addend in the range 0xffffff00-0xffffffff as a small negative addend as a compensation for a pc-relative reference, and ignore it. We treat everything else as a positive addend and consider it as part of the expression to determine what item is being referenced. Unfortunately, that heuristic should only be applied for pc-relative relocations, and only for references to section symbols. Neither of those conditions apply here, and we should ignore the addend regardless of its value. For relocating the code normally, that is the case, and we should generate the right code eventually. For --icf, however, we are incorrectly applying that heuristic to this case, and that's leading to this error.
(In reply to Cary Coutant from comment #12) > Here's a disassembly of the first part of the function in section 150: > > Disassembly of section > .text._D3std8encoding__T6decodeTAxEQBbQBa10Latin2CharZQBeFNaNbNiNfKQBjZw: > [...] > > (Side question: where does this name mangling scheme come from? It's not the > standard C++ scheme.) That is Dlang mangling and means "pure nothrow @nogc @safe dchar std.encoding.decode!(const(std.encoding.Latin2Char)[]).decode(ref const(std.encoding.Latin2Char)[])", which is an instance of this generic/template function: https://github.com/ldc-developers/phobos/blob/91ebe89f3e7fc96594afa84c53f93d3bfcb5be6e/std/encoding.d#L1961 instanced with this type (basically a #typedef ubyte): https://github.com/ldc-developers/phobos/blob/91ebe89f3e7fc96594afa84c53f93d3bfcb5be6e/std/encoding.d#L957 It boils down to reading a character 'c' off the string and then translating it through a lookup table to a UTF-16 code point: private static immutable dchar m_charMapStart = 0xa1; private static immutable dchar m_charMapEnd = 0xff; private immutable wstring charMap = "\u0104\u02D8\u0141\u00A4\u013D\u015A\u00A7\u00A8"~ "\u0160\u015E\u0164\u0179\u00AD\u017D\u017B\u00B0"~ "\u0105\u02DB\u0142\u00B4\u013E\u015B\u02C7\u00B8"~ "\u0161\u015F\u0165\u017A\u02DD\u017E\u017C\u0154"~ "\u00C1\u00C2\u0102\u00C4\u0139\u0106\u00C7\u010C"~ "\u00C9\u0118\u00CB\u011A\u00CD\u00CE\u010E\u0110"~ "\u0143\u0147\u00D3\u00D4\u0150\u00D6\u00D7\u0158"~ "\u016E\u00DA\u0170\u00DC\u00DD\u0162\u00DF\u0155"~ "\u00E1\u00E2\u0103\u00E4\u013A\u0107\u00E7\u010D"~ "\u00E9\u0119\u00EB\u011B\u00ED\u00EE\u010F\u0111"~ "\u0144\u0148\u00F3\u00F4\u0151\u00F6\u00F7\u0159"~ "\u016F\u00FA\u0171\u00FC\u00FD\u0163\u02D9"; return (c >= m_charMapStart && c <= m_charMapEnd) ? charMap[c-m_charMapStart] : c; There is a safety range check in place here. That's where the "cmp $0x5e,%ecx" comes from in the assembly before the table 'charMap' is indexed and the heuristic fails.
> It boils down to reading a character 'c' off the string and then translating it > through a lookup table to a UTF-16 code point: > > private static immutable dchar m_charMapStart = 0xa1; > private static immutable dchar m_charMapEnd = 0xff; > > private immutable wstring charMap = > "\u0104\u02D8\u0141\u00A4\u013D\u015A\u00A7\u00A8"~ > "\u0160\u015E\u0164\u0179\u00AD\u017D\u017B\u00B0"~ > "\u0105\u02DB\u0142\u00B4\u013E\u015B\u02C7\u00B8"~ > "\u0161\u015F\u0165\u017A\u02DD\u017E\u017C\u0154"~ > "\u00C1\u00C2\u0102\u00C4\u0139\u0106\u00C7\u010C"~ > "\u00C9\u0118\u00CB\u011A\u00CD\u00CE\u010E\u0110"~ > "\u0143\u0147\u00D3\u00D4\u0150\u00D6\u00D7\u0158"~ > "\u016E\u00DA\u0170\u00DC\u00DD\u0162\u00DF\u0155"~ > "\u00E1\u00E2\u0103\u00E4\u013A\u0107\u00E7\u010D"~ > "\u00E9\u0119\u00EB\u011B\u00ED\u00EE\u010F\u0111"~ > "\u0144\u0148\u00F3\u00F4\u0151\u00F6\u00F7\u0159"~ > "\u016F\u00FA\u0171\u00FC\u00FD\u0163\u02D9"; > > return (c >= m_charMapStart && c <= m_charMapEnd) ? > charMap[c-m_charMapStart] : c; > > There is a safety range check in place here. That's where the "cmp > $0x5e,%ecx" comes from in the assembly before the table 'charMap' is indexed > and the heuristic fails. Aha! And 0x142 == m_charMapStart * sizeof(wchar). So it's just folding the index computation into the base address. -cary
*** Bug 22820 has been marked as a duplicate of this bug. ***
The master branch has been updated by Cary Coutant <ccoutant@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=651d16203867f8013a0f78a2f2e24df8c02d1377 commit 651d16203867f8013a0f78a2f2e24df8c02d1377 Author: Cary Coutant <ccoutant@gmail.com> Date: Tue Apr 24 21:57:37 2018 -0700 Fix bug with relocation addends and merge sections with --icf. During --icf processing, gold was incorrectly processing the relocation addend for references to items in a merge section. PC-relative references and other forms of reference with a biased base address require a non-section local symbol, where the addend is purely the bias. gold/ PR gold/20642 PR gold/22820 * gc.h (gc_process_relocs): Flag STT_SECTION symbols in symvec. * icf.cc (get_section_contents): For merge sections, ignore the addend for relocations against non-section symbols.
Fixed on master branch.
And here is the confirmation from my end: ldc2 1.8.0 compiles now! I'll check if any other package that used to fail with --ifc=safe on my system got fixed by this.
It looks like two more Dlang packages were affected by this, the first of which compiles now. The second is a shared library and is hit by "relocation R_386_GOTOFF against preemptible symbol <...> cannot be used when making a shared object" issues when compiled with dmd (the language's reference compiler based on a formerly proprietary backend) targeting i386. Dlang compilers ldc2 and gdc (based on llvm and gcc) work fine, so I think it is just a backend issue.