After upgrading my binutils installation from 2.22 to 2.23.1, many (all?) C++ EH tests on Solaris/SPARC started failing like this: Program received signal SIGILL, Illegal instruction. [Switching to Thread 1 (LWP 1)] 0xff2e4f60 in __CTOR_LIST__ () from ../../../sparc-sun-solaris2.11/libstdc++-v3/src/.libs/libstdc++.so.6 (gdb) where #0 0xff2e4f60 in __CTOR_LIST__ () from ../../../sparc-sun-solaris2.11/libstdc++-v3/src/.libs/libstdc++.so.6 #1 0xff249618 in __cxxabiv1::__cxa_get_globals () at /vol/gcc/src/hg/trunk/local/libstdc++-v3/libsupc++/eh_globals.cc:63 #2 0xff248660 in __cxxabiv1::__cxa_allocate_exception (thrown_size=134464) at /vol/gcc/src/hg/trunk/local/libstdc++-v3/libsupc++/eh_alloc.cc:136 #3 0x000108d4 in h () at /vol/gcc/src/hg/trunk/local/gcc/testsuite/g++.dg/eh/alias1.C:25 #4 0x00010918 in main () at /vol/gcc/src/hg/trunk/local/gcc/testsuite/g++.dg/eh/alias1.C:34 If I relink libstdc++.so with gld 2.22 (no other changes), the test passes. Looking at the __cxa_get_globals call site with gdb, I find (with gld 2.23.1): => 0xff249610 <__cxxabiv1::__cxa_get_globals()+32>: call 0xff2e4f60 0xff249614 <__cxxabiv1::__cxa_get_globals()+36>: xor %i0, 0, %i0 (gdb) 1: x/i $pc => 0xff249614 <__cxxabiv1::__cxa_get_globals()+36>: xor %i0, 0, %i0 (gdb) 0xff2e4f60 in __CTOR_LIST__ () from ../../../sparc-sun-solaris2.11/libstdc++-v3/src/.libs/save/libstdc++.so.6 1: x/i $pc => 0xff2e4f60: unknown Instead, with libstdc++.so relinked with gld 2.22, I see => 0xff2496a4 <__cxxabiv1::__cxa_get_globals()+32>: call 0xff2e987c <__tls_get_addr@plt> 0xff2496a8 <__cxxabiv1::__cxa_get_globals()+36>: xor %i0, 0, %i0 In the input object (eh_globals.o), elfdump -r shows (mangled and unmangled respectively): R_SPARC_TLS_LDM_CALL 0x20 0 .rela.text.__c _ZZN12_GLOBAL__ N_110get_globalEvE6global R_SPARC_TLS_LDM_CALL 0x20 0 .rela.text.(char) (anonymous n amespace)::get_global()::global I don't yet have an idea what gld change is causing this, but it's a gld 2.23 regression that makes binutils 2.23 unusable on Solaris/SPARC. Rainer
My first guess would be that this is caused by: 2011-10-18 David S. Miller <davem@davemloft.net> PR binutils/13301 * elfxx-sparc.c (sparc_elf_find_reloc_at_ofs): New function. (_bfd_sparc_elf_relocate_section): Always move the __tls_get_addr call delay slot instruction forward 4 bytes when performing relaxation. I wonder why Eric never saw this problem, maybe he didn't test it on Solaris or something like that.
> My first guess would be that this is caused by: > > 2011-10-18 David S. Miller <davem@davemloft.net> > > PR binutils/13301 > * elfxx-sparc.c (sparc_elf_find_reloc_at_ofs): New function. > (_bfd_sparc_elf_relocate_section): Always move the __tls_get_addr > call delay slot instruction forward 4 bytes when performing > relaxation. Not clear IMO, this patch is supposed to be in the 2.22 release. > I wonder why Eric never saw this problem, maybe he didn't test it > on Solaris or something like that. I seldom use the GNU linker on Solaris and, on cross-platforms where I do use it, TLS isn't supported.
Indeed, my change is in the 2.22 release, I just double checked.
> --- Comment #3 from David S. Miller <davem at davemloft dot net> 2013-01-23 > 21:34:17 UTC --- > Indeed, my change is in the 2.22 release, I just double checked. Seems like I'll have to run a reghunt to find which binutils patch caused this. Rainer
The reghunt identified the following patch as the culprit: 2011-12-07 Alan Modra <amodra@gmail.com> PR ld/12772 * elflink.c (elf_gc_sweep_symbol): Discard unmarked symbols defined in shared libraries. To verify, I've reverted just this patch in a binutils 2.23.1 tree, rebuilt gld and all the EH failure were gone in a fresh gcc mainline bootstrap. Rainer
What differences are shown by nm -D -n between the good and bad libstdc++.so?
Created attachment 6836 [details] good nm -D -n output
Created attachment 6837 [details] bad nm -D -n output Among many others, the undefined reference to __tls_get_addr is gone.
The question then is "Why didn't _bfd_elf_gc_mark_rsec mark __tls_get_addr?" _bfd_elf_gc_mark_rsec ought to be called for every relocation in kept sections of libstdc++.so, and so set h->mark for __tls_get_addr. Answering my own question, I guess this is because one or more SPARC relocs reference __tls_get_addr *implicitly*, R_SPARC_TLS_LDM_CALL and R_SPARC_TLS_GD_CALL by the look of it. _bfd_sparc_elf_gc_mark_hook will need to do the marking. I'm sure you or David can throw together a patch to fix this. If not, I'll hack on it myself tomorrow.
Thanks a lot Alan, I'll take it from here.
Created attachment 6838 [details] Fix for __tls_get_addr garbage collection on sparc. Rainer please give this patch a test. I'll try to put together a test case meanwhile.
So there's no need for gc_mark_hook to return the section of the symbol on the R_SPARC_TLS_GD_CALL? ie. you're sure that section will be marked from some other reloc? If not then something like the following is needed. if (info->shared) switch (SPARC_ELF_R_TYPE (rel->r_info)) { case R_SPARC_TLS_GD_CALL: case R_SPARC_TLS_LDM_CALL: tga = elf_link_hash_lookup (elf_hash_table (info), "__tls_get_addr", FALSE, FALSE, TRUE); BFD_ASSERT (tga != NULL); tga->mark = 1; if (tga->u.weakdef != NULL) tga->u.weakdef->mark = 1; /* If __tls_get_addr might ever have a regular definition, then we need to mark its section too. */ rsec = _bfd_elf_gc_mark_hook (sec, info, rel, tga, NULL); if (rsec && !rsec->gc_mark) { if (bfd_get_flavour (rsec->owner) != bfd_target_elf_flavour || (rsec->owner->flags & DYNAMIC) != 0) rsec->gc_mark = 1; else /* FIXME: return error from _bfd_elf_gc_mark somehow. */ _bfd_elf_gc_mark (info, rsec, _bfd_sparc_elf_gc_mark_hook); } }
Yes, we are guarenteed that the symbol that's actually attached to the LDM_CALL and GD_CALL relocs will be referenced by other relocations. The TLS calls never exist by themselves, they always exist alongside other TLS relocations over the same symbol. BTW, tilegx and tilepro BFD elf backends have this same exact GC bug.
OK, I thought that might be the case. tilegx and tilepro probably copied sparc. :) BTW, some nits in your patch. You should zap sym as well, just in case the tls sym is local. Also, condition on info->shared to match check_relocs creation of __tls_get_addr sym and relocate_section optimization. And something for later, they probably all should be !info->executable.
Ok I ran Rainer's test case and my patch as attached doesn't work, but something along the line of Alan's suggestion does. I'll go figure this out.
Created attachment 6840 [details] revised patch This one sets h->mark, critical to fixing this bug, and the same problem in tilepro and tilegx.
Alan that patch works, please check it in to mainline (and on any relevant active branch too). Thanks!
CVSROOT: /cvs/src Module name: src Changes by: amodra@sourceware.org 2013-01-31 07:32:45 Modified files: bfd : ChangeLog elfxx-sparc.c elf32-tilepro.c elfxx-tilegx.c Log message: PR ld/15056 * elfxx-sparc.c (_bfd_sparc_elf_gc_mark_hook): Handle implicit references to __tls_get_addr. * elf32-tilpro.c (tilepro_elf_gc_mark_hook): Likewise. Correct vtinherit and vtentry reloc handling too. * elfxx-tilegx.c (tilegx_elf_gc_mark_hook): As for tilepro. Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/ChangeLog.diff?cvsroot=src&r1=1.5922&r2=1.5923 http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/elfxx-sparc.c.diff?cvsroot=src&r1=1.71&r2=1.72 http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/elf32-tilepro.c.diff?cvsroot=src&r1=1.16&r2=1.17 http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/elfxx-tilegx.c.diff?cvsroot=src&r1=1.18&r2=1.19
CVSROOT: /cvs/src Module name: src Branch: binutils-2_23-branch Changes by: amodra@sourceware.org 2013-01-31 07:35:26 Modified files: bfd : ChangeLog elfxx-sparc.c elf32-tilepro.c elfxx-tilegx.c Log message: PR ld/15056 * elfxx-sparc.c (_bfd_sparc_elf_gc_mark_hook): Handle implicit references to __tls_get_addr. * elf32-tilpro.c (tilepro_elf_gc_mark_hook): Likewise. Correct vtinherit and vtentry reloc handling too. * elfxx-tilegx.c (tilegx_elf_gc_mark_hook): As for tilepro. Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/ChangeLog.diff?cvsroot=src&only_with_tag=binutils-2_23-branch&r1=1.5758.2.39&r2=1.5758.2.40 http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/elfxx-sparc.c.diff?cvsroot=src&only_with_tag=binutils-2_23-branch&r1=1.69&r2=1.69.4.1 http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/elf32-tilepro.c.diff?cvsroot=src&only_with_tag=binutils-2_23-branch&r1=1.10.4.3&r2=1.10.4.4 http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/elfxx-tilegx.c.diff?cvsroot=src&only_with_tag=binutils-2_23-branch&r1=1.11.4.3&r2=1.11.4.4
Fixed
This fixed the full libstdc++.so testcase. I'm now running a full gcc mainline bootstrap to check that no other problems turn up. Many thanks to both of you for the quick resolution. Rainer
The bootstrap has now concluded without regressions, so all is certainly fine. Thanks again. Rainer