Bug 15056 - gld 2.23.1 mishandles R_SPARC_TLS_LDM_CALL
Summary: gld 2.23.1 mishandles R_SPARC_TLS_LDM_CALL
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.23
: P2 normal
Target Milestone: ---
Assignee: David S. Miller
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-23 12:58 UTC by Rainer Orth
Modified: 2013-02-01 09:31 UTC (History)
3 users (show)

See Also:
Host: sparc*-*-solaris2*
Target:
Build:
Last reconfirmed:


Attachments
good nm -D -n output (30.20 KB, text/plain)
2013-01-30 11:43 UTC, Rainer Orth
Details
bad nm -D -n output (30.14 KB, text/plain)
2013-01-30 11:44 UTC, Rainer Orth
Details
Fix for __tls_get_addr garbage collection on sparc. (507 bytes, patch)
2013-01-30 19:59 UTC, David S. Miller
Details | Diff
revised patch (1.03 KB, patch)
2013-01-31 06:10 UTC, Alan Modra
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Rainer Orth 2013-01-23 12:58:03 UTC
After upgrading my binutils installation from 2.22 to 2.23.1, many (all?) C++
EH tests on Solaris/SPARC started failing like this:

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 1 (LWP 1)]
0xff2e4f60 in __CTOR_LIST__ ()
   from ../../../sparc-sun-solaris2.11/libstdc++-v3/src/.libs/libstdc++.so.6
(gdb) where
#0  0xff2e4f60 in __CTOR_LIST__ ()
   from ../../../sparc-sun-solaris2.11/libstdc++-v3/src/.libs/libstdc++.so.6
#1  0xff249618 in __cxxabiv1::__cxa_get_globals ()
    at /vol/gcc/src/hg/trunk/local/libstdc++-v3/libsupc++/eh_globals.cc:63
#2  0xff248660 in __cxxabiv1::__cxa_allocate_exception (thrown_size=134464)
    at /vol/gcc/src/hg/trunk/local/libstdc++-v3/libsupc++/eh_alloc.cc:136
#3  0x000108d4 in h ()
    at /vol/gcc/src/hg/trunk/local/gcc/testsuite/g++.dg/eh/alias1.C:25
#4  0x00010918 in main ()
    at /vol/gcc/src/hg/trunk/local/gcc/testsuite/g++.dg/eh/alias1.C:34

If I relink libstdc++.so with gld 2.22 (no other changes), the test passes.

Looking at the __cxa_get_globals call site with gdb, I find (with gld 2.23.1):

=> 0xff249610 <__cxxabiv1::__cxa_get_globals()+32>:	call  0xff2e4f60
   0xff249614 <__cxxabiv1::__cxa_get_globals()+36>:	xor  %i0, 0, %i0
(gdb) 
1: x/i $pc
=> 0xff249614 <__cxxabiv1::__cxa_get_globals()+36>:	xor  %i0, 0, %i0
(gdb) 
0xff2e4f60 in __CTOR_LIST__ () from ../../../sparc-sun-solaris2.11/libstdc++-v3/src/.libs/save/libstdc++.so.6
1: x/i $pc
=> 0xff2e4f60:	unknown

Instead, with libstdc++.so relinked with gld 2.22, I see

=> 0xff2496a4 <__cxxabiv1::__cxa_get_globals()+32>:	call  0xff2e987c <__tls_get_addr@plt>
   0xff2496a8 <__cxxabiv1::__cxa_get_globals()+36>:	xor  %i0, 0, %i0

In the input object (eh_globals.o), elfdump -r shows (mangled and unmangled
respectively):

  R_SPARC_TLS_LDM_CALL           0x20          0  .rela.text.__c _ZZN12_GLOBAL__
N_110get_globalEvE6global

  R_SPARC_TLS_LDM_CALL           0x20          0  .rela.text.(char) (anonymous n
amespace)::get_global()::global

I don't yet have an idea what gld change is causing this, but it's a gld 2.23 regression that makes binutils 2.23 unusable on Solaris/SPARC.

  Rainer
Comment 1 David S. Miller 2013-01-23 19:23:47 UTC
My first guess would be that this is caused by:

2011-10-18  David S. Miller  <davem@davemloft.net>

	PR binutils/13301
	* elfxx-sparc.c (sparc_elf_find_reloc_at_ofs): New function.
	(_bfd_sparc_elf_relocate_section): Always move the __tls_get_addr
	call delay slot instruction forward 4 bytes when performing
	relaxation.

I wonder why Eric never saw this problem, maybe he didn't test it
on Solaris or something like that.
Comment 2 Eric Botcazou 2013-01-23 20:03:32 UTC
> My first guess would be that this is caused by:
> 
> 2011-10-18  David S. Miller  <davem@davemloft.net>
> 
>     PR binutils/13301
>     * elfxx-sparc.c (sparc_elf_find_reloc_at_ofs): New function.
>     (_bfd_sparc_elf_relocate_section): Always move the __tls_get_addr
>     call delay slot instruction forward 4 bytes when performing
>     relaxation.

Not clear IMO, this patch is supposed to be in the 2.22 release.

> I wonder why Eric never saw this problem, maybe he didn't test it
> on Solaris or something like that.

I seldom use the GNU linker on Solaris and, on cross-platforms where I do use it, TLS isn't supported.
Comment 3 David S. Miller 2013-01-23 21:34:17 UTC
Indeed, my change is in the 2.22 release, I just double checked.
Comment 4 Rainer Orth 2013-01-24 13:25:16 UTC
> --- Comment #3 from David S. Miller <davem at davemloft dot net> 2013-01-23
> 21:34:17 UTC ---
> Indeed, my change is in the 2.22 release, I just double checked.

Seems like I'll have to run a reghunt to find which binutils patch
caused this.

	Rainer
Comment 5 Rainer Orth 2013-01-30 09:31:13 UTC
The reghunt identified the following patch as the culprit:

2011-12-07  Alan Modra  <amodra@gmail.com>

       PR ld/12772
       * elflink.c (elf_gc_sweep_symbol): Discard unmarked symbols
       defined in shared libraries.

To verify, I've reverted just this patch in a binutils 2.23.1 tree,
rebuilt gld and all the EH failure were gone in a fresh gcc mainline
bootstrap.

	Rainer
Comment 6 Alan Modra 2013-01-30 11:36:37 UTC
What differences are shown by nm -D -n between the good and bad libstdc++.so?
Comment 7 Rainer Orth 2013-01-30 11:43:01 UTC
Created attachment 6836 [details]
good nm -D -n output
Comment 8 Rainer Orth 2013-01-30 11:44:10 UTC
Created attachment 6837 [details]
bad nm -D -n output

Among many others, the undefined reference to __tls_get_addr is gone.
Comment 9 Alan Modra 2013-01-30 12:57:20 UTC
The question then is "Why didn't _bfd_elf_gc_mark_rsec mark __tls_get_addr?"  _bfd_elf_gc_mark_rsec ought to be called for every relocation in kept sections of libstdc++.so, and so set h->mark for __tls_get_addr.  Answering my own question, I guess this is because one or more SPARC relocs reference __tls_get_addr *implicitly*, R_SPARC_TLS_LDM_CALL and R_SPARC_TLS_GD_CALL by the look of it.  _bfd_sparc_elf_gc_mark_hook will need to do the marking.  I'm sure you or David can throw together a patch to fix this.  If not, I'll hack on it myself tomorrow.
Comment 10 David S. Miller 2013-01-30 18:11:21 UTC
Thanks a lot Alan, I'll take it from here.
Comment 11 David S. Miller 2013-01-30 19:59:03 UTC
Created attachment 6838 [details]
Fix for __tls_get_addr garbage collection on sparc.

Rainer please give this patch a test.  I'll try to put together a test case meanwhile.
Comment 12 Alan Modra 2013-01-31 04:03:49 UTC
So there's no need for gc_mark_hook to return the section of the symbol on the R_SPARC_TLS_GD_CALL?  ie. you're sure that section will be marked from some other reloc?  If not then something like the following is needed.

  if (info->shared)
    switch (SPARC_ELF_R_TYPE (rel->r_info))
      {
      case R_SPARC_TLS_GD_CALL:
      case R_SPARC_TLS_LDM_CALL:
	tga = elf_link_hash_lookup (elf_hash_table (info), "__tls_get_addr",
				    FALSE, FALSE, TRUE);
	BFD_ASSERT (tga != NULL);
	tga->mark = 1;
	if (tga->u.weakdef != NULL)
	  tga->u.weakdef->mark = 1;
	/* If __tls_get_addr might ever have a regular definition,
	   then we need to mark its section too.  */
	rsec = _bfd_elf_gc_mark_hook (sec, info, rel, tga, NULL);
	if (rsec && !rsec->gc_mark)
	  {
	    if (bfd_get_flavour (rsec->owner) != bfd_target_elf_flavour
		|| (rsec->owner->flags & DYNAMIC) != 0)
	      rsec->gc_mark = 1;
	    else
	      /* FIXME: return error from _bfd_elf_gc_mark somehow.  */
	      _bfd_elf_gc_mark (info, rsec, _bfd_sparc_elf_gc_mark_hook);
	  }
      }
Comment 13 David S. Miller 2013-01-31 04:07:01 UTC
Yes, we are guarenteed that the symbol that's actually attached to the LDM_CALL and GD_CALL relocs will be referenced by other relocations.

The TLS calls never exist by themselves, they always exist alongside other TLS relocations over the same symbol.

BTW, tilegx and tilepro BFD elf backends have this same exact GC bug.
Comment 14 Alan Modra 2013-01-31 04:33:22 UTC
OK, I thought that might be the case.  tilegx and tilepro probably copied sparc. :)  BTW, some nits in your patch.  You should zap sym as well, just in case the tls sym is local.  Also, condition on info->shared to match check_relocs creation of __tls_get_addr sym and relocate_section optimization.  And something for later, they probably all should be !info->executable.
Comment 15 David S. Miller 2013-01-31 05:28:40 UTC
Ok I ran Rainer's test case and my patch as attached doesn't work, but something along the line of Alan's suggestion does.  I'll go figure this out.
Comment 16 Alan Modra 2013-01-31 06:10:42 UTC
Created attachment 6840 [details]
revised patch

This one sets h->mark, critical to fixing this bug, and the same problem in tilepro and tilegx.
Comment 17 David S. Miller 2013-01-31 07:14:10 UTC
Alan that patch works, please check it in to mainline (and on any relevant active branch too).  Thanks!
Comment 18 Sourceware Commits 2013-01-31 07:32:50 UTC
CVSROOT:	/cvs/src
Module name:	src
Changes by:	amodra@sourceware.org	2013-01-31 07:32:45

Modified files:
	bfd            : ChangeLog elfxx-sparc.c elf32-tilepro.c 
	                 elfxx-tilegx.c 

Log message:
	PR ld/15056
	* elfxx-sparc.c (_bfd_sparc_elf_gc_mark_hook): Handle implicit
	references to __tls_get_addr.
	* elf32-tilpro.c (tilepro_elf_gc_mark_hook): Likewise.  Correct
	vtinherit and vtentry reloc handling too.
	* elfxx-tilegx.c (tilegx_elf_gc_mark_hook): As for tilepro.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/ChangeLog.diff?cvsroot=src&r1=1.5922&r2=1.5923
http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/elfxx-sparc.c.diff?cvsroot=src&r1=1.71&r2=1.72
http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/elf32-tilepro.c.diff?cvsroot=src&r1=1.16&r2=1.17
http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/elfxx-tilegx.c.diff?cvsroot=src&r1=1.18&r2=1.19
Comment 19 Sourceware Commits 2013-01-31 07:35:33 UTC
CVSROOT:	/cvs/src
Module name:	src
Branch: 	binutils-2_23-branch
Changes by:	amodra@sourceware.org	2013-01-31 07:35:26

Modified files:
	bfd            : ChangeLog elfxx-sparc.c elf32-tilepro.c 
	                 elfxx-tilegx.c 

Log message:
	PR ld/15056
	* elfxx-sparc.c (_bfd_sparc_elf_gc_mark_hook): Handle implicit
	references to __tls_get_addr.
	* elf32-tilpro.c (tilepro_elf_gc_mark_hook): Likewise.  Correct
	vtinherit and vtentry reloc handling too.
	* elfxx-tilegx.c (tilegx_elf_gc_mark_hook): As for tilepro.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/ChangeLog.diff?cvsroot=src&only_with_tag=binutils-2_23-branch&r1=1.5758.2.39&r2=1.5758.2.40
http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/elfxx-sparc.c.diff?cvsroot=src&only_with_tag=binutils-2_23-branch&r1=1.69&r2=1.69.4.1
http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/elf32-tilepro.c.diff?cvsroot=src&only_with_tag=binutils-2_23-branch&r1=1.10.4.3&r2=1.10.4.4
http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/elfxx-tilegx.c.diff?cvsroot=src&only_with_tag=binutils-2_23-branch&r1=1.11.4.3&r2=1.11.4.4
Comment 20 Alan Modra 2013-01-31 07:36:12 UTC
Fixed
Comment 21 Rainer Orth 2013-01-31 12:22:42 UTC
This fixed the full libstdc++.so testcase.  I'm now running a full gcc
mainline bootstrap to check that no other problems turn up.

Many thanks to both of you for the quick resolution.

	Rainer
Comment 22 Rainer Orth 2013-02-01 09:31:00 UTC
The bootstrap has now concluded without regressions, so all is certainly
fine.

Thanks again.

	Rainer