Bug 22721 - [2.30, 2.31 regression] Solaris/x86 TLS transition failures with linker plugin
Summary: [2.30, 2.31 regression] Solaris/x86 TLS transition failures with linker plugin
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.31
: P2 normal
Target Milestone: 2.30
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-17 10:49 UTC by Rainer Orth
Modified: 2018-03-31 12:42 UTC (History)
2 users (show)

See Also:
Host: i386-pc-solaris2.11
Target: i386-pc-solaris2.11
Build: i386-pc-solaris2.11
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rainer Orth 2018-01-17 10:49:57 UTC
When I recently tried a gcc mainline bootstrap on Solaris 11/x86 with gas and
gld from the binutils 2.30 branch, I found a couple of gcc testsuite regressions:

UNRESOLVED: gcc.dg/lto/20090210 c_lto_20090210_0.o-c_lto_20090210_1.o execute -O0 -flto -flto-partition=none -fuse-linker-plugin
UNRESOLVED: gcc.dg/lto/20090210 c_lto_20090210_0.o-c_lto_20090210_1.o execute -O0 -flto -fuse-linker-plugin -fno-fat-lto-objects 
UNRESOLVED: gcc.dg/lto/20090210 c_lto_20090210_0.o-c_lto_20090210_1.o execute -O2 -flto -flto-partition=none -fuse-linker-plugin -fno-fat-lto-objects 
UNRESOLVED: gcc.dg/lto/20090210 c_lto_20090210_0.o-c_lto_20090210_1.o execute -O2 -flto -fuse-linker-plugin
FAIL: gcc.dg/lto/20090210 c_lto_20090210_0.o-c_lto_20090210_1.o link, -O0 -flto -flto-partition=none -fuse-linker-plugin
FAIL: gcc.dg/lto/20090210 c_lto_20090210_0.o-c_lto_20090210_1.o link, -O0 -flto -fuse-linker-plugin -fno-fat-lto-objects 
FAIL: gcc.dg/lto/20090210 c_lto_20090210_0.o-c_lto_20090210_1.o link, -O2 -flto -flto-partition=none -fuse-linker-plugin -fno-fat-lto-objects 
FAIL: gcc.dg/lto/20090210 c_lto_20090210_0.o-c_lto_20090210_1.o link, -O2 -flto -fuse-linker-plugin
FAIL: gcc.dg/torture/tls/run-gd.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
WARNING: gcc.dg/torture/tls/run-gd.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  compilation failed to produce executable
FAIL: gcc.dg/torture/tls/run-ld.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
WARNING: gcc.dg/torture/tls/run-ld.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  compilation failed to produce executable

both for 32 and 64-bit.  The regression is present on binutils mainline, too.

E.g. for the run-gd.c test, the link fails with

/vol/gcc/bin/gld-2.29.90: /var/tmp//ccDiJVLa.ltrans0.ltrans.o: TLS transition from R_386_TLS_GD to R_386_TLS_IE_32 against `tls_gd' at 0x15 in section `.text.startup' failed
/vol/gcc/bin/gld-2.29.90: final link failed: Nonrepresentable section on output

Since this only happens with -fuse-linker-plugin, I'm a bit at a loss how to produce
a standalone testcase.

A reghunt identified this patch as the culprit:

The first bad revision is:
changeset:   91181:891260873d9a
user:        H.J. Lu <hjl.tools@gmail.com>
date:        Sun Aug 06 08:40:56 2017 -0700
summary:     x86: Lookup __tls_get_addr or ___tls_get_addr once

  Rainer
Comment 1 H.J. Lu 2018-01-17 11:16:15 UTC
A couple questions:

1. Do all tests under ld/testsuite/ld-i386 pass on Solaris?
2. Does Solaris use the same TLS code sequences as Linux?
3. Does GCC generate different TLS code sequences on Solaris?
4. If GCC generate different TLS code sequences on Solaris,
   a. Did GNU ld ever support Solaris TLS code sequences?
   b. If yes, are there any linker testcases for Solaris TLS code sequences?
Comment 2 Rainer Orth 2018-01-18 13:40:22 UTC
> --- Comment #1 from H.J. Lu <hjl.tools at gmail dot com> ---
> A couple questions:
>
> 1. Do all tests under ld/testsuite/ld-i386 pass on Solaris?

No, but that's a preexisting condition:

FAIL: Build libno-plt-1b.so
FAIL: No PLT (dynamic 1a)
FAIL: No PLT (dynamic 1b)
FAIL: No PLT (dynamic 1c)
FAIL: No PLT (static 1d)
FAIL: No PLT (PIE 1e)
FAIL: No PLT (PIE 1f)
FAIL: No PLT (PIE 1g)
FAIL: No PLT (static 1j)
FAIL: No PLT (static 1d)
FAIL: No PLT (static 1j)

The static tests fail because there are no libc.a and libm.a on Solaris,
so full-static linking just isn't possible.  Testing static linking when
the platform doesn't support it is a testsuite bug. 

All other failures follow the same pattern:

regexp_diff match failure
regexp "^ +[a-f0-9]+:   8b 80 ([0-9a-f]{2} ){4}[        ]+mov +-0x[a-f0-9]+\(%eax\),%eax$"
line   " 52a:   8b 80 10 00 00 00       mov    0x10(%eax),%eax"
regexp_diff match failure
regexp "^ +[a-f0-9]+:   ff a0 ([0-9a-f]{2} ){4}[        ]+jmp +\*-0x[0-9a-f]+\(%eax\)$"
line   " 54a:   ff a0 10 00 00 00       jmp    *0x10(%eax)"
FAIL: Build libno-plt-1b.so

where i686-pc-linux-gnu has

 48a:   8b 80 f8 ff ff ff       mov    -0x8(%eax),%eax

 4aa:   ff a0 f8 ff ff ff       jmp    *-0x8(%eax)

Maybe an -fno-omit-frame-pointer thing?

The tls.exp tests there are only run on Linux/x86 at the moment.

> 2. Does Solaris use the same TLS code sequences as Linux?

Mostly.  See HAVE_AS_IX86_TLSGDPLT and HAVE_AS_IX86_TLSLDMPLT in gcc's
i386.md.  They are only used if both assembler and linker support those
relocs, which means when using as and ld only, so they are not relevant
to this bug.

For full documentation, see
https://docs.oracle.com/cd/E53394_01/html/E54813/chapter8-1.html#scrolltoc

> 3. Does GCC generate different TLS code sequences on Solaris?

Not as far as the present bug is concerned, I believe.

> 4. If GCC generate different TLS code sequences on Solaris,
>    a. Did GNU ld ever support Solaris TLS code sequences?
>    b. If yes, are there any linker testcases for Solaris TLS code sequences?

No to both: I think I once tried to teach gas and gld about them.  gas
was reasonably easy, but I totally failed for gld, so I gave up.

	Rainer
Comment 3 H.J. Lu 2018-01-18 13:58:25 UTC
(In reply to Rainer Orth from comment #2)
> > --- Comment #1 from H.J. Lu <hjl.tools at gmail dot com> ---
> > A couple questions:
> >
> > 1. Do all tests under ld/testsuite/ld-i386 pass on Solaris?
> 
> No, but that's a preexisting condition:
> 
> FAIL: Build libno-plt-1b.so
> FAIL: No PLT (dynamic 1a)
> FAIL: No PLT (dynamic 1b)
> FAIL: No PLT (dynamic 1c)
> FAIL: No PLT (static 1d)
> FAIL: No PLT (PIE 1e)
> FAIL: No PLT (PIE 1f)
> FAIL: No PLT (PIE 1g)
> FAIL: No PLT (static 1j)
> FAIL: No PLT (static 1d)
> FAIL: No PLT (static 1j)
> 
> The static tests fail because there are no libc.a and libm.a on Solaris,
> so full-static linking just isn't possible.  Testing static linking when
> the platform doesn't support it is a testsuite bug. 

Please open a bug report.

> All other failures follow the same pattern:
> 
> regexp_diff match failure
> regexp "^ +[a-f0-9]+:   8b 80 ([0-9a-f]{2} ){4}[        ]+mov
> +-0x[a-f0-9]+\(%eax\),%eax$"
> line   " 52a:   8b 80 10 00 00 00       mov    0x10(%eax),%eax"
> regexp_diff match failure
> regexp "^ +[a-f0-9]+:   ff a0 ([0-9a-f]{2} ){4}[        ]+jmp
> +\*-0x[0-9a-f]+\(%eax\)$"
> line   " 54a:   ff a0 10 00 00 00       jmp    *0x10(%eax)"
> FAIL: Build libno-plt-1b.so
> 
> where i686-pc-linux-gnu has
> 
>  48a:   8b 80 f8 ff ff ff       mov    -0x8(%eax),%eax
> 
>  4aa:   ff a0 f8 ff ff ff       jmp    *-0x8(%eax)
> 
> Maybe an -fno-omit-frame-pointer thing?

Can you add -fomit-frame-pointer to i386.exp to see if it works?

> The tls.exp tests there are only run on Linux/x86 at the moment.
> 
> > 2. Does Solaris use the same TLS code sequences as Linux?
> 
> Mostly.  See HAVE_AS_IX86_TLSGDPLT and HAVE_AS_IX86_TLSLDMPLT in gcc's
> i386.md.  They are only used if both assembler and linker support those
> relocs, which means when using as and ld only, so they are not relevant
> to this bug.
> 
> For full documentation, see
> https://docs.oracle.com/cd/E53394_01/html/E54813/chapter8-1.html#scrolltoc
> 
> > 3. Does GCC generate different TLS code sequences on Solaris?
> 
> Not as far as the present bug is concerned, I believe.

Please do

1. Get users/hjl/lto-mixed/master branch and build/install it.
2. Pass -v -save-temps -Wl,-plugin-save-temps to gcc
This should save all temporary files used by ld LTO.
3. Create .gdbinit to set environment variables used by ld LTO with
output from "gcc -v".
4. Capture the failed ld LTO command line option.
5. Run ld under gdb to investigate why ld fails.
Comment 4 Rainer Orth 2018-01-19 11:16:04 UTC
> --- Comment #3 from H.J. Lu <hjl.tools at gmail dot com> ---
[...]
>> The static tests fail because there are no libc.a and libm.a on Solaris,
>> so full-static linking just isn't possible.  Testing static linking when
>> the platform doesn't support it is a testsuite bug. 
>
> Please open a bug report.

Done: PR ld/22732.

>> All other failures follow the same pattern:
>> 
>> regexp_diff match failure
>> regexp "^ +[a-f0-9]+:   8b 80 ([0-9a-f]{2} ){4}[        ]+mov
>> +-0x[a-f0-9]+\(%eax\),%eax$"
>> line   " 52a:   8b 80 10 00 00 00       mov    0x10(%eax),%eax"
>> regexp_diff match failure
>> regexp "^ +[a-f0-9]+:   ff a0 ([0-9a-f]{2} ){4}[        ]+jmp
>> +\*-0x[0-9a-f]+\(%eax\)$"
>> line   " 54a:   ff a0 10 00 00 00       jmp    *0x10(%eax)"
>> FAIL: Build libno-plt-1b.so
>> 
>> where i686-pc-linux-gnu has
>> 
>>  48a:   8b 80 f8 ff ff ff       mov    -0x8(%eax),%eax
>> 
>>  4aa:   ff a0 f8 ff ff ff       jmp    *-0x8(%eax)
>> 
>> Maybe an -fno-omit-frame-pointer thing?
>
> Can you add -fomit-frame-pointer to i386.exp to see if it works?

I set CFLAGS to '-g -O2 -fomit-frame-pointer' in site.exp instead:
doesn't make a difference for test results.

	Rainer
Comment 5 H.J. Lu 2018-01-19 11:39:19 UTC
(In reply to Rainer Orth from comment #4)
> > --- Comment #3 from H.J. Lu <hjl.tools at gmail dot com> ---
> [...]
> >> The static tests fail because there are no libc.a and libm.a on Solaris,
> >> so full-static linking just isn't possible.  Testing static linking when
> >> the platform doesn't support it is a testsuite bug. 
> >
> > Please open a bug report.
> 
> Done: PR ld/22732.

I will take a look.
 
> >> All other failures follow the same pattern:
> >> 
> >> regexp_diff match failure
> >> regexp "^ +[a-f0-9]+:   8b 80 ([0-9a-f]{2} ){4}[        ]+mov
> >> +-0x[a-f0-9]+\(%eax\),%eax$"
> >> line   " 52a:   8b 80 10 00 00 00       mov    0x10(%eax),%eax"
> >> regexp_diff match failure
> >> regexp "^ +[a-f0-9]+:   ff a0 ([0-9a-f]{2} ){4}[        ]+jmp
> >> +\*-0x[0-9a-f]+\(%eax\)$"
> >> line   " 54a:   ff a0 10 00 00 00       jmp    *0x10(%eax)"
> >> FAIL: Build libno-plt-1b.so
> >> 
> >> where i686-pc-linux-gnu has
> >> 
> >>  48a:   8b 80 f8 ff ff ff       mov    -0x8(%eax),%eax
> >> 
> >>  4aa:   ff a0 f8 ff ff ff       jmp    *-0x8(%eax)
> >> 
> >> Maybe an -fno-omit-frame-pointer thing?
> >
> > Can you add -fomit-frame-pointer to i386.exp to see if it works?
> 
> I set CFLAGS to '-g -O2 -fomit-frame-pointer' in site.exp instead:
> doesn't make a difference for test results.
> 

This is due to different PLT/GOT layout on Solaris. I can xfail it
on Solaris.

Please try the current master branch again.
Comment 6 Rainer Orth 2018-01-19 16:18:14 UTC
> --- Comment #3 from H.J. Lu <hjl.tools at gmail dot com> ---
[...]
> Please do
>
> 1. Get users/hjl/lto-mixed/master branch and build/install it.
> 2. Pass -v -save-temps -Wl,-plugin-save-temps to gcc
> This should save all temporary files used by ld LTO.
> 3. Create .gdbinit to set environment variables used by ld LTO with
> output from "gcc -v".
> 4. Capture the failed ld LTO command line option.
> 5. Run ld under gdb to investigate why ld fails.

I did some digging so far.  That branch didn't include the culprit
patch.  If I use gld build from it as is, the testcase works fine.
Next, I've applied that patch and could reproduced the failure, so the
reghunt was correct in identifying the culprit.

I find that this call to elf_i386_check_tls_transition returns FALSE

Thread 2 hit Breakpoint 2, elf_i386_check_tls_transition (sec=0x8303f48, contents=0x82b6c78 "U\211\345\213E\b]\303U\211\345S\203\354\024\307E\364", symtab_hdr=0x82fa2ac, sym_hashes=0x83050f4, r_type=18, rel=0x8305184, relend=0x83051b4) at /var/gcc/reghunt/binutils-lto-mixed/bfd/elf32-i386.c:1381

here:

      h = sym_hashes[r_symndx - symtab_hdr->sh_info];
      if (h == NULL
	  || !((struct elf_i386_link_hash_entry *) h)->tls_get_addr)
	return FALSE;

(gdb) p *$33
$34 = {elf = {root = {root = {next = 0x0, string = 0x846d570 "___tls_get_addr@@SUNWprivate_1.1", hash = 624670595}, type = bfd_link_hash_defined, non_ir_ref_regular = 0, non_ir_ref_dynamic = 0, linker_def = 0, ldscript_def = 0, u = {undef = {next = 0x846d594, abfd = 0x82d88d4}, def = {next = 0x846d594, section = 0x82d88d4, value = 1081620}, i = {next = 0x846d594, link = 0x82d88d4, warning = 0x108114 <error: Cannot access memory at address 0x108114>}, c = {next = 0x846d594, p = 0x82d88d4, size = 1081620}}}, indx = -1, dynindx = 8, got = {refcount = 0, offset = 0, glist = 0x0, plist = 0x0}, plt = {refcount = 0, offset = 0, glist = 0x0, plist = 0x0}, size = 45, type = 2, other = 0, target_internal = 0, ref_regular = 1, def_regular = 0, ref_dynamic = 0, def_dynamic = 1, ref_regular_nonweak = 1, dynamic_adjusted = 0, needs_copy = 0, needs_plt = 0, non_elf = 0, versioned = versioned, forced_local = 0, dynamic = 0, mark = 0, non_got_ref = 0, dynamic_def = 1, ref_dynamic_nonweak = 0, pointer_equality_needed = 0, unique_global = 0, protected_def = 0, start_stop = 0, dynstr_index = 9, u = {weakdef = 0x0, elf_hash_value = 0}, verinfo = {verdef = 0x82da720, vertree = 0x82da720}, u2 = {start_stop_section = 0x0, vtable = 0x0}}, dyn_relocs = 0x0, tls_type = 0 '\000', gotoff_ref = 0, has_got_reloc = 0, has_non_got_reloc = 0, no_finish_dynamic_symbol = 0, tls_get_addr = 0, func_pointer_refcount = 0, plt_got = {refcount = -1, offset = 18446744073709551615, glist = 0xffffffff, plist = 0xffffffff}, plt_second = {refcount = 0, offset = 0, glist = 0x0, plist = 0x0}, tlsdesc_got = 18446744073709551615}

It's pretty obvious that this *is* ___tls_get_addr, but h->tls_get_addr
is 0 nontheless.

	Rainer
Comment 7 H.J. Lu 2018-01-19 17:33:32 UTC
(In reply to Rainer Orth from comment #6)
> > --- Comment #3 from H.J. Lu <hjl.tools at gmail dot com> ---
> [...]
> > Please do
> >
> > 1. Get users/hjl/lto-mixed/master branch and build/install it.
> > 2. Pass -v -save-temps -Wl,-plugin-save-temps to gcc
> > This should save all temporary files used by ld LTO.
> > 3. Create .gdbinit to set environment variables used by ld LTO with
> > output from "gcc -v".
> > 4. Capture the failed ld LTO command line option.
> > 5. Run ld under gdb to investigate why ld fails.
> 
> I did some digging so far.  That branch didn't include the culprit
> patch.  If I use gld build from it as is, the testcase works fine.
> Next, I've applied that patch and could reproduced the failure, so the
> reghunt was correct in identifying the culprit.
> 
> I find that this call to elf_i386_check_tls_transition returns FALSE
...
> It's pretty obvious that this *is* ___tls_get_addr, but h->tls_get_addr
> is 0 nontheless.
> 

What are the commend-line options passed to ld?
Comment 8 Rainer Orth 2018-01-19 17:36:59 UTC
> --- Comment #7 from H.J. Lu <hjl.tools at gmail dot com> ---
[...]
> What are the commend-line options passed to ld?

ld -plugin ./liblto_plugin.so -plugin-opt=./lto-wrapper \
        -plugin-opt=-fresolution=-plugin-save-temps.res \
        -flto \
        -o gcc-dg-lto-20090210-01.exe \
        /usr/lib/crt1.o \
        c_lto_20090210_0.o c_lto_20090210_1.o \
        -lc
Comment 9 H.J. Lu 2018-01-19 17:59:43 UTC
[hjl@gnu-6 pr22721]$ cat foo1.c 
extern __thread int foo_var;

int
main (void)
{
  return foo_var;
}
[hjl@gnu-6 pr22721]$ cat foo2.c 
__thread int foo_var = 2;
[hjl@gnu-6 pr22721]$ cat tls.S
	.text
	.globl __tls_get_addr
	.globl ___tls_get_addr
__tls_get_addr:
___tls_get_addr:
	ret
[hjl@gnu-6 pr22721]$ cat tls.v
SUNWprivate_1.1 {
global:
  __tls_get_addr;
  ___tls_get_addr;
local:
  *;
};
[hjl@gnu-6 pr22721]$ make
gcc -m32 -O2 -flto -fPIC   -c -o foo1.o foo1.c
gcc -m32    -c -o tls.o tls.S
./ld -m elf_i386 -shared -o libtls.so tls.o --version-script tls.v
gcc -m32 -O2 -flto -fPIC   -c -o foo2.o foo2.c
gcc -m32 -O2 -flto -o foo1 foo1.o libtls.so foo2.o
/usr/local/bin/ld: /tmp/ccmYn52R.ltrans0.ltrans.o: TLS transition from R_386_TLS_GD to R_386_TLS_IE_32 against `foo_var' at 0x15 in section `.text.startup' failed
/usr/local/bin/ld: warning: type and size of dynamic symbol `___tls_get_addr@@SUNWprivate_1.1' are not defined
/usr/local/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
make: *** [Makefile:26: foo1] Error 1
[hjl@gnu-6 pr22721]$
Comment 10 H.J. Lu 2018-01-19 18:58:33 UTC
Please try:

https://sourceware.org/ml/binutils/2018-01/msg00293.html
Comment 11 Rainer Orth 2018-01-19 19:27:51 UTC
> --- Comment #10 from H.J. Lu <hjl.tools at gmail dot com> ---
> Please try:
>
> https://sourceware.org/ml/binutils/2018-01/msg00293.html

As a quick test, I've just rebuilt binutils with that patch and ran the
failing LTO test and the various tls.exp in gcc.dg: no failures anymore.

Also, upon make check in ld, the new tests PASS as well.

Thanks.
        Rainer
Comment 12 Sourceware Commits 2018-01-20 22:31:24 UTC
The master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=8a1b824af786989f879ab1421a4279f60bba141a

commit 8a1b824af786989f879ab1421a4279f60bba141a
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sat Jan 20 14:25:24 2018 -0800

    x86: Check the versioned __tls_get_addr symbol
    
    We need to check the versioned __tls_get_addr symbol when looking up
    "__tls_get_addr".
    
    bfd/
    
    	PR ld/22721
    	* elfxx-x86.c (_bfd_x86_elf_link_check_relocs): Check the
    	versioned __tls_get_addr symbol.
    
    ld/
    
    	PR ld/22721
    	* testsuite/ld-plugin/lto.exp: Run PR ld/22721 tests.
    	* testsuite/ld-plugin/pr22721.t: New file.
    	* testsuite/ld-plugin/pr22721a.s: Likewise.
    	* testsuite/ld-plugin/pr22721b.c: Likewise.
Comment 13 Sourceware Commits 2018-01-20 22:35:47 UTC
The binutils-2_30-branch branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=0a99d34019855d0dafa171796826efb0ff3d18b7

commit 0a99d34019855d0dafa171796826efb0ff3d18b7
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sat Jan 20 14:25:24 2018 -0800

    x86: Check the versioned __tls_get_addr symbol
    
    We need to check the versioned __tls_get_addr symbol when looking up
    "__tls_get_addr".
    
    bfd/
    
    	PR ld/22721
    	* elfxx-x86.c (_bfd_x86_elf_link_check_relocs): Check the
    	versioned __tls_get_addr symbol.
    
    ld/
    
    	PR ld/22721
    	* testsuite/ld-plugin/lto.exp: Run PR ld/22721 tests.
    	* testsuite/ld-plugin/pr22721.t: New file.
    	* testsuite/ld-plugin/pr22721a.s: Likewise.
    	* testsuite/ld-plugin/pr22721b.c: Likewise.
    
    (cherry picked from commit 8a1b824af786989f879ab1421a4279f60bba141a)
Comment 14 H.J. Lu 2018-01-20 22:36:17 UTC
Fixed on master and binutils-2_30-branch.