Bug 20805 - gcc's ThreadSanitizer broken by gold
Summary: gcc's ThreadSanitizer broken by gold
Status: REOPENED
Alias: None
Product: binutils
Classification: Unclassified
Component: gold (show other bugs)
Version: 2.28
: P2 normal
Target Milestone: ---
Assignee: Cary Coutant
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-11 21:05 UTC by Markus Trippelsdorf
Modified: 2020-07-29 13:31 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2016-11-15 00:00:00


Attachments
testcase (426.41 KB, application/x-xz)
2016-11-11 21:05 UTC, Markus Trippelsdorf
Details
new testcase (572.34 KB, application/x-xz)
2016-11-12 09:20 UTC, Markus Trippelsdorf
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Markus Trippelsdorf 2016-11-11 21:05:08 UTC
Created attachment 9624 [details]
testcase

When gcc's ThreadSanitizer is linked with gold it crashes all instrumented
binaries because of calls to __tls_get_addr.

However ThreadSanitizer uses the initial-exec tls model:

__attribute__((tls_model("initial-exec")))                                                                                                                                         
extern THREADLOCAL char cur_thread_placeholder[];                                                                                                                                  
INLINE ThreadState *cur_thread() {                                                                                                                                                 
  return reinterpret_cast<ThreadState *>(&cur_thread_placeholder);                                                                                                                 
} 

So no __tls_get_addr calls should be emitted.

For the attached testcase:

markus@x4 libsanitizer % gcc -fuse-ld=bfd -shared tsan_interceptors.o
markus@x4 libsanitizer % objdump -dS ./a.out | grep __tls_get_addr@plt
markus@x4 libsanitizer % gcc -fuse-ld=gold -shared tsan_interceptors.o
markus@x4 libsanitizer % objdump -dS ./a.out | grep __tls_get_addr@plt | wc -l
779

Also see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78294
Comment 1 Dmitry Vyukov 2016-11-11 21:38:08 UTC
ld.bfd compiles access to the initial-exec tls variable:

   6da8a:       64 48 8b 04 25 00 00    mov    %fs:0x0,%rax
   6da93:       48 03 05 fe 9e 26 00    add    0x269efe(%rip),%rax
   6daa8:       83 80 98 02 02 00 01    addl   $0x1,0x20298(%rax)

which looks correct.

while gold compiles it to:

   6effa:       66 48 8d 3d 66 cf 06    data16 lea 0x6cf66(%rip),%rdi        # dbf68 <_ZN6__tsan22cur_thread_placeholderE@@Base+0xdbf28>                                           
   6f001:       00                                                                                                                                                                 
    cur_thread()->ignore_interceptors++;                                                                                                                                           
   6f002:       66 66 48 e8 16 3d fb    data16 data16 callq 22d20 <__tls_get_addr@plt>              


Markus, what version of gold do you have? Maybe it is old?
Comment 2 Markus Trippelsdorf 2016-11-11 21:43:13 UTC
 % ld -v
GNU gold (Gentoo git 2.27.51.20161111) 1.12
Comment 3 Markus Trippelsdorf 2016-11-12 09:20:56 UTC
Created attachment 9626 [details]
new testcase

Here's a better testcase:

gcc -fuse-ld=bfd -shared tsan_rtl.o tsan_interceptors.o tsan_rtl_amd64.o
vs.
gcc -fuse-ld=gold -shared tsan_rtl.o tsan_interceptors.o tsan_rtl_amd64.o

Look at the disassembly of function _ZN6__tsan10InitializeEPNS_11ThreadStateE.

bfd: 
    cur_thread()->ignore_interceptors++;                                                                                                                                           
   2d4a3:       48 03 05 ce 67 25 00    add    0x2567ce(%rip),%rax        # 283c78 <.got+0x1780> 

gold:
    cur_thread()->ignore_interceptors++;                                                                                                                                           
   2ed42:       66 66 48 e8 36 e4 ff    data16 data16 callq 2d180 <__tls_get_addr@plt>
Comment 4 Cary Coutant 2016-11-15 16:31:44 UTC
Your testcase is not using the initial-exec model:

     e4a:       66 48 8d 3d 00 00 00    data16 lea 0x0(%rip),%rdi        # e52 <_ZN6__tsan10InitializeEPNS_11ThreadStateE+0x32>
     e51:       00 
                        e4e: R_X86_64_TLSGD     _ZN6__tsan22cur_thread_placeholderE-0x4
     e52:       66 66 48 e8 00 00 00    data16 data16 callq e5a <_ZN6__tsan10InitializeEPNS_11ThreadStateE+0x3a>
     e59:       00 
                        e56: R_X86_64_PLT32     __tls_get_addr-0x4

Gold does not optimize any TLS references when building a shared library.

In this case, where cur_thread_placeholder has hidden visibility, I think we should be able to optimize it down to IE, as ld.bfd does.
Comment 5 Cary Coutant 2016-11-15 17:39:58 UTC
> Gold does not optimize any TLS references when building a shared library.
> 
> In this case, where cur_thread_placeholder has hidden visibility, I think we
> should be able to optimize it down to IE, as ld.bfd does.

I take that back. We can't convert the GD model to IE unless we know that the shared library we're building will never be dlopen'ed, and there's no current way to tell the linker that.

If you want the IE model, it needs to be compiled that way. Apparently, you were actually trying to do that, but the attribute didn't have the desired effect (perhaps because the inliner didn't propagate the tls model to the outer function), and you lucked out because ld.bfd went ahead and made the optimization anyway. I'm not sure why it did that, but I don't think it should have.
Comment 6 Markus Trippelsdorf 2016-11-15 17:43:25 UTC
OK, fair enough. I will reopen the gcc bug.
Comment 7 Markus Trippelsdorf 2016-11-16 07:45:22 UTC
CCing H.J. for the wrong ld.bfd optimization.

H.J.: on the attached testcase function _ZN6__tsan10InitializeEPNS_11ThreadStateE
from tsan_rtl.o which uses the general dynamic model, is relaxed by ld.bfd to initial exec in the shared library.
Because the shared library could be dynamically loaded this is not allowed.
Comment 8 Markus Trippelsdorf 2016-11-16 07:56:53 UTC
But Jakub wrote in the gcc bug:

»As for the ld.bfd optimization that makes linking with ld.bfd work, that is an optimization if there are any IE relocations then other GD/LD relocations are turned into them too, because once you have any IE relocations, all of the TLS has to be static anyway, so the TLS GD/LD calls are just a waste of time.«

So either this is a missed optimization for gold, or ld.bfd generates wrong code.
Comment 9 H.J. Lu 2016-11-16 16:32:24 UTC
(In reply to Markus Trippelsdorf from comment #7)
> CCing H.J. for the wrong ld.bfd optimization.
> 
> H.J.: on the attached testcase function
> _ZN6__tsan10InitializeEPNS_11ThreadStateE
> from tsan_rtl.o which uses the general dynamic model, is relaxed by ld.bfd
> to initial exec in the shared library.
> Because the shared library could be dynamically loaded this is not allowed.

ld.bfd doesn't convert LD to LE.
Comment 10 H.J. Lu 2016-11-16 19:07:19 UTC
Gold generates:

Symbol table '.dynsym' contains 1821 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000068d58    32 TLS     LOCAL  DEFAULT   16 _ZN11__sanitizerL4dtlsE
     2: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND getpagesize@GLIBC_2.2.5 (2)
     4: 0000000000021f30     0 FUNC    GLOBAL DEFAULT    9 _init
     5: 00000000000a07a8     0 FUNC    GLOBAL DEFAULT   12 _fini
     6: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab
     7: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable
     8: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@GLIBC_2.2.5 (2)
     9: 000000000007da00    69 FUNC    GLOBAL DEFAULT   11 _ZN11__sanitizer11CheckFa
    10: 0000000000024950     8 FUNC    WEAK   DEFAULT   11 __tsan_default_options
    11: 00000000000252b0     6 FUNC    WEAK   DEFAULT   11 __tls_get_addr
    12: 00000000000252b0     6 FUNC    GLOBAL DEFAULT   11 __interceptor___tls_get_a
    13: 0000000000000140 0x68c18 TLS     LOCAL  HIDDEN    16 _ZN6__tsan22cur_thread_pl
                                         ^^^^^^  This is wrong.

[hjl@gnu-6 tsan]$ readelf -s .libs/libtsan.so.0  > /dev/null
readelf: Warning: local symbol 13 found at index >= .dynsym's sh_info value of 2
[hjl@gnu-6 tsan]$
Comment 11 H.J. Lu 2016-11-16 19:37:45 UTC
ld performs GD->IE optimization since __tsan::cur_thread_placeholder
is accessed as IE in some place.
Comment 12 Cary Coutant 2016-11-21 18:26:19 UTC
(In reply to Markus Trippelsdorf from comment #8)
> But Jakub wrote in the gcc bug:
> 
> »As for the ld.bfd optimization that makes linking with ld.bfd work, that is
> an optimization if there are any IE relocations then other GD/LD relocations
> are turned into them too, because once you have any IE relocations, all of
> the TLS has to be static anyway, so the TLS GD/LD calls are just a waste of
> time.«
> 
> So either this is a missed optimization for gold, or ld.bfd generates wrong
> code.

Yes, we could take the presence of a R_X86_64_GOTTPOFF relocation as a signal that it's OK to optimize for the IE model. But to do it reliably, it would require us to pre-scan the relocations before making any optimization decisions. If we see a GD relocation that could have been optimized before we see any GOTTPOFF relocations, we wouldn't be able to optimize that relocation.

I don't think that slowing the link down by adding another pass over the relocations is worth it.
Comment 13 Blair Sansford 2020-07-29 13:31:05 UTC
ld.bfd doesn't convert LD to LE.
https://airgunmaniac.com/