Created attachment 9624 [details] testcase When gcc's ThreadSanitizer is linked with gold it crashes all instrumented binaries because of calls to __tls_get_addr. However ThreadSanitizer uses the initial-exec tls model: __attribute__((tls_model("initial-exec"))) extern THREADLOCAL char cur_thread_placeholder[]; INLINE ThreadState *cur_thread() { return reinterpret_cast<ThreadState *>(&cur_thread_placeholder); } So no __tls_get_addr calls should be emitted. For the attached testcase: markus@x4 libsanitizer % gcc -fuse-ld=bfd -shared tsan_interceptors.o markus@x4 libsanitizer % objdump -dS ./a.out | grep __tls_get_addr@plt markus@x4 libsanitizer % gcc -fuse-ld=gold -shared tsan_interceptors.o markus@x4 libsanitizer % objdump -dS ./a.out | grep __tls_get_addr@plt | wc -l 779 Also see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78294
ld.bfd compiles access to the initial-exec tls variable: 6da8a: 64 48 8b 04 25 00 00 mov %fs:0x0,%rax 6da93: 48 03 05 fe 9e 26 00 add 0x269efe(%rip),%rax 6daa8: 83 80 98 02 02 00 01 addl $0x1,0x20298(%rax) which looks correct. while gold compiles it to: 6effa: 66 48 8d 3d 66 cf 06 data16 lea 0x6cf66(%rip),%rdi # dbf68 <_ZN6__tsan22cur_thread_placeholderE@@Base+0xdbf28> 6f001: 00 cur_thread()->ignore_interceptors++; 6f002: 66 66 48 e8 16 3d fb data16 data16 callq 22d20 <__tls_get_addr@plt> Markus, what version of gold do you have? Maybe it is old?
% ld -v GNU gold (Gentoo git 2.27.51.20161111) 1.12
Created attachment 9626 [details] new testcase Here's a better testcase: gcc -fuse-ld=bfd -shared tsan_rtl.o tsan_interceptors.o tsan_rtl_amd64.o vs. gcc -fuse-ld=gold -shared tsan_rtl.o tsan_interceptors.o tsan_rtl_amd64.o Look at the disassembly of function _ZN6__tsan10InitializeEPNS_11ThreadStateE. bfd: cur_thread()->ignore_interceptors++; 2d4a3: 48 03 05 ce 67 25 00 add 0x2567ce(%rip),%rax # 283c78 <.got+0x1780> gold: cur_thread()->ignore_interceptors++; 2ed42: 66 66 48 e8 36 e4 ff data16 data16 callq 2d180 <__tls_get_addr@plt>
Your testcase is not using the initial-exec model: e4a: 66 48 8d 3d 00 00 00 data16 lea 0x0(%rip),%rdi # e52 <_ZN6__tsan10InitializeEPNS_11ThreadStateE+0x32> e51: 00 e4e: R_X86_64_TLSGD _ZN6__tsan22cur_thread_placeholderE-0x4 e52: 66 66 48 e8 00 00 00 data16 data16 callq e5a <_ZN6__tsan10InitializeEPNS_11ThreadStateE+0x3a> e59: 00 e56: R_X86_64_PLT32 __tls_get_addr-0x4 Gold does not optimize any TLS references when building a shared library. In this case, where cur_thread_placeholder has hidden visibility, I think we should be able to optimize it down to IE, as ld.bfd does.
> Gold does not optimize any TLS references when building a shared library. > > In this case, where cur_thread_placeholder has hidden visibility, I think we > should be able to optimize it down to IE, as ld.bfd does. I take that back. We can't convert the GD model to IE unless we know that the shared library we're building will never be dlopen'ed, and there's no current way to tell the linker that. If you want the IE model, it needs to be compiled that way. Apparently, you were actually trying to do that, but the attribute didn't have the desired effect (perhaps because the inliner didn't propagate the tls model to the outer function), and you lucked out because ld.bfd went ahead and made the optimization anyway. I'm not sure why it did that, but I don't think it should have.
OK, fair enough. I will reopen the gcc bug.
CCing H.J. for the wrong ld.bfd optimization. H.J.: on the attached testcase function _ZN6__tsan10InitializeEPNS_11ThreadStateE from tsan_rtl.o which uses the general dynamic model, is relaxed by ld.bfd to initial exec in the shared library. Because the shared library could be dynamically loaded this is not allowed.
But Jakub wrote in the gcc bug: »As for the ld.bfd optimization that makes linking with ld.bfd work, that is an optimization if there are any IE relocations then other GD/LD relocations are turned into them too, because once you have any IE relocations, all of the TLS has to be static anyway, so the TLS GD/LD calls are just a waste of time.« So either this is a missed optimization for gold, or ld.bfd generates wrong code.
(In reply to Markus Trippelsdorf from comment #7) > CCing H.J. for the wrong ld.bfd optimization. > > H.J.: on the attached testcase function > _ZN6__tsan10InitializeEPNS_11ThreadStateE > from tsan_rtl.o which uses the general dynamic model, is relaxed by ld.bfd > to initial exec in the shared library. > Because the shared library could be dynamically loaded this is not allowed. ld.bfd doesn't convert LD to LE.
Gold generates: Symbol table '.dynsym' contains 1821 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000068d58 32 TLS LOCAL DEFAULT 16 _ZN11__sanitizerL4dtlsE 2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__ 3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND getpagesize@GLIBC_2.2.5 (2) 4: 0000000000021f30 0 FUNC GLOBAL DEFAULT 9 _init 5: 00000000000a07a8 0 FUNC GLOBAL DEFAULT 12 _fini 6: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab 7: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable 8: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize@GLIBC_2.2.5 (2) 9: 000000000007da00 69 FUNC GLOBAL DEFAULT 11 _ZN11__sanitizer11CheckFa 10: 0000000000024950 8 FUNC WEAK DEFAULT 11 __tsan_default_options 11: 00000000000252b0 6 FUNC WEAK DEFAULT 11 __tls_get_addr 12: 00000000000252b0 6 FUNC GLOBAL DEFAULT 11 __interceptor___tls_get_a 13: 0000000000000140 0x68c18 TLS LOCAL HIDDEN 16 _ZN6__tsan22cur_thread_pl ^^^^^^ This is wrong. [hjl@gnu-6 tsan]$ readelf -s .libs/libtsan.so.0 > /dev/null readelf: Warning: local symbol 13 found at index >= .dynsym's sh_info value of 2 [hjl@gnu-6 tsan]$
ld performs GD->IE optimization since __tsan::cur_thread_placeholder is accessed as IE in some place.
(In reply to Markus Trippelsdorf from comment #8) > But Jakub wrote in the gcc bug: > > »As for the ld.bfd optimization that makes linking with ld.bfd work, that is > an optimization if there are any IE relocations then other GD/LD relocations > are turned into them too, because once you have any IE relocations, all of > the TLS has to be static anyway, so the TLS GD/LD calls are just a waste of > time.« > > So either this is a missed optimization for gold, or ld.bfd generates wrong > code. Yes, we could take the presence of a R_X86_64_GOTTPOFF relocation as a signal that it's OK to optimize for the IE model. But to do it reliably, it would require us to pre-scan the relocations before making any optimization decisions. If we see a GD relocation that could have been optimized before we see any GOTTPOFF relocations, we wouldn't be able to optimize that relocation. I don't think that slowing the link down by adding another pass over the relocations is worth it.
ld.bfd doesn't convert LD to LE. https://airgunmaniac.com/