on x86 tlsdesc can be lazily relocated, but that (1) requires locks for _dl_make_tlsdesc_dynamic (2) allocations in _dl_make_tlsdesc_dynamic can fail (3) calls _dl_try_allocate_static_tls without locks from these issues (1) and (2) applies to normal tls access too (bug 16133 and bug 16134) but (3) is specific to lazy tlsdesc relocations: there are a lot of accesses to global state that was not designed to be done without holding locks such as writing to GL(dl_tls_static_optional), GL(dl_tls_static_used) and map->l_tls_offset. on arm and aarch64 lazy tlsdesc reloc was removed earlier because of data races between accesses to the same tls variable, these are data races with concurrent thread creation and independent tls accesses.
It is hard to argue that this is the same problem but the same bit of code is to blame. The calculation for the TLSDESC relocation ends up crashing the dynamic linker when done lazily in some cases. The patch that addresses these race conditions by disabling lazy relocations also fixes this problem. In this particular case, the problem doesn't appear to be a data race, somehow some libraries are being compiled in such a way that they use the gnu2 TLS variant but they do not contain a PLT or GOT this causes the relocation calculation to go awry and execution crashes in the dynamic linker. Steps to Reproduce: 1. compile up auditor.so and dlmain.c 2. LD_AUDIT=/path/to/auditor.so /path/to/dlmain /path/to/libmemkind.so The patch that addresses this problem also addresses the problem I am describing. The key one needed to fix this problem is: https://sourceware.org/pipermail/libc-alpha/2021-February/122637.html It simply disables lazy binding of gnu2 TLSDESC relocations. Unfortunately, there is a minor bug in that patch which the author said that he will address with the second version of the patch set. https://sourceware.org/pipermail/libc-alpha/2021-April/124889.html It is fairly likely that a variant of this problem also exists in i386 but I have yet to see that in the wild. It therefore may be worth considering also adding patch 12 from Szabolcs Nagy as well when fixing this problem. https://sourceware.org/pipermail/libc-alpha/2021-February/122638.html Patches 13 and 14 are not strictly necessary to address this bug but they remove the code that has the bug in it. https://sourceware.org/pipermail/libc-alpha/2021-February/122639.html https://sourceware.org/pipermail/libc-alpha/2021-February/122640.html
Created attachment 13359 [details] simple reproducer of this problem
(In reply to Ben Woodard from comment #1) > somehow some libraries are being compiled in such a way that they use the > gnu2 TLS variant but they do not contain a PLT or GOT this causes the > relocation calculation to go awry and execution crashes in the dynamic > linker. that would be a linker bug as the DT_TLSDESC_PLT/GOT are needed for lazy tlsdesc to work, however.. libmemkind.so DYNAMIC section has 0x000000000000001e (FLAGS) BIND_NOW 0x000000006ffffffb (FLAGS_1) Flags: NOW i.e. it does not support lazy entry points. The bug in this case is in ldaudit: it disables bind now and tries to relocate everything lazily (so it can use plt hooks) that's wrong for tlsdesc, i opened bug 27721.
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8f7e09f4dbdb5c815a18b8285fbc5d5d7bc17d86 commit 8f7e09f4dbdb5c815a18b8285fbc5d5d7bc17d86 Author: Szabolcs Nagy <szabolcs.nagy@arm.com> Date: Thu Feb 11 11:29:23 2021 +0000 x86_64: Avoid lazy relocation of tlsdesc [BZ #27137] Lazy tlsdesc relocation is racy because the static tls optimization and tlsdesc management operations are done without holding the dlopen lock. This similar to the commit b7cf203b5c17dd6d9878537d41e0c7cc3d270a67 for aarch64, but it fixes a different race: bug 27137. Another issue is that ld auditing ignores DT_BIND_NOW and thus tries to relocate tlsdesc lazily, but that does not work in a BIND_NOW module due to missing DT_TLSDESC_PLT. Unconditionally relocating tlsdesc at load time fixes this bug 27721 too.
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ddcacd91cc10ff92d6201eda87047d029c14158d commit ddcacd91cc10ff92d6201eda87047d029c14158d Author: Szabolcs Nagy <szabolcs.nagy@arm.com> Date: Thu Feb 11 11:40:11 2021 +0000 i386: Avoid lazy relocation of tlsdesc [BZ #27137] Lazy tlsdesc relocation is racy because the static tls optimization and tlsdesc management operations are done without holding the dlopen lock. This similar to the commit b7cf203b5c17dd6d9878537d41e0c7cc3d270a67 for aarch64, but it fixes a different race: bug 27137. On i386 the code is a bit more complicated than on x86_64 because both rel and rela relocs are supported.
fixed for 2.34
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1387ad6225c2222f027790e3f460e31aa5dd2c54 commit 1387ad6225c2222f027790e3f460e31aa5dd2c54 Author: Szabolcs Nagy <szabolcs.nagy@arm.com> Date: Wed Dec 30 19:19:37 2020 +0000 elf: Fix data races in pthread_create and TLS access [BZ #19329] DTV setup at thread creation (_dl_allocate_tls_init) is changed to take the dlopen lock, GL(dl_load_lock). Avoiding data races here without locks would require design changes: the map that is accessed for static TLS initialization here may be concurrently freed by dlclose. That use after free may be solved by only locking around static TLS setup or by ensuring dlclose does not free modules with static TLS, however currently every link map with TLS has to be accessed at least to see if it needs static TLS. And even if that's solved, still a lot of atomics would be needed to synchronize DTV related globals without a lock. So fix both bug 19329 and bug 27111 with a lock that prevents DTV setup running concurrently with dlopen or dlclose. _dl_update_slotinfo at TLS access still does not use any locks so CONCURRENCY NOTES are added to explain the synchronization. The early exit from the slotinfo walk when max_modid is reached is not strictly necessary, but does not hurt either. An incorrect acquire load was removed from _dl_resize_dtv: it did not synchronize with any release store or fence and synchronization is now handled separately at thread creation and TLS access time. There are still a number of racy read accesses to globals that will be changed to relaxed MO atomics in a followup patch. This should not introduce regressions compared to existing behaviour and avoid cluttering the main part of the fix. Not all TLS access related data races got fixed here: there are additional races at lazy tlsdesc relocations see bug 27137. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(In reply to Ben Woodard from comment #2) > Created attachment 13359 [details] > simple reproducer of this problem Hi,how can I reproduce on i386? I try to reproduce it on x86_64 by building 32bits program. The steps are as follows: 1. add -m32 to make, like this: dlmain: dlmain.c gcc -O2 -m32 -g -o dlmain dlmain.c -ldl auditor.so: auditor.c gcc -O2 -m32 -g -fPIC -shared -o auditor.so auditor.c 2. make 3. make auditor.so 4. LD_AUDIT=/path/to/auditor.so /path/to/dlmain /path/to/libmemkind.so And I try to build(get) libmemkind.so(ELF 32-bit LSB shared object), but I can't success. I also try to use other shared library(libreadline.so), I can't reproduce too(on x86_64 can't reproduce too. I think any shared library can replace libmemkind.so to reproduce but not). Is it only produce on 32bits mathine? Or other problem? Thanks.