Created attachment 8096 [details] test code / main when starting several threads which load and unload a module with tls data, accessing the tls data segfaults sometimes. i can reproduce the segfault often with the attached code running it as cc -g -Wl,--rpath=. -o a a.c -ldl -lpthread cc -g -shared -fPIC -o mod.so mod.c for i in `seq 0 99`; do cp mod.so mod-$i.so; done ./a i saw this with glibc-2.15, glibc-2.19 and latest git (glibc-2.20-578-gedac0a6) on x86_64 and aarch64 the backtrace and local vars are Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffeaffff700 (LWP 24803)] __GI___libc_free (mem=0x29) at malloc.c:2930 2930 if (chunk_is_mmapped (p)) /* release mmapped memory. */ (gdb) i local ar_ptr = <optimised out> p = 0x19 hook = 0 (gdb) bt #0 __GI___libc_free (mem=0x29) at malloc.c:2930 #1 0x00007ffff7dedf75 in _dl_update_slotinfo (req_modid=36) at dl-tls.c:687 #2 0x00007ffff7dee091 in update_get_addr (ti=0x7fff0c3f3fc0) at dl-tls.c:801 #3 0x00007ffff7dee0df in __GI___tls_get_addr (ti=<optimised out>) at dl-tls.c:831 #4 0x00007fff0c1f36c2 in fun () at mod.c:7 #5 0x000000000040083f in start (a=0x4c) at a.c:21 #6 0x00007ffff79c4298 in start_thread (arg=0x7ffeaffff700) at pthread_create.c:333 #7 0x00007ffff770c9ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 (gdb) i frame Stack level 0, frame at 0x7ffeafffed98: rip = 0x7ffff76a35ef in __GI___libc_free (malloc.c:2930); saved rip 0x7ffff7dedf75 called by frame at 0x7ffeafffee08 source language c. Arglist at 0x7ffeafffed80, args: mem=0x29 Locals at 0x7ffeafffed80, Previous frame's sp is 0x7ffeafffed98 Saved registers: rip at 0x7ffeafffed90 (gdb) up 1 #1 0x00007ffff7dedf75 in _dl_update_slotinfo (req_modid=36) at dl-tls.c:687 687 free (dtv[total + cnt].pointer.val); (gdb) i frame Stack level 1, frame at 0x7ffeafffee08: rip = 0x7ffff7dedf75 in _dl_update_slotinfo (dl-tls.c:687); saved rip 0x7ffff7dee091 called by frame at 0x7ffeafffee18, caller of frame at 0x7ffeafffed98 source language c. Arglist at 0x7ffeafffed90, args: req_modid=36 Locals at 0x7ffeafffed90, Previous frame's sp is 0x7ffeafffee08 Saved registers: rbx at 0x7ffeafffedd0, rbp at 0x7ffeafffedd8, r12 at 0x7ffeafffede0, r13 at 0x7ffeafffede8, r14 at 0x7ffeafffedf0, r15 at 0x7ffeafffedf8, rip at 0x7ffeafffee00 (gdb) i local gen = <optimised out> map = 0x0 modid = <optimised out> cnt = 42 total = 0 the_map = 0x7ffea0000910 dtv = 0x60aed0 idx = <optimised out> listp = 0x7ffff7ff8300 __PRETTY_FUNCTION__ = "_dl_update_slotinfo" (gdb) p *listp $1 = {len = 64, next = 0x0, slotinfo = 0x7ffff7ff8310} (gdb) p *listp->slotinfo@64 $2 = {{gen = 0, map = 0x7ffff7ff94f0}, {gen = 1, map = 0x7ffff7ff94f0}, {gen = 10, map = 0x0}, {gen = 3, map = 0x7fffec000910}, {gen = 4, map = 0x7fffe8000910}, {gen = 11, map = 0x0}, {gen = 9, map = 0x0}, {gen = 7, map = 0x7fffcc000910}, {gen = 8, map = 0x7fffc8000910}, {gen = 12, map = 0x7fffe4000910}, {gen = 13, map = 0x7fff94000910}, {gen = 14, map = 0x7fffd0000910}, {gen = 17, map = 0x7fff70000910}, {gen = 18, map = 0x7fff5c000910}, {gen = 19, map = 0x7fff58000910}, {gen = 20, map = 0x7fff54000910}, {gen = 21, map = 0x7fff38000910}, {gen = 22, map = 0x7fff34000910}, {gen = 23, map = 0x7fff30000910}, {gen = 24, map = 0x7fff10000910}, {gen = 25, map = 0x7fff08000910}, {gen = 26, map = 0x7ffef8000910}, {gen = 27, map = 0x7ffef4000910}, {gen = 28, map = 0x7ffee4000910}, {gen = 29, map = 0x7ffedc000910}, {gen = 30, map = 0x7ffed4000910}, {gen = 31, map = 0x7ffec8000910}, {gen = 32, map = 0x7ffebc000910}, {gen = 33, map = 0x7ffea8000910}, {gen = 34, map = 0x7ffea4000910}, {gen = 35, map = 0x7ffe98000910}, {gen = 65, map = 0x0}, {gen = 37, map = 0x7ffe84000910}, {gen = 38, map = 0x7ffe80000910}, {gen = 66, map = 0x0}, {gen = 64, map = 0x7ffe94000910}, {gen = 67, map = 0x7ffea0000910}, {gen = 60, map = 0x0}, {gen = 62, map = 0x0}, {gen = 51, map = 0x0}, {gen = 48, map = 0x0}, {gen = 50, map = 0x0}, {gen = 54, map = 0x0}, {gen = 55, map = 0x0}, {gen = 0, map = 0x0} <repeats 20 times>} (gdb) p listp->slotinfo[cnt] $3 = {gen = 54, map = 0x0} (gdb) p listp->slotinfo[36] $6 = {gen = 67, map = 0x7ffea0000910} (gdb) p *dtv@64 $4 = {{counter = 31, pointer = {val = 0x1f, is_static = false}}, {counter = 140731851208320, pointer = {val = 0x7ffeaffff680, is_static = true}}, {counter = 0, pointer = {val = 0x0, is_static = false}}, {counter = 18446744073709551615, pointer = {val = 0xffffffffffffffff, is_static = false}}, { counter = 18446744073709551615, pointer = {val = 0xffffffffffffffff, is_static = false}}, {counter = 0, pointer = {val = 0x0, is_static = false}}, {counter = 0, pointer = {val = 0x0, is_static = false}}, {counter = 18446744073709551615, pointer = {val = 0xffffffffffffffff, is_static = false}} <repeats 34 times>, {counter = 0, pointer = {val = 0x0, is_static = 193}}, {counter = 41, pointer = {val = 0x29, is_static = false}}, {counter = 34, pointer = {val = 0x22, is_static = false}}, {counter = 140731842815616, pointer = {val = 0x7ffeaf7fe680, is_static = true}}, {counter = 0, pointer = {val = 0x0, is_static = false}}, {counter = 18446744073709551615, pointer = {val = 0xffffffffffffffff, is_static = false}}, {counter = 18446744073709551615, pointer = {val = 0xffffffffffffffff, is_static = false}}, {counter = 0, pointer = {val = 0x0, is_static = false}}, {counter = 0, pointer = { val = 0x0, is_static = false}}, {counter = 18446744073709551615, pointer = {val = 0xffffffffffffffff, is_static = false}} <repeats 14 times>} (gdb) p dtv[cnt] $5 = {counter = 41, pointer = {val = 0x29, is_static = false}}
Created attachment 8097 [details] test code / module
the testcase is very similar to the glibc test nptl/tst-stack4 so https://sourceware.org/ml/libc-alpha/2015-01/msg00531.html might be related. without dlclose i dont see dtv corruption
the patch in https://sourceware.org/ml/libc-alpha/2015-03/msg00563.html gets rid of the dtv corruption, but i still see data races in the code and failures on both x86_64 and aarch64: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= _rtld_local._dl_tls_generation' failed! i think this happens because during pthread_create accesses GL(dl_tls_max_dtv_idx) (== cnt at the assert failure) and GL(dl_tls_generation) without holding the global rtld lock or using atomics. both of those _rtld_local fileds are updated in dlopen and dlclose independently while holding a lock (i think dlopen first updates the max dtv idx when the module is mapped and if there is tls then the generation count is updated too so there is a window where the idx is already new but gen is outdated) a simple fix is to just remove the assert (assuming the logic is otherwise sound) and only access the max dtv idx with atomic load during pthread_create (it's hard to reproduce it on x86_64, easier on aarch64)
Fixed in https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f8aeae347377f3dfa8cbadde057adf1827fb1d44 the remaining issue is independent of the original dtv corruption (and does not require dlclose) so created bug 19329 for that.