Bug 17918 - corrupt dtv causes segfault whith multithreaded dlopen/dlclose of shared objects with tls
Summary: corrupt dtv causes segfault whith multithreaded dlopen/dlclose of shared obje...
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: 2.22
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-03 13:08 UTC by Szabolcs Nagy
Modified: 2015-12-04 12:48 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
test code / main (307 bytes, text/x-csrc)
2015-02-03 13:08 UTC, Szabolcs Nagy
Details
test code / module (97 bytes, text/x-csrc)
2015-02-03 13:09 UTC, Szabolcs Nagy
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Szabolcs Nagy 2015-02-03 13:08:15 UTC
Created attachment 8096 [details]
test code / main

when starting several threads which load and unload a module with tls data, accessing the tls data segfaults sometimes.

i can reproduce the segfault often with the attached code running it as

 cc -g -Wl,--rpath=. -o a a.c -ldl -lpthread
 cc -g -shared -fPIC -o mod.so mod.c
 for i in `seq 0 99`; do cp mod.so mod-$i.so; done
 ./a

i saw this with glibc-2.15, glibc-2.19 and latest git (glibc-2.20-578-gedac0a6) on x86_64 and aarch64

the backtrace and local vars are

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffeaffff700 (LWP 24803)]
__GI___libc_free (mem=0x29) at malloc.c:2930
2930  if (chunk_is_mmapped (p))                       /* release mmapped memory. */
(gdb) i local
ar_ptr = <optimised out>
p = 0x19
hook = 0
(gdb) bt
#0  __GI___libc_free (mem=0x29) at malloc.c:2930
#1  0x00007ffff7dedf75 in _dl_update_slotinfo (req_modid=36) at dl-tls.c:687
#2  0x00007ffff7dee091 in update_get_addr (ti=0x7fff0c3f3fc0) at dl-tls.c:801
#3  0x00007ffff7dee0df in __GI___tls_get_addr (ti=<optimised out>) at dl-tls.c:831
#4  0x00007fff0c1f36c2 in fun () at mod.c:7
#5  0x000000000040083f in start (a=0x4c) at a.c:21
#6  0x00007ffff79c4298 in start_thread (arg=0x7ffeaffff700) at pthread_create.c:333
#7  0x00007ffff770c9ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) i frame
Stack level 0, frame at 0x7ffeafffed98:
 rip = 0x7ffff76a35ef in __GI___libc_free (malloc.c:2930); saved rip 0x7ffff7dedf75
 called by frame at 0x7ffeafffee08
 source language c.
 Arglist at 0x7ffeafffed80, args: mem=0x29
 Locals at 0x7ffeafffed80, Previous frame's sp is 0x7ffeafffed98
 Saved registers:
  rip at 0x7ffeafffed90
(gdb) up 1
#1  0x00007ffff7dedf75 in _dl_update_slotinfo (req_modid=36) at dl-tls.c:687
687      free (dtv[total + cnt].pointer.val);
(gdb) i frame
Stack level 1, frame at 0x7ffeafffee08:
 rip = 0x7ffff7dedf75 in _dl_update_slotinfo (dl-tls.c:687); saved rip 0x7ffff7dee091
 called by frame at 0x7ffeafffee18, caller of frame at 0x7ffeafffed98
 source language c.
 Arglist at 0x7ffeafffed90, args: req_modid=36
 Locals at 0x7ffeafffed90, Previous frame's sp is 0x7ffeafffee08
 Saved registers:
  rbx at 0x7ffeafffedd0, rbp at 0x7ffeafffedd8, r12 at 0x7ffeafffede0, r13 at 0x7ffeafffede8, r14 at 0x7ffeafffedf0, r15 at 0x7ffeafffedf8, rip at 0x7ffeafffee00
(gdb) i local
gen = <optimised out>
map = 0x0
modid = <optimised out>
cnt = 42
total = 0
the_map = 0x7ffea0000910
dtv = 0x60aed0
idx = <optimised out>
listp = 0x7ffff7ff8300
__PRETTY_FUNCTION__ = "_dl_update_slotinfo"
(gdb) p *listp
$1 = {len = 64, next = 0x0, slotinfo = 0x7ffff7ff8310}
(gdb) p *listp->slotinfo@64
$2 = {{gen = 0, map = 0x7ffff7ff94f0}, {gen = 1, map = 0x7ffff7ff94f0}, {gen = 10, map = 0x0}, {gen = 3, map = 0x7fffec000910}, {gen = 4, map = 0x7fffe8000910}, {gen = 11, map = 0x0}, {gen = 9, map = 0x0}, {gen = 7, map = 0x7fffcc000910}, {gen = 8, map = 0x7fffc8000910}, {gen = 12, map = 0x7fffe4000910}, {gen = 13, 
    map = 0x7fff94000910}, {gen = 14, map = 0x7fffd0000910}, {gen = 17, map = 0x7fff70000910}, {gen = 18, map = 0x7fff5c000910}, {gen = 19, map = 0x7fff58000910}, {gen = 20, map = 0x7fff54000910}, {gen = 21, map = 0x7fff38000910}, {gen = 22, map = 0x7fff34000910}, {gen = 23, map = 0x7fff30000910}, {gen = 24, 
    map = 0x7fff10000910}, {gen = 25, map = 0x7fff08000910}, {gen = 26, map = 0x7ffef8000910}, {gen = 27, map = 0x7ffef4000910}, {gen = 28, map = 0x7ffee4000910}, {gen = 29, map = 0x7ffedc000910}, {gen = 30, map = 0x7ffed4000910}, {gen = 31, map = 0x7ffec8000910}, {gen = 32, map = 0x7ffebc000910}, {gen = 33, 
    map = 0x7ffea8000910}, {gen = 34, map = 0x7ffea4000910}, {gen = 35, map = 0x7ffe98000910}, {gen = 65, map = 0x0}, {gen = 37, map = 0x7ffe84000910}, {gen = 38, map = 0x7ffe80000910}, {gen = 66, map = 0x0}, {gen = 64, map = 0x7ffe94000910}, {gen = 67, map = 0x7ffea0000910}, {gen = 60, map = 0x0}, {gen = 62, 
    map = 0x0}, {gen = 51, map = 0x0}, {gen = 48, map = 0x0}, {gen = 50, map = 0x0}, {gen = 54, map = 0x0}, {gen = 55, map = 0x0}, {gen = 0, map = 0x0} <repeats 20 times>}
(gdb) p listp->slotinfo[cnt]
$3 = {gen = 54, map = 0x0}
(gdb) p listp->slotinfo[36]
$6 = {gen = 67, map = 0x7ffea0000910}
(gdb) p *dtv@64
$4 = {{counter = 31, pointer = {val = 0x1f, is_static = false}}, {counter = 140731851208320, pointer = {val = 0x7ffeaffff680, is_static = true}}, {counter = 0, pointer = {val = 0x0, is_static = false}}, {counter = 18446744073709551615, pointer = {val = 0xffffffffffffffff, is_static = false}}, {
    counter = 18446744073709551615, pointer = {val = 0xffffffffffffffff, is_static = false}}, {counter = 0, pointer = {val = 0x0, is_static = false}}, {counter = 0, pointer = {val = 0x0, is_static = false}}, {counter = 18446744073709551615, pointer = {val = 0xffffffffffffffff, 
      is_static = false}} <repeats 34 times>, {counter = 0, pointer = {val = 0x0, is_static = 193}}, {counter = 41, pointer = {val = 0x29, is_static = false}}, {counter = 34, pointer = {val = 0x22, is_static = false}}, {counter = 140731842815616, pointer = {val = 0x7ffeaf7fe680, is_static = true}}, {counter = 0, 
    pointer = {val = 0x0, is_static = false}}, {counter = 18446744073709551615, pointer = {val = 0xffffffffffffffff, is_static = false}}, {counter = 18446744073709551615, pointer = {val = 0xffffffffffffffff, is_static = false}}, {counter = 0, pointer = {val = 0x0, is_static = false}}, {counter = 0, pointer = {
      val = 0x0, is_static = false}}, {counter = 18446744073709551615, pointer = {val = 0xffffffffffffffff, is_static = false}} <repeats 14 times>}
(gdb) p dtv[cnt]
$5 = {counter = 41, pointer = {val = 0x29, is_static = false}}
Comment 1 Szabolcs Nagy 2015-02-03 13:09:20 UTC
Created attachment 8097 [details]
test code / module
Comment 2 Szabolcs Nagy 2015-02-03 13:19:50 UTC
the testcase is very similar to the glibc test nptl/tst-stack4 so

https://sourceware.org/ml/libc-alpha/2015-01/msg00531.html

might be related.

without dlclose i dont see dtv corruption
Comment 3 Szabolcs Nagy 2015-03-18 14:35:58 UTC
the patch in

https://sourceware.org/ml/libc-alpha/2015-03/msg00563.html

gets rid of the dtv corruption, but i still see data races in the code and failures on both x86_64 and aarch64:

Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= _rtld_local._dl_tls_generation' failed!

i think this happens because during pthread_create accesses GL(dl_tls_max_dtv_idx) (== cnt at the assert failure) and GL(dl_tls_generation) without holding the global rtld lock or using atomics.

both of those _rtld_local fileds are updated in dlopen and dlclose independently while holding a lock (i think dlopen first updates the max dtv idx when the module is mapped and if there is tls then the generation count is updated too so there is a window where the idx is already new but gen is outdated)

a simple fix is to just remove the assert (assuming the logic is otherwise sound) and only access the max dtv idx with atomic load during pthread_create

(it's hard to reproduce it on x86_64, easier on aarch64)
Comment 4 Szabolcs Nagy 2015-12-04 12:48:56 UTC
Fixed in
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f8aeae347377f3dfa8cbadde057adf1827fb1d44

the remaining issue is independent of the original
dtv corruption (and does not require dlclose) so
created bug 19329 for that.