The test machine is an x86_64 machine running the 3.10 kernel from Red Hat Enterprise Linux 7, two Xeon E5-2450 CPUs with 8 cores each, for a total number of 32 hardware threads. I can reproduce this with 2.23 master as of today, but I have trouble getting proper backtraces because I cannot get GDB to use the right libthread_db, so I am not completely certain the observation below applies to glibc master. With the glibc 2.17 in Red Hat Enterprise Linux 7, this pthread_mutex_lock call blocks indefinitely: 232 for (int n = 0; n < N; ++n) 233 { 234 int e = pthread_mutex_lock (&map[n]); 235 if (e != 0 && e != EOWNERDEAD) Sometimes, this happens on the first round, sometimes on subsequent rounds. (gdb) print/x map[n] $4 = {__data = {__lock = 0x80000000, __count = 0x1, __owner = 0x0, __nusers = 0x0, __kind = 0x90, __spins = 0x0, __list = {__prev = 0x0, __next = 0x0}}, __size = {0x0, 0x0, 0x0, 0x80, 0x1, 0x0 <repeats 11 times>, 0x90, 0x0 <repeats 23 times>}, __align = 0x180000000} (gdb) print/x map[n-1] $5 = {__data = {__lock = 0x800052b0, __count = 0x1, __owner = 0x7fffffff, __nusers = 0x1, __kind = 0x90, __spins = 0x0, __list = { __prev = 0x7ffff7feca20, __next = 0x7ffff7ff8480}}, __size = {0xb0, 0x52, 0x0, 0x80, 0x1, 0x0, 0x0, 0x0, 0xff, 0xff, 0xff, 0x7f, 0x1, 0x0, 0x0, 0x0, 0x90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x20, 0xca, 0xfe, 0xf7, 0xff, 0x7f, 0x0, 0x0, 0x80, 0x84, 0xff, 0xf7, 0xff, 0x7f, 0x0, 0x0}, __align = 0x1800052b0} (gdb) print/x map[n+1] $7 = {__data = {__lock = 0x40000000, __count = 0x1, __owner = 0x52b3, __nusers = 0x1, __kind = 0x90, __spins = 0x0, __list = { __prev = 0x7ffff7ff8200, __next = 0x7ffff7ff8138}}, __size = {0x0, 0x0, 0x0, 0x40, 0x1, 0x0, 0x0, 0x0, 0xb3, 0x52, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x82, 0xff, 0xf7, 0xff, 0x7f, 0x0, 0x0, 0x38, 0x81, 0xff, 0xf7, 0xff, 0x7f, 0x0, 0x0}, __align = 0x140000000} (gdb) print/x map[n+2] $8 = {__data = {__lock = 0xc0000000, __count = 0x1, __owner = 0x52ba, __nusers = 0x1, __kind = 0x90, __spins = 0x0, __list = { __prev = 0x7ffff7ff8d18, __next = 0x7ffff7ff8250}}, __size = {0x0, 0x0, 0x0, 0xc0, 0x1, 0x0, 0x0, 0x0, 0xba, 0x52, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x18, 0x8d, 0xff, 0xf7, 0xff, 0x7f, 0x0, 0x0, 0x50, 0x82, 0xff, 0xf7, 0xff, 0x7f, 0x0, 0x0}, __align = 0x1c0000000} I don't know enough about futexes to debug this further.
Full backtrace: #1 0x00007ffff7bc4d7c in _L_robust_lock_205 () at pthread_mutex_lock.c:478 No symbol table info available. #2 0x00007ffff7bc4a05 in __pthread_mutex_lock_full (mutex=0x7ffff7ff84b0) at pthread_mutex_lock.c:225 result = <optimized out> oldval = <optimized out> id = 21168 #3 0x00007ffff7bc8c1a in __GI___pthread_mutex_lock (mutex=<optimized out>) at pthread_mutex_lock.c:55 type = <optimized out> id = <optimized out> #4 0x0000000000402349 in do_test () at tst-robust8.c:234 e = <optimized out> n = 30 p = 21169 status = 9 round = 1 ma = {__size = "\000\000\000\300", __align = -1073741824} #5 0x0000000000401a17 in main (argc=<optimized out>, argv=<optimized out>) at test-skeleton.c:412 direct = <optimized out> status = 0 opt = <optimized out> timeoutfactor = 1 termpid = <optimized out> envstr_timeoutfactor = <optimized out> envstr_direct = <optimized out>
I could reproduce this on a much smaller machine (with once i7-4810MQ CPU), with glibc-2.17-78.el7.x86_64. The hang looks identical.
Created attachment 8924 [details] robust_test.stp I believe I have a test case from an user that makes it easier to debug this condition. Attached is my WIP stap script to debug it. It traces the test case pretty much down to schedule calls in the kernel side, but still may need some extra tweaks in the user level trace (but should give enough hints to someone familiar enough with the glibc code).
Created attachment 8925 [details] robust_test2.c While it is very easy to reproduce in the command line, if using stap it may take quite some time to trigger the (apparent) race. Sample systemtap script output when a dead lock happens: [29352] > syscall.wait4(upid=0x7562 stat_addr=0x7fff9f59152c options=0x0 ru=0x0) [29352] > schedule() [30050] signal.send SIGCHLD to 29352 [30050] > schedule() [29352] < schedule() [29352] < syscall.wait4() return=0x7562 [29352] > pthread_mutex_unlock(mutex=0x7fbf4b3ae000) [29352] > pthread_mutex_unlock(decr=0x1 mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_unlock_full(mutex=0x7fbf4b3ae000 decr=0x1) [29352] < 0x400fed() return=0x0 [29352] < 0x400fed() return=0x0 [30051] > syscall.set_robust_list(head=0x7fbf4b38aa20 len=0x18) [30051] < syscall.set_robust_list() return=0x0 [29352] signal.send SIGTERM to 30051 [30051] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [30051] do_exit(code=0xf) [29352] > __pthread_mutex_lock_full(mutex=0x7fbf4b3ae000) [29352] < 0x400f98() return=0x0 [29352] < 0x400f98() return=0x0 [29352] > syscall.wait4(upid=0x7563 stat_addr=0x7fff9f59152c options=0x0 ru=0x0) [29352] > schedule() [30051] signal.send SIGCHLD to 29352 [30051] > schedule() [29352] < schedule() [29352] < syscall.wait4() return=0x7563 [29352] > pthread_mutex_unlock(mutex=0x7fbf4b3ae000) [29352] > pthread_mutex_unlock(decr=0x1 mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_unlock_full(mutex=0x7fbf4b3ae000 decr=0x1) [29352] < 0x400fed() return=0x0 [29352] < 0x400fed() return=0x0 [30052] > syscall.set_robust_list(head=0x7fbf4b38aa20 len=0x18) [30052] < syscall.set_robust_list() return=0x0 [29352] signal.send SIGTERM to 30052 [30052] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [30052] do_exit(code=0xf) [29352] > __pthread_mutex_lock_full(mutex=0x7fbf4b3ae000) [29352] < 0x400f98() return=0x0 [29352] < 0x400f98() return=0x0 [29352] > syscall.wait4(upid=0x7564 stat_addr=0x7fff9f59152c options=0x0 ru=0x0) [29352] > schedule() [30052] signal.send SIGCHLD to 29352 [30052] > schedule() [29352] < schedule() [29352] < syscall.wait4() return=0x7564 [29352] > pthread_mutex_unlock(mutex=0x7fbf4b3ae000) [29352] > pthread_mutex_unlock(decr=0x1 mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_unlock_full(mutex=0x7fbf4b3ae000 decr=0x1) [29352] < 0x400fed() return=0x0 [29352] < 0x400fed() return=0x0 [30053] > syscall.set_robust_list(head=0x7fbf4b38aa20 len=0x18) [30053] < syscall.set_robust_list() return=0x0 [29352] signal.send SIGTERM to 30053 [29352] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [30053] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_lock_full(mutex=0x7fbf4b3ae000) [29352] < 0x400f98() return=0x0 [29352] < 0x400f98() return=0x0 [30053] do_exit(code=0xf) [29352] > syscall.wait4(upid=0x7565 stat_addr=0x7fff9f59152c options=0x0 ru=0x0) [29352] > schedule() [30053] signal.send SIGCHLD to 29352 [30053] > schedule() [29352] < schedule() [29352] < syscall.wait4() return=0x7565 [29352] > pthread_mutex_unlock(mutex=0x7fbf4b3ae000) [29352] > pthread_mutex_unlock(decr=0x1 mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_unlock_full(mutex=0x7fbf4b3ae000 decr=0x1) [29352] < 0x400fed() return=0x0 [30054] > syscall.set_robust_list(head=0x7fbf4b38aa20 len=0x18) [30054] < syscall.set_robust_list() return=0x0 [29352] < 0x400fed() return=0x0 [30054] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [29352] signal.send SIGTERM to 30054 [30054] do_exit(code=0xf) [29352] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_lock_full(mutex=0x7fbf4b3ae000) [29352] < 0x400f98() return=0x0 [29352] < 0x400f98() return=0x0 [29352] > syscall.wait4(upid=0x7566 stat_addr=0x7fff9f59152c options=0x0 ru=0x0) [29352] > schedule() [30054] signal.send SIGCHLD to 29352 [30054] > schedule() [29352] < schedule() [29352] < syscall.wait4() return=0x7566 [29352] > pthread_mutex_unlock(mutex=0x7fbf4b3ae000) [29352] > pthread_mutex_unlock(decr=0x1 mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_unlock_full(mutex=0x7fbf4b3ae000 decr=0x1) [29352] < 0x400fed() return=0x0 [29352] < 0x400fed() return=0x0 [30055] > syscall.set_robust_list(head=0x7fbf4b38aa20 len=0x18) [30055] < syscall.set_robust_list() return=0x0 [30055] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [29352] signal.send SIGTERM to 30055 [30055] do_exit(code=0xf) [29352] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_lock_full(mutex=0x7fbf4b3ae000) [29352] < 0x400f98() return=0x0 [29352] < 0x400f98() return=0x0 [29352] > syscall.wait4(upid=0x7567 stat_addr=0x7fff9f59152c options=0x0 ru=0x0) [29352] > schedule() [30055] signal.send SIGCHLD to 29352 [30055] > schedule() [29352] < schedule() [29352] < syscall.wait4() return=0x7567 [29352] > pthread_mutex_unlock(mutex=0x7fbf4b3ae000) [29352] > pthread_mutex_unlock(decr=0x1 mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_unlock_full(mutex=0x7fbf4b3ae000 decr=0x1) [30056] > syscall.set_robust_list(head=0x7fbf4b38aa20 len=0x18) [30056] < syscall.set_robust_list() return=0x0 [29352] < 0x400fed() return=0x0 [29352] < 0x400fed() return=0x0 [30056] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [29352] signal.send SIGTERM to 30056 [30056] do_exit(code=0xf) [29352] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_lock_full(mutex=0x7fbf4b3ae000) [29352] < 0x400f98() return=0x0 [29352] < 0x400f98() return=0x0 [29352] > syscall.wait4(upid=0x7568 stat_addr=0x7fff9f59152c options=0x0 ru=0x0) [29352] > schedule() [30056] signal.send SIGCHLD to 29352 [30056] > schedule() [29352] < schedule() [29352] < syscall.wait4() return=0x7568 [29352] > pthread_mutex_unlock(mutex=0x7fbf4b3ae000) [29352] > pthread_mutex_unlock(decr=0x1 mutex=0x7fbf4b3ae000) [30057] > syscall.set_robust_list(head=0x7fbf4b38aa20 len=0x18) [30057] < syscall.set_robust_list() return=0x0 [29352] > __pthread_mutex_unlock_full(mutex=0x7fbf4b3ae000 decr=0x1) [29352] < 0x400fed() return=0x0 [29352] < 0x400fed() return=0x0 [30057] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [29352] signal.send SIGTERM to 30057 [30057] do_exit(code=0xf) [29352] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_lock_full(mutex=0x7fbf4b3ae000) [29352] < 0x400f98() return=0x0 [29352] < 0x400f98() return=0x0 [29352] > syscall.wait4(upid=0x7569 stat_addr=0x7fff9f59152c options=0x0 ru=0x0) [29352] > schedule() [30057] signal.send SIGCHLD to 29352 [30057] > schedule() [29352] < schedule() [29352] < syscall.wait4() return=0x7569 [29352] > pthread_mutex_unlock(mutex=0x7fbf4b3ae000) [29352] > pthread_mutex_unlock(decr=0x1 mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_unlock_full(mutex=0x7fbf4b3ae000 decr=0x1) [29352] < 0x400fed() return=0x0 [29352] < 0x400fed() return=0x0 [30058] > syscall.set_robust_list(head=0x7fbf4b38aa20 len=0x18) [30058] < syscall.set_robust_list() return=0x0 [30058] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [29352] signal.send SIGTERM to 30058 [30058] do_exit(code=0xf) [29352] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_lock_full(mutex=0x7fbf4b3ae000) [29352] < 0x400f98() return=0x0 [29352] < 0x400f98() return=0x0 [29352] > syscall.wait4(upid=0x756a stat_addr=0x7fff9f59152c options=0x0 ru=0x0) [29352] > schedule() [30058] signal.send SIGCHLD to 29352 [30058] > schedule() [29352] < schedule() [29352] < syscall.wait4() return=0x756a [30059] > syscall.set_robust_list(head=0x7fbf4b38aa20 len=0x18) [30059] < syscall.set_robust_list() return=0x0 [29352] > schedule() [30059] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [30059] > __pthread_mutex_lock_full(mutex=0x7fbf4b3ae000) [30059] > syscall.futex(uaddr=0x7fbf4b3ae000 op=0x0 val=0x800072a8 utime=0x0 uaddr2=0x80 val3=0x0) [30059] > futex_wait(uaddr=0x7fbf4b3ae000 flags=0x1 val=0x800072a8 abs_time=0x0 bitset=0xffffffff) [30059] > futex_wait_setup(uaddr=0x7fbf4b3ae000 val=0x800072a8 flags=0x1 q=0xffff88028616bdc8 hb=0xffff88028616bd58) [30059] < futex_wait() return=0x0 [30059] > futex_wait_queue_me(hb=0xffffc9000189c840 q=0xffff88028616bdc8 timeout=0x0) [30059] > schedule() Done 700 iterations [29352] > schedule() [29352] > pthread_mutex_unlock(mutex=0x7fbf4b3ae000) [29352] > pthread_mutex_unlock(decr=0x1 mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_unlock_full(mutex=0x7fbf4b3ae000 decr=0x1) [29352] > syscall.futex(uaddr=0x7fbf4b3ae000 op=0x1 val=0x1 utime=0x0 uaddr2=0x7fbf4b3ae000 val3=0x8) [29352] < syscall.futex() return=0x1 [30059] < schedule() [29352] < 0x400fed() return=0x0 [29352] < 0x400fed() return=0x0 [30059] < futex_wait() [30059] < do_futex() return=0x0 [30059] < syscall.futex() return=0x0 [29352] signal.send SIGTERM to 30059 [30059] do_exit(code=0xf) [29352] > __pthread_mutex_lock(mutex=0x7fbf4b3ae000) [29352] > __pthread_mutex_lock_full(mutex=0x7fbf4b3ae000) [29352] > syscall.futex(uaddr=0x7fbf4b3ae000 op=0x0 val=0x8000756b utime=0x0 uaddr2=0x80 val3=0x0) [29352] > futex_wait(uaddr=0x7fbf4b3ae000 flags=0x1 val=0x8000756b abs_time=0x0 bitset=0xffffffff) [29352] > futex_wait_setup(uaddr=0x7fbf4b3ae000 val=0x8000756b flags=0x1 q=0xffff88027e8cbdc8 hb=0xffff88027e8cbd58) [29352] < futex_wait() return=0x0 [29352] > futex_wait_queue_me(hb=0xffffc9000189c840 q=0xffff88027e8cbdc8 timeout=0x0) [29352] > schedule() [30059] signal.send SIGCHLD to 29352 [30059] > schedule() and it is dead locked at the end. 29352 is the main process and 30059 is the last forked process. (Note that the test case only forks one child process at a time, and tries to "race" with it to get the mutex lock).
Adding extra kernel probes to handle_futex_death, mm_release and exit_robust_list, it appears the robust list is in an inconsistent state in exit_robust_list(), passing an invalid "user" pointer to handle_futex_death(). This appears to happen when "pending" is not null in exit_robust_list. Extra sample log, of the previous working process, and the one locking: [1648] > syscall.set_robust_list(head=0x7ff55806fa20 len=0x18) [1648] < syscall.set_robust_list() return=0x0 [755] signal.send SIGTERM to 1648 [1648] > __pthread_mutex_lock(mutex=0x7ff558093000) [755] > __pthread_mutex_lock(mutex=0x7ff558093000) [1648] do_exit(code=0xf) [755] > __pthread_mutex_lock_full(mutex=0x7ff558093000) [1648] exit_mm(tsk=0xffff8800d252e780) [755] < 0x400f98() return=0x0 [1648] > mm_release(tsk=0xffff8800d252e780 mm=0xffff88019a64e400) [755] < 0x400f98() return=0x0 [755] > syscall.wait4(upid=0x670 stat_addr=0x7ffdb7e2e80c options=0x0 ru=0x0) [1648] > exit_robust_list(curr=0xffff8800d252e780) head = 0x7ff55806fa20 entry = 0x7ff558093020 futex_offset = -32 pending = 0x0 [755] > schedule() [1648] > handle_futex_death(uaddr=0x7ff558093000 curr=0xffff8800d252e780 pi=0x0) [1648] < handle_futex_death() return=0x0 [1648] exit_robust_list locals head=? entry=? next_entry=? pending=? limit=? pi=? pip=0x0 next_pi=? futex_offset=? rc=? RIP: ffffffff81078553 RSP: ffff880118a63cd0 EFLAGS: 00000282 RAX: 0000000000000000 RBX: ffff88019a64e400 RCX: ffff880118a63fd8 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800d252e780 RBP: ffff880118a63ce0 R08: 0000000000000000 R09: 000000018020001f R10: ffffea000c1e6400 R11: 0000000000000004 R12: ffff8800d252e780 R13: ffff8800d252e780 R14: ffff88011df60100 R15: ffff8800d252e780 FS: 00007ff55806f740(0000) GS:ffff88031e280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff557955290 CR3: 00000002881d4000 CR4: 00000000001407e0 [1648] < exit_robust_list() [1648] < mm_release() [1648] signal.send SIGCHLD to 755 [1648] > schedule() [755] < schedule() [755] < syscall.wait4() return=0x670 [755] > schedule() [1649] > syscall.set_robust_list(head=0x7ff55806fa20 len=0x18) [1649] < syscall.set_robust_list() return=0x0 [1649] > __pthread_mutex_lock(mutex=0x7ff558093000) [1649] > __pthread_mutex_lock_full(mutex=0x7ff558093000) [1649] > syscall.futex(uaddr=0x7ff558093000 op=0x0 val=0x800002f3 utime=0x0 uaddr2=0x80 val3=0x0) [1649] > futex_wait(uaddr=0x7ff558093000 flags=0x1 val=0x800002f3 abs_time=0x0 bitset=0xffffffff) [1649] > futex_wait_setup(uaddr=0x7ff558093000 val=0x800002f3 flags=0x1 q=0xffff8801a2b0bdc8 hb=0xffff8801a2b0bd58) [1649] < futex_wait() return=0x0 [1649] > futex_wait_queue_me(hb=0xffffc9000189fb00 q=0xffff8801a2b0bdc8 timeout=0x0) [1649] > schedule() [755] > pthread_mutex_unlock(mutex=0x7ff558093000) [755] > pthread_mutex_unlock(decr=0x1 mutex=0x7ff558093000) [755] > __pthread_mutex_unlock_full(mutex=0x7ff558093000 decr=0x1) [755] > syscall.futex(uaddr=0x7ff558093000 op=0x1 val=0x1 utime=0x0 uaddr2=0x7ff558093000 val3=0x8) [755] < syscall.futex() return=0x1 [1649] < schedule() [755] < 0x400fed() return=0x0 [755] < 0x400fed() return=0x0 [1649] < futex_wait() [1649] < do_futex() return=0x0 [1649] < syscall.futex() return=0x0 [755] signal.send SIGTERM to 1649 [1649] do_exit(code=0xf) [755] > __pthread_mutex_lock(mutex=0x7ff558093000) [755] > __pthread_mutex_lock_full(mutex=0x7ff558093000) [1649] exit_mm(tsk=0xffff8800d252a280) [755] > syscall.futex(uaddr=0x7ff558093000 op=0x0 val=0x80000671 utime=0x0 uaddr2=0x80 val3=0x0) [1649] > mm_release(tsk=0xffff8800d252a280 mm=0xffff88019a649f40) [755] > futex_wait(uaddr=0x7ff558093000 flags=0x1 val=0x80000671 abs_time=0x0 bitset=0xffffffff) [1649] > exit_robust_list(curr=0xffff8800d252a280) head = 0x7ff55806fa20 entry = 0x7ff558093020 futex_offset = -32 pending = 0x7ff558093020 [755] > futex_wait_setup(uaddr=0x7ff558093000 val=0x80000671 flags=0x1 q=0xffff880115593dc8 hb=0xffff880115593d58) [755] < futex_wait() return=0x0 [1649] > handle_futex_death(uaddr=0xffffffffffffffe0 curr=0xffff8800d252a280 pi=0x0) [755] > futex_wait_queue_me(hb=0xffffc9000189fb00 q=0xffff880115593dc8 timeout=0x0) [1649] < handle_futex_death() return=0xffffffffffffffff [755] > schedule() [1649] exit_robust_list locals head=? entry=? next_entry=? pending=? limit=? pi=? pip=0x0 next_pi=? futex_offset=? rc=? RIP: ffffffff81078553 RSP: ffff8801a2b0bcd0 EFLAGS: 00000282 RAX: 00000000ffffffff RBX: ffff88019a649f40 RCX: ffff8801a2b0bfd8 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffe0 RBP: ffff8801a2b0bce0 R08: 0000000000000000 R09: 000000018020001e R10: ffffea000c1e6400 R11: 0000000000000004 R12: ffff8800d252a280 R13: ffff8800d252a280 R14: ffff88011df60100 R15: ffff8800d252a280 FS: 00007ff55806f740(0000) GS:ffff88031e280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000002a337b000 CR4: 00000000001407e0 [1649] < exit_robust_list() [1649] < mm_release() [1649] signal.send SIGCHLD to 755 [1649] > schedule()
Created attachment 8927 [details] robust_test.stp Updated stap script. Was run, as root with the command: # while :; do stap --ldd robust_test.stp -c ./robust_test2; done sometimes it triggers quickly the issue, sometimes it takes up to 10 minutes...
Created attachment 8930 [details] linux-kernel-test.patch I was thinking it could be a kernel bug, because when adding a minor change to kernel/futex.c:exit_robust_list() I could no longer reproduce the problem when "instrumenting" the execution with systemtap; did not reproduce after 12+ hours, but if running from command line, would reproduce in 15 to 30 runs. The kernel patch, I added to my own computer, running rhel 7.2, was due to noticing these conditions in entry of exit_robust_list in one of the dead locks: head = 0x7fb7ce4afa20 &head.list = 0x7fb7ce4afa20 entry = 0x7fb7ce4d3020 entry->next = 0x0 futex_offset = -32 pending = 0x7fb7ce4d3020 // loop because 0x7fb7ce4d3020 != 0x7fb7ce4afa20 while (entry != &head->list) { // next_entry is set (again) to NULL rc = fetch_robust_entry(&next_entry, &entry->next, &next_pi); // not executed, because entry == pending if (entry != pending) if (handle_futex_death((void __user *)entry + futex_offset, curr, pi)) return; // entry is set to NULL entry = next_entry; *** restart loop // loop because NULL != 0x7fb7ce4afa20 while (entry != &head->list) { // executed, because NULL != pending if (entry != pending) if (handle_futex_death((void __user *)entry + futex_offset, curr, pi)) return; *** since it leaves with a return on error, it never executes: if (pending) handle_futex_death((void __user *)pending + futex_offset, curr, pip); Apparently this is a side effect of https://lkml.org/lkml/2007/9/30/55 But I wonder, could "head->list.next->next" really be NULL in normal conditions?
Created attachment 8933 [details] robust_test.c After the kernel test patch, I could only reproduce the issue with systemtap if using the alternate version that spawns 8 child processes, instead of just 1.
Created attachment 8934 [details] robust_test.stp Updated stap script, and s/robust_test2/robust_test/ Now I can easily trigger again the issue, instrumenting with systemtap: 0 S root 19653 19650 0 80 0 - 1584 kretpr 11:13 pts/1 00:00:00 ./robust_test 1 S root 19656 19653 0 80 0 - 1582 kretpr 11:13 pts/1 00:00:00 ./robust_test 1 Z root 19708 19653 0 80 0 - 0 kretpr 11:13 pts/1 00:00:00 [robust_test] <defunct> 1 Z root 19711 19653 0 80 0 - 0 kretpr 11:13 pts/1 00:00:00 [robust_test] <defunct> 1 S root 19712 19653 0 80 0 - 1584 kretpr 11:13 pts/1 00:00:00 ./robust_test 1 S root 19713 19653 0 80 0 - 1584 kretpr 11:13 pts/1 00:00:00 ./robust_test 1 Z root 19714 19653 0 80 0 - 0 kretpr 11:13 pts/1 00:00:00 [robust_test] <defunct> 1 S root 19715 19653 0 80 0 - 1584 kretpr 11:13 pts/1 00:00:00 ./robust_test 1 S root 19716 19653 0 80 0 - 1584 kretpr 11:13 pts/1 00:00:00 ./robust_test 1 Z root 19717 19653 0 80 0 - 0 kretpr 11:13 pts/1 00:00:00 [robust_test] <defunct> A few pages of the systemtap output: [19697] > handle_futex_death(uaddr=0x7f87da9cd000 curr=0xffff8800c1608000 pi=0x0) [19697] < handle_futex_death() return=0x0 [19697] < exit_robust_list() [19697] < mm_release() [19653] > schedule() [19653] < schedule() [19697] signal.send SIGCHLD to 19653 [19697] > schedule() [19653] > syscall.wait4(upid=0x4cf1 stat_addr=0x7ffc78b0308c options=0x0 ru=0x0) [19653] < syscall.wait4() return=0x4cf1 [19707] > syscall.set_robust_list(head=0x7f87da9a9a20 len=0x18) [19707] < syscall.set_robust_list() return=0x0 [19707] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19653] > pthread_mutex_unlock(mutex=0x7f87da9cd000) [19707] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19653] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9cd000) [19707] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cc5 utime=0x0 uaddr2=0x80 val3=0x0) [19707] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cc5 abs_time=0x0 bitset=0xffffffff) [19653] > __pthread_mutex_unlock_full(mutex=0x7f87da9cd000 decr=0x1) [19707] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cc5 flags=0x1 q=0xffff88020b5c3dc8 hb=0xffff88020b5c3d58) [19707] < futex_wait() return=0x0 [19707] > futex_wait_queue_me(hb=0xffffc90001893fc0 q=0xffff88020b5c3dc8 timeout=0x0) [19707] > schedule() [19653] > syscall.futex(uaddr=0x7f87da9cd000 op=0x1 val=0x1 utime=0x7f87da9a9a10 uaddr2=0x7f87da9cd000 val3=0x4cc5) [19653] < syscall.futex() return=0x1 [19702] < schedule() [19653] < 0x4011a7() return=0x0 [19702] < futex_wait() [19653] < 0x4011a7() return=0x0 [19702] < do_futex() return=0x0 [19708] > syscall.set_robust_list(head=0x7f87da9a9a20 len=0x18) [19702] < syscall.futex() return=0x0 [19708] < syscall.set_robust_list() return=0x0 [19653] signal.send SIGTERM to 19701 [19702] < 0x401122() return=0x0 [19702] < 0x401122() return=0x0 [19701] < schedule() [19653] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19701] < futex_wait() [19702] > pthread_mutex_unlock(mutex=0x7f87da9cd000) [19701] < do_futex() return=0xfffffffffffffe00 [19653] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19701] < syscall.futex() return=0xfffffffffffffe00 [19702] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9cd000) [19701] do_exit(code=0xf) [19702] > __pthread_mutex_unlock_full(mutex=0x7f87da9cd000 decr=0x1) [19701] exit_mm(tsk=0xffff88030a4d4500) [19653] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cf6 utime=0x0 uaddr2=0x80 val3=0x0) [19701] > mm_release(tsk=0xffff88030a4d4500 mm=0xffff8803046e12c0) [19653] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cf6 abs_time=0x0 bitset=0xffffffff) [19702] > syscall.futex(uaddr=0x7f87da9cd000 op=0x1 val=0x1 utime=0x7f87da9cd020 uaddr2=0x7f87da9cd000 val3=0x0) [19701] > exit_robust_list(curr=0xffff88030a4d4500) head = 0x7f87da9a9a20 &head.list = 0x7f87da9a9a20 entry = 0x7f87da9cd020 next = 0x0 futex_offset = -32 pending = 0x7f87da9cd020 [19653] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cf6 flags=0x1 q=0xffff8801df113dc8 hb=0xffff8801df113d58) [19702] < syscall.futex() return=0x1 [19701] > handle_futex_death(uaddr=0x7f87da9cd000 curr=0xffff88030a4d4500 pi=0x0) [19653] < futex_wait() return=0xfffffffffffffff5 [19702] < 0x4011a7() return=0x0 [19701] < handle_futex_death() return=0x0 [19701] < exit_robust_list() [19653] < do_futex() return=0xfffffffffffffff5 [19702] < 0x4011a7() return=0x0 [19701] < mm_release() [19653] < syscall.futex() return=0xfffffffffffffff5 [19653] < 0x401122() return=0x0 [19708] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19653] < 0x401122() return=0x0 [19653] > syscall.wait4(upid=0x4cf5 stat_addr=0x7ffc78b0308c options=0x0 ru=0x0) [19653] > schedule() [19702] > __pthread_mutex_lock(mutex=0x7f87da9d0908) [19708] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19702] < _dl_fini() return=0x0 [19708] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cc5 utime=0x0 uaddr2=0x80 val3=0x0) [19708] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cc5 abs_time=0x0 bitset=0xffffffff) [19708] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cc5 flags=0x1 q=0xffff8801de85bdc8 hb=0xffff8801de85bd58) [19708] < futex_wait() return=0x0 [19702] > pthread_mutex_unlock(mutex=0x7f87da9d0908) [19708] > futex_wait_queue_me(hb=0xffffc90001893fc0 q=0xffff8801de85bdc8 timeout=0x0) [19708] > schedule() [19702] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9d0908) [19702] < _dl_fini() return=0x0 [19702] do_exit(code=0x0) [19702] exit_mm(tsk=0xffff88030a4d0b80) [19701] signal.send SIGCHLD to 19653 [19702] > mm_release(tsk=0xffff88030a4d0b80 mm=0xffff8803046e3840) [19701] > schedule() [19653] < schedule() [19703] < schedule() [19702] > exit_robust_list(curr=0xffff88030a4d0b80) head = 0x7f87da9a9a20 &head.list = 0x7f87da9a9a20 entry = 0x7f87da9cd020 next = 0x7f87da9a9a20 futex_offset = -32 pending = 0x0 [19703] < futex_wait() [19702] > handle_futex_death(uaddr=0x7f87da9cd000 curr=0xffff88030a4d0b80 pi=0x0) [19653] < syscall.wait4() return=0x4cf5 [19702] < handle_futex_death() return=0x0 [19703] < do_futex() return=0x0 [19702] < exit_robust_list() [19703] < syscall.futex() return=0x0 [19702] < mm_release() [19703] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cc5 utime=0x0 uaddr2=0x80 val3=0x0) [19703] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cc5 abs_time=0x0 bitset=0xffffffff) [19703] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cc5 flags=0x1 q=0xffff88026a2fbdc8 hb=0xffff88026a2fbd58) [19703] < futex_wait() return=0x0 [19703] > futex_wait_queue_me(hb=0xffffc90001893fc0 q=0xffff88026a2fbdc8 timeout=0x0) [19703] > schedule() [19702] > schedule() [19702] < schedule() [19702] signal.send SIGCHLD to 19653 [19702] > schedule() [19653] > syscall.wait4(upid=0x4cf6 stat_addr=0x7ffc78b0308c options=0x0 ru=0x0) [19653] < syscall.wait4() return=0x4cf6 [19709] > syscall.set_robust_list(head=0x7f87da9a9a20 len=0x18) [19709] < syscall.set_robust_list() return=0x0 [19709] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19653] > pthread_mutex_unlock(mutex=0x7f87da9cd000) [19709] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19653] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9cd000) [19653] > __pthread_mutex_unlock_full(mutex=0x7f87da9cd000 decr=0x1) [19709] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cc5 utime=0x0 uaddr2=0x80 val3=0x0) [19709] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cc5 abs_time=0x0 bitset=0xffffffff) [19653] > syscall.futex(uaddr=0x7f87da9cd000 op=0x1 val=0x1 utime=0x7f87da9a9a10 uaddr2=0x7f87da9cd000 val3=0x4cc5) [19710] > syscall.set_robust_list(head=0x7f87da9a9a20 len=0x18) [19710] < syscall.set_robust_list() return=0x0 [19709] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cc5 flags=0x1 q=0xffff8801de85fdc8 hb=0xffff8801de85fd58) [19653] < syscall.futex() return=0x1 [19709] < futex_wait() return=0xfffffffffffffff5 [19653] < 0x4011a7() return=0x0 [19709] < do_futex() return=0xfffffffffffffff5 [19653] < 0x4011a7() return=0x0 [19709] < syscall.futex() return=0xfffffffffffffff5 [19709] < 0x401122() return=0x0 [19653] signal.send SIGTERM to 19703 [19709] < 0x401122() return=0x0 [19653] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19709] > pthread_mutex_unlock(mutex=0x7f87da9cd000) [19703] < schedule() [19703] < futex_wait() [19709] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9cd000) [19653] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19703] < do_futex() return=0xfffffffffffffe00 [19703] < syscall.futex() return=0xfffffffffffffe00 [19709] > __pthread_mutex_unlock_full(mutex=0x7f87da9cd000 decr=0x1) [19703] do_exit(code=0xf) [19653] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cfd utime=0x0 uaddr2=0x80 val3=0x0) [19703] exit_mm(tsk=0xffff88030a4d6780) [19709] > syscall.futex(uaddr=0x7f87da9cd000 op=0x1 val=0x1 utime=0x7f87da9cd020 uaddr2=0x7f87da9cd000 val3=0x0) [19653] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cfd abs_time=0x0 bitset=0xffffffff) [19703] > mm_release(tsk=0xffff88030a4d6780 mm=0xffff8803046e5dc0) [19709] < syscall.futex() return=0x1 [19653] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cfd flags=0x1 q=0xffff8801df113dc8 hb=0xffff8801df113d58) [19710] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19709] < 0x4011a7() return=0x0 [19653] < futex_wait() return=0xfffffffffffffff5 [19703] > exit_robust_list(curr=0xffff88030a4d6780) head = 0x7f87da9a9a20 &head.list = 0x7f87da9a9a20 entry = 0x7f87da9cd020 next = 0x0 futex_offset = -32 pending = 0x7f87da9cd020 [19709] < 0x4011a7() return=0x0 [19653] < do_futex() return=0xfffffffffffffff5 [19653] < syscall.futex() return=0xfffffffffffffff5 [19703] > handle_futex_death(uaddr=0x7f87da9cd000 curr=0xffff88030a4d6780 pi=0x0) [19703] < handle_futex_death() return=0x0 [19703] < exit_robust_list() [19653] < 0x401122() return=0x0 [19703] < mm_release() [19653] < 0x401122() return=0x0 [19710] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19653] > syscall.wait4(upid=0x4cfd stat_addr=0x7ffc78b0308c options=0x0 ru=0x0) [19709] > __pthread_mutex_lock(mutex=0x7f87da9d0908) [19710] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cc5 utime=0x0 uaddr2=0x80 val3=0x0) [19653] > schedule() [19709] < _dl_fini() return=0x0 [19710] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cc5 abs_time=0x0 bitset=0xffffffff) [19710] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cc5 flags=0x1 q=0xffff88020b5a3dc8 hb=0xffff88020b5a3d58) [19710] < futex_wait() return=0x0 [19709] > pthread_mutex_unlock(mutex=0x7f87da9d0908) [19709] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9d0908) [19710] > futex_wait_queue_me(hb=0xffffc90001893fc0 q=0xffff88020b5a3dc8 timeout=0x0) [19710] > schedule() [19709] < _dl_fini() return=0x0 [19709] do_exit(code=0x0) [19709] exit_mm(tsk=0xffff88030a4d2280) [19709] > mm_release(tsk=0xffff88030a4d2280 mm=0xffff8803046e3200) [19709] > exit_robust_list(curr=0xffff88030a4d2280) head = 0x7f87da9a9a20 &head.list = 0x7f87da9a9a20 entry = 0x7f87da9cd020 next = 0x7f87da9a9a20 futex_offset = -32 pending = 0x0 [19703] signal.send SIGCHLD to 19653 [19709] > handle_futex_death(uaddr=0x7f87da9cd000 curr=0xffff88030a4d2280 pi=0x0) [19703] > schedule() [19709] < handle_futex_death() return=0x0 [19709] < exit_robust_list() [19709] < mm_release() [19709] signal.send SIGCHLD to 19653 [19709] > schedule() [19704] < schedule() [19653] < schedule() [19704] < futex_wait() [19704] < do_futex() return=0x0 [19704] < syscall.futex() return=0x0 [19653] < syscall.wait4() return=0x4cfd [19704] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cc5 utime=0x0 uaddr2=0x80 val3=0x0) [19704] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cc5 abs_time=0x0 bitset=0xffffffff) [19704] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cc5 flags=0x1 q=0xffff88026a2ffdc8 hb=0xffff88026a2ffd58) [19704] < futex_wait() return=0x0 [19704] > futex_wait_queue_me(hb=0xffffc90001893fc0 q=0xffff88026a2ffdc8 timeout=0x0) [19704] > schedule() [19705] < schedule() [19705] < futex_wait() [19705] < do_futex() return=0x0 [19705] < syscall.futex() return=0x0 [19705] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cc5 utime=0x0 uaddr2=0x80 val3=0x0) [19705] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cc5 abs_time=0x0 bitset=0xffffffff) [19705] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cc5 flags=0x1 q=0xffff8801df15bdc8 hb=0xffff8801df15bd58) [19705] < futex_wait() return=0x0 [19705] > futex_wait_queue_me(hb=0xffffc90001893fc0 q=0xffff8801df15bdc8 timeout=0x0) [19705] > schedule() [19653] > syscall.wait4(upid=0x4cf7 stat_addr=0x7ffc78b0308c options=0x0 ru=0x0) [19653] < syscall.wait4() return=0x4cf7 [19711] > syscall.set_robust_list(head=0x7f87da9a9a20 len=0x18) [19711] < syscall.set_robust_list() return=0x0 [19711] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19711] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19711] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cc5 utime=0x0 uaddr2=0x80 val3=0x0) [19711] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cc5 abs_time=0x0 bitset=0xffffffff) [19711] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cc5 flags=0x1 q=0xffff88020b5a7dc8 hb=0xffff88020b5a7d58) [19711] < futex_wait() return=0x0 [19711] > futex_wait_queue_me(hb=0xffffc90001893fc0 q=0xffff88020b5a7dc8 timeout=0x0) [19653] > pthread_mutex_unlock(mutex=0x7f87da9cd000) [19711] > schedule() [19653] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9cd000) [19712] > syscall.set_robust_list(head=0x7f87da9a9a20 len=0x18) [19712] < syscall.set_robust_list() return=0x0 [19653] > __pthread_mutex_unlock_full(mutex=0x7f87da9cd000 decr=0x1) [19653] > syscall.futex(uaddr=0x7f87da9cd000 op=0x1 val=0x1 utime=0x0 uaddr2=0x7f87da9cd000 val3=0x4) [19653] < syscall.futex() return=0x1 [19706] < schedule() [19653] < 0x4011a7() return=0x0 [19706] < futex_wait() [19706] < do_futex() return=0x0 [19653] < 0x4011a7() return=0x0 [19706] < syscall.futex() return=0x0 [19706] < 0x401122() return=0x0 [19653] signal.send SIGTERM to 19707 [19706] < 0x401122() return=0x0 [19653] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19707] < schedule() [19706] > pthread_mutex_unlock(mutex=0x7f87da9cd000) [19707] < futex_wait() [19706] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9cd000) [19653] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19707] < do_futex() return=0xfffffffffffffe00 [19707] < syscall.futex() return=0xfffffffffffffe00 [19653] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cfa utime=0x0 uaddr2=0x80 val3=0x0) [19706] > __pthread_mutex_unlock_full(mutex=0x7f87da9cd000 decr=0x1) [19653] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cfa abs_time=0x0 bitset=0xffffffff) [19707] do_exit(code=0xf) [19706] > syscall.futex(uaddr=0x7f87da9cd000 op=0x1 val=0x1 utime=0x7f87da9cd020 uaddr2=0x7f87da9cd000 val3=0x0) [19712] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19707] exit_mm(tsk=0xffff88030a4d3980) [19653] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cfa flags=0x1 q=0xffff8801df113dc8 hb=0xffff8801df113d58) [19706] < syscall.futex() return=0x1 [19707] > mm_release(tsk=0xffff88030a4d3980 mm=0xffff8803046e2580) [19653] < futex_wait() return=0xfffffffffffffff5 [19706] < 0x4011a7() return=0x0 [19653] < do_futex() return=0xfffffffffffffff5 [19706] < 0x4011a7() return=0x0 [19653] < syscall.futex() return=0xfffffffffffffff5 [19707] > exit_robust_list(curr=0xffff88030a4d3980) head = 0x7f87da9a9a20 &head.list = 0x7f87da9a9a20 entry = 0x7f87da9cd020 next = 0x0 futex_offset = -32 pending = 0x7f87da9cd020 [19712] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19707] > handle_futex_death(uaddr=0x7f87da9cd000 curr=0xffff88030a4d3980 pi=0x0) [19653] < 0x401122() return=0x0 [19707] < handle_futex_death() return=0x0 [19653] < 0x401122() return=0x0 [19707] < exit_robust_list() [19712] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cc5 utime=0x0 uaddr2=0x80 val3=0x0) [19707] < mm_release() [19653] > syscall.wait4(upid=0x4cfb stat_addr=0x7ffc78b0308c options=0x0 ru=0x0) [19712] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cc5 abs_time=0x0 bitset=0xffffffff) [19653] > schedule() [19706] > __pthread_mutex_lock(mutex=0x7f87da9d0908) [19712] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cc5 flags=0x1 q=0xffff88020b5abdc8 hb=0xffff88020b5abd58) [19712] < futex_wait() return=0x0 [19706] < _dl_fini() return=0x0 [19712] > futex_wait_queue_me(hb=0xffffc90001893fc0 q=0xffff88020b5abdc8 timeout=0x0) [19712] > schedule() [19706] > pthread_mutex_unlock(mutex=0x7f87da9d0908) [19706] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9d0908) [19706] < _dl_fini() return=0x0 [19706] do_exit(code=0x0) [19706] exit_mm(tsk=0xffff88030a4d2e00) [19706] > mm_release(tsk=0xffff88030a4d2e00 mm=0xffff8803046e1900) [19707] signal.send SIGCHLD to 19653 [19706] > exit_robust_list(curr=0xffff88030a4d2e00) head = 0x7f87da9a9a20 &head.list = 0x7f87da9a9a20 entry = 0x7f87da9cd020 next = 0x7f87da9a9a20 futex_offset = -32 pending = 0x0 [19706] > handle_futex_death(uaddr=0x7f87da9cd000 curr=0xffff88030a4d2e00 pi=0x0) [19707] > schedule() [19706] < handle_futex_death() return=0x0 [19706] < exit_robust_list() [19706] < mm_release() [19653] < schedule() [19653] < syscall.wait4() return=0x4cfb [19706] signal.send SIGCHLD to 19653 [19706] > schedule() [19708] < schedule() [19708] < futex_wait() [19708] < do_futex() return=0x0 [19708] < syscall.futex() return=0x0 [19708] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cc5 utime=0x0 uaddr2=0x80 val3=0x0) [19708] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cc5 abs_time=0x0 bitset=0xffffffff) [19708] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cc5 flags=0x1 q=0xffff8801de85bdc8 hb=0xffff8801de85bd58) [19708] < futex_wait() return=0x0 [19708] > futex_wait_queue_me(hb=0xffffc90001893fc0 q=0xffff8801de85bdc8 timeout=0x0) [19708] > schedule() [19653] > syscall.wait4(upid=0x4cfa stat_addr=0x7ffc78b0308c options=0x0 ru=0x0) [19653] < syscall.wait4() return=0x4cfa [19653] > pthread_mutex_unlock(mutex=0x7f87da9cd000) [19653] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9cd000) [19653] > __pthread_mutex_unlock_full(mutex=0x7f87da9cd000 decr=0x1) [19653] > syscall.futex(uaddr=0x7f87da9cd000 op=0x1 val=0x1 utime=0x7f87da9a9a10 uaddr2=0x7f87da9cd000 val3=0x4cc5) [19653] < syscall.futex() return=0x1 [19653] < 0x4011a7() return=0x0 [19653] < 0x4011a7() return=0x0 [19653] signal.send SIGTERM to 19710 [19653] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19653] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19714] > syscall.set_robust_list(head=0x7f87da9a9a20 len=0x18) [19653] < 0x401122() return=0x0 [19714] < syscall.set_robust_list() return=0x0 [19653] < 0x401122() return=0x0 [19653] > syscall.wait4(upid=0x4cfe stat_addr=0x7ffc78b0308c options=0x0 ru=0x0) [19653] > schedule() [19713] > syscall.set_robust_list(head=0x7f87da9a9a20 len=0x18) [19713] < syscall.set_robust_list() return=0x0 [19714] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19714] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19714] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cc5 utime=0x0 uaddr2=0x80 val3=0x0) [19714] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cc5 abs_time=0x0 bitset=0xffffffff) [19714] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cc5 flags=0x1 q=0xffff88020c567dc8 hb=0xffff88020c567d58) [19714] < futex_wait() return=0x0 [19714] > futex_wait_queue_me(hb=0xffffc90001893fc0 q=0xffff88020c567dc8 timeout=0x0) [19714] > schedule() [19710] < schedule() [19710] < futex_wait() [19710] < do_futex() return=0x0 [19713] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19710] < syscall.futex() return=0x0 [19710] do_exit(code=0xf) [19710] exit_mm(tsk=0xffff8802bba2ae00) [19710] > mm_release(tsk=0xffff8802bba2ae00 mm=0xffff8803046e0000) [19713] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19710] > exit_robust_list(curr=0xffff8802bba2ae00) head = 0x7f87da9a9a20 &head.list = 0x7f87da9a9a20 entry = 0x7f87da9cd020 next = 0x7f87da9a9a20 futex_offset = -32 pending = 0x7f87da9cd020 [19710] > handle_futex_death(uaddr=0x7f87da9cd000 curr=0xffff8802bba2ae00 pi=0x0) [19713] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cc5 utime=0x0 uaddr2=0x80 val3=0x0) [19710] < handle_futex_death() return=0x0 [19710] < exit_robust_list() [19710] < mm_release() [19713] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cc5 abs_time=0x0 bitset=0xffffffff) [19713] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cc5 flags=0x1 q=0xffff8802bbae7dc8 hb=0xffff8802bbae7d58) [19713] < futex_wait() return=0x0 [19713] > futex_wait_queue_me(hb=0xffffc90001893fc0 q=0xffff8802bbae7dc8 timeout=0x0) [19713] > schedule() [19710] signal.send SIGCHLD to 19653 [19710] > schedule() [19653] < schedule() [19653] < syscall.wait4() return=0x4cfe [19653] > pthread_mutex_unlock(mutex=0x7f87da9cd000) [19653] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9cd000) [19653] > __pthread_mutex_unlock_full(mutex=0x7f87da9cd000 decr=0x1) [19653] > syscall.futex(uaddr=0x7f87da9cd000 op=0x1 val=0x1 utime=0x7f87da9a9a10 uaddr2=0x7f87da9cd000 val3=0x4cc5) [19653] < syscall.futex() return=0x1 [19715] > syscall.set_robust_list(head=0x7f87da9a9a20 len=0x18) [19715] < syscall.set_robust_list() return=0x0 [19653] < 0x4011a7() return=0x0 [19653] < 0x4011a7() return=0x0 [19704] < schedule() [19704] < futex_wait() [19653] signal.send SIGTERM to 19704 [19704] < do_futex() return=0x0 [19704] < syscall.futex() return=0x0 [19653] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19704] do_exit(code=0xf) [19653] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19653] < 0x401122() return=0x0 [19704] exit_mm(tsk=0xffff88030a4d5080) [19653] < 0x401122() return=0x0 [19704] > mm_release(tsk=0xffff88030a4d5080 mm=0xffff8803046e3e80) [19653] > syscall.wait4(upid=0x4cf8 stat_addr=0x7ffc78b0308c options=0x0 ru=0x0) [19653] > schedule() [19704] > exit_robust_list(curr=0xffff88030a4d5080) head = 0x7f87da9a9a20 &head.list = 0x7f87da9a9a20 entry = 0x7f87da9cd020 next = 0x7f87da9a9a20 futex_offset = -32 pending = 0x7f87da9cd020 [19715] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19704] > handle_futex_death(uaddr=0x7f87da9cd000 curr=0xffff88030a4d5080 pi=0x0) [19704] < handle_futex_death() return=0x0 [19704] < exit_robust_list() [19704] < mm_release() [19715] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19715] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cc5 utime=0x0 uaddr2=0x80 val3=0x0) [19715] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cc5 abs_time=0x0 bitset=0xffffffff) [19715] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cc5 flags=0x1 q=0xffff88020cb3bdc8 hb=0xffff88020cb3bd58) [19715] < futex_wait() return=0x0 [19715] > futex_wait_queue_me(hb=0xffffc90001893fc0 q=0xffff88020cb3bdc8 timeout=0x0) [19715] > schedule() [19704] signal.send SIGCHLD to 19653 [19704] > schedule() [19653] < schedule() [19653] < syscall.wait4() return=0x4cf8 [19653] > pthread_mutex_unlock(mutex=0x7f87da9cd000) [19653] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9cd000) [19653] > __pthread_mutex_unlock_full(mutex=0x7f87da9cd000 decr=0x1) [19653] > syscall.futex(uaddr=0x7f87da9cd000 op=0x1 val=0x1 utime=0x7f87da9a9a10 uaddr2=0x7f87da9cd000 val3=0x4cc5) [19653] < syscall.futex() return=0x1 [19716] > syscall.set_robust_list(head=0x7f87da9a9a20 len=0x18) [19653] < 0x4011a7() return=0x0 [19716] < syscall.set_robust_list() return=0x0 [19653] < 0x4011a7() return=0x0 [19705] < schedule() [19705] < futex_wait() [19653] signal.send SIGTERM to 19705 [19705] < do_futex() return=0x0 [19705] < syscall.futex() return=0x0 [19653] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19705] do_exit(code=0xf) [19653] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19705] exit_mm(tsk=0xffff88030a4d0000) [19653] < 0x401122() return=0x0 [19705] > mm_release(tsk=0xffff88030a4d0000 mm=0xffff8803046e7080) [19653] < 0x401122() return=0x0 [19653] > syscall.wait4(upid=0x4cf9 stat_addr=0x7ffc78b0308c options=0x0 ru=0x0) [19653] > schedule() [19705] > exit_robust_list(curr=0xffff88030a4d0000) head = 0x7f87da9a9a20 &head.list = 0x7f87da9a9a20 entry = 0x7f87da9cd020 next = 0x7f87da9a9a20 futex_offset = -32 pending = 0x7f87da9cd020 [19705] > handle_futex_death(uaddr=0x7f87da9cd000 curr=0xffff88030a4d0000 pi=0x0) [19716] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19705] < handle_futex_death() return=0x0 [19705] < exit_robust_list() [19705] < mm_release() [19716] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19716] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cc5 utime=0x0 uaddr2=0x80 val3=0x0) [19716] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cc5 abs_time=0x0 bitset=0xffffffff) [19716] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cc5 flags=0x1 q=0xffff88020cb3fdc8 hb=0xffff88020cb3fd58) [19716] < futex_wait() return=0x0 [19716] > futex_wait_queue_me(hb=0xffffc90001893fc0 q=0xffff88020cb3fdc8 timeout=0x0) [19716] > schedule() [19705] signal.send SIGCHLD to 19653 [19705] > schedule() [19653] < schedule() [19653] < syscall.wait4() return=0x4cf9 [19653] > pthread_mutex_unlock(mutex=0x7f87da9cd000) [19653] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9cd000) [19653] > __pthread_mutex_unlock_full(mutex=0x7f87da9cd000 decr=0x1) [19653] > syscall.futex(uaddr=0x7f87da9cd000 op=0x1 val=0x1 utime=0x7f87da9a9a10 uaddr2=0x7f87da9cd000 val3=0x4cc5) [19653] < syscall.futex() return=0x1 [19653] < 0x4011a7() return=0x0 [19717] > syscall.set_robust_list(head=0x7f87da9a9a20 len=0x18) [19653] < 0x4011a7() return=0x0 [19717] < syscall.set_robust_list() return=0x0 [19711] < schedule() [19653] signal.send SIGTERM to 19714 [19711] < futex_wait() [19653] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19711] < do_futex() return=0x0 [19711] < syscall.futex() return=0x0 [19653] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19711] < 0x401122() return=0x0 [19653] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004cff utime=0x0 uaddr2=0x80 val3=0x0) [19711] < 0x401122() return=0x0 [19653] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004cff abs_time=0x0 bitset=0xffffffff) [19711] > pthread_mutex_unlock(mutex=0x7f87da9cd000) [19653] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004cff flags=0x1 q=0xffff8801df113dc8 hb=0xffff8801df113d58) [19711] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9cd000) [19653] < futex_wait() return=0x0 [19653] > futex_wait_queue_me(hb=0xffffc90001893fc0 q=0xffff8801df113dc8 timeout=0x0) [19653] > schedule() [19711] > __pthread_mutex_unlock_full(mutex=0x7f87da9cd000 decr=0x1) [19711] > syscall.futex(uaddr=0x7f87da9cd000 op=0x1 val=0x1 utime=0x7f87da9cd020 uaddr2=0x7f87da9cd000 val3=0x0) [19717] > __pthread_mutex_lock(mutex=0x7f87da9cd000) [19711] < syscall.futex() return=0x1 [19711] < 0x4011a7() return=0x0 [19712] < schedule() [19711] < 0x4011a7() return=0x0 [19712] < futex_wait() [19712] < do_futex() return=0x0 [19717] > __pthread_mutex_lock_full(mutex=0x7f87da9cd000) [19712] < syscall.futex() return=0x0 [19717] < 0x401122() return=0x0 [19712] > syscall.futex(uaddr=0x7f87da9cd000 op=0x0 val=0x80004d05 utime=0x0 uaddr2=0x80 val3=0x0) [19717] < 0x401122() return=0x0 [19712] > futex_wait(uaddr=0x7f87da9cd000 flags=0x1 val=0x80004d05 abs_time=0x0 bitset=0xffffffff) [19717] > pthread_mutex_unlock(mutex=0x7f87da9cd000) [19712] > futex_wait_setup(uaddr=0x7f87da9cd000 val=0x80004d05 flags=0x1 q=0xffff88020b5abdc8 hb=0xffff88020b5abd58) [19711] > __pthread_mutex_lock(mutex=0x7f87da9d0908) [19712] < futex_wait() return=0x0 [19717] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9cd000) [19712] > futex_wait_queue_me(hb=0xffffc90001893fc0 q=0xffff88020b5abdc8 timeout=0x0) [19712] > schedule() [19711] < _dl_fini() return=0x0 [19717] > __pthread_mutex_unlock_full(mutex=0x7f87da9cd000 decr=0x1) [19717] > syscall.futex(uaddr=0x7f87da9cd000 op=0x1 val=0x1 utime=0x7f87da9cd020 uaddr2=0x7f87da9cd000 val3=0x0) [19711] > pthread_mutex_unlock(mutex=0x7f87da9d0908) [19717] < syscall.futex() return=0x1 [19711] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9d0908) [19708] < schedule() [19708] < futex_wait() [19717] < 0x4011a7() return=0x0 [19708] < do_futex() return=0x0 [19711] < _dl_fini() return=0x0 [19708] < syscall.futex() return=0x0 [19717] < 0x4011a7() return=0x0 [19708] < 0x401122() return=0x0 [19708] < 0x401122() return=0x0 [19708] > pthread_mutex_unlock(mutex=0x7f87da9cd000) [19708] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9cd000) [19708] > __pthread_mutex_unlock_full(mutex=0x7f87da9cd000 decr=0x1) [19717] > __pthread_mutex_lock(mutex=0x7f87da9d0908) [19708] > syscall.futex(uaddr=0x7f87da9cd000 op=0x1 val=0x1 utime=0x7f87da9cd020 uaddr2=0x7f87da9cd000 val3=0x0) [19711] do_exit(code=0x0) [19717] < _dl_fini() return=0x0 [19708] < syscall.futex() return=0x1 [19708] < 0x4011a7() return=0x0 [19711] exit_mm(tsk=0xffff8802bba2b980) [19708] < 0x4011a7() return=0x0 [19711] > mm_release(tsk=0xffff8802bba2b980 mm=0xffff8803046e76c0) [19717] > pthread_mutex_unlock(mutex=0x7f87da9d0908) [19711] > exit_robust_list(curr=0xffff8802bba2b980) head = 0x7f87da9a9a20 &head.list = 0x7f87da9a9a20 entry = 0x7f87da9cd020 next = 0x0 futex_offset = -32 pending = 0x0 [19717] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9d0908) [19711] > handle_futex_death(uaddr=0x7f87da9cd000 curr=0xffff8802bba2b980 pi=0x0) [19708] > __pthread_mutex_lock(mutex=0x7f87da9d0908) [19717] < _dl_fini() return=0x0 [19711] < handle_futex_death() return=0x0 [19711] < exit_robust_list() [19708] < _dl_fini() return=0x0 [19711] < mm_release() [19708] > pthread_mutex_unlock(mutex=0x7f87da9d0908) [19708] > pthread_mutex_unlock(decr=0x1 mutex=0x7f87da9d0908) [19717] do_exit(code=0x0) [19708] < _dl_fini() return=0x0 [19717] exit_mm(tsk=0xffff8802bba2dc00) [19717] > mm_release(tsk=0xffff8802bba2dc00 mm=0xffff8803046e1900) [19717] > exit_robust_list(curr=0xffff8802bba2dc00) head = 0x7f87da9a9a20 &head.list = 0x7f87da9a9a20 entry = 0x7f87da9cd020 next = 0x0 futex_offset = -32 pending = 0x0 [19708] do_exit(code=0x0) [19717] > handle_futex_death(uaddr=0x7f87da9cd000 curr=0xffff8802bba2dc00 pi=0x0) [19717] < handle_futex_death() return=0x0 [19708] exit_mm(tsk=0xffff88030a4d7300) [19717] < exit_robust_list() [19717] < mm_release() [19708] > mm_release(tsk=0xffff88030a4d7300 mm=0xffff8803046e5780) [19708] > exit_robust_list(curr=0xffff88030a4d7300) head = 0x7f87da9a9a20 &head.list = 0x7f87da9a9a20 entry = 0x7f87da9cd020 next = 0x0 futex_offset = -32 pending = 0x0 [19708] > handle_futex_death(uaddr=0x7f87da9cd000 curr=0xffff88030a4d7300 pi=0x0) [19708] < handle_futex_death() return=0x0 [19708] < exit_robust_list() [19708] < mm_release() [19711] signal.send SIGCHLD to 19653 [19711] > schedule() [19717] signal.send SIGCHLD to 19653 [19717] > schedule() [19714] < schedule() [19714] < futex_wait() [19714] < do_futex() return=0x0 [19714] < syscall.futex() return=0x0 [19708] signal.send SIGCHLD to 19653 [19708] > schedule() [19714] do_exit(code=0xf) [19714] exit_mm(tsk=0xffff8802bba2a280) [19714] > mm_release(tsk=0xffff8802bba2a280 mm=0xffff8803046e12c0) [19714] > exit_robust_list(curr=0xffff8802bba2a280) head = 0x7f87da9a9a20 &head.list = 0x7f87da9a9a20 entry = 0x7f87da9cd020 next = 0x0 futex_offset = -32 pending = 0x7f87da9cd020 [19714] > handle_futex_death(uaddr=0x7f87da9cd000 curr=0xffff8802bba2a280 pi=0x0) [19714] < handle_futex_death() return=0x0 [19714] < exit_robust_list() [19714] < mm_release() [19714] signal.send SIGCHLD to 19653 [19714] > schedule() So, I wonder if the below conditions can become true... 1. process A spawns process B and race for a robust mutex 2. process B has robust mutex locked or in process of locking it 3. process B dies 4. while cleaning up from process B death, process C is started 5. during fork, robust_list of A or C may become inconsistent because only one process would call pthread_mutex_consistent, so, while the mutex would be recovered, the robust list would not 6. C also dies with mutex locked causing corruption and dead lock
As an extra note, returning to test robust_test2, but changing MAX_CHILDREN to 5 I can again trigger a dead lock with systemtap. It enters an "infinite" loop in exit_robust_list due to a circular list (well, it leaves due to the kernel preventing it, looping '#define ROBUST_LIST_LIMIT 2048' times): [16445] > exit_robust_list(curr=0xffff880305c49700) head = 0x7f7b510f2a20 entry = 0x7f7b51116020 next = 0x7f7b51116020 pending = 0x0
[trying to get more information...] For robust_test2.c, with MAX_CHILDREN set to 1, but with this (pseudo) patch: CHECKED(pthread_mutexattr_setrobust, &attr, PTHREAD_MUTEX_ROBUST); + CHECKED(pthread_mutexattr_setprotocol, &attr, PTHREAD_PRIO_INHERIT); CHECKED(pthread_mutex_init, mutex, &attr); *and* the kernel patch I posted earlier, I cannot reproduce the problem in rhel 7.2. Below, testing with the robust_test2 patch, but no kernel patch on "real" hardware: --- Recent rawhide I get an assertion shortly: $ ./robust_test2 Done 0 iterations Done 100 iterations Done 200 iterations robust_test2: ../nptl/pthread_mutex_lock.c:352: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust' failed. Older rhel7 (> 7.0 < 7.2) kernel it dead locks very quickly.
Even with the fix for bug 20973 I still saw an intermittent hang in tst-robust8.