Summary: | Deadlock with robust shared mutex and asynchronous termination | ||
---|---|---|---|
Product: | glibc | Reporter: | raoulgough |
Component: | nptl | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | carlos, drepper.fsp, fweimer, raoulgough, triegel |
Priority: | P2 | Flags: | fweimer:
security-
|
Version: | unspecified | ||
Target Milestone: | 2.25 | ||
See Also: |
https://sourceware.org/bugzilla/show_bug.cgi?id=19004 https://sourceware.org/bugzilla/show_bug.cgi?id=20973 https://sourceware.org/bugzilla/show_bug.cgi?id=20985 https://sourceware.org/bugzilla/show_bug.cgi?id=14485 |
||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: | Test program for shared robust mutex with asynchronous termination |
Description
raoulgough
2015-12-24 09:15:17 UTC
I can reproduce this issue with a cleaned up testcase that uses C11 atomics to avoid the data races in the orignal test case. Torvald Riegel, Florian Weimer and I discussed this and the final analysis is that this is clearly a flaw in the glibc handling of the robust list. The sequence of events isn't even that complicated. P1 -> set_robust_list -> setup shared mapping, setup shared mutex -> lock mutex (eunqueues it onto the robust list) -> fork child -> unlock mutex (dequeues it from the list) Note: The deuque here removes the mutex from the robust list copy in the parent, but he child's copy of the robust list head is unchanged and pointing to the mutex. However, that mutex now has it's own ->next pointer set to null so it is no longer a valid list according to the kernel. P2 -> locks mutex (owner not dead, doesn't own it, so locks it) P1 -> Kills P2 Kernel -> Inspects P2's robust list, find it contains a head that points to a mutex that points to NULL. This is a dangling list and the cleanup is ignored. P1 -> locks mutex (and deadlocks). At this point no recovery is possible. The key issue is that the mutex itself, in shared memory, is part of the circular linked list itself. The robust list head itself is stored in the process/thread-local storage, but the rest of the list is shared, made up of the mutexes that are involved in the list. Therefore P1 and P2 share parts of their own robust lists via the shared mutex's afer the fork. This is a flaw really because the mutexes can only be owned by one thread at a time. When P2 is forked we need to do some kind of cleanup of the head of the list, probably clearing head (linking it back to itself to make an empty list) just after forking. This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, master has been updated via 8e31cafb268938729a1314806a924d73fb1991c5 (commit) via 65810f0ef05e8c9e333f17a44e77808b163ca298 (commit) from f32941d80c7f532031061f8dd4704fab9c275cfe (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8e31cafb268938729a1314806a924d73fb1991c5 commit 8e31cafb268938729a1314806a924d73fb1991c5 Author: Torvald Riegel <triegel@redhat.com> Date: Wed Dec 21 13:37:19 2016 +0100 Clear list of acquired robust mutexes in the child process after forking. Robust mutexes acquired at the time of a call to fork() do not remain acquired by the forked child process. We have to clear the list of acquired robust mutexes before registering this list with the kernel; otherwise, if some of the robust mutexes are process-shared, the parent process can alter the child's robust mutex list, which can lead to deadlocks or even modification of memory that may not be occupied by a mutex anymore. [BZ #19402] * sysdeps/nptl/fork.c (__libc_fork): Clear list of acquired robust mutexes. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=65810f0ef05e8c9e333f17a44e77808b163ca298 commit 65810f0ef05e8c9e333f17a44e77808b163ca298 Author: Torvald Riegel <triegel@redhat.com> Date: Thu Dec 22 10:20:43 2016 +0100 robust mutexes: Fix broken x86 assembly by removing it lll_robust_unlock on i386 and x86_64 first sets the futex word to FUTEX_WAITERS|0 before calling __lll_unlock_wake, which will set the futex word to 0. If the thread is killed between these steps, then the futex word will be FUTEX_WAITERS|0, and the kernel (at least current upstream) will not set it to FUTEX_OWNER_DIED|FUTEX_WAITERS because 0 is not equal to the TID of the crashed thread. The lll_robust_lock assembly code on i386 and x86_64 is not prepared to deal with this case because the fastpath tries to only CAS 0 to TID and not FUTEX_WAITERS|0 to TID; the slowpath simply waits until it can CAS 0 to TID or the futex_word has the FUTEX_OWNER_DIED bit set. This issue is fixed by removing the custom x86 assembly code and using the generic C code instead. However, instead of adding more duplicate code to the custom x86 lowlevellock.h, the code of the lll_robust* functions is inlined into the single call sites that exist for each of these functions in the pthread_mutex_* functions. The robust mutex paths in the latter have been slightly reorganized to make them simpler. This patch is meant to be easy to backport, so C11-style atomics are not used. [BZ #20985] * nptl/Makefile: Adapt. * nptl/pthread_mutex_cond_lock.c (LLL_ROBUST_MUTEX_LOCK): Remove. (LLL_ROBUST_MUTEX_LOCK_MODIFIER): New. * nptl/pthread_mutex_lock.c (LLL_ROBUST_MUTEX_LOCK): Remove. (LLL_ROBUST_MUTEX_LOCK_MODIFIER): New. (__pthread_mutex_lock_full): Inline lll_robust* functions and adapt. * nptl/pthread_mutex_timedlock.c (pthread_mutex_timedlock): Inline lll_robust* functions and adapt. * nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_full): Likewise. * sysdeps/nptl/lowlevellock.h (__lll_robust_lock_wait, __lll_robust_lock, lll_robust_cond_lock, __lll_robust_timedlock_wait, __lll_robust_timedlock, __lll_robust_unlock): Remove. * sysdeps/unix/sysv/linux/i386/lowlevellock.h (lll_robust_lock, lll_robust_cond_lock, lll_robust_timedlock, lll_robust_unlock): Remove. * sysdeps/unix/sysv/linux/x86_64/lowlevellock.h (lll_robust_lock, lll_robust_cond_lock, lll_robust_timedlock, lll_robust_unlock): Remove. * sysdeps/unix/sysv/linux/sparc/lowlevellock.h (__lll_robust_lock_wait, __lll_robust_lock, lll_robust_cond_lock, __lll_robust_timedlock_wait, __lll_robust_timedlock, __lll_robust_unlock): Remove. * nptl/lowlevelrobustlock.c: Remove file. * nptl/lowlevelrobustlock.sym: Likewise. * sysdeps/unix/sysv/linux/i386/lowlevelrobustlock.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/lowlevelrobustlock.S: Likewise. ----------------------------------------------------------------------- Summary of changes: ChangeLog | 33 ++ nptl/Makefile | 4 +- nptl/lowlevelrobustlock.c | 136 --------- nptl/lowlevelrobustlock.sym | 6 - nptl/pthread_mutex_cond_lock.c | 6 +- nptl/pthread_mutex_lock.c | 79 ++++-- nptl/pthread_mutex_timedlock.c | 106 ++++++-- nptl/pthread_mutex_unlock.c | 18 +- sysdeps/nptl/fork.c | 20 +- sysdeps/nptl/lowlevellock.h | 68 ----- sysdeps/unix/sysv/linux/i386/lowlevellock.h | 60 ---- sysdeps/unix/sysv/linux/i386/lowlevelrobustlock.S | 232 --------------- sysdeps/unix/sysv/linux/sparc/lowlevellock.h | 40 --- sysdeps/unix/sysv/linux/x86_64/lowlevellock.h | 74 ----- .../unix/sysv/linux/x86_64/lowlevelrobustlock.S | 306 -------------------- 15 files changed, 205 insertions(+), 983 deletions(-) delete mode 100644 nptl/lowlevelrobustlock.c delete mode 100644 nptl/lowlevelrobustlock.sym delete mode 100644 sysdeps/unix/sysv/linux/i386/lowlevelrobustlock.S delete mode 100644 sysdeps/unix/sysv/linux/x86_64/lowlevelrobustlock.S Fixed in master. |