IMHO, if a thread that owns a mutex terminates, this should result in undefined behavior. The Austin Group thinks differently: http://austingroupbugs.net/view.php?id=755 Essentially, what they want is that an imaginary thread (because the initial owner doesn't exist anymore) continues to own such mutex. Our recursive and non-robust PI mutexes don't fulfill this requirement, because we have an ABA issue on the thread ID that's used to represent ownership. My preferred solution would be to not change the implementation yet document that we deviate from POSIX in this aspect. (The requirement is not explicitly made, but may be determined based on what's in the spec; I think that the people are as likely to interpret the spec in a way that's compatbile with our implementation.) Alternatively, we should change from thread ID to an ID obtained from a global 64b counter by each thread (on first acquisition of an affected mutex). That means one more indirection for the lock, and we need a TLS slot for it.
*** Bug 18109 has been marked as a duplicate of this bug. ***
One relatively inexpensive way to resolve conformance here is to have threads keep (in the TCB/TLS) a count of the number of recursive or error-checking mutexes they currently one which are not robust. Then, at thread exit time, if the count is not zero, instead of SYS_exit, the thread can zero out its TID, futex_wake it, and go into an infinite SYS_pause loop with all signals blocked. This in effect reserves the TID against reuse. Of course it wastes resources in programs where threads exit with mutexes locked, but the impact on performance is minimal, and such programs are arguably buggy anyway (certainly so if they do it an unbounded number of times, since to do so they'd be creating an unbounded number of mutexes which are not destroyable).
(In reply to Rich Felker from comment #2) > One relatively inexpensive way to resolve conformance here is to have > threads keep (in the TCB/TLS) a count of the number of recursive or > error-checking mutexes they currently one which are not robust. Then, at > thread exit time, if the count is not zero, instead of SYS_exit, the thread > can zero out its TID, futex_wake it, and go into an infinite SYS_pause loop > with all signals blocked. This in effect reserves the TID against reuse. I think this has global, system-wide impact, because TIDs are implemented in the kernel as task IDs, which are per-PID-namespace, not per process. I expect this approach, while technically correct, would break quite a few workloads.