17463 – mutexes owned by a terminated thread are supposed to be owned by an imaginary thread

Bug 17463 - mutexes owned by a terminated thread are supposed to be owned by an imaginary thread

Summary: mutexes owned by a terminated thread are supposed to be owned by an imaginary...

Status:	NEW

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	nptl (show other bugs)
Version:	unspecified

Importance:	P2 minor
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Duplicates (1):	18109 (view as bug list)
Depends on:
Blocks:

Reported:	2014-10-07 12:50 UTC by Torvald Riegel
Modified:	2019-11-05 14:27 UTC (History)
CC List:	4 users (show)

See Also:	14485
Host:
Target:
Build:
Last reconfirmed:

Flags:	fweimer: security-

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Torvald Riegel 2014-10-07 12:50:35 UTC

IMHO, if a thread that owns a mutex terminates, this should result in undefined behavior.  The Austin Group thinks differently:
http://austingroupbugs.net/view.php?id=755

Essentially, what they want is that an imaginary thread (because the initial owner doesn't exist anymore) continues to own such mutex.  Our recursive and non-robust PI mutexes don't fulfill this requirement, because we have an ABA issue on the thread ID that's used to represent ownership.

My preferred solution would be to not change the implementation yet document that we deviate from POSIX in this aspect.  (The requirement is not explicitly made, but may be determined based on what's in the spec; I think that the people are as likely to interpret the spec in a way that's compatbile with our implementation.)

Alternatively, we should change from thread ID to an ID obtained from a global 64b counter by each thread (on first acquisition of an affected mutex).  That means one more indirection for the lock, and we need a TLS slot for it.

Comment 1 Torvald Riegel 2017-01-12 10:49:12 UTC

*** Bug 18109 has been marked as a duplicate of this bug. ***

Comment 2 Rich Felker 2017-10-24 22:47:00 UTC

One relatively inexpensive way to resolve conformance here is to have threads keep (in the TCB/TLS) a count of the number of recursive or error-checking mutexes they currently one which are not robust. Then, at thread exit time, if the count is not zero, instead of SYS_exit, the thread can zero out its TID, futex_wake it, and go into an infinite SYS_pause loop with all signals blocked. This in effect reserves the TID against reuse. Of course it wastes resources in programs where threads exit with mutexes locked, but the impact on performance is minimal, and such programs are arguably buggy anyway (certainly so if they do it an unbounded number of times, since to do so they'd be creating an unbounded number of mutexes which are not destroyable).

Comment 3 Florian Weimer 2019-11-05 14:27:47 UTC

(In reply to Rich Felker from comment #2)
> One relatively inexpensive way to resolve conformance here is to have
> threads keep (in the TCB/TLS) a count of the number of recursive or
> error-checking mutexes they currently one which are not robust. Then, at
> thread exit time, if the count is not zero, instead of SYS_exit, the thread
> can zero out its TID, futex_wake it, and go into an infinite SYS_pause loop
> with all signals blocked. This in effect reserves the TID against reuse.

I think this has global, system-wide impact, because TIDs are implemented in the kernel as task IDs, which are per-PID-namespace, not per process. I expect this approach, while technically correct, would break quite a few workloads.