This is the mail archive of the
mailing list for the glibc project.
Re: [RFC] mutex destruction (#13690): problem description and workarounds
- From: Torvald Riegel <triegel at redhat dot com>
- To: Rich Felker <dalias at libc dot org>
- Cc: GLIBC Devel <libc-alpha at sourceware dot org>
- Date: Mon, 01 Dec 2014 21:52:30 +0100
- Subject: Re: [RFC] mutex destruction (#13690): problem description and workarounds
- Authentication-results: sourceware.org; auth=none
- References: <1396621230 dot 10643 dot 7191 dot camel at triegel dot csb> <20141201153802 dot GV29621 at brightrain dot aerifal dot cx> <1417452125 dot 1771 dot 503 dot camel at triegel dot csb> <20141201170542 dot GY29621 at brightrain dot aerifal dot cx>
On Mon, 2014-12-01 at 12:05 -0500, Rich Felker wrote:
> On Mon, Dec 01, 2014 at 05:42:05PM +0100, Torvald Riegel wrote:
> > On Mon, 2014-12-01 at 10:38 -0500, Rich Felker wrote:
> > > On Fri, Apr 04, 2014 at 04:20:30PM +0200, Torvald Riegel wrote:
> > > > === Workaround 1a: New FUTEX_WAKE_SPURIOUS operation that avoids the
> > > > specification change
> > > >
> > > > This is like Workaround 1, except that the kernel could add a new futex
> > > > op that works like FUTEX_WAKE except that:
> > > > * FUTEX_WAITs woken up by a FUTEX_WAKE_SPURIOUS will always return
> > > > EINTR. EINTR for spurious wakeups is already part of the spec, so
> > > > correct futex users are already handling this (e.g., glibc does).
> > > > * Make sure (and specify) that FUTEX_WAKE_SPURIOUS that hit other
> > > > futexes (e.g., PI) are ignored and don't cause wake-ups (or just benign
> > > > spurious wakeups already specified).
> > > >
> > > > Users of FUTEX_WAKE_SPURIOUS should have to do very little compared to
> > > > when using FUTEX_WAKE. The only thing that they don't have anymore is
> > > > the ability to distinguish between a real wakeup and a spurious one.
> > > > Single-use FUTEX_WAITs could be affected, but we don't have them in
> > > > glibc. The only other benefit from being able to distinguish between
> > > > real and spurious is in combination with a timeout: If the wake-up is
> > > > real on a single-use futex, there's no need to check timeouts again.
> > > > But will programs want to use this often, and will they need to have to
> > > > use FUTEX_WAKE_SPURIOUS in this case? I guess not.
> > > >
> > > > Pros:
> > > > * Correct futex uses will need no changes.
> > > > Cons:
> > > > * Needs a new futex operation.
> > >
> > > I'm fine with this except for the return value. EINTR should never
> > > mean anything but "interrupted by signal". Especially if we're going
> > > to be exposing futex() to applications as a public API, which should
> > > be done, applications should be able to rely on EINTR always being
> > > "interrupted by signal" in the sense that it's acceptable to assume it
> > > doesn't happen if you're not using (interrupting) signal handlers and
> > > that it's okay to use a standard EINTR retry loop if you want to. This
> > > would not be valid if EINTR were overloaded with the above meaning.
> > >
> > > There are plenty of other errno codes that could be used without
> > > creating this problem. EINPROGRESS has good precedent as a "non-error"
> > > error condition, and seems like a reasonable choice, but I'm fine with
> > > anything that doesn't overload EINTR or other existing errors in ways
> > > that would break existing handling.
> > Given that glibc hasn't exposed an API for it, what would it break?
> The kernel has exposed an API for it, and non-glibc software is using
> it via syscall() and/or asm. Answering that question would require
> surveying all such software. However I'm not sure that the proposal
> for a new FUTEX_WAKE_SPURIOUS would not already break such users. If
> they really want to count wakes and are relying on existing futex wait
> semantics, a new error condition that returns spuriously at seemingly
> random times is potentially going to break things (although not quite
> as badly as a spurious return of zero).
My proposal above reuses EINTR. The futex man page states:
"Signals (see signal(7)) or other spurious wakeups cause FUTEX_WAIT to
fail with the error EINTR."
The source of "other spurious wake-ups isn't defined, so I don't see how
a program could reliably prevent them, or reason that they won't ever
appear. Thus, it seems that correct futex uses would have to be
prepared to handle EINTR.
> > I agree that this also means we could try to use other error coded, but
> > this needs to work with what the kernel ultimately provides. We may not
> > be able to distinguish between two conditions based on the kernel's
> > return values, in which case it wouldn't help if we created a new error
> > code because there'd be no way we can use it.
> > Do you have use cases for actually needing to know that a signal was the
> > cause of the interruption compared to, say, a spurious wake-up?
> One major one internal to libc would be implementing cancellation. You
> would want to re-check the cancellation flag and act on cancellation
> if the syscall were interrupted by a signal (the cancellation signal
> or otherwise) but not necessarily in other cases.
Why wouldn't the canceling thread be able to reach consensus with the
cancelled thread about whether cancellation is to happen or not using
just shared-memory synchronization?
> There are other ways to use interrupting signals similarly to
> cancellation where you actually want to know you were interrupted by a
> signal handler.
But how would you distinguish from "other spurious wakeups" that are