This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] mutex destruction (#13690): problem description and workarounds
- From: Rich Felker <dalias at libc dot org>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: GLIBC Devel <libc-alpha at sourceware dot org>
- Date: Mon, 1 Dec 2014 12:05:42 -0500
- Subject: Re: [RFC] mutex destruction (#13690): problem description and workarounds
- Authentication-results: sourceware.org; auth=none
- References: <1396621230 dot 10643 dot 7191 dot camel at triegel dot csb> <20141201153802 dot GV29621 at brightrain dot aerifal dot cx> <1417452125 dot 1771 dot 503 dot camel at triegel dot csb>
On Mon, Dec 01, 2014 at 05:42:05PM +0100, Torvald Riegel wrote:
> On Mon, 2014-12-01 at 10:38 -0500, Rich Felker wrote:
> > On Fri, Apr 04, 2014 at 04:20:30PM +0200, Torvald Riegel wrote:
> > > === Workaround 1a: New FUTEX_WAKE_SPURIOUS operation that avoids the
> > > specification change
> > >
> > > This is like Workaround 1, except that the kernel could add a new futex
> > > op that works like FUTEX_WAKE except that:
> > > * FUTEX_WAITs woken up by a FUTEX_WAKE_SPURIOUS will always return
> > > EINTR. EINTR for spurious wakeups is already part of the spec, so
> > > correct futex users are already handling this (e.g., glibc does).
> > > * Make sure (and specify) that FUTEX_WAKE_SPURIOUS that hit other
> > > futexes (e.g., PI) are ignored and don't cause wake-ups (or just benign
> > > spurious wakeups already specified).
> > >
> > > Users of FUTEX_WAKE_SPURIOUS should have to do very little compared to
> > > when using FUTEX_WAKE. The only thing that they don't have anymore is
> > > the ability to distinguish between a real wakeup and a spurious one.
> > > Single-use FUTEX_WAITs could be affected, but we don't have them in
> > > glibc. The only other benefit from being able to distinguish between
> > > real and spurious is in combination with a timeout: If the wake-up is
> > > real on a single-use futex, there's no need to check timeouts again.
> > > But will programs want to use this often, and will they need to have to
> > > use FUTEX_WAKE_SPURIOUS in this case? I guess not.
> > >
> > > Pros:
> > > * Correct futex uses will need no changes.
> > > Cons:
> > > * Needs a new futex operation.
> >
> > I'm fine with this except for the return value. EINTR should never
> > mean anything but "interrupted by signal". Especially if we're going
> > to be exposing futex() to applications as a public API, which should
> > be done, applications should be able to rely on EINTR always being
> > "interrupted by signal" in the sense that it's acceptable to assume it
> > doesn't happen if you're not using (interrupting) signal handlers and
> > that it's okay to use a standard EINTR retry loop if you want to. This
> > would not be valid if EINTR were overloaded with the above meaning.
> >
> > There are plenty of other errno codes that could be used without
> > creating this problem. EINPROGRESS has good precedent as a "non-error"
> > error condition, and seems like a reasonable choice, but I'm fine with
> > anything that doesn't overload EINTR or other existing errors in ways
> > that would break existing handling.
>
> Given that glibc hasn't exposed an API for it, what would it break?
The kernel has exposed an API for it, and non-glibc software is using
it via syscall() and/or asm. Answering that question would require
surveying all such software. However I'm not sure that the proposal
for a new FUTEX_WAKE_SPURIOUS would not already break such users. If
they really want to count wakes and are relying on existing futex wait
semantics, a new error condition that returns spuriously at seemingly
random times is potentially going to break things (although not quite
as badly as a spurious return of zero).
> I agree that this also means we could try to use other error coded, but
> this needs to work with what the kernel ultimately provides. We may not
> be able to distinguish between two conditions based on the kernel's
> return values, in which case it wouldn't help if we created a new error
> code because there'd be no way we can use it.
>
> Do you have use cases for actually needing to know that a signal was the
> cause of the interruption compared to, say, a spurious wake-up?
One major one internal to libc would be implementing cancellation. You
would want to re-check the cancellation flag and act on cancellation
if the syscall were interrupted by a signal (the cancellation signal
or otherwise) but not necessarily in other cases.
There are other ways to use interrupting signals similarly to
cancellation where you actually want to know you were interrupted by a
signal handler.
Also, if there's ever any interest in having the futex API (or a
subset of it) proposed for inclusion in standards (POSIX) or other
operating systems, this kind of overloading of EINTR would be a very
negative property that would stand in the way of adoption, I think.
Rich