This is the mail archive of the
mailing list for the glibc project.
Re: [RFC] mutex destruction (#13690): problem description and workarounds
- From: Rich Felker <dalias at libc dot org>
- To: Carlos O'Donell <carlos at redhat dot com>
- Cc: Torvald Riegel <triegel at redhat dot com>, GLIBC Devel <libc-alpha at sourceware dot org>
- Date: Wed, 3 Dec 2014 09:33:57 -0500
- Subject: Re: [RFC] mutex destruction (#13690): problem description and workarounds
- Authentication-results: sourceware.org; auth=none
- References: <1396621230 dot 10643 dot 7191 dot camel at triegel dot csb> <20141201153802 dot GV29621 at brightrain dot aerifal dot cx> <1417452125 dot 1771 dot 503 dot camel at triegel dot csb> <20141201170542 dot GY29621 at brightrain dot aerifal dot cx> <1417467150 dot 1771 dot 581 dot camel at triegel dot csb> <20141201212223 dot GZ29621 at brightrain dot aerifal dot cx> <1417553118 dot 3930 dot 14 dot camel at triegel dot csb> <547F1734 dot 40903 at redhat dot com>
On Wed, Dec 03, 2014 at 08:59:16AM -0500, Carlos O'Donell wrote:
> On 12/02/2014 03:45 PM, Torvald Riegel wrote:
> >> I think this is incorrect documentation. I cannot find any hint at
> >> what other sort of "other spurious wake-ups" could cause EINTR.
> > But that's no reason to not have it. I think it makes perfect sense to
> > allow for spurious wake-ups, especially for futexes. Even if currently
> > there's no case in which there would be a spurious wake-up, it's safer
> > to have an error code that allows it so that if you need to have a
> > spurious wake-up later on, you have a way to delegate the issue to the
> > caller -- which, for futexes, is perfectly fine due to how they are
> > designed.
> What is the reason we want spurious wake-up allowed? Simply to make
> the mutex destruction issue simpler to fix? I'd like to see a new thread
> started with a clear description of what spurious wake-up buys the
See your own reply to Torvald's options where you preferred 1a. If you
want to avoid spurious wakeups, you need something like option 2 where
the kernel is responsible for the atomic operation (store) that
releases the lock and performs it after getting the futex key/hash and
locking the associated futex bin. This increases the latency to
unlock (in the contended case only, of course) by the cost of syscall
entry and futex hashing.
> >>>> There are other ways to use interrupting signals similarly to
> >>>> cancellation where you actually want to know you were interrupted by a
> >>>> signal handler.
> >>> But how would you distinguish from "other spurious wakeups" that are
> >>> currently allowed?
> >> As far as I can tell that's just ***-covering by the man page with no
> >> basis in what the kernel actually does. If there are actually current
> >> situations under which it can produce EINTR without a signal, that's
> >> very bad. For instance sem_wait must return EINTR when actually
> >> interrupted by a non-SA_RESTART signal, but it's forbidden from
> >> returning EINTR if that didn't happen.
> > EINTR is a 'may fail'. POSIX states that sem_wait is interruptible, but
> > I read this as allowing interruption, not requiring it.
> Careful. The 'may fail' are implementation optional parts, and at present
> glibc AFAIK fails only in the intended case which is for non-SA_RESTART
> signals interrupting the futex and returning EINTR. By allowing a futex
> to spuriously fail without a signal would break conforming uses of sem_wait.
I think Torvald's point was that we could avoid the issue by taking
the liberty not to fail on EINTR at all, since it's an optional error.
However I don't like changing outward behavior as a workaround for bad
design, especially when the bad design can be avoided. If it's really
preferable not to fail on EINTR (which it might be), this should be a
change considered independently.