This is the mail archive of the
mailing list for the glibc project.
Re: [RFC] mutex destruction (#13690): problem description and workarounds
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: Rich Felker <dalias at libc dot org>
- Cc: Torvald Riegel <triegel at redhat dot com>, GLIBC Devel <libc-alpha at sourceware dot org>
- Date: Wed, 03 Dec 2014 14:38:11 -0500
- Subject: Re: [RFC] mutex destruction (#13690): problem description and workarounds
- Authentication-results: sourceware.org; auth=none
- References: <1396621230 dot 10643 dot 7191 dot camel at triegel dot csb> <20141201153802 dot GV29621 at brightrain dot aerifal dot cx> <1417452125 dot 1771 dot 503 dot camel at triegel dot csb> <20141201170542 dot GY29621 at brightrain dot aerifal dot cx> <1417467150 dot 1771 dot 581 dot camel at triegel dot csb> <20141201212223 dot GZ29621 at brightrain dot aerifal dot cx> <1417553118 dot 3930 dot 14 dot camel at triegel dot csb> <547F1734 dot 40903 at redhat dot com> <20141203143357 dot GH4574 at brightrain dot aerifal dot cx>
On 12/03/2014 09:33 AM, Rich Felker wrote:
>>> EINTR is a 'may fail'. POSIX states that sem_wait is interruptible, but
>>> I read this as allowing interruption, not requiring it.
>> Careful. The 'may fail' are implementation optional parts, and at present
>> glibc AFAIK fails only in the intended case which is for non-SA_RESTART
>> signals interrupting the futex and returning EINTR. By allowing a futex
>> to spuriously fail without a signal would break conforming uses of sem_wait.
> I think Torvald's point was that we could avoid the issue by taking
> the liberty not to fail on EINTR at all, since it's an optional error.
> However I don't like changing outward behavior as a workaround for bad
> design, especially when the bad design can be avoided. If it's really
> preferable not to fail on EINTR (which it might be), this should be a
> change considered independently.
I don't see any immediate problem in changing the implementation to
never return EINTR e.g. sem_wait is uninterruptable except by a sem_post
from the signal handler. The argument Torvald uses here is that there is
simply no guarantee that sem_wait has been entered and thus any signal
may be seen to arrive before sem_wait. The only counter-example is a bad
one, in that a user might use the timed wait to create a happens-after
relationship under the expectation that in 99.9999% of the cases the
sem_timedwait has been entered, the absolute timeout time is not exceeded,
and expect a signal to cancel the wait. The fix is always to call sem_post
from the interrupting signal handler.
However, I have a tendency to agree with you here, that the real fix is
to disambiguate the kernel signal interrupt and spurious wakeup somehow.