This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] mutex destruction (#13690): problem description and workarounds
- From: Torvald Riegel <triegel at redhat dot com>
- To: Rich Felker <dalias at libc dot org>
- Cc: "Carlos O'Donell" <carlos at redhat dot com>, GLIBC Devel <libc-alpha at sourceware dot org>
- Date: Thu, 04 Dec 2014 19:54:06 +0100
- Subject: Re: [RFC] mutex destruction (#13690): problem description and workarounds
- Authentication-results: sourceware.org; auth=none
- References: <20141201153802 dot GV29621 at brightrain dot aerifal dot cx> <1417452125 dot 1771 dot 503 dot camel at triegel dot csb> <20141201170542 dot GY29621 at brightrain dot aerifal dot cx> <1417467150 dot 1771 dot 581 dot camel at triegel dot csb> <20141201212223 dot GZ29621 at brightrain dot aerifal dot cx> <1417553118 dot 3930 dot 14 dot camel at triegel dot csb> <20141202210316 dot GI29621 at brightrain dot aerifal dot cx> <547F17E3 dot 9060901 at redhat dot com> <1417703533 dot 22797 dot 16 dot camel at triegel dot csb> <5480807D dot 3040309 at redhat dot com> <20141204173402 dot GQ4574 at brightrain dot aerifal dot cx>
On Thu, 2014-12-04 at 12:34 -0500, Rich Felker wrote:
> On Thu, Dec 04, 2014 at 10:40:45AM -0500, Carlos O'Donell wrote:
> > On 12/04/2014 09:32 AM, Torvald Riegel wrote:
> > >> I agree. The conflation of EINTR for non-signal use is IMO going to be
> > >> a design decision we regret in the future.
> > >
> > > I'd rather see the fault in POSIX semantics, and it not making it clear
> > > that signal handlers should do sem_post if they need to reliably
> > > interrupt a sem_wait.
> >
> > If we are going to disallow a signal to interrupt sem_post we should just
> > change the semantics, version the interface, and document that glibc no
> > longer ever returns EINTR for sem_wait, and that the right way to interrupt
> > it is with a signal handler that does sem_post.
> >
> > This prevents users from complaining that what they observe with strace
> > and gdb is a signal arriving after the sem_wait, but not interrupting it.
> > We can claim the user is looking under the hood, but that's what they do,
> > and if we can possibly avoid those arguments we win. We know we're right,
> > we know we don't want to allow timing to imply ordering, but we need time
> > to educate developers (and that looking under the hood leads to non-obvious
> > observations).
> >
> > I really wish the kernel returned some other error code for woken up
> > vs. signal. Is it not possible to get the kernel to distinguish these
> > two? Am I forgetting something?
>
> It *DOES*. It returns 0 for woken-up, and EINTR for
> interrupted-by-signal.
No. See man 2 futex, return values of FUTEX_WAIT:
"Signals (see signal(7)) or other spurious wakeups cause FUTEX_WAIT
to fail with the error EINTR."
The LKML message that expanded on other error codes states that existing
wording for FUTEX_WAIT "seems ok": https://lkml.org/lkml/2014/5/15/356
So, EINTR is currently documented as happening *either* due to a signal
or spuriously.
> Torvald is proposing breaking the kernel by
> adding non-signal situations under which it can return EINTR.
No. My proposal does *not* break anything but just uses an error
condition that is documented today. If it turns out the documentation
is wrong, this would change.
> The
> kernel folks went to a lot of trouble to FIX all of the wrong
> conditions under which EINTR could be returned (see man 7 signal) and
> it would be a huge shame to undo that now.
Then maybe you should talk to the kernel folks to decide whether the
current futex documentation is wrong in this regard or not.