This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] New condvar implementation that provides stronger ordering guarantees.
- From: Rich Felker <dalias at libc dot org>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: GLIBC Devel <libc-alpha at sourceware dot org>, Marcus Shawcroft <marcus dot shawcroft at gmail dot com>, "Joseph S. Myers" <joseph at codesourcery dot com>, Richard Henderson <rth at redhat dot com>, Carlos O'Donell <codonell at redhat dot com>, Mike Frysinger <vapier at gentoo dot org>, David Holsgrove <david dot holsgrove at xilinx dot com>, Chung-Lin Tang <chunglin_tang at mentor dot com>, Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>, Andreas Krebbel <krebbel at linux dot ibm dot com>, Kaz Kojima <kkojima at rr dot iij4u dot or dot jp>, Chris Metcalf <cmetcalf at tilera dot com>, David Miller <davem at davemloft dot net>, Darren Hart <dvhart at infradead dot org>
- Date: Mon, 23 Feb 2015 12:59:39 -0500
- Subject: Re: [PATCH] New condvar implementation that provides stronger ordering guarantees.
- Authentication-results: sourceware.org; auth=none
- References: <1424456307 dot 20941 dot 122 dot camel at triegel dot csb> <20150222223722 dot GA23507 at brightrain dot aerifal dot cx> <1424690809 dot 22790 dot 32 dot camel at triegel dot csb>
On Mon, Feb 23, 2015 at 12:26:49PM +0100, Torvald Riegel wrote:
> On Sun, 2015-02-22 at 17:37 -0500, Rich Felker wrote:
> > On Fri, Feb 20, 2015 at 07:18:27PM +0100, Torvald Riegel wrote:
> > > + Limitations:
> > > + * This condvar isn't designed to allow for more than
> > > + WSEQ_THRESHOLD * (1 << (sizeof(GENERATION) * 8 - 1)) calls to
> > > + __pthread_cond_wait. It probably only suffers from potential ABA issues
> > > + afterwards, but this hasn't been checked nor tested.
> > > + * More than (1 << (sizeof(QUIESCENCE_WAITERS) * 8) -1 concurrent waiters
> > > + are not supported.
> > > + * Beyond what is allowed as errors by POSIX or documented, we can also
> > > + return the following errors:
> > > + * EPERM if MUTEX is a recursive mutex and the caller doesn't own it.
> > This is not beyond POSIX; it's explicitly specified as a "shall fail".
> The mutex type is PTHREAD_MUTEX_ERRORCHECK or the mutex is a
> robust mutex, and the current thread does not own the mutex.
> POSIX does not seem to allow EPERM for *recursive mutexes*. Is there an
> update that I'm missing?
Well it doesn't specifically require it for recursive (I missed that)
but it also doesn't disallow it.
> > > + * EOWNERDEAD or ENOTRECOVERABLE when using robust mutexes. Unlike
> > > + for other errors, this can happen when we re-acquire the mutex; this
> > > + isn't allowed by POSIX (which requires all errors to virtually happen
> > > + before we release the mutex or change the condvar state), but there's
> > > + nothing we can do really.
> > Likewise these are "shall fail" errors specified by POSIX, and while
> > it's not clearly written in the specification, it's clear that they
> > only happen on re-locking.
> Yes, they are "shall fail". I also agree that POSIX *should* make it
> clear that they can happen after releasing and when acquiring the mutex
> again -- but that's not what the spec says:
> "Except in the case of [ETIMEDOUT], all these error checks shall act as
> if they were performed immediately at the beginning of processing for
> the function and shall cause an error return, in effect, prior to
> modifying the state of the mutex[...]"
OK, then I think that text is a bug. There's no way that mutex locking
errors could meaningful before the mutex is unlocked.
> Until these two get clarified in the spec, I consider the comments
> correct. We can certainly extend them and document why we thing this
> behavior is The Right Thing To Do. But we need to document where we
> deviate from what the spec states literally.
Yes. Would you like to submit the bug report or should I?
> > One ugly case I don't think you're prepared to handle is when
> > EOWNERDEAD happens in the cancellation cleanup path. There's no way
> > for the caller to know this error happened and thereby no way for it
> > to recover the state and make it consistent before unlocking the
> > mutex.
> That's true. There's "XXX" comments on that in the cancellation
> handlers in the patch; we need to decide whether we ignore such errors
> or abort. I'm kind of undecided right now what's the most useful
> behavior in practice.
The most useful behavior in practice is to opt for a spurious wake
that's formally "just before the cancellation request arrived" (note:
there's no ordering relationship possible to preclude this) and leave
the cancellation request pending. However to do that you need a
mechanism for getting cancellation to throw you out of the wait
without acting on it, or a way to stop acting on cancellation once
you've started -- which could be done with sjlj hacks (but those kill
performance in the common case), or possibly with unwinding hacks, or
with the following...
> > The new cancellation state I'm proposing (PTHREAD_CANCEL_MASKED; see
> > http://www.openwall.com/lists/musl/2015/02/22/1 for details) makes it
> > possible to solve this problem by refusing to act on cancellation
> > (leaving it pending), and instead returning as if by a spurious wake
> > occurring just before cancellation, so that the error can be reported.
And if you don't want to add or depend on an API like this internally
in glibc, it would also be possible just to make a special variant of
the futex syscall for internal glibc use that returns an error rather
than acting on cancellation when cancellation happens. This (or adding
the proposed new masked mode) could be done along with the pending
cancellation race overhaul.
> > > + Other notes:
> > > + * Instead of the normal mutex unlock / lock functions, we use
> > > + __pthread_mutex_unlock_usercnt(m, 0) / __pthread_mutex_cond_lock(m)
> > > + because those will not change the mutex-internal users count, so that it
> > > + can be detected when a condvar is still associated with a particular
> > > + mutex because there is a waiter blocked on this condvar using this mutex.
> > I don't follow. What is this users count? Is it the recursive lock
> > count or something else? Why do you want to be able to detect that
> > it's associated with a cv? Note that it could be associated with more
> > than one cv.
> I've maintained what the previous code did. There's this check in
> if ((mutex->__data.__kind & PTHREAD_MUTEX_ROBUST_NORMAL_NP) == 0
> && mutex->__data.__nusers != 0)
> return EBUSY;
> That's not required (but undefined behavior is allowed by POSIX for
> destroying a mutex that is referenced by a concurrent condvar
> It might be better to remove this check, mostly to avoid unnecessary
> writes to the mutex.
I see. I suspect checks like this are going to lead to bad performance
and/or bugs (e.g. interaction with lock elision, etc.) so my
preference would be to remove them.