This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] New condvar implementation that provides stronger ordering guarantees.
- From: Rich Felker <dalias at libc dot org>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: GLIBC Devel <libc-alpha at sourceware dot org>, Marcus Shawcroft <marcus dot shawcroft at gmail dot com>, "Joseph S. Myers" <joseph at codesourcery dot com>, Richard Henderson <rth at redhat dot com>, Carlos O'Donell <codonell at redhat dot com>, Mike Frysinger <vapier at gentoo dot org>, David Holsgrove <david dot holsgrove at xilinx dot com>, Chung-Lin Tang <chunglin_tang at mentor dot com>, Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>, Andreas Krebbel <krebbel at linux dot ibm dot com>, Kaz Kojima <kkojima at rr dot iij4u dot or dot jp>, Chris Metcalf <cmetcalf at tilera dot com>, David Miller <davem at davemloft dot net>, Darren Hart <dvhart at infradead dot org>
- Date: Fri, 20 Feb 2015 14:20:23 -0500
- Subject: Re: [PATCH] New condvar implementation that provides stronger ordering guarantees.
- Authentication-results: sourceware.org; auth=none
- References: <1424456307 dot 20941 dot 122 dot camel at triegel dot csb>
On Fri, Feb 20, 2015 at 07:18:27PM +0100, Torvald Riegel wrote:
> TLDR: This is a new implementation for condition variables, required
> after http://austingroupbugs.net/view.php?id=609 to fix bug 13165. In
> essence, we need to be stricter in which waiters a signal or broadcast
> is required to wake up; this couldn't be solved using the old algorithm.
> ISO C++ made a similar clarification, so this also fixes a bug in
> current libstdc++, for example.
> This doesn't contain the existing requeue optimization (see below), and
> PI support is not worse (except no requeue).
First of all, I liked your detailed description of the problem and
your solution enough that I tweeted it. Thanks for putting in the
effort to document all this.
> There's another issue specific to condvars: ABA issues on the underlying
> futexes. Unlike mutexes that have just three states, or semaphores that
> have no tokens or a limited number of them, the state of a condvar is
> the *order* of the waiters. With a semaphore, we can grab whenever
> there is a token, and block in the futex_wait when there is none. With
> condvars, a waiter must not block if there had been more
> signals/broadcasts than waiters before it (ie, ordering matters).
I don't quite follow your claim that the order is part of the state.
My interpretation is that the state is merely the _set_ of waiters at
the instant of signal/broadcast. But perhaps thinking of order as
state makes sense if you're looking at this from the standpoint of a
weakly ordered memory model where such an "instant" does not make
sense without either forced serialization or order-as-data. Is this
what you mean?
> This algorithm satisfies the equivalent of the strong mutex destruction
> guarantee. However, unlike mutexes, because of spurious wake-ups being
> allowed a correct program has to effectively ensure that destruction
> happens after signals/broadcast calls return. Thus, the destruction
> requirement in POSIX is not as restrictive as with semaphores, but we
> still need to take care.
If the mutex is held when changing the state on which the predicate
depends and signaling, it's possible for a woken waiter to determine
whether a subsequent signal is possible simply from observing the
predicate. So I think there are some situations where destruction
after pthread_cond_wait returns makes sense, but I don't think they're
If the signal is made while the mutex is not held, there is absolutely
no way for the woken thread to determine whether the call to signal
has already been made or not, so I agree that it's invalid to destroy
in this case.
> If you want to dive into the code, it's best to start with the comments
> on top of __pthread_cond_wait_common in nptl/pthread_cond_wait.c.
Thanks. I haven't read it yet but I look forward to seeing how you did
> This condvar doesn't yet use a requeue optimization (ie, on a broadcast,
> waking just one thread and requeueing all others on the futex of the
> mutex supplied by the program). I don't think doing the requeue is
> necessarily the right approach (but I haven't done real measurements
> * If a program expects to wake many threads at the same time and make
> that scalable, a condvar isn't great anyway because of how it requires
> waiters to operate mutually exclusive (due to the mutex usage). Thus, a
> thundering herd problem is a scalability problem with or without the
> optimization. Using something like a semaphore might be more
> appropriate in such a case.
> * The scalability problem is actually at the mutex side; the condvar
> could help (and it tries to with the requeue optimization), but it
> should be the mutex who decides how that is done, and whether it is done
> at all.
> * Forcing all but one waiter into the kernel-side wait queue of the
> mutex prevents/avoids the use of lock elision on the mutex. Thus, it
> prevents the only cure against the underlying scalability problem
> inherent to condvars.
> * If condvars use short critical sections (ie, hold the mutex just to
> check a binary flag or such), which they should do ideally, then forcing
> all those waiter to proceed serially with kernel-based hand-off (ie,
> futex ops in the mutex' contended state, via the futex wait queues) will
> be less efficient than just letting a scalable mutex implementation take
> care of it. Our current mutex impl doesn't employ spinning at all, but
> if critical sections are short, spinning can be much better.
> * Doing the requeue stuff requires all waiters to always drive the mutex
> into the contended state. This leads to each waiter having to call
> futex_wake after lock release, even if this wouldn't be necessary.
This is very interesting.
Unless I'm misremembering, my experience has been that requeue is
helpful not for scaling to massive numbers of threads but when the
number of threads is slightly to moderately over the number of
physical cores. This should probably be tested.