This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] New condvar implementation that provides stronger ordering guarantees.

From: Torvald Riegel <triegel at redhat dot com>
To: OndÅej BÃlka <neleai at seznam dot cz>
Cc: GLIBC Devel <libc-alpha at sourceware dot org>, Marcus Shawcroft <marcus dot shawcroft at gmail dot com>, "Joseph S. Myers" <joseph at codesourcery dot com>, Richard Henderson <rth at redhat dot com>, "Carlos O'Donell" <codonell at redhat dot com>, Mike Frysinger <vapier at gentoo dot org>, David Holsgrove <david dot holsgrove at xilinx dot com>, Chung-Lin Tang <chunglin_tang at mentor dot com>, Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>, Andreas Krebbel <krebbel at linux dot ibm dot com>, Kaz Kojima <kkojima at rr dot iij4u dot or dot jp>, Chris Metcalf <cmetcalf at tilera dot com>, David Miller <davem at davemloft dot net>, Darren Hart <dvhart at infradead dot org>
Date: Thu, 02 Jul 2015 15:25:02 +0200
Subject: Re: [PATCH] New condvar implementation that provides stronger ordering guarantees.
Authentication-results: sourceware.org; auth=none
References: <1424456307 dot 20941 dot 122 dot camel at triegel dot csb> <1431713889 dot 25070 dot 17 dot camel at triegel dot csb> <20150701221513 dot GA29236 at domone>

On Thu, 2015-07-02 at 00:15 +0200, OndÅej BÃlka wrote:
> On Fri, May 15, 2015 at 08:18:09PM +0200, Torvald Riegel wrote:
> > Ping, 3 months later.
> > 
> > I believe Siddhesh has been testing this in rawhide, and at least I
> > haven't heard of any breakage.  Nonetheless, the new implementation is
> > complex, so getting an actual review before this gets checked in would
> > be good.  At the very least, it would be good if someone could confirm
> > that the comments are sufficient and easy to follow; alternatively,
> > please complain if you think more detail is needed.
> > 
> > On Fri, 2015-02-20 at 19:18 +0100, Torvald Riegel wrote:
> > > TLDR: This is a new implementation for condition variables, required
> > > after http://austingroupbugs.net/view.php?id=609 to fix bug 13165.  In
> > > essence, we need to be stricter in which waiters a signal or broadcast
> > > is required to wake up; this couldn't be solved using the old algorithm.
> > > ISO C++ made a similar clarification, so this also fixes a bug in
> > > current libstdc++, for example.
> > > This doesn't contain the existing requeue optimization (see below), and
> > > PI support is not worse (except no requeue).
> > > 
> > > 
> > > Arch maintainers: This adapts each arch's definition of pthread_cond_t.
> > > Only x86, x86_64, and hppa have significant arch-specific changes.  I'd
> > > appreciate review considering we want to stay compatible with old
> > > initializers, and we don't want existing padding to alias with bytes
> > > that are now used for real fields (see text on pthread_cond_t below for
> > > details).
> > > 
> > > 
> > > And here's the detailed version :)
> > > 
> > > We can't use the old algorithm, which tries to avoid spurious wake-ups,
> > > anymore because there's no way (AFAICT) to wake in FIFO order from a
> > > futex (the current Linux implementation may do today, but it's not
> > > guaranteed). 
> 
> And what is performance difference between old algorithm and proposed
> one?

First of all, this is a bug fix, so any non-catastrophic performance
concerns are secondary.

Regarding your question, which performance aspect are you interested in?
Roughly, I'd say that the new algorithm should be faster and more
scalable in the common use cases (e.g., compare the critical sections
vs. just operating on wseq and ssent).

> If there is noticable difference wouldn't be better to write patch
> into kernel to for example AT_MONOTONE_FUTEX auxval set to 1 and when
> kernel developers decide for nonmonotone wakeup they will set that to 0.

That doesn't fix the problem on existing kernels.
 
> > > * The scalability problem is actually at the mutex side; the condvar
> > > could help (and it tries to with the requeue optimization), but it
> > > should be the mutex who decides how that is done, and whether it is done
> > > at all.
> > > * Forcing all but one waiter into the kernel-side wait queue of the
> > > mutex prevents/avoids the use of lock elision on the mutex.  Thus, it
> > > prevents the only cure against the underlying scalability problem
> > > inherent to condvars.
> 
> How exactly that harms lock elision? 

You serialize the woken-up waiters, and lock elision tries to run
non-conflicting critical sections in parallel.  Thus, serializing
up-front prevents lock elision from trying to run them in parallel.

> > > * If condvars use short critical sections (ie, hold the mutex just to
> > > check a binary flag or such), which they should do ideally, then forcing
> > > all those waiter to proceed serially with kernel-based hand-off (ie,
> > > futex ops in the mutex' contended state, via the futex wait queues) will
> > > be less efficient than just letting a scalable mutex implementation take
> > > care of it.  Our current mutex impl doesn't employ spinning at all, but
> > > if critical sections are short, spinning can be much better.
> 
> That looks too complicate code much, how do you want to pass information
> do differentiate between signal/broadcast?

I don't understand your question.  How does it relate to my paragraph
you replied to?

> > > * Doing the requeue stuff requires all waiters to always drive the mutex
> > > into the contended state.  This leads to each waiter having to call
> > > futex_wake after lock release, even if this wouldn't be necessary.
> > >
> 
> That is most important point. My hypotesis is that mutex will almost
> always be unlocked for threads after wake.

Well, that depends on how both the mutex and the condvar are used,
actually.

> Here question is how does
> wake and scheduler interact. Here bad case would be effective wake that
> could simultaneously schedule threads to free cores and all would
> collide. A better deal would be if there would be 1000 cycle delay
> between different cores as when next thread tried to lock previous
> thread would be already done.

I don't understand those sentences.

> So how that behaves in practice?

I don't understand this question.

> > > PI support is "kind of" included.  There is no internal lock anymore, so
> > > the thing that Darren proposed the fix for is gone.  So far so good;
> > > however, we don't requeue, and Darren's paper states that requeue would
> > > yield better latency in the PI scenario (is this still the case?).
> > >
> You have problem that when kernel keeps FIFO api requeue gives you fairness while
> with waking everything a important thread could be buried by lower
> priority threads that after each broadcast do something small and wait
> for broadcast again. 

If you wake several threads, then the those with highest priority will
run.

Follow-Ups:
- Re: [PATCH] New condvar implementation that provides stronger ordering guarantees.
  - From: OndÅej BÃlka

References:
- Re: [PATCH] New condvar implementation that provides stronger ordering guarantees.
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]