This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Is this pthread_cond_wait() race possible


On 2017-06-08 18:50:58 [+0200], Torvald Riegel wrote:
> On Thu, 2017-06-08 at 17:45 +0200, Sebastian Andrzej Siewior wrote:
> > On 2017-06-08 14:28:09 [+0200], Torvald Riegel wrote:
> > > > The counterpart (pthread_cond_signal()) uses __condvar_acquire_lock()
> > > > which is a handcrafted lock with atomics and two bits. Do you have an
> > > > integer field left?
> > > 
> > > No, otherwise I would have avoided the complexity of embedding the lock
> > > bits :)  One could rearrange the bits in some other way to get a full
> > > 32b for the PI lock -- but I don't think that's the fundamental problem.
> > > 
> > > > This seems to be only part that would break PI. That
> > > > is a low priority thread that owns the lock (and is scheduled out) will
> > > > make every subsequent thread end up in futex_wait(). An integer with
> > > > futex_lock_pi() should fix that.
> > > 
> > > Have you looked at the presentation I gave at Real-Time Summit last
> > > year?  I think the problem we're facing re PI support is more
> > > fundamental than getting a PI lock in there.
> > 
> > You refer to
> >   https://wiki.linuxfoundation.org/_media/realtime/events/rt-summit2016/pthread-condvars-posix-compliance-and-the-pi-gap_darren-hart_torvald-riegel.pdf
> > 
> > the last two slides?
> 
> Yes.  Do these bring across what the underlying algorithmic problem is?

Unfortunately not.

> > FUTEX_WAIT_REQUEUE_PI / FUTEX_CMP_REQUEUE_PI handled the user-mutex.
> > This is gone now. I don't know how much trouble it will cost us. Let's
> > see…
> > In the old implementation
> 
> The old implementation did not handle all cases correctly.
I do not question this. I wanted to point out that in the old code we
had the PI part.

> > if you had a high-prio task in
> > pthread_cond_wait() then it already had the mutex on its return path and
> > it would boost the waker if needed. We don't have this "fast" kernel
> > boost, only after the task has been woken up and is going for the
> > userland mutex (which should be owned by the pthread_cond_signal()
> > caller). So this context switch is not optimal. *But* the signal path
> > does not hold any locks which made you lose PI while trying to obtain
> > the internal condvar mutex via futex_wait().
> 
> I can't follow you.  What are you trying to say?

Say you have a threads with low, medium and high priority.
Thread L gets the lock (the self made with two bits).
Thread M pushes L away and does something else.
Thread H wants the lock L is holding. Instead of boosting L's priority
(as we would have with a PI mutex) it just goes into FUTEX_WAIT und
waits and M gets on the CPU. This means the H waits until L is runnable
again.

> > In the broadcast case all threads will compete for the one lock so
> > thundering herd here we are. But ideally the thread with highest
> > priority should get the lock first.
> > So based on that, the only thing I see is, that you end up in
> > futex_wait() via the one lock in pthread_cond_signal() (or
> > pthread_cond_timed_wait() ETIMEDOUT path).
> 
> I can't follow.  What is the "thing" you see, a problem, or specific
> execution, or something else?

based on review currently the two bit lock.

> I think it would make sense to focus on correctness first.
What is there still missing correctness wise?

Sebastian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]