This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Is this pthread_cond_wait() race possible
On 2017-06-08 18:50:58 [+0200], Torvald Riegel wrote:
> On Thu, 2017-06-08 at 17:45 +0200, Sebastian Andrzej Siewior wrote:
> > On 2017-06-08 14:28:09 [+0200], Torvald Riegel wrote:
> > > > The counterpart (pthread_cond_signal()) uses __condvar_acquire_lock()
> > > > which is a handcrafted lock with atomics and two bits. Do you have an
> > > > integer field left?
> > >
> > > No, otherwise I would have avoided the complexity of embedding the lock
> > > bits :) One could rearrange the bits in some other way to get a full
> > > 32b for the PI lock -- but I don't think that's the fundamental problem.
> > >
> > > > This seems to be only part that would break PI. That
> > > > is a low priority thread that owns the lock (and is scheduled out) will
> > > > make every subsequent thread end up in futex_wait(). An integer with
> > > > futex_lock_pi() should fix that.
> > >
> > > Have you looked at the presentation I gave at Real-Time Summit last
> > > year? I think the problem we're facing re PI support is more
> > > fundamental than getting a PI lock in there.
> >
> > You refer to
> > https://wiki.linuxfoundation.org/_media/realtime/events/rt-summit2016/pthread-condvars-posix-compliance-and-the-pi-gap_darren-hart_torvald-riegel.pdf
> >
> > the last two slides?
>
> Yes. Do these bring across what the underlying algorithmic problem is?
Unfortunately not.
> > FUTEX_WAIT_REQUEUE_PI / FUTEX_CMP_REQUEUE_PI handled the user-mutex.
> > This is gone now. I don't know how much trouble it will cost us. Let's
> > see…
> > In the old implementation
>
> The old implementation did not handle all cases correctly.
I do not question this. I wanted to point out that in the old code we
had the PI part.
> > if you had a high-prio task in
> > pthread_cond_wait() then it already had the mutex on its return path and
> > it would boost the waker if needed. We don't have this "fast" kernel
> > boost, only after the task has been woken up and is going for the
> > userland mutex (which should be owned by the pthread_cond_signal()
> > caller). So this context switch is not optimal. *But* the signal path
> > does not hold any locks which made you lose PI while trying to obtain
> > the internal condvar mutex via futex_wait().
>
> I can't follow you. What are you trying to say?
Say you have a threads with low, medium and high priority.
Thread L gets the lock (the self made with two bits).
Thread M pushes L away and does something else.
Thread H wants the lock L is holding. Instead of boosting L's priority
(as we would have with a PI mutex) it just goes into FUTEX_WAIT und
waits and M gets on the CPU. This means the H waits until L is runnable
again.
> > In the broadcast case all threads will compete for the one lock so
> > thundering herd here we are. But ideally the thread with highest
> > priority should get the lock first.
> > So based on that, the only thing I see is, that you end up in
> > futex_wait() via the one lock in pthread_cond_signal() (or
> > pthread_cond_timed_wait() ETIMEDOUT path).
>
> I can't follow. What is the "thing" you see, a problem, or specific
> execution, or something else?
based on review currently the two bit lock.
> I think it would make sense to focus on correctness first.
What is there still missing correctness wise?
Sebastian