This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Is this pthread_cond_wait() race possible
On Thu, 2017-06-08 at 17:45 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-08 14:28:09 [+0200], Torvald Riegel wrote:
> > > The counterpart (pthread_cond_signal()) uses __condvar_acquire_lock()
> > > which is a handcrafted lock with atomics and two bits. Do you have an
> > > integer field left?
> >
> > No, otherwise I would have avoided the complexity of embedding the lock
> > bits :) One could rearrange the bits in some other way to get a full
> > 32b for the PI lock -- but I don't think that's the fundamental problem.
> >
> > > This seems to be only part that would break PI. That
> > > is a low priority thread that owns the lock (and is scheduled out) will
> > > make every subsequent thread end up in futex_wait(). An integer with
> > > futex_lock_pi() should fix that.
> >
> > Have you looked at the presentation I gave at Real-Time Summit last
> > year? I think the problem we're facing re PI support is more
> > fundamental than getting a PI lock in there.
>
> You refer to
> https://wiki.linuxfoundation.org/_media/realtime/events/rt-summit2016/pthread-condvars-posix-compliance-and-the-pi-gap_darren-hart_torvald-riegel.pdf
>
> the last two slides?
Yes. Do these bring across what the underlying algorithmic problem is?
> FUTEX_WAIT_REQUEUE_PI / FUTEX_CMP_REQUEUE_PI handled the user-mutex.
> This is gone now. I don't know how much trouble it will cost us. Let's
> see…
> In the old implementation
The old implementation did not handle all cases correctly.
> if you had a high-prio task in
> pthread_cond_wait() then it already had the mutex on its return path and
> it would boost the waker if needed. We don't have this "fast" kernel
> boost, only after the task has been woken up and is going for the
> userland mutex (which should be owned by the pthread_cond_signal()
> caller). So this context switch is not optimal. *But* the signal path
> does not hold any locks which made you lose PI while trying to obtain
> the internal condvar mutex via futex_wait().
I can't follow you. What are you trying to say?
> In the broadcast case all threads will compete for the one lock so
> thundering herd here we are. But ideally the thread with highest
> priority should get the lock first.
> So based on that, the only thing I see is, that you end up in
> futex_wait() via the one lock in pthread_cond_signal() (or
> pthread_cond_timed_wait() ETIMEDOUT path).
I can't follow. What is the "thing" you see, a problem, or specific
execution, or something else?
I think it would make sense to focus on correctness first.