This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH][BZ #20973] Robust mutexes: Fix lost wake-up.
- From: Carlos O'Donell <carlos at redhat dot com>
- To: Torvald Riegel <triegel at redhat dot com>, GLIBC Devel <libc-alpha at sourceware dot org>
- Cc: vl at samba dot org, Michael Adam <madam at redhat dot com>, Rich Felker <dalias at libc dot org>
- Date: Mon, 19 Dec 2016 13:20:02 -0500
- Subject: Re: [PATCH][BZ #20973] Robust mutexes: Fix lost wake-up.
- Authentication-results: sourceware.org; auth=none
- References: <1481840825.14990.586.camel@redhat.com> <1481840946.14990.588.camel@redhat.com>
On 12/15/2016 05:29 PM, Torvald Riegel wrote:
> On Thu, 2016-12-15 at 23:27 +0100, Torvald Riegel wrote:
>> See patch for a description.
>>
>> Tested on x86_64-linux with our tests and the test case from the
>> original bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1401665
>>
>> OK?
This looks good to me.
Detailed review below.
> Now with an attached patch...
>
>
> robust-mutex-lwu.patch
>
>
> commit 74be3b28d962a5a6d2eeff51b93d61ddf91e2d49
> Author: Torvald Riegel <triegel@redhat.com>
> Date: Thu Dec 15 16:06:28 2016 +0100
>
> Robust mutexes: Fix lost wake-up.
>
> Assume that Thread 1 waits to acquire a robust mutex using futexes to
> block (and thus sets the FUTEX_WAITERS flag), and is unblocked when this
> mutex is released. If Thread 2 concurrently acquires the lock and is
> killed, Thread 1 can recover from the died owner but fail to restore the
> FUTEX_WAITERS flag. This can lead to a Thread 3 that also blocked using
> futexes at the same time as Thread 1 to not get woken up because
> FUTEX_WAITERS is not set anymore.
Just to reiterate:
T0 T1 T2 T3
| | | |
LOCK | | |
| WAIT | |
| s | WAIT
UNLOCK s | s
WAKE AWAKE | s
| | LOCK s
| | DIES s
| LOCK s
| UNLOCK s
| NOWAKE *
At the point T0 unlocks and wakes T1 we have both T1 and T2 attempting to
acquire the same lock which has a value of 0 (no waiters since the shared
waiter flag was cleared).
If T2 take the lock (no FUTEX_WAITERS set) and dies, the futex value is going
to be reset to FUTEX_OWNER_DIED by the kernel (note that the kernel preserves
FUTEX_WAITERS, but it was not set in this case because T0 unlocked).
In the case of T1, the following loop is executing
27 int
28 __lll_robust_lock_wait (int *futex, int private)
29 {
37 do
38 {
39 /* If the owner died, return the present value of the futex. */
40 if (__glibc_unlikely (oldval & FUTEX_OWNER_DIED))
41 return oldval;
42
43 /* Try to put the lock into state 'acquired, possibly with waiters'. */
44 int newval = oldval | FUTEX_WAITERS;
45 if (oldval != newval
46 && atomic_compare_and_exchange_bool_acq (futex, newval, oldval))
47 continue;
48
49 /* If *futex == 2, wait until woken. */
50 lll_futex_wait (futex, newval, private);
We have just been woken.
T2 acquires the lock, dies, and the lock is cleaned up.
Nobody is waiting on the lock.
52 try:
53 ;
54 }
55 while ((oldval = atomic_compare_and_exchange_val_acq (futex,
56 tid | FUTEX_WAITERS,
57 0)) != 0);
This fails because the value is not 0 it's FUTEX_OWNER_DIED.
We loop up to the top and check FUTEX_OWNER_DIED, which matches, and then we
return to the higher-level code which has now lost FUTEX_WAITERS.
At this point T3 has lost the wakeup it needs to continue with the lock because
T1 has "forgotten" there were waiters.
> The fix for this is to ensure that we continue to preserve the
> FUTEX_WAITERS flag whenever we may have set it or shared it with another
> thread. This is the same requirement as in the algorithm for normal
> mutexes, only that the robust mutexes need additional handling for died
> owners and thus preserving the FUTEX_WAITERS flag cannot be done just in
> the futex slowpath code.
Agreed.
Just highlighting from above:
39 /* If the owner died, return the present value of the futex. */
40 if (__glibc_unlikely (oldval & FUTEX_OWNER_DIED))
41 return oldval;
Here we return the shared oldval with FUTEX_WAITERS lost.
I struggled with the possibility of a simpler fix that just changed the returned
oldval, but you correctly pointed out to me that such a fix breaks the abstraction
and algorithms that consider oldval to be the previously read value from memory.
Therefore I agree that this needs to be handled at a higher level which also has
to handle re-inserting FUTEX_WAITERS as required as is done here.
With your patch the only two lock patch we have are now covered and correctly
re-add FUTEX_WAITERS if we'd had a slowpath wait that might potentially have lost
the FUTEX_WAITERS flag due to a concurrent lock/death sequence.
I don't immediately see any other broken paths where FUTEX_WAITERS can be dropped
and the wakeup lost.
> [BZ #20973]
> * nptl/pthread_mutex_lock.c (__pthread_mutex_lock_full): Fix lost
> wake-up in robust mutexes.
> * nptl/pthread_mutex_timedlock.c (pthread_mutex_timedlock): Likewise.
>
> diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c
> index bdfa529..01ac75e 100644
> --- a/nptl/pthread_mutex_lock.c
> +++ b/nptl/pthread_mutex_lock.c
> @@ -182,6 +182,11 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex)
> &mutex->__data.__list.__next);
>
> oldval = mutex->__data.__lock;
> + /* This is set to FUTEX_WAITERS iff we might have shared the
> + FUTEX_WAITERS flag with other threads, and therefore need to keep it
> + set to avoid lost wake-ups. We have the same requirement in the
> + simple mutex algorithm. */
> + unsigned int assume_other_futex_waiters = 0;
OK.
> do
> {
> again:
> @@ -190,9 +195,9 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex)
> /* The previous owner died. Try locking the mutex. */
> int newval = id;
> #ifdef NO_INCR
> - newval |= FUTEX_WAITERS;
> + newval |= FUTEX_WAITERS | assume_other_futex_waiters;
> #else
> - newval |= (oldval & FUTEX_WAITERS);
> + newval |= (oldval & FUTEX_WAITERS) | assume_other_futex_waiters;
OK.
> #endif
>
> newval
> @@ -253,7 +258,11 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex)
> }
> }
>
> - oldval = LLL_ROBUST_MUTEX_LOCK (mutex, id);
> + oldval = LLL_ROBUST_MUTEX_LOCK (mutex,
> + id | assume_other_futex_waiters);
> + /* See above. We set FUTEX_WAITERS and might have shared this flag
> + with other threads; thus, we need to preserve it. */
> + assume_other_futex_waiters = FUTEX_WAITERS;
OK.
>
> if (__builtin_expect (mutex->__data.__owner
> == PTHREAD_MUTEX_NOTRECOVERABLE, 0))
> diff --git a/nptl/pthread_mutex_timedlock.c b/nptl/pthread_mutex_timedlock.c
> index 07f0901..21e9eea 100644
> --- a/nptl/pthread_mutex_timedlock.c
> +++ b/nptl/pthread_mutex_timedlock.c
> @@ -142,13 +142,19 @@ pthread_mutex_timedlock (pthread_mutex_t *mutex,
> &mutex->__data.__list.__next);
>
> oldval = mutex->__data.__lock;
> + /* This is set to FUTEX_WAITERS iff we might have shared the
> + FUTEX_WAITERS flag with other threads, and therefore need to keep it
> + set to avoid lost wake-ups. We have the same requirement in the
> + simple mutex algorithm. */
> + unsigned int assume_other_futex_waiters = 0;
OK.
> do
> {
> again:
> if ((oldval & FUTEX_OWNER_DIED) != 0)
> {
> /* The previous owner died. Try locking the mutex. */
> - int newval = id | (oldval & FUTEX_WAITERS);
> + int newval = id | (oldval & FUTEX_WAITERS)
> + | assume_other_futex_waiters;
OK.
>
> newval
> = atomic_compare_and_exchange_val_acq (&mutex->__data.__lock,
> @@ -203,8 +209,12 @@ pthread_mutex_timedlock (pthread_mutex_t *mutex,
> }
> }
>
> - result = lll_robust_timedlock (mutex->__data.__lock, abstime, id,
> + result = lll_robust_timedlock (mutex->__data.__lock, abstime,
> + id | assume_other_futex_waiters,
> PTHREAD_ROBUST_MUTEX_PSHARED (mutex));
> + /* See above. We set FUTEX_WAITERS and might have shared this flag
> + with other threads; thus, we need to preserve it. */
> + assume_other_futex_waiters = FUTEX_WAITERS;
OK.
>
> if (__builtin_expect (mutex->__data.__owner
> == PTHREAD_MUTEX_NOTRECOVERABLE, 0))
--
Cheers,
Carlos.