This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH][BZ #20973] Robust mutexes: Fix lost wake-up.

From: Carlos O'Donell <carlos at redhat dot com>
To: Torvald Riegel <triegel at redhat dot com>, GLIBC Devel <libc-alpha at sourceware dot org>
Cc: vl at samba dot org, Michael Adam <madam at redhat dot com>, Rich Felker <dalias at libc dot org>
Date: Mon, 19 Dec 2016 13:20:02 -0500
Subject: Re: [PATCH][BZ #20973] Robust mutexes: Fix lost wake-up.
Authentication-results: sourceware.org; auth=none
References: <1481840825.14990.586.camel@redhat.com> <1481840946.14990.588.camel@redhat.com>

On 12/15/2016 05:29 PM, Torvald Riegel wrote:
> On Thu, 2016-12-15 at 23:27 +0100, Torvald Riegel wrote:
>> See patch for a description.
>>
>> Tested on x86_64-linux with our tests and the test case from the
>> original bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1401665
>>
>> OK?

This looks good to me.

Detailed review below.

> Now with an attached patch...
> 
> 
> robust-mutex-lwu.patch
> 
> 
> commit 74be3b28d962a5a6d2eeff51b93d61ddf91e2d49
> Author: Torvald Riegel <triegel@redhat.com>
> Date:   Thu Dec 15 16:06:28 2016 +0100
> 
>     Robust mutexes: Fix lost wake-up.
>     
>     Assume that Thread 1 waits to acquire a robust mutex using futexes to
>     block (and thus sets the FUTEX_WAITERS flag), and is unblocked when this
>     mutex is released.  If Thread 2 concurrently acquires the lock and is
>     killed, Thread 1 can recover from the died owner but fail to restore the
>     FUTEX_WAITERS flag.  This can lead to a Thread 3 that also blocked using
>     futexes at the same time as Thread 1 to not get woken up because
>     FUTEX_WAITERS is not set anymore.

Just to reiterate:

T0     T1     T2   T3
|      |      |    |
LOCK   |      |    |
|      WAIT   |    |
|      s      |    WAIT
UNLOCK s      |    s
WAKE   AWAKE  |    s
|      |      LOCK s
|      |      DIES s
|      LOCK        s
|      UNLOCK      s
|      NOWAKE      *

At the point T0 unlocks and wakes T1 we have both T1 and T2 attempting to
acquire the same lock which has a value of 0 (no waiters since the shared
waiter flag was cleared).

If T2 take the lock (no FUTEX_WAITERS set) and dies, the futex value is going
to be reset to FUTEX_OWNER_DIED by the kernel (note that the kernel preserves 
FUTEX_WAITERS, but it was not set in this case because T0 unlocked).

In the case of T1, the following loop is executing 

 27 int
 28 __lll_robust_lock_wait (int *futex, int private)
 29 {

 37   do
 38     { 
 39       /* If the owner died, return the present value of the futex.  */
 40       if (__glibc_unlikely (oldval & FUTEX_OWNER_DIED))
 41         return oldval;
 42       
 43       /* Try to put the lock into state 'acquired, possibly with waiters'.  */
 44       int newval = oldval | FUTEX_WAITERS;
 45       if (oldval != newval
 46           && atomic_compare_and_exchange_bool_acq (futex, newval, oldval))
 47         continue;
 48 
 49       /* If *futex == 2, wait until woken.  */
 50       lll_futex_wait (futex, newval, private);

We have just been woken.

T2 acquires the lock, dies, and the lock is cleaned up.

Nobody is waiting on the lock.

 52     try:
 53       ;
 54     }
 55   while ((oldval = atomic_compare_and_exchange_val_acq (futex,
 56                                                         tid | FUTEX_WAITERS,
 57                                                         0)) != 0);

This fails because the value is not 0 it's FUTEX_OWNER_DIED.

We loop up to the top and check FUTEX_OWNER_DIED, which matches, and then we
return to the higher-level code which has now lost FUTEX_WAITERS.

At this point T3 has lost the wakeup it needs to continue with the lock because
T1 has "forgotten" there were waiters.

>     The fix for this is to ensure that we continue to preserve the
>     FUTEX_WAITERS flag whenever we may have set it or shared it with another
>     thread.  This is the same requirement as in the algorithm for normal
>     mutexes, only that the robust mutexes need additional handling for died
>     owners and thus preserving the FUTEX_WAITERS flag cannot be done just in
>     the futex slowpath code.


Agreed.

Just highlighting from above:

 39       /* If the owner died, return the present value of the futex.  */
 40       if (__glibc_unlikely (oldval & FUTEX_OWNER_DIED))
 41         return oldval;

Here we return the shared oldval with FUTEX_WAITERS lost.

I struggled with the possibility of a simpler fix that just changed the returned
oldval, but you correctly pointed out to me that such a fix breaks the abstraction
and algorithms that consider oldval to be the previously read value from memory.

Therefore I agree that this needs to be handled at a higher level which also has
to handle re-inserting FUTEX_WAITERS as required as is done here.

With your patch the only two lock patch we have are now covered and correctly
re-add FUTEX_WAITERS if we'd had a slowpath wait that might potentially have lost
the FUTEX_WAITERS flag due to a concurrent lock/death sequence.

I don't immediately see any other broken paths where FUTEX_WAITERS can be dropped
and the wakeup lost.

>     	[BZ #20973]
>     	* nptl/pthread_mutex_lock.c (__pthread_mutex_lock_full): Fix lost
>     	wake-up in robust mutexes.
>     	* nptl/pthread_mutex_timedlock.c (pthread_mutex_timedlock): Likewise.
> 
> diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c
> index bdfa529..01ac75e 100644
> --- a/nptl/pthread_mutex_lock.c
> +++ b/nptl/pthread_mutex_lock.c
> @@ -182,6 +182,11 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex)
>  		     &mutex->__data.__list.__next);
>  
>        oldval = mutex->__data.__lock;
> +      /* This is set to FUTEX_WAITERS iff we might have shared the
> +	 FUTEX_WAITERS flag with other threads, and therefore need to keep it
> +	 set to avoid lost wake-ups.  We have the same requirement in the
> +	 simple mutex algorithm.  */
> +      unsigned int assume_other_futex_waiters = 0;

OK.

>        do
>  	{
>  	again:
> @@ -190,9 +195,9 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex)
>  	      /* The previous owner died.  Try locking the mutex.  */
>  	      int newval = id;
>  #ifdef NO_INCR
> -	      newval |= FUTEX_WAITERS;
> +	      newval |= FUTEX_WAITERS | assume_other_futex_waiters;
>  #else
> -	      newval |= (oldval & FUTEX_WAITERS);
> +	      newval |= (oldval & FUTEX_WAITERS) | assume_other_futex_waiters;

OK.

>  #endif
>  
>  	      newval
> @@ -253,7 +258,11 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex)
>  		}
>  	    }
>  
> -	  oldval = LLL_ROBUST_MUTEX_LOCK (mutex, id);
> +	  oldval = LLL_ROBUST_MUTEX_LOCK (mutex,
> +					  id | assume_other_futex_waiters);
> +	  /* See above.  We set FUTEX_WAITERS and might have shared this flag
> +	     with other threads; thus, we need to preserve it.  */
> +	  assume_other_futex_waiters = FUTEX_WAITERS;

OK.

>  
>  	  if (__builtin_expect (mutex->__data.__owner
>  				== PTHREAD_MUTEX_NOTRECOVERABLE, 0))
> diff --git a/nptl/pthread_mutex_timedlock.c b/nptl/pthread_mutex_timedlock.c
> index 07f0901..21e9eea 100644
> --- a/nptl/pthread_mutex_timedlock.c
> +++ b/nptl/pthread_mutex_timedlock.c
> @@ -142,13 +142,19 @@ pthread_mutex_timedlock (pthread_mutex_t *mutex,
>  		     &mutex->__data.__list.__next);
>  
>        oldval = mutex->__data.__lock;
> +      /* This is set to FUTEX_WAITERS iff we might have shared the
> +	 FUTEX_WAITERS flag with other threads, and therefore need to keep it
> +	 set to avoid lost wake-ups.  We have the same requirement in the
> +	 simple mutex algorithm.  */
> +      unsigned int assume_other_futex_waiters = 0;

OK.

>        do
>  	{
>  	again:
>  	  if ((oldval & FUTEX_OWNER_DIED) != 0)
>  	    {
>  	      /* The previous owner died.  Try locking the mutex.  */
> -	      int newval = id | (oldval & FUTEX_WAITERS);
> +	      int newval = id | (oldval & FUTEX_WAITERS)
> +		  | assume_other_futex_waiters;

OK.

>  
>  	      newval
>  		= atomic_compare_and_exchange_val_acq (&mutex->__data.__lock,
> @@ -203,8 +209,12 @@ pthread_mutex_timedlock (pthread_mutex_t *mutex,
>  		}
>  	    }
>  
> -	  result = lll_robust_timedlock (mutex->__data.__lock, abstime, id,
> +	  result = lll_robust_timedlock (mutex->__data.__lock, abstime,
> +					 id | assume_other_futex_waiters,
>  					 PTHREAD_ROBUST_MUTEX_PSHARED (mutex));
> +	  /* See above.  We set FUTEX_WAITERS and might have shared this flag
> +	     with other threads; thus, we need to preserve it.  */
> +	  assume_other_futex_waiters = FUTEX_WAITERS;

OK.

>  
>  	  if (__builtin_expect (mutex->__data.__owner
>  				== PTHREAD_MUTEX_NOTRECOVERABLE, 0))


-- 
Cheers,
Carlos.

References:
- [PATCH][BZ #20973] Robust mutexes: Fix lost wake-up.
  - From: Torvald Riegel
- Re: [PATCH][BZ #20973] Robust mutexes: Fix lost wake-up.
  - From: Torvald Riegel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]