This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH] Fix race between sem_post and semaphore destruction [BZ #12674]

From: Torvald Riegel <triegel at redhat dot com>
To: Siddhesh Poyarekar <siddhesh at redhat dot com>
Cc: libc-alpha at sourceware dot org, dalias at libc dot org, carlos at redhat dot com
Date: Thu, 22 May 2014 22:10:06 +0200
Subject: Re: [PATCH] Fix race between sem_post and semaphore destruction [BZ #12674]
Authentication-results: sourceware.org; auth=none
References: <20140521110711 dot GA3598 at spoyarek dot pnq dot redhat dot com>
On Wed, 2014-05-21 at 16:37 +0530, Siddhesh Poyarekar wrote:
> Summary of the race:
> 
> T1: enter sem_post and increment value
> T2: enter_sem_wait and decrement value
> T2: return from sem_wait and destroy the semaphore
> T1: Check value of semaphore->nwaiters and find an invalid value
> 
> The idea for the fix is adapted from Rich Felker's fix for the same
> race in musl.  The fix is to prevent sem_post from accessing nwaiters
> after it has incremented the value since the state of the semaphore is
> not known beyond this point.  This is fairly easy to achieve using
> Rich's idea.  One may set the value to a special value of -1 to
> indicate waiters.  That way, sem_post can inspect the old value and
> call futex_wake if it is necessary.

I haven't looked at musl, so I'll take it for what it seems to be:
adding a state bit that's set whenever there are threads waiting on a
variable that is also used by a futex.  (That's similar to the value 2
used in the mutex impl.)

> Rich used this condition as a primary check and the waiter count as a
> secondary check, but I think the secondary check is not required in
> glibc.  The only time the secondary check would theoretically be
> required is when the old value came up as zero *and* there were
> waiters to wake up.  This happens only momentarily when an exiting
> waiter decrements nwaiters and resets the semaphore value if it is -1
> and that operation races with a new waiter entering and losing the
> race, thus keeping the value as 0.  This is momentary because the
> futex call on such a value will fail with EWOULDBLOCK since it expects
> the value to be -1 and not 0.  After this failure, the waiter fixes up
> the value to -1 and goes back to wait.

That sounds okay, but I don't think it's sufficient to show that your
changes still work.  To do that, I find it useful to think about (and
document!) what the intent behind each piece of the algorithm is.  IOW,
what does it do on an abstract level?  What's the overall state flow?
Which pieces of the algorithm prevent certain states (so that the thing
becomes manageable, overall).  What patterns do we follow?  So, discuss
more of the why, not just the how.  If you can't explain it elegantly
and intuitively, there's a good chance something isn't well
understood :)

For example, one good way to fix this would be to start by documenting
the existing algorithm: Write an (informal) proof why it works because
this will give you the understanding how it works, and is a good outline
for the documentation of the algorithm.
Try to break down all possible executions into subexecutions that can
happen, and show which sub-executions can't happen at all.  Keep track
of which subexecutions (in particular, memory writes / CAS) are
"pending" in the sense of being possible at arbitrary times; this will
help cutting down the possible execution+state space.

Maybe start without blocking first, assuming all futex waits are
busy-waiting instead.  Then add the blocking.

> This requires two other changes:
> 
> - The type of value is now int instead of unsigned int.  This should
>   not break the ABI since we don't expose the sign of the value.  In
>   fact, the only place the value is seen is with sem_getvalue, where
>   it is int.  And speaking of sem_getvalue...

That's fine I think.

> - sem_getvalue is patched to lie about the actual value if it sees the
>   -1 and return 0.

OT: We should add a note there about having to clarify the ordering
guarantees that this gives.  This is effectively an mo_relaxed load, so
very weak ordering guarantees; OTOH, POSIX seems to want very strong
ordering guarantees for most of its sync operations.  So, I think we
either need to clarify in POSIX or make this at least an acquire load or
such.

> Siddhesh
> 
> 	[BZ #12674]
> 	* nptl/sem_getvalue.c (__new_sem_getvalue): Return 0 for
> 	negative semaphore value.
> 	* nptl/sysdeps/unix/sysv/linux/internaltypes.h (struct
> 	new_sem): Change type of VALUE to int.
> 	* nptl/sysdeps/unix/sysv/linux/sem_post.c (__new_sem_post):
> 	Avoid accessing semaphore after incrementing it.

See below.

> 	* sysdeps/unix/sysv/linux/i386/i486/sem_post.S
> 	(__new_sem_post): Likewise.
> 	* sysdeps/unix/sysv/linux/x86_64/sem_post.S (__new_sem_post):
> 	Likewise.
> 	* nptl/sysdeps/unix/sysv/linux/sem_timedwait.c
> 	(do_futex_timed_wait): Set expected value of futex to -1.
> 	(sem_timedwait): Set expected value of semaphore to -1.
> 	* sysdeps/unix/sysv/linux/i386/i486/sem_timedwait.S
> 	(sem_wait_cleanup): Reset semaphore value when there are no
> 	waiters.
> 	(sem_timedwait): Set expected value of semaphore to -1.
> 	* sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S
> 	(sem_wait_cleanup): Reset semaphore value when there are no
> 	waiters.
> 	(sem_wait_cleanup2): Likewise.
> 	(sem_timedwait): Set expected value of semaphore to -1.
> 	* nptl/sysdeps/unix/sysv/linux/sem_wait.c
> 	(__sem_wait_cleanup): Reset semaphore value when there are no
> 	waiters.
> 	(do_futex_wait): Set expected value of futex to -1.
> 	(__new_sem_wait): Set expected value of semaphore to -1.
> 	* sysdeps/unix/sysv/linux/i386/i486/sem_wait.S
> 	(sem_wait_cleanup): Reset semaphore value when there are no
> 	waiters.
> 	(__new_sem_wait): Set expected value of semaphore to -1.
> 	* sysdeps/unix/sysv/linux/x86_64/sem_wait.S
> 	(sem_wait_cleanup): Reset semaphore value when there are no
> 	waiters.
> 	(__new_sem_wait): Set expected value of semaphore to -1.

I think that any change in the generic C versions is a good time to
review whether we still need the custom assembler versions for
performance.  We'd need a multi-threaded benchtest in this case,
probably (to measure round-trip time and contention), but that might be
easier than fixing up all the assembler versions too :)

> 
> 
> diff --git a/nptl/sem_getvalue.c b/nptl/sem_getvalue.c
> index a4ab41f..d9b70fd 100644
> --- a/nptl/sem_getvalue.c
> +++ b/nptl/sem_getvalue.c
> @@ -22,15 +22,15 @@
>  
> 
>  int
> -__new_sem_getvalue (sem, sval)
> -     sem_t *sem;
> -     int *sval;
> +__new_sem_getvalue (sem_t *sem, int *sval)
>  {
>    struct new_sem *isem = (struct new_sem *) sem;
>  
>    /* XXX Check for valid SEM parameter.  */
>  
>    *sval = isem->value;
> +  if (*sval < 0)
> +    *sval = 0;
>  
>    return 0;
>  }
> diff --git a/nptl/sysdeps/unix/sysv/linux/internaltypes.h b/nptl/sysdeps/unix/sysv/linux/internaltypes.h
> index d127f68..5eea097 100644
> --- a/nptl/sysdeps/unix/sysv/linux/internaltypes.h
> +++ b/nptl/sysdeps/unix/sysv/linux/internaltypes.h
> @@ -141,7 +141,7 @@ struct pthread_key_struct
>  /* Semaphore variable structure.  */
>  struct new_sem
>  {
> -  unsigned int value;
> +  int value;
>    int private;
>    unsigned long int nwaiters;
>  };
> diff --git a/nptl/sysdeps/unix/sysv/linux/sem_post.c b/nptl/sysdeps/unix/sysv/linux/sem_post.c
> index 4906adf..0ff1699 100644
> --- a/nptl/sysdeps/unix/sysv/linux/sem_post.c
> +++ b/nptl/sysdeps/unix/sysv/linux/sem_post.c
> @@ -30,24 +30,35 @@ int
>  __new_sem_post (sem_t *sem)
>  {
>    struct new_sem *isem = (struct new_sem *) sem;
> +  int incr, is_private = isem->private;
>  
>    __typeof (isem->value) cur;
>    do
>      {
>        cur = isem->value;
> +      incr = 1 + (cur < 0);
>        if (isem->value == SEM_VALUE_MAX)
>  	{
>  	  __set_errno (EOVERFLOW);
>  	  return -1;
>  	}
>      }
> -  while (atomic_compare_and_exchange_bool_rel (&isem->value, cur + 1, cur));
> +  while (atomic_compare_and_exchange_bool_rel (&isem->value, cur + incr, cur));
>  
>    atomic_full_barrier ();

Why do you still need the full barrier?  AFAICS, it was necessary before
because of the Dekker-like synchronization between nwaiters and value --
but you've changed that (or not?).

> -  if (isem->nwaiters > 0)
> +  /* This is always a sufficient condition to detect waiters.  This is because
> +     once there is either a waiter or a poster, the value is always non-zero at
> +     this point, either because sem_post set it to a positive value or sem_wait
> +     set it to a negative value.  There is a transient condition whe it could
> +     be 0 with a waiter.  This happens when a waiter is cancelled and another
> +     waiter arrives, where a race condition causes the value to be 0 before the
> +     futex_wait is called.  That is fixed immediately since the futex_wait will
> +     return immediately with EWOULDBLOCK, fix the value and go back to
> +     sleep in futex_wait.  */

Why would this condition be the *only* case where this can happen?  I
think this should be documented.  And it will get easier if you document
the state flow for the whole algorithm :)

The only thing you test below is that *this thread* saw the flag for the
waiter-present state.
However, it's the only place where a wake-up for 1 thread happened -- so
it should actually see and act on *all* necessary wake-ups.  Which I
believe isn't the case with your changes.

> +  if (cur < 0)
>      {
>        int err = lll_futex_wake (&isem->value, 1,
> -				isem->private ^ FUTEX_PRIVATE_FLAG);
> +				is_private ^ FUTEX_PRIVATE_FLAG);
>        if (__builtin_expect (err, 0) < 0)
>  	{
>  	  __set_errno (-err);
> diff --git a/nptl/sysdeps/unix/sysv/linux/sem_timedwait.c b/nptl/sysdeps/unix/sysv/linux/sem_timedwait.c
> index 7dfe51d..1e74c40 100644
> --- a/nptl/sysdeps/unix/sysv/linux/sem_timedwait.c
> +++ b/nptl/sysdeps/unix/sysv/linux/sem_timedwait.c
> @@ -38,7 +38,7 @@ do_futex_timed_wait (struct new_sem *isem, struct timespec *rt)
>  {
>    int err, oldtype = __pthread_enable_asynccancel ();
>  
> -  err = lll_futex_timed_wait (&isem->value, 0, rt,
> +  err = lll_futex_timed_wait (&isem->value, -1, rt,
>  			      isem->private ^ FUTEX_PRIVATE_FLAG);
>  
>    __pthread_disable_asynccancel (oldtype);
> @@ -60,6 +60,8 @@ sem_timedwait (sem_t *sem, const struct timespec *abstime)
>        return -1;
>      }
>  
> +  /* If the value is zero, set it to -1 to indicate waiters.  */
> +  atomic_compare_and_exchange_val_acq (&isem->value, -1, 0);

Please document why the 0 -> -1 transition is always ok.  (This CAS can
occur anywhere in an execution, assuming that an incoming waiter is
allowed.)  The overview documentation of the algorithm would be a good
place to explain that, given that this can happen in a few places.

>    atomic_increment (&isem->nwaiters);

What about the memory orders here?  Why is the _acq above
sufficient/required/insufficient?

>  
>    pthread_cleanup_push (__sem_wait_cleanup, isem);
> @@ -106,11 +108,17 @@ sem_timedwait (sem_t *sem, const struct timespec *abstime)
>  	  err = 0;
>  	  break;
>  	}
> +
> +      /* Still not time to wake up.  Go back to sleep.  */
> +      atomic_compare_and_exchange_val_acq (&isem->value, -1, 0);
>      }
>  
>    pthread_cleanup_pop (0);
>  
> -  atomic_decrement (&isem->nwaiters);
> +  /* Reset semaphore value to zero if we are the last waiter.  The reset will

This should say "if we _were_ the last waiter" because ...

> +     actually happen only when we exit due to an error.  */
> +  if (atomic_decrement_and_test (&isem->nwaiters))
> +    atomic_compare_and_exchange_val_acq (&isem->value, 0, -1);

... like above, the CAS can happen at any time; the waiters has already
exited (fetch_dec on nwaiters), so it can be suspended, and some time
later do the -1 -> 0 transition.  Why is that okay?

>  
>    return err;
>  }
> diff --git a/nptl/sysdeps/unix/sysv/linux/sem_wait.c b/nptl/sysdeps/unix/sysv/linux/sem_wait.c
> index b12babb..4d3f91b 100644
> --- a/nptl/sysdeps/unix/sysv/linux/sem_wait.c
> +++ b/nptl/sysdeps/unix/sysv/linux/sem_wait.c
> @@ -34,7 +34,12 @@ __sem_wait_cleanup (void *arg)
>  {
>    struct new_sem *isem = (struct new_sem *) arg;
>  
> -  atomic_decrement (&isem->nwaiters);
> +  /* Decrement nwaiters and reset value if there are no other waiters.  This
> +     could race with the futex_wait call in another waiter and cause it to wake
> +     up when it shouldn't, but that is OK since it will go right back to sleep
> +     when it sees that the semaphore value is not what it wants.  */
> +  if (atomic_decrement_and_test (&isem->nwaiters))
> +    atomic_compare_and_exchange_val_acq (&isem->value, 0, -1);

See above.

>  }
>  
>  /* This is in a seperate function in order to make sure gcc
> @@ -46,7 +51,7 @@ do_futex_wait (struct new_sem *isem)
>  {
>    int err, oldtype = __pthread_enable_asynccancel ();
>  
> -  err = lll_futex_wait (&isem->value, 0, isem->private ^ FUTEX_PRIVATE_FLAG);
> +  err = lll_futex_wait (&isem->value, -1, isem->private ^ FUTEX_PRIVATE_FLAG);
>  
>    __pthread_disable_asynccancel (oldtype);
>    return err;
> @@ -61,6 +66,12 @@ __new_sem_wait (sem_t *sem)
>    if (atomic_decrement_if_positive (&isem->value) > 0)
>      return 0;
>  
> +  /* If we are the only know waiter right now, indicate that by setting the

How do you know that's the case?

> +     value to -1.  This is useful to avoid access to nwaiters in sem_post when
> +     the sole waiter exits and destroys the semaphore before sem_post has a
> +     chance to test the value of nwaiters.  */
> +  atomic_compare_and_exchange_val_acq (&isem->value, -1, 0);
> +

See above.

>    atomic_increment (&isem->nwaiters);
>  
>    pthread_cleanup_push (__sem_wait_cleanup, isem);
> @@ -80,11 +91,17 @@ __new_sem_wait (sem_t *sem)
>  	  err = 0;
>  	  break;
>  	}
> +
> +      /* Still not time to wake up.  Go back to sleep.  */
> +      atomic_compare_and_exchange_val_acq (&isem->value, -1, 0);
>      }
>  
>    pthread_cleanup_pop (0);
>  
> -  atomic_decrement (&isem->nwaiters);
> +  /* Reset semaphore value to zero if we are the last waiter.  The reset will
> +     actually happen only when we exit due to an error.  */
> +  if (atomic_decrement_and_test (&isem->nwaiters))
> +    atomic_compare_and_exchange_val_acq (&isem->value, -1, 0);

Doesn't this reset to -1?

HTH.
Follow-Ups:
- Re: [PATCH] Fix race between sem_post and semaphore destruction [BZ #12674]
  - From: Rich Felker
- Re: [PATCH] Fix race between sem_post and semaphore destruction [BZ #12674]
  - From: Siddhesh Poyarekar
References:
- [PATCH] Fix race between sem_post and semaphore destruction [BZ #12674]
  - From: Siddhesh Poyarekar
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]