This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] PowerPC: Fix a race condition when eliding a lock


On 08/20/2015 11:20 AM, Tulio Magno Quites Machado Filho wrote:
> Changes since v1:
>  - Improved commit message.
>  - Added new comments in the source code alerting to the concurrency details
>    intrinsic to this code.
>  - Removed compiler barriers from this patch, which will be treated in another
>    patch and will be synchronized with the GCC implementation.

The architecture and idea of this change is correct, but there are still
a few details we need to get right. It's almost there, probably v3 and
we'll be done. See my comments below.

> 8<----------
> 
> The previous code used to evaluate the preprocessor token is_lock_free to
> a variable before starting a transaction.  This behavior can cause an
> error if another thread got the lock (without using a transaction)
> between the evaluation of the token and the beginning of the transaction.
> 
> This patch delays the evaluation of is_lock_free to inside a transaction
> by moving this part of the code to the macro ELIDE_LOCK.

This is looking better, but the comments talk only about the error
you are fixing, and not the overall concurrency scheme. The latter
is really what we want to document, and from that documentation it
should flow naturally that is_lock_free must, out of a requirement,
can be read within the transaction safely.

> 2015-08-20  Tulio Magno Quites Machado Filho  <tuliom@linux.vnet.ibm.com>
> 
> 	[BZ #18743]
> 	* sysdeps/powerpc/nptl/elide.h (__elide_lock): Move most of this
> 	code to...
> 	(ELIDE_LOCK): ...here.
> 	(__get_new_count): New function with part of the code from
> 	__elide_lock that updates the value of adapt_count after a
> 	transaction abort.
> 	(__elided_trylock): Moved this code to...
> 	(ELIDE_TRYLOCK): ...here.
> ---
>  NEWS                         |   3 +-
>  sysdeps/powerpc/nptl/elide.h | 120 ++++++++++++++++++++++++-------------------
>  2 files changed, 70 insertions(+), 53 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index b75a202..49aba2d 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -11,7 +11,8 @@ Version 2.23
>  
>    14341, 16517, 16519, 16520, 16734, 16973, 17787, 17905, 18084, 18086,
>    18265, 18370, 18421, 18480, 18525, 18618, 18647, 18661, 18674, 18681,
> -  18778, 18781, 18787, 18789, 18790, 18795, 18796, 18820, 18823, 18824.
> +  18778, 18781, 18787, 18789, 18790, 18795, 18796, 18820, 18823, 18824,
> +  18743.
>  
>  * The obsolete header <regexp.h> has been removed.  Programs that require
>    this header must be updated to use <regex.h> instead.
> diff --git a/sysdeps/powerpc/nptl/elide.h b/sysdeps/powerpc/nptl/elide.h
> index 389f5a5..261efd5 100644
> --- a/sysdeps/powerpc/nptl/elide.h
> +++ b/sysdeps/powerpc/nptl/elide.h
> @@ -23,67 +23,83 @@
>  # include <htm.h>
>  # include <elision-conf.h>
>  
> -/* Returns true if the lock defined by is_lock_free as elided.
> -   ADAPT_COUNT is a pointer to per-lock state variable. */
> -
> +/* Get the new value of adapt_count according to the elision
> +   configurations.  Returns true if the system should retry again or false
> +   otherwise.  */
>  static inline bool
> -__elide_lock (uint8_t *adapt_count, int is_lock_free)
> +__get_new_count (uint8_t *adapt_count)
>  {
> -  if (*adapt_count > 0)
> +  /* A persistent failure indicates that a retry will probably
> +     result in another failure.  Use normal locking now and
> +     for the next couple of calls.  */
> +  if (_TEXASRU_FAILURE_PERSISTENT (__builtin_get_texasru ()))
>      {
> -      (*adapt_count)--;
> +      if (__elision_aconf.skip_lock_internal_abort > 0)
> +	*adapt_count = __elision_aconf.skip_lock_internal_abort;
>        return false;
>      }
> -
> -  for (int i = __elision_aconf.try_tbegin; i > 0; i--)
> -    {
> -      if (__builtin_tbegin (0))
> -	{
> -	  if (is_lock_free)
> -	    return true;
> -	  /* Lock was busy.  */
> -	  __builtin_tabort (_ABORT_LOCK_BUSY);
> -	}
> -      else
> -	{
> -	  /* A persistent failure indicates that a retry will probably
> -	     result in another failure.  Use normal locking now and
> -	     for the next couple of calls.  */
> -	  if (_TEXASRU_FAILURE_PERSISTENT (__builtin_get_texasru ()))
> -	    {
> -	      if (__elision_aconf.skip_lock_internal_abort > 0)
> -		*adapt_count = __elision_aconf.skip_lock_internal_abort;
> -	      break;
> -	    }
> -	  /* Same logic as above, but for a number of temporary failures in a
> -	     a row.  */
> -	  else if (__elision_aconf.skip_lock_out_of_tbegin_retries > 0
> -		   && __elision_aconf.try_tbegin > 0)
> -	    *adapt_count = __elision_aconf.skip_lock_out_of_tbegin_retries;
> -	}
> -     }
> -
> -  return false;
> +  /* Same logic as above, but for a number of temporary failures in a
> +     a row.  */
> +  else if (__elision_aconf.skip_lock_out_of_tbegin_retries > 0
> +	   && __elision_aconf.try_tbegin > 0)
> +    *adapt_count = __elision_aconf.skip_lock_out_of_tbegin_retries;
> +  return true;

OK.

>  }
>  
> -# define ELIDE_LOCK(adapt_count, is_lock_free) \
> -  __elide_lock (&(adapt_count), is_lock_free)
> -
> -
> -static inline bool
> -__elide_trylock (uint8_t *adapt_count, int is_lock_free, int write)
> -{
> -  if (__elision_aconf.try_tbegin > 0)
> -    {
> -      if (write)
> -	__builtin_tabort (_ABORT_NESTED_TRYLOCK);
> -      return __elide_lock (adapt_count, is_lock_free);
> -    }
> -  return false;
> -}
> +/* CONCURRENCY NOTES:
> +
> +   Always evaluate is_lock_free from inside a transaction in macros
> +   ELIDE_LOCK and ELIDE_TRYLOCK.
> +   Failing to do so, creates a race condition to the memory access specified
> +   in is_lock_free which can be triggered with the following order of
> +   events:
> +   1. The lock accessed by is_lock_free is free.
> +   2. Thread T1 evaluates is_lock_free and stores into register R1 that the
> +      lock is free.
> +   3. Thread T2 acquires the same lock used in is_lock_free.
> +   4. T1 begins the transaction, creating a memory barrier where is_lock_free
> +      is false, but R1 is true.
> +   5. T1 reads R1 and doesn't abort the transaction.
> +   6. T1 calls ELIDE_UNLOCK, which reads false from is_lock_free and decides
> +      to unlock a lock acquired by T2, leading to undefined behavior.  */

This talks about this particular problem, but should instead talk about
why is_lock_free need not have any locks or atomic accesses.

e.g.

/* CONCURRENCY NOTES:

   The value of is_lock_free is read and possibly written to by multiple
   threads, either during lock acquisition, or elision. In order to avoid
   this evaluation from becoming a data race the access of is_lock_free
   is placed *inside* the transaction. Within the transaction is_lock_free
   can be accessed with relaxed ordering. We are assured by the transaction
   that the value of is_lock_free is consistent with the state of the lock,
   otherwise the hardware will abort the transaction.

   ... */

Feel free to add back the notes on the invalid case if you want, but the
above is the minimum I'm expecting. It's a definition of the intent of the
current code.

Does that make sense?

Perhaps Torvald can review this too.

I apologize in advance for making this take so long, but we all need
to use the same language and calibrate our expectations regarding
the level of detail that we need here and the language used.

> +
> +/* Returns 0 if the lock defined by is_lock_free was elided.
> +   ADAPT_COUNT is a per-lock state variable.  */
> +# define ELIDE_LOCK(adapt_count, is_lock_free)				\
> +  ({									\
> +    int ret = 0;							\
> +    if (adapt_count > 0)						\
> +      (adapt_count)--;							\
> +    else								\
> +      for (int i = __elision_aconf.try_tbegin; i > 0; i--)		\
> +	{								\
> +	  if (__builtin_tbegin (0))					\
> +	    {								\
> +	      if (is_lock_free)						\

This should use a relaxed MO atomic access to highlight that this data *is*
shared between threads, but that there are no ordering guarantees. It should
result in a normal load, but it makes it clear in the code that some other
thread is also accessing this data, and that the transaction as a barrier
is all that is protecting this from being a data-race (undefined behaviour).

Torvlad made the same comment in his review, and I'm just repeating it here
because I think it bears repeating.

> +		{							\
> +		  ret = 1;						\
> +		  break;						\
> +		}							\
> +	      __builtin_tabort (_ABORT_LOCK_BUSY);			\
> +	    }								\
> +	  else								\
> +	    if (!__get_new_count(&adapt_count))				\
> +	      break;							\
> +	}								\
> +    ret;								\
> +  })
>  
>  # define ELIDE_TRYLOCK(adapt_count, is_lock_free, write)	\
> -  __elide_trylock (&(adapt_count), is_lock_free, write)
> +  ({								\
> +    int ret = 0;						\
> +    if (__elision_aconf.try_tbegin > 0)				\
> +      {								\
> +	if (write)						\
> +	  __builtin_tabort (_ABORT_NESTED_TRYLOCK);		\
> +	ret = ELIDE_LOCK (adapt_count, is_lock_free);		\
> +      }								\
> +    ret;							\
> +  })
>  
>  
>  static inline bool
> 


Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]