[PATCH v2] nptl: Add backoff mechanism to spinlock loop

Noah Goldstein goldstein.w.n@gmail.com
Mon Mar 28 16:41:21 GMT 2022


On Mon, Mar 28, 2022 at 3:47 AM Wangyang Guo <wangyang.guo@intel.com> wrote:
>
> When mutiple threads waiting for lock at the same time, once lock owner
> releases the lock, waiters will see lock available and all try to lock,
> which may cause an expensive CAS storm.
>
> Binary exponential backoff with random jitter is introduced. As try-lock
> attempt increases, there is more likely that a larger number threads
> compete for adaptive mutex lock, so increase wait time in exponential.
> A random jitter is also added to avoid synchronous try-lock from other
> threads.
>
> v2: Remove read-check before try-lock for performance.
>
> Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
> ---
>  nptl/pthread_mutex_lock.c | 25 ++++++++++++++++---------
>  1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c
> index d2e652d151..7e75ec1cba 100644
> --- a/nptl/pthread_mutex_lock.c
> +++ b/nptl/pthread_mutex_lock.c
> @@ -26,6 +26,7 @@
>  #include <futex-internal.h>
>  #include <stap-probe.h>
>  #include <shlib-compat.h>
> +#include <random-bits.h>
>
>  /* Some of the following definitions differ when pthread_mutex_cond_lock.c
>     includes this file.  */
> @@ -64,11 +65,6 @@ lll_mutex_lock_optimized (pthread_mutex_t *mutex)
>  # define PTHREAD_MUTEX_VERSIONS 1
>  #endif
>
> -#ifndef LLL_MUTEX_READ_LOCK
> -# define LLL_MUTEX_READ_LOCK(mutex) \
> -  atomic_load_relaxed (&(mutex)->__data.__lock)
> -#endif
> -
>  static int __pthread_mutex_lock_full (pthread_mutex_t *mutex)
>       __attribute_noinline__;
>
> @@ -138,17 +134,28 @@ PTHREAD_MUTEX_LOCK (pthread_mutex_t *mutex)
>           int cnt = 0;
>           int max_cnt = MIN (max_adaptive_count (),
>                              mutex->__data.__spins * 2 + 10);
> +         int spin_count, exp_backoff = 1;
> +         unsigned int jitter = random_bits ();
>           do
>             {
> -             if (cnt++ >= max_cnt)
> +             /* In each loop, spin count is exponential backoff plus
> +                random jitter, random range is [0, exp_backoff-1].  */
> +             spin_count = exp_backoff + (jitter & (exp_backoff - 1));
> +             cnt += spin_count;
> +             if (cnt >= max_cnt)
>                 {
> +                 /* If cnt exceeds max spin count, just go to wait
> +                    queue.  */
>                   LLL_MUTEX_LOCK (mutex);
>                   break;
>                 }
> -             atomic_spin_nop ();
> +             do
> +                 atomic_spin_nop ();
> +             while (--spin_count > 0);
> +             /* Binary exponential backoff, prepare for next loop.  */
> +             exp_backoff <<= 1;
>             }
> -         while (LLL_MUTEX_READ_LOCK (mutex) != 0
> -                || LLL_MUTEX_TRYLOCK (mutex) != 0);
> +         while (LLL_MUTEX_TRYLOCK (mutex) != 0);

Does it perform better w.o the read?

In general can you post some benchmarks varying the number of threads / size of
the critical section? Would also be nice if you could collect some
stats regarding
the average number of failed CAS attempts before/after.
>
>           mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8;
>         }
> --
> 2.35.1
>


More information about the Libc-alpha mailing list