[PATCH v2] nptl: Add backoff mechanism to spinlock loop
Noah Goldstein
goldstein.w.n@gmail.com
Mon Mar 28 16:41:21 GMT 2022
On Mon, Mar 28, 2022 at 3:47 AM Wangyang Guo <wangyang.guo@intel.com> wrote:
>
> When mutiple threads waiting for lock at the same time, once lock owner
> releases the lock, waiters will see lock available and all try to lock,
> which may cause an expensive CAS storm.
>
> Binary exponential backoff with random jitter is introduced. As try-lock
> attempt increases, there is more likely that a larger number threads
> compete for adaptive mutex lock, so increase wait time in exponential.
> A random jitter is also added to avoid synchronous try-lock from other
> threads.
>
> v2: Remove read-check before try-lock for performance.
>
> Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
> ---
> nptl/pthread_mutex_lock.c | 25 ++++++++++++++++---------
> 1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c
> index d2e652d151..7e75ec1cba 100644
> --- a/nptl/pthread_mutex_lock.c
> +++ b/nptl/pthread_mutex_lock.c
> @@ -26,6 +26,7 @@
> #include <futex-internal.h>
> #include <stap-probe.h>
> #include <shlib-compat.h>
> +#include <random-bits.h>
>
> /* Some of the following definitions differ when pthread_mutex_cond_lock.c
> includes this file. */
> @@ -64,11 +65,6 @@ lll_mutex_lock_optimized (pthread_mutex_t *mutex)
> # define PTHREAD_MUTEX_VERSIONS 1
> #endif
>
> -#ifndef LLL_MUTEX_READ_LOCK
> -# define LLL_MUTEX_READ_LOCK(mutex) \
> - atomic_load_relaxed (&(mutex)->__data.__lock)
> -#endif
> -
> static int __pthread_mutex_lock_full (pthread_mutex_t *mutex)
> __attribute_noinline__;
>
> @@ -138,17 +134,28 @@ PTHREAD_MUTEX_LOCK (pthread_mutex_t *mutex)
> int cnt = 0;
> int max_cnt = MIN (max_adaptive_count (),
> mutex->__data.__spins * 2 + 10);
> + int spin_count, exp_backoff = 1;
> + unsigned int jitter = random_bits ();
> do
> {
> - if (cnt++ >= max_cnt)
> + /* In each loop, spin count is exponential backoff plus
> + random jitter, random range is [0, exp_backoff-1]. */
> + spin_count = exp_backoff + (jitter & (exp_backoff - 1));
> + cnt += spin_count;
> + if (cnt >= max_cnt)
> {
> + /* If cnt exceeds max spin count, just go to wait
> + queue. */
> LLL_MUTEX_LOCK (mutex);
> break;
> }
> - atomic_spin_nop ();
> + do
> + atomic_spin_nop ();
> + while (--spin_count > 0);
> + /* Binary exponential backoff, prepare for next loop. */
> + exp_backoff <<= 1;
> }
> - while (LLL_MUTEX_READ_LOCK (mutex) != 0
> - || LLL_MUTEX_TRYLOCK (mutex) != 0);
> + while (LLL_MUTEX_TRYLOCK (mutex) != 0);
Does it perform better w.o the read?
In general can you post some benchmarks varying the number of threads / size of
the critical section? Would also be nice if you could collect some
stats regarding
the average number of failed CAS attempts before/after.
>
> mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8;
> }
> --
> 2.35.1
>
More information about the Libc-alpha
mailing list