This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: PowerPC: libc single-thread lock optimization
- From: Torvald Riegel <triegel at redhat dot com>
- To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- Cc: "GNU C. Library" <libc-alpha at sourceware dot org>
- Date: Fri, 02 May 2014 15:11:27 +0200
- Subject: Re: PowerPC: libc single-thread lock optimization
- Authentication-results: sourceware.org; auth=none
- References: <5343F8F1 dot 4000400 at linux dot vnet dot ibm dot com>
On Tue, 2014-04-08 at 10:26 -0300, Adhemerval Zanella wrote:
> This patch adds a single-thread optimization for libc.so locks used
> within the shared objects. For each lock operations it checks it the
> process has already spawned one thread and if not use non-atomic
> operations. Other libraries (libpthread.so for instance) are unaffected
> by this change.
>
> This is similar to x86_64 optimization on locks and atomics by using the
> __libc_multiple_threads variable.
>
> Tested on powerpc32, powerpc64, and powerp64le.
>
> Note: for macro code change I tried to change as little as possible the
> current syntax.
>
> --
>
> * nptl/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h
> (__lll_robust_trylock): Add single-thread lock optimization for calls
> within libc.so.
See comment below.
> * sysdeps/powerpc/bits/atomic.h
> (__arch_compare_and_exchange_val_32_acq): Likewise.
> (__arch_compare_and_exchange_val_32_rel): Likewise.
> (__arch_atomic_exchange_32_acq): Likewise.
> (__arch_atomic_exchange_32_rel): Likewise.
> (__arch_atomic_exchange_and_add_32): Likewise.
> (__arch_atomic_increment_val_32): Likewise.
> (__arch_atomic_decrement_val_32): Likewise.
> (__arch_atomic_decrement_if_positive_32): Likewise.
> * sysdeps/powerpc/powerpc32/bits/atomic.h
> (__arch_compare_and_exchange_bool_32_acq): Likewise.
> (__arch_compare_and_exchange_bool_32_rel): Likewise.
> * sysdeps/powerpc/powerpc64/bits/atomic.h
> (__arch_compare_and_exchange_bool_32_acq): Likewise.
> (__arch_compare_and_exchange_bool_32_rel): Likewise.
> (__arch_compare_and_exchange_bool_64_acq): Likewise.
> (__arch_compare_and_exchange_bool_64_rel): Likewise.
> (__arch_compare_and_exchange_val_64_acq): Likewise.
> (__arch_compare_and_exchange_val_64_rel): Likewise.
> (__arch_atomic_exchange_64_acq): Likewise.
> (__arch_atomic_exchange_64_rel): Likewise.
> (__arch_atomic_exchange_and_add_64): Likewise.
> (__arch_atomic_increment_val_64): Likewise.
> (__arch_atomic_decrement_val_64): Likewise.
> (__arch_atomic_decrement_if_positive_64): Likewise.
I think the optimization you attempt here does not belong into the
low-level atomics, as I've argued elsewhere in the thread.
> ---
>
> diff --git a/nptl/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h b/nptl/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h
> index ab92c3f..419ee2f 100644
> --- a/nptl/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h
> +++ b/nptl/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h
> @@ -205,7 +205,9 @@
> /* Set *futex to ID if it is 0, atomically. Returns the old value */
> #define __lll_robust_trylock(futex, id) \
> ({ int __val; \
> - __asm __volatile ("1: lwarx %0,0,%2" MUTEX_HINT_ACQ "\n" \
> + if (!SINGLE_THREAD_P) \
> + __asm __volatile ( \
> + "1: lwarx %0,0,%2" MUTEX_HINT_ACQ "\n" \
> " cmpwi 0,%0,0\n" \
> " bne 2f\n" \
> " stwcx. %3,0,%2\n" \
> @@ -214,6 +216,12 @@
> : "=&r" (__val), "=m" (*futex) \
> : "r" (futex), "r" (id), "m" (*futex) \
> : "cr0", "memory"); \
> + else \
> + { \
> + __val = *futex; \
> + if (__val == 0) \
> + *futex = id; \
> + } \
> __val; \
> })
>
The single client of this is in pthread_mutex_trylock.c. If there's
actually a need to do this, it belongs here, not in each arch's
lowlevellock.h.