This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: PowerPC: libc single-thread lock optimization

From: Torvald Riegel <triegel at redhat dot com>
To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
Cc: "GNU C. Library" <libc-alpha at sourceware dot org>
Date: Fri, 02 May 2014 15:11:27 +0200
Subject: Re: PowerPC: libc single-thread lock optimization
Authentication-results: sourceware.org; auth=none
References: <5343F8F1 dot 4000400 at linux dot vnet dot ibm dot com>

On Tue, 2014-04-08 at 10:26 -0300, Adhemerval Zanella wrote:
> This patch adds a single-thread optimization for libc.so locks used
> within the shared objects.  For each lock operations it checks it the
> process has already spawned one thread and if not use non-atomic
> operations.  Other libraries (libpthread.so for instance) are unaffected
> by this change.
> 
> This is similar to x86_64 optimization on locks and atomics by using the
> __libc_multiple_threads variable.
> 
> Tested on powerpc32, powerpc64, and powerp64le.
> 
> Note: for macro code change I tried to change as little as possible the
> current syntax.
> 
> --
> 
>  	* nptl/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h
> 	(__lll_robust_trylock): Add single-thread lock optimization for calls
> 	within libc.so.

See comment below.

> 	* sysdeps/powerpc/bits/atomic.h
> 	(__arch_compare_and_exchange_val_32_acq): Likewise.
> 	(__arch_compare_and_exchange_val_32_rel): Likewise.
> 	(__arch_atomic_exchange_32_acq): Likewise.
> 	(__arch_atomic_exchange_32_rel): Likewise.
> 	(__arch_atomic_exchange_and_add_32): Likewise.
> 	(__arch_atomic_increment_val_32): Likewise.
> 	(__arch_atomic_decrement_val_32): Likewise.
> 	(__arch_atomic_decrement_if_positive_32): Likewise.
>  	* sysdeps/powerpc/powerpc32/bits/atomic.h
> 	(__arch_compare_and_exchange_bool_32_acq): Likewise.
> 	(__arch_compare_and_exchange_bool_32_rel): Likewise.
>  	* sysdeps/powerpc/powerpc64/bits/atomic.h
> 	(__arch_compare_and_exchange_bool_32_acq): Likewise.
> 	(__arch_compare_and_exchange_bool_32_rel): Likewise.
> 	(__arch_compare_and_exchange_bool_64_acq): Likewise.
> 	(__arch_compare_and_exchange_bool_64_rel): Likewise.
> 	(__arch_compare_and_exchange_val_64_acq): Likewise.
> 	(__arch_compare_and_exchange_val_64_rel): Likewise.
> 	(__arch_atomic_exchange_64_acq): Likewise.
> 	(__arch_atomic_exchange_64_rel): Likewise.
> 	(__arch_atomic_exchange_and_add_64): Likewise.
> 	(__arch_atomic_increment_val_64): Likewise.
> 	(__arch_atomic_decrement_val_64): Likewise.
> 	(__arch_atomic_decrement_if_positive_64): Likewise.

I think the optimization you attempt here does not belong into the
low-level atomics, as I've argued elsewhere in the thread.

> ---
> 
> diff --git a/nptl/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h b/nptl/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h
> index ab92c3f..419ee2f 100644
> --- a/nptl/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h
> +++ b/nptl/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h
> @@ -205,7 +205,9 @@
>  /* Set *futex to ID if it is 0, atomically.  Returns the old value */
>  #define __lll_robust_trylock(futex, id) \
>    ({ int __val;								      \
> -     __asm __volatile ("1:	lwarx	%0,0,%2" MUTEX_HINT_ACQ "\n"	      \
> +     if (!SINGLE_THREAD_P)						      \
> +       __asm __volatile (						      \
> +		       "1:	lwarx	%0,0,%2" MUTEX_HINT_ACQ "\n"	      \
>  		       "	cmpwi	0,%0,0\n"			      \
>  		       "	bne	2f\n"				      \
>  		       "	stwcx.	%3,0,%2\n"			      \
> @@ -214,6 +216,12 @@
>  		       : "=&r" (__val), "=m" (*futex)			      \
>  		       : "r" (futex), "r" (id), "m" (*futex)		      \
>  		       : "cr0", "memory");				      \
> +     else								      \
> +       {								      \
> +	 __val = *futex;						      \
> +	 if (__val == 0)						      \
> +	   *futex = id;							      \
> +       }								      \
>       __val;								      \
>    })
>  

The single client of this is in pthread_mutex_trylock.c.  If there's
actually a need to do this, it belongs here, not in each arch's
lowlevellock.h.

Follow-Ups:
- Re: PowerPC: libc single-thread lock optimization
  - From: Adhemerval Zanella

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]