This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCHv2] powerpc: Spinlock optimization and cleanup

From: "Tulio Magno Quites Machado Filho" <tuliom at linux dot vnet dot ibm dot com>
To: "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>
Cc: "libc-alpha\ at sourceware dot org" <libc-alpha at sourceware dot org>, triegel at redhat dot com, rth at twiddle dot net, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, Steve Munroe <sjmunroe at us dot ibm dot com>
Cc:
Date: Wed, 30 Sep 2015 17:31:00 -0300
Subject: Re: [PATCHv2] powerpc: Spinlock optimization and cleanup
Authentication-results: sourceware.org; auth=none
References: <560C0DA6 dot 5060409 at linux dot vnet dot ibm dot com>

"Paul E. Murphy" <murphyp@linux.vnet.ibm.com> writes:

> Changes from V1:
>
> * Use C macro for atomic_store_release as suggested in
> comments.
>
> * Run benchmarks to quantize the performance changes and
> note in commit message.

Thanks for resurrecting this patch!

> This patch optimizes powerpc spinlock implementation by:
>
> * Current algorithm spin over a lwzx, but only after issuing a lwarx.
>   first.  The optimization for such case is to avoid the first lwarx
>   in contention case (which is much more costly than normal loads).
>
> * Use the correct EH hint bit on the larx for supported ISA.  For lock

lwarx

>   acquisition, the thread that acquired the lock with a successful stcx

stwcx

>   do not want to give away the write ownership on the cacheline.  The
>   idea is to make the load reservation "sticky" about retaining write
>   authority to the line.  That way, the store that must inevitably come
>   to release the lock can succeed quickly and not contend with other
>   threads issuing lwarx.  If another thread does a store to the line
>   (false sharing), the winning thread must give up write authority to

Is something missing between the previous line and the next?

>   The proper value of EH for the larx for a lock acquisition is 1.

Could you move part of this explanation to pthread_spin_lock.c as a comment,
please?
I'm referring only to the part where you explain the logic behind this
optimization.

> * Increase contented lock performance by up to 40%, and no measurable
>   impact on uncontended locks on P8.
>
> It also adds some cleanup to use the defined acquire semantic
> instructions and function prototype using default C style.
>
> Thanks to Adhemerval Zanella who did most of the work.  I've run some
> tests, and addressed some minor feedback.
>
> 2015-09-25  Paul E. Murphy  <murphyp@linux.vnet.ibm.com>

As this patch is based on the work of Adhemerval, it's important to mention
him in the ChangeLog.  Here's an example:
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=d3573f61a#patch1

> ---
>  sysdeps/powerpc/nptl/pthread_spin_lock.c           |   22 ++++++++-------
>  sysdeps/powerpc/nptl/pthread_spin_trylock.c        |    3 +-
>  sysdeps/powerpc/nptl/pthread_spin_unlock.c         |   27 +++++++++++++++++++
>  .../unix/sysv/linux/powerpc/pthread_spin_unlock.c  |   28 --------------------
>  4 files changed, 40 insertions(+), 40 deletions(-)
>  create mode 100644 sysdeps/powerpc/nptl/pthread_spin_unlock.c
>  delete mode 100644 sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
>
> diff --git a/sysdeps/powerpc/nptl/pthread_spin_lock.c b/sysdeps/powerpc/nptl/pthread_spin_lock.c
> index cc081f8..8a39da9 100644
> --- a/sysdeps/powerpc/nptl/pthread_spin_lock.c
> +++ b/sysdeps/powerpc/nptl/pthread_spin_lock.c
> @@ -19,24 +19,26 @@
>  #include "pthreadP.h"
>
>  int
> -pthread_spin_lock (lock)
> -     pthread_spinlock_t *lock;
> +pthread_spin_lock (pthread_spinlock_t *lock)
>  {
>    unsigned int __tmp;
>
>    asm volatile (
> -       "1:	lwarx	%0,0,%1\n"
> +       "0:	lwzx    %0,0,%1\n"
> +       "	cmpwi   0,%0,0\n"
> +       "	bne	0b\n"
> +       "1:	lwarx	%0,0,%1" MUTEX_HINT_ACQ "\n"
>         "	cmpwi	0,%0,0\n"
>         "	bne-	2f\n"
>         "	stwcx.	%2,0,%1\n"
>         "	bne-	2f\n"
> -       "	isync\n"
> -       "	.subsection 1\n"
> -       "2:	lwzx	%0,0,%1\n"
> -       "	cmpwi	0,%0,0\n"
> -       "	bne	2b\n"
> -       "	b	1b\n"
> -       "	.previous"
> +                __ARCH_ACQ_INSTR "\n"
> +       "        .subsection 1\n"
> +       "2:	lwzx    %0,0,%1\n"
> +       "        cmpwi   0,%0,0\n"
> +       "        bne     2b\n"
> +       "        b       1b\n"
> +       "        .previous"

Part of the asm instructions are indented with spaces while most of them are
using tabs.

-- 
Tulio Magno

References:
- [PATCHv2] powerpc: Spinlock optimization and cleanup
  - From: Paul E. Murphy

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]