This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCHv2] powerpc: Spinlock optimization and cleanup

From: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
To: Torvald Riegel <triegel at redhat dot com>
Cc: "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, rth at twiddle dot net, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, Steve Munroe <sjmunroe at us dot ibm dot com>
Date: Thu, 01 Oct 2015 15:28:04 -0500
Subject: Re: [PATCHv2] powerpc: Spinlock optimization and cleanup
Authentication-results: sourceware.org; auth=none
References: <560C0DA6 dot 5060409 at linux dot vnet dot ibm dot com> <1443727671 dot 30828 dot 420 dot camel at localhost dot localdomain>
Reply-to: munroesj at linux dot vnet dot ibm dot com

On Thu, 2015-10-01 at 21:27 +0200, Torvald Riegel wrote:
> On Wed, 2015-09-30 at 11:28 -0500, Paul E. Murphy wrote:
> > Changes from V1:
> > 
> > * Use C macro for atomic_store_release as suggested in
> > comments.
> > 
> > * Run benchmarks to quantize the performance changes and
> > note in commit message.
> > 
> > ---8<---
> > This patch optimizes powerpc spinlock implementation by:
> > 
> > * Current algorithm spin over a lwzx, but only after issuing a lwarx.
> >   first.  The optimization for such case is to avoid the first lwarx
> >   in contention case (which is much more costly than normal loads).
> > 
> > * Use the correct EH hint bit on the larx for supported ISA.  For lock
> >   acquisition, the thread that acquired the lock with a successful stcx
> >   do not want to give away the write ownership on the cacheline.  The
> >   idea is to make the load reservation "sticky" about retaining write
> >   authority to the line.  That way, the store that must inevitably come
> >   to release the lock can succeed quickly and not contend with other
> >   threads issuing lwarx.  If another thread does a store to the line
> >   (false sharing), the winning thread must give up write authority to
> >   The proper value of EH for the larx for a lock acquisition is 1.
> > 
> > * Increase contented lock performance by up to 40%, and no measurable
> >   impact on uncontended locks on P8.
> 
> Could you add the tests you used to the glibc microbenchmarks (or
> whatever works best for them)?  We do want to be able to track
> performance, and having benchmarks is the first step towards that.
> 
> > It also adds some cleanup to use the defined acquire semantic
> > instructions and function prototype using default C style.
> > 
> > Thanks to Adhemerval Zanella who did most of the work.  I've run some
> > tests, and addressed some minor feedback.
> > 
> > 2015-09-25  Paul E. Murphy  <murphyp@linux.vnet.ibm.com>
> > 
> > 	* sysdeps/powerpc/nptl/pthread_spin_lock.c (pthread_spin_lock):
> > 	Optimize first check for contention case and add lwarx hint.
> > 	* sysdeps/powerpc/nptl/pthread_spin_trylock.c (pthread_spin_trylock):
> > 	Use ANSI prototype.
> > 	* sysdep/unix/sysv/linux/powerpc/pthread_spin_unlock.c: Move to ...
> > 	* sysdeps/powerpc/nptl/pthread_spin_unlock.c: ... here, and
> > 	update to new atomic macros.
> > ---
> >  sysdeps/powerpc/nptl/pthread_spin_lock.c           |   22 ++++++++-------
> >  sysdeps/powerpc/nptl/pthread_spin_trylock.c        |    3 +-
> >  sysdeps/powerpc/nptl/pthread_spin_unlock.c         |   27 +++++++++++++++++++
> >  .../unix/sysv/linux/powerpc/pthread_spin_unlock.c  |   28 --------------------
> >  4 files changed, 40 insertions(+), 40 deletions(-)
> >  create mode 100644 sysdeps/powerpc/nptl/pthread_spin_unlock.c
> >  delete mode 100644 sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
> > 
> > diff --git a/sysdeps/powerpc/nptl/pthread_spin_lock.c b/sysdeps/powerpc/nptl/pthread_spin_lock.c
> > index cc081f8..8a39da9 100644
> > --- a/sysdeps/powerpc/nptl/pthread_spin_lock.c
> > +++ b/sysdeps/powerpc/nptl/pthread_spin_lock.c
> > @@ -19,24 +19,26 @@
> >  #include "pthreadP.h"
> >  
> >  int
> > -pthread_spin_lock (lock)
> > -     pthread_spinlock_t *lock;
> > +pthread_spin_lock (pthread_spinlock_t *lock)
> >  {
> >    unsigned int __tmp;
> >  
> >    asm volatile (
> > -       "1:	lwarx	%0,0,%1\n"
> > +       "0:	lwzx    %0,0,%1\n"
> > +       "	cmpwi   0,%0,0\n"
> > +       "	bne	0b\n"
> > +       "1:	lwarx	%0,0,%1" MUTEX_HINT_ACQ "\n"
> >         "	cmpwi	0,%0,0\n"
> >         "	bne-	2f\n"
> >         "	stwcx.	%2,0,%1\n"
> >         "	bne-	2f\n"
> > -       "	isync\n"
> > -       "	.subsection 1\n"
> > -       "2:	lwzx	%0,0,%1\n"
> > -       "	cmpwi	0,%0,0\n"
> > -       "	bne	2b\n"
> > -       "	b	1b\n"
> > -       "	.previous"
> > +                __ARCH_ACQ_INSTR "\n"
> > +       "        .subsection 1\n"
> > +       "2:	lwzx    %0,0,%1\n"
> > +       "        cmpwi   0,%0,0\n"
> > +       "        bne     2b\n"
> > +       "        b       1b\n"
> > +       "        .previous"
> >         : "=&r" (__tmp)
> >         : "r" (lock), "r" (1)
> >         : "cr0", "memory");
> 
> Is this essentially a test-and-test-and-set implementation of a lock,
> with the MUTEX_HINT_ACQ hint additionally?  If so, have you considered 
> Have you considered adding a variant of
> atomic_compare_exchange_weak_acquire or atomic_exchange_acquire that set
> and hint, and then writing a C function using atomic an atomic load in
> the outer test loop and using the new cmpxchg/xchg variant in the inner
> loop?  That would make it easier to eventually merge this with the
> generic version of spinlocks.  Also, once we have to adding spinning and
> randomized exponential backoff to the generic locks, powerpc would be
> able to benefit from those generic changes too (perhaps with some
> additional tuning). 
> 
Paul is traveling today so I will answer.

It may be functionally possible to use a 

__atomic_compare_exchange_n ((mem), (expected), (desired),
	weak, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)

in this construct.

But the cmpxchg is a clumsy way to generate the PowerISA Acquire Import
Barrier which has a very specific form. 

Also it is not obvious how the MUTEX_HINT_ACQ applies to
__atomic_compare_exchange_n in general or even
__atomic_compare_exchange_n (*, *, *, *, __ATOMIC_ACQUIRE,
__ATOMIC_RELAXED).

In the PowerISA, MUTEX_HINT_ACQ means:

 The Load and Reserve instructions include an Exclusive
 Access hint (EH), which can be used to indicate
 that the instruction sequence being executed is implementing
 one of two types of algorithms:

 Atomic Update (EH=0)
  This hint indicates that the program is using a fetch and
  operate (e.g., fetch and add) or some similar algorithm
  and that all programs accessing the shared variable are
  likely to use a similar operation to access the shared
  variable for some time.
 Exclusive Access (EH=1)
  This hint indicates that the program is attempting to
  acquire a lock and if it succeeds, will perform another
  store to the lock variable (releasing the lock) before
  another program attempts to modify the lock variable.

It is not clear that C11 __atomic_compare_exchange when/if/ever implies
exclusive access in the meaning of the PowerISA.

Finally trying to combine atomic_compare_exchange_weak_acquire with the
relaxed load spin is likely to generate some awkward sequences with
unnecessary code.

I would prefer to proceed with this implementation and revisit when we
are further along with C11 integration.

> > diff --git a/sysdeps/powerpc/nptl/pthread_spin_unlock.c b/sysdeps/powerpc/nptl/pthread_spin_unlock.c
> > new file mode 100644
> > index 0000000..f830ad2
> > --- /dev/null
> > +++ b/sysdeps/powerpc/nptl/pthread_spin_unlock.c
> > @@ -0,0 +1,27 @@
> > +/* pthread_spin_unlock -- unlock a spin lock.  PowerPC version.
> > +   Copyright (C) 2007-2015 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <http://www.gnu.org/licenses/>.  */
> > +
> > +#include "pthreadP.h"
> > +#include <lowlevellock.h>
> 
> Do you really need to include lowlevellock.h?
> 
> > +int
> > +pthread_spin_unlock (pthread_spinlock_t *lock)
> > +{
> > +  atomic_store_release (lock, 0);
> > +  return 0;
> > +}
> 
>

Follow-Ups:
- Re: [PATCHv2] powerpc: Spinlock optimization and cleanup
  - From: Torvald Riegel

References:
- Re: [PATCHv2] powerpc: Spinlock optimization and cleanup
  - From: Torvald Riegel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]