This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCHv2] powerpc: Spinlock optimization and cleanup


On Thu, 2015-10-01 at 21:27 +0200, Torvald Riegel wrote:
> On Wed, 2015-09-30 at 11:28 -0500, Paul E. Murphy wrote:
> > Changes from V1:
> > 
> > * Use C macro for atomic_store_release as suggested in
> > comments.
> > 
> > * Run benchmarks to quantize the performance changes and
> > note in commit message.
> > 
> > ---8<---
> > This patch optimizes powerpc spinlock implementation by:
> > 
> > * Current algorithm spin over a lwzx, but only after issuing a lwarx.
> >   first.  The optimization for such case is to avoid the first lwarx
> >   in contention case (which is much more costly than normal loads).
> > 
> > * Use the correct EH hint bit on the larx for supported ISA.  For lock
> >   acquisition, the thread that acquired the lock with a successful stcx
> >   do not want to give away the write ownership on the cacheline.  The
> >   idea is to make the load reservation "sticky" about retaining write
> >   authority to the line.  That way, the store that must inevitably come
> >   to release the lock can succeed quickly and not contend with other
> >   threads issuing lwarx.  If another thread does a store to the line
> >   (false sharing), the winning thread must give up write authority to
> >   The proper value of EH for the larx for a lock acquisition is 1.
> > 
> > * Increase contented lock performance by up to 40%, and no measurable
> >   impact on uncontended locks on P8.
> 
> Could you add the tests you used to the glibc microbenchmarks (or
> whatever works best for them)?  We do want to be able to track
> performance, and having benchmarks is the first step towards that.
> 
> > It also adds some cleanup to use the defined acquire semantic
> > instructions and function prototype using default C style.
> > 
> > Thanks to Adhemerval Zanella who did most of the work.  I've run some
> > tests, and addressed some minor feedback.
> > 
> > 2015-09-25  Paul E. Murphy  <murphyp@linux.vnet.ibm.com>
> > 
> > 	* sysdeps/powerpc/nptl/pthread_spin_lock.c (pthread_spin_lock):
> > 	Optimize first check for contention case and add lwarx hint.
> > 	* sysdeps/powerpc/nptl/pthread_spin_trylock.c (pthread_spin_trylock):
> > 	Use ANSI prototype.
> > 	* sysdep/unix/sysv/linux/powerpc/pthread_spin_unlock.c: Move to ...
> > 	* sysdeps/powerpc/nptl/pthread_spin_unlock.c: ... here, and
> > 	update to new atomic macros.
> > ---
> >  sysdeps/powerpc/nptl/pthread_spin_lock.c           |   22 ++++++++-------
> >  sysdeps/powerpc/nptl/pthread_spin_trylock.c        |    3 +-
> >  sysdeps/powerpc/nptl/pthread_spin_unlock.c         |   27 +++++++++++++++++++
> >  .../unix/sysv/linux/powerpc/pthread_spin_unlock.c  |   28 --------------------
> >  4 files changed, 40 insertions(+), 40 deletions(-)
> >  create mode 100644 sysdeps/powerpc/nptl/pthread_spin_unlock.c
> >  delete mode 100644 sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
> > 
> > diff --git a/sysdeps/powerpc/nptl/pthread_spin_lock.c b/sysdeps/powerpc/nptl/pthread_spin_lock.c
> > index cc081f8..8a39da9 100644
> > --- a/sysdeps/powerpc/nptl/pthread_spin_lock.c
> > +++ b/sysdeps/powerpc/nptl/pthread_spin_lock.c
> > @@ -19,24 +19,26 @@
> >  #include "pthreadP.h"
> >  
> >  int
> > -pthread_spin_lock (lock)
> > -     pthread_spinlock_t *lock;
> > +pthread_spin_lock (pthread_spinlock_t *lock)
> >  {
> >    unsigned int __tmp;
> >  
> >    asm volatile (
> > -       "1:	lwarx	%0,0,%1\n"
> > +       "0:	lwzx    %0,0,%1\n"
> > +       "	cmpwi   0,%0,0\n"
> > +       "	bne	0b\n"
> > +       "1:	lwarx	%0,0,%1" MUTEX_HINT_ACQ "\n"
> >         "	cmpwi	0,%0,0\n"
> >         "	bne-	2f\n"
> >         "	stwcx.	%2,0,%1\n"
> >         "	bne-	2f\n"
> > -       "	isync\n"
> > -       "	.subsection 1\n"
> > -       "2:	lwzx	%0,0,%1\n"
> > -       "	cmpwi	0,%0,0\n"
> > -       "	bne	2b\n"
> > -       "	b	1b\n"
> > -       "	.previous"
> > +                __ARCH_ACQ_INSTR "\n"
> > +       "        .subsection 1\n"
> > +       "2:	lwzx    %0,0,%1\n"
> > +       "        cmpwi   0,%0,0\n"
> > +       "        bne     2b\n"
> > +       "        b       1b\n"
> > +       "        .previous"
> >         : "=&r" (__tmp)
> >         : "r" (lock), "r" (1)
> >         : "cr0", "memory");
> 
> Is this essentially a test-and-test-and-set implementation of a lock,
> with the MUTEX_HINT_ACQ hint additionally?  If so, have you considered 
> Have you considered adding a variant of
> atomic_compare_exchange_weak_acquire or atomic_exchange_acquire that set
> and hint, and then writing a C function using atomic an atomic load in
> the outer test loop and using the new cmpxchg/xchg variant in the inner
> loop?  That would make it easier to eventually merge this with the
> generic version of spinlocks.  Also, once we have to adding spinning and
> randomized exponential backoff to the generic locks, powerpc would be
> able to benefit from those generic changes too (perhaps with some
> additional tuning). 
> 
Paul is traveling today so I will answer.

It may be functionally possible to use a 

__atomic_compare_exchange_n ((mem), (expected), (desired),
	weak, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)

in this construct.

But the cmpxchg is a clumsy way to generate the PowerISA Acquire Import
Barrier which has a very specific form. 

Also it is not obvious how the MUTEX_HINT_ACQ applies to
__atomic_compare_exchange_n in general or even
__atomic_compare_exchange_n (*, *, *, *, __ATOMIC_ACQUIRE,
__ATOMIC_RELAXED).

In the PowerISA, MUTEX_HINT_ACQ means:

 The Load and Reserve instructions include an Exclusive
 Access hint (EH), which can be used to indicate
 that the instruction sequence being executed is implementing
 one of two types of algorithms:

 Atomic Update (EH=0)
  This hint indicates that the program is using a fetch and
  operate (e.g., fetch and add) or some similar algorithm
  and that all programs accessing the shared variable are
  likely to use a similar operation to access the shared
  variable for some time.
 Exclusive Access (EH=1)
  This hint indicates that the program is attempting to
  acquire a lock and if it succeeds, will perform another
  store to the lock variable (releasing the lock) before
  another program attempts to modify the lock variable.

It is not clear that C11 __atomic_compare_exchange when/if/ever implies
exclusive access in the meaning of the PowerISA.

Finally trying to combine atomic_compare_exchange_weak_acquire with the
relaxed load spin is likely to generate some awkward sequences with
unnecessary code.

I would prefer to proceed with this implementation and revisit when we
are further along with C11 integration.

> > diff --git a/sysdeps/powerpc/nptl/pthread_spin_unlock.c b/sysdeps/powerpc/nptl/pthread_spin_unlock.c
> > new file mode 100644
> > index 0000000..f830ad2
> > --- /dev/null
> > +++ b/sysdeps/powerpc/nptl/pthread_spin_unlock.c
> > @@ -0,0 +1,27 @@
> > +/* pthread_spin_unlock -- unlock a spin lock.  PowerPC version.
> > +   Copyright (C) 2007-2015 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <http://www.gnu.org/licenses/>.  */
> > +
> > +#include "pthreadP.h"
> > +#include <lowlevellock.h>
> 
> Do you really need to include lowlevellock.h?
> 
> > +int
> > +pthread_spin_unlock (pthread_spinlock_t *lock)
> > +{
> > +  atomic_store_release (lock, 0);
> > +  return 0;
> > +}
> 
> 



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]