This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCHv2] powerpc: Spinlock optimization and cleanup
- From: "Tulio Magno Quites Machado Filho" <tuliom at linux dot vnet dot ibm dot com>
- To: "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>
- Cc: "libc-alpha\ at sourceware dot org" <libc-alpha at sourceware dot org>, triegel at redhat dot com, rth at twiddle dot net, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, Steve Munroe <sjmunroe at us dot ibm dot com>
- Cc:
- Date: Wed, 30 Sep 2015 17:31:00 -0300
- Subject: Re: [PATCHv2] powerpc: Spinlock optimization and cleanup
- Authentication-results: sourceware.org; auth=none
- References: <560C0DA6 dot 5060409 at linux dot vnet dot ibm dot com>
"Paul E. Murphy" <murphyp@linux.vnet.ibm.com> writes:
> Changes from V1:
>
> * Use C macro for atomic_store_release as suggested in
> comments.
>
> * Run benchmarks to quantize the performance changes and
> note in commit message.
Thanks for resurrecting this patch!
> This patch optimizes powerpc spinlock implementation by:
>
> * Current algorithm spin over a lwzx, but only after issuing a lwarx.
> first. The optimization for such case is to avoid the first lwarx
> in contention case (which is much more costly than normal loads).
>
> * Use the correct EH hint bit on the larx for supported ISA. For lock
lwarx
> acquisition, the thread that acquired the lock with a successful stcx
stwcx
> do not want to give away the write ownership on the cacheline. The
> idea is to make the load reservation "sticky" about retaining write
> authority to the line. That way, the store that must inevitably come
> to release the lock can succeed quickly and not contend with other
> threads issuing lwarx. If another thread does a store to the line
> (false sharing), the winning thread must give up write authority to
Is something missing between the previous line and the next?
> The proper value of EH for the larx for a lock acquisition is 1.
Could you move part of this explanation to pthread_spin_lock.c as a comment,
please?
I'm referring only to the part where you explain the logic behind this
optimization.
> * Increase contented lock performance by up to 40%, and no measurable
> impact on uncontended locks on P8.
>
> It also adds some cleanup to use the defined acquire semantic
> instructions and function prototype using default C style.
>
> Thanks to Adhemerval Zanella who did most of the work. I've run some
> tests, and addressed some minor feedback.
>
> 2015-09-25 Paul E. Murphy <murphyp@linux.vnet.ibm.com>
As this patch is based on the work of Adhemerval, it's important to mention
him in the ChangeLog. Here's an example:
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=d3573f61a#patch1
> ---
> sysdeps/powerpc/nptl/pthread_spin_lock.c | 22 ++++++++-------
> sysdeps/powerpc/nptl/pthread_spin_trylock.c | 3 +-
> sysdeps/powerpc/nptl/pthread_spin_unlock.c | 27 +++++++++++++++++++
> .../unix/sysv/linux/powerpc/pthread_spin_unlock.c | 28 --------------------
> 4 files changed, 40 insertions(+), 40 deletions(-)
> create mode 100644 sysdeps/powerpc/nptl/pthread_spin_unlock.c
> delete mode 100644 sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
>
> diff --git a/sysdeps/powerpc/nptl/pthread_spin_lock.c b/sysdeps/powerpc/nptl/pthread_spin_lock.c
> index cc081f8..8a39da9 100644
> --- a/sysdeps/powerpc/nptl/pthread_spin_lock.c
> +++ b/sysdeps/powerpc/nptl/pthread_spin_lock.c
> @@ -19,24 +19,26 @@
> #include "pthreadP.h"
>
> int
> -pthread_spin_lock (lock)
> - pthread_spinlock_t *lock;
> +pthread_spin_lock (pthread_spinlock_t *lock)
> {
> unsigned int __tmp;
>
> asm volatile (
> - "1: lwarx %0,0,%1\n"
> + "0: lwzx %0,0,%1\n"
> + " cmpwi 0,%0,0\n"
> + " bne 0b\n"
> + "1: lwarx %0,0,%1" MUTEX_HINT_ACQ "\n"
> " cmpwi 0,%0,0\n"
> " bne- 2f\n"
> " stwcx. %2,0,%1\n"
> " bne- 2f\n"
> - " isync\n"
> - " .subsection 1\n"
> - "2: lwzx %0,0,%1\n"
> - " cmpwi 0,%0,0\n"
> - " bne 2b\n"
> - " b 1b\n"
> - " .previous"
> + __ARCH_ACQ_INSTR "\n"
> + " .subsection 1\n"
> + "2: lwzx %0,0,%1\n"
> + " cmpwi 0,%0,0\n"
> + " bne 2b\n"
> + " b 1b\n"
> + " .previous"
Part of the asm instructions are indented with spaces while most of them are
using tabs.
--
Tulio Magno