This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] powerpc: Spinlock optimization and cleanup
- From: Torvald Riegel <triegel at redhat dot com>
- To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- Cc: "GNU C. Library" <libc-alpha at sourceware dot org>
- Date: Fri, 27 Feb 2015 13:20:49 +0100
- Subject: Re: [PATCH] powerpc: Spinlock optimization and cleanup
- Authentication-results: sourceware.org; auth=none
- References: <54EF91B2 dot 3020704 at linux dot vnet dot ibm dot com>
On Thu, 2015-02-26 at 18:35 -0300, Adhemerval Zanella wrote:
> This patch optimizes powerpc spinlock implementation by:
> * Current algorithm spin over a lwzx, but only after issuing a lwarx.
> first. The optimization for such case is to avoid the first lwarx
> in contention case (which is much more costly than normal loads).
IOW, the previous code was TAS-and-test-and-TAS loop, and you're
changing it into a test-and-TAS loop?
Are you sure that this gives enough benefit for the contended case to
offset potential disadvantages for the uncontended case in which the
lock is not yet in the cache? That is, whether the unnecessary
transition to write ownership when the lock is already acquired is more
important to avoid than the potential transition to shared first
followed by a transition to ownership?
(I'm not familiar with powerpc enough to answer any of that, I just want
to make sure the reason for our decisions are clear and documented.
I've done a few measurements on other architectures in the past, and
there the test-and-TAS wasn't necessarily better...)
Do you have benchmarks to show the effects? In this particular case,
I'd trust the machine maintainers with keeping the decisions we make
up-to-date wrt current hardware (ie, maintain the code so that we always
make the best trade-offs in the future); but it would be even better if
we had benchmarks.
Have you investigated with adding back-off for the contended case?
What are your assumptions about the workload and the relative importance
of individual cases (e.g., likelihood of contended vs. uncontended)? We
should agree on those globally (ie, across archs), and document them.
> * Use the correct EH hint bit on the larx for supported ISA. For lock
> acquisition, the thread that acquired the lock with a successful stcx
> do not want to give away the write ownership on the cacheline. The
> idea is to make the load reservation "sticky" about retaining write
> authority to the line. That way, the store that must inevitably come
> to release the lock can succeed quickly and not contend with other
> threads issuing lwarx. If another thread does a store to the line
> (false sharing), the winning thread must give up write authority to
> The proper value of EH for the larx for a lock acquisition is 1.
I would strongly prefer if we could use C code for our locks, including
Ignoring the cacheline ownership hint, is there any reason to use
assembly here (with either the current assembly implementation of
atomics that we have, or the new GCC __atomic builtins)? If so, what
could / would we need to fix in these implementations of atomics? Is
there a measurable difference?
Regarding the ownership hint, what about extending the atomics that we
have with support for that? This hint could be useful in other
concurrent code too, and we don't want to have to clone the, say, whole
rwlock implementation for powerpc just to add a few such hints.
> diff --git a/sysdeps/powerpc/nptl/pthread_spin_unlock.c b/sysdeps/powerpc/nptl/pthread_spin_unlock.c
> new file mode 100644
> index 0000000..7af694f
> --- /dev/null
> +++ b/sysdeps/powerpc/nptl/pthread_spin_unlock.c
> @@ -0,0 +1,28 @@
> +/* pthread_spin_unlock -- unlock a spin lock. PowerPC version.
> + Copyright (C) 2007-2015 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +#include "pthreadP.h"
> +#include <lowlevellock.h>
> +pthread_spin_unlock (pthread_spinlock_t *lock)
> + __asm __volatile (__ARCH_REL_INSTR ::: "memory");
> + *lock = 0;
> + return 0;
Is there any reason to not use "atomic_store_release (lock)" here? Or
adapt the generic pthread_spin_unlock to use just a release barrier
(which I'd be open to doing too), and get rid of the powerpc-specific