This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCHv2] powerpc: Spinlock optimization and cleanup
- From: "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, rth at twiddle dot net, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, Steve Munroe <sjmunroe at us dot ibm dot com>
- Date: Wed, 7 Oct 2015 14:00:09 -0500
- Subject: Re: [PATCHv2] powerpc: Spinlock optimization and cleanup
- Authentication-results: sourceware.org; auth=none
- References: <560C0DA6 dot 5060409 at linux dot vnet dot ibm dot com> <1443727671 dot 30828 dot 420 dot camel at localhost dot localdomain>
On 10/01/2015 02:27 PM, Torvald Riegel wrote:
> On Wed, 2015-09-30 at 11:28 -0500, Paul E. Murphy wrote:
>> Changes from V1:
>>
>> * Use C macro for atomic_store_release as suggested in
>> comments.
>>
>> * Run benchmarks to quantize the performance changes and
>> note in commit message.
>>
>> ---8<---
>> This patch optimizes powerpc spinlock implementation by:
>>
>> * Current algorithm spin over a lwzx, but only after issuing a lwarx.
>> first. The optimization for such case is to avoid the first lwarx
>> in contention case (which is much more costly than normal loads).
>>
>> * Use the correct EH hint bit on the larx for supported ISA. For lock
>> acquisition, the thread that acquired the lock with a successful stcx
>> do not want to give away the write ownership on the cacheline. The
>> idea is to make the load reservation "sticky" about retaining write
>> authority to the line. That way, the store that must inevitably come
>> to release the lock can succeed quickly and not contend with other
>> threads issuing lwarx. If another thread does a store to the line
>> (false sharing), the winning thread must give up write authority to
>> The proper value of EH for the larx for a lock acquisition is 1.
>>
>> * Increase contented lock performance by up to 40%, and no measurable
>> impact on uncontended locks on P8.
>
> Could you add the tests you used to the glibc microbenchmarks (or
> whatever works best for them)? We do want to be able to track
> performance, and having benchmarks is the first step towards that.
Would you object to that being a separate patch?
As I sketch out the work to integrate it into our benchtest, it seems
like it could be trivially extended to mutexes aswell.
>> diff --git a/sysdeps/powerpc/nptl/pthread_spin_unlock.c b/sysdeps/powerpc/nptl/pthread_spin_unlock.c
>> new file mode 100644
>> index 0000000..f830ad2
>> --- /dev/null
>> +++ b/sysdeps/powerpc/nptl/pthread_spin_unlock.c
>> @@ -0,0 +1,27 @@
>> +/* pthread_spin_unlock -- unlock a spin lock. PowerPC version.
>> + Copyright (C) 2007-2015 Free Software Foundation, Inc.
>> + This file is part of the GNU C Library.
>> +
>> + The GNU C Library is free software; you can redistribute it and/or
>> + modify it under the terms of the GNU Lesser General Public
>> + License as published by the Free Software Foundation; either
>> + version 2.1 of the License, or (at your option) any later version.
>> +
>> + The GNU C Library is distributed in the hope that it will be useful,
>> + but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> + Lesser General Public License for more details.
>> +
>> + You should have received a copy of the GNU Lesser General Public
>> + License along with the GNU C Library; if not, see
>> + <http://www.gnu.org/licenses/>. */
>> +
>> +#include "pthreadP.h"
>> +#include <lowlevellock.h>
>
> Do you really need to include lowlevellock.h?
No, atomic.h is needed at minimum.
Thanks,
Paul