This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] [RFC] nptl: use compare and exchange for lll_cond_lock
- From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- To: libc-alpha at sourceware dot org, Torvald Riegel <triegel at redhat dot com>
- Date: Fri, 10 Oct 2014 08:43:43 -0300
- Subject: Re: [PATCH] [RFC] nptl: use compare and exchange for lll_cond_lock
- Authentication-results: sourceware.org; auth=none
- References: <5421B1C2 dot 9020509 at linux dot vnet dot ibm dot com> <1411668008 dot 22112 dot 67 dot camel at triegel dot csb> <5424B160 dot 3000306 at linux dot vnet dot ibm dot com>
On 25-09-2014 21:20, Adhemerval Zanella wrote:
> On 25-09-2014 15:00, Torvald Riegel wrote:
>> On Tue, 2014-09-23 at 14:45 -0300, Adhemerval Zanella wrote:
>>> While checking the generated code and macros used in generic lowlevellock.h,
>>> I noted powerpc and other arch uses uses a compare and swap instead of a plain
>>> exchange value on lll_cond_lock.
>>> I am not really sure which behavior would be desirable, since as far I could
>>> they will have both the same side effects (since lll_cond_lock, different
>>> from lll_lock, does not hold value of '1').
>> What do you mean by "[the function] does not hold value of '1'"?
> Bad wording in fact, I mean the 'futex' used in lll_cond_lock.
>
>>> So I am proposing this patch to sync default implementation for what mostly
>>> architectures (ia64, ppc, s390, sparc, x86, hppa) uses for lll_cond_lock. I see
>>> that only microblaze and sh (I am not sure about this one, I not well versed in
>>> its assembly and I'm being guided by its comment) used the atomic_exchange_acq
>>> instead.
>> I think both versions work from a correctness POV, but doing an
>> unconditional exchange should be the right thing to do.
>>
>> The default implementation of __lll_lock_wait will test if the futex
>> variable equals 2, and if not, do an exchange right away before running
>> the FUTEX_WAIT syscall. So if the CAS that you propose fails, the next
>> thing that will happen is an exchange. Thus, it seems that we should do
>> the exchange right away.
>>
>> Thoughts?
> The only 'advantage' I see on using the compare and exchange version is it might be
> an optimization on architectures that uses LL/SC instead of CAS instruction. For
> instance on POWER, the exchange version is translated to:
>
> li r9,2
> 1: lwarx 10,0,3,1
> stwcx. 9,0,3
> bne- 1b
> isync
>
> And for compare and exchange:
>
> li r10,2
> li r9,0
> 1: lwarx r8,r0,r3,1
> cmpw r8,r9
> bne 2f
> stwcx. r10,r0,r3
> bne- 1b
> 2: isync
>
> So for contend cases if the lock is taken it avoids the store (which for POWER8 is
> at least 10 cycles to more).
Does this analysis make sense? Also, I'm not sure which is better for x86_64 or other
architectures.
[1] Intel 64 and IA-32 Architectures
Optimization Reference Manual