This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [RFC] nptl: use compare and exchange for lll_cond_lock


On 25-09-2014 21:20, Adhemerval Zanella wrote:
> On 25-09-2014 15:00, Torvald Riegel wrote:
>> On Tue, 2014-09-23 at 14:45 -0300, Adhemerval Zanella wrote:
>>> While checking the generated code and macros used in generic lowlevellock.h,
>>> I noted powerpc and other arch uses uses a compare and swap instead of a plain
>>> exchange value on lll_cond_lock.
>>> I am not really sure which behavior would be desirable, since as far I could
>>> they will have both the same side effects (since lll_cond_lock, different
>>> from lll_lock, does not hold value of '1').
>> What do you mean by "[the function] does not hold value of '1'"?
> Bad wording in fact, I mean the 'futex' used in lll_cond_lock.
>
>>> So I am proposing this patch to sync default implementation for what mostly
>>> architectures (ia64, ppc, s390, sparc, x86, hppa) uses for lll_cond_lock.  I see
>>> that only microblaze and sh (I am not sure about this one, I not well versed in
>>> its assembly and I'm being guided by its comment) used the atomic_exchange_acq
>>> instead.
>> I think both versions work from a correctness POV, but doing an
>> unconditional exchange should be the right thing to do.
>>
>> The default implementation of __lll_lock_wait will test if the futex
>> variable equals 2, and if not, do an exchange right away before running
>> the FUTEX_WAIT syscall.  So if the CAS that you propose fails, the next
>> thing that will happen is an exchange.  Thus, it seems that we should do
>> the exchange right away.
>>
>> Thoughts?
> The only 'advantage' I see on using the compare and exchange version is it might be
> an optimization on architectures that uses LL/SC instead of CAS instruction.  For
> instance on POWER, the exchange version is translated to:
>
> 	li 	r9,2
>  1:     lwarx   10,0,3,1
>         stwcx.  9,0,3
>         bne-    1b
>         isync
>
> And for compare and exchange:
>
> 	li	r10,2
> 	li 	r9,0
> 1:      lwarx   r8,r0,r3,1
>         cmpw    r8,r9
> 	bne     2f
> 	stwcx.  r10,r0,r3
> 	bne-    1b
> 2:      isync
>
> So for contend cases if the lock is taken it avoids the store (which for POWER8 is
> at least 10 cycles to more).

Does this analysis make sense? Also, I'm not sure which is better for x86_64 or other
architectures.

[1] Intel 64 and IA-32 Architectures
 Optimization Reference Manual


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]