[PATCH 2/2] S390: Test if lock is free before using atomic instruction in spin-lock.

Fri Dec 16 17:08:00 GMT 2016

On 12/09/2016 04:16 PM, Stefan Liebler wrote:
> On 11/24/2016 02:04 PM, Torvald Riegel wrote:
>> On Wed, 2016-11-23 at 16:09 +0100, Stefan Liebler wrote:
>>> This patch changes the behaviour of pthread_spin_lock on s390:
>>> Instead of spinning on the lock with compare-and-swap instruction,
>>> the atomic_compare_and_exchange_bool_acq macro is used.
>>> The s390 atomic_compare_and_exchange_bool_acq macro called with constant
>>> zero as oldval first compares the lock with normal instructions.  If
>>> it is
>>> free the compare-and-swap instruction is used to aquire the lock.  While
>>> the lock is held by one cpu another cpu will not exclusively lock the
>>> memory of the lock in a loop.  If the lock is unlocked by another cpu
>>> it is observed by the normal instruction and the lock is acquired
>>> with a compare-and-swap instruction.
>>
>> I don't want to have more arch-specific code than we really need.
>> Please compare what you have against the generic spinlock code.  If you
>> find the generic spinlock code lacking, then please propose a change for
>> it in a way that does not make things worse for other architectures.
>>
>> If there are arch-specific properties that significantly affect the
>> approach the generic spinlock takes (eg, assumptions about CAS vs
>> atomic-exchange runtime overheads), then please split them out.
>>
>> In the long term, this will also benefit s390.  For example, the
>> spinlock code you have has no backoff, and introduces spinning with
>> loads in a pretty ugly way through the hack you have added to the CAS
>> (first loading the lock's value and comparing it manually if the
>> supplied argument is a constant).
>> While the generic spinlock code we have is also not very great,
>> improving it is what will make life easier for the whole glibc project.
>> Also, if someone else improves the generic spinlock code, s390 would
>> miss out on this if you have added your custom spinlock code.
>>
>
> I had a look into the assembler output of generic spinlock code, e.g:
> int
> pthread_spin_trylock (pthread_spinlock_t *lock)
> {
>   return atomic_exchange_acq (lock, 1) ? EBUSY : 0;
> }
>
> On s390x assembler output, a new stack-frame is generated, the lock
> value is loaded, stored to stack, loaded from stack and then passed to
> cs-instruction.
> After cs-instruction, the "old-value" is stored to stack, loaded from
> stack and then compared to zero.
>
> I also had a look into the aarch64 pthread_spin_trylock.os compiled with
> build-many-script and a gcc 6.2.
> aarch64 is using the __atomic-builtins for atomic_exchange_acq,
> atomic_compare_and_exchange_val_acq, ... .
> The atomic_exchange_acq results in the exclusive load/store. Then the
> old-value is also stored to stack and is immediately loaded back in
> order to compare it against zero.
>
> The type pthread_spinlock_t is a volatile int!
> If lock is casted to (int *) before passing it to the atomic macros,
> the assembler output looks okay.
>
> If I change the current generic spinlock code, do I have to rewrite it
> to the c11-like macros?
>
> I've tested the following function in advance:
> int
> foo (pthread_spinlock_t *lock)
> {
>   return atomic_load_acquire (lock);
> }
>
> With the __atomic_load_n version of atomic_load_acquire, the assembler
> output contains a load which is returned as expected.
>
> The non-c11-macro results in the following:
>    0:   58 10 20 00             l       %r1,0(%r2)
>    4:   b3 c1 00 0f             ldgr    %f0,%r15
>    8:   e3 f0 ff 58 ff 71       lay     %r15,-168(%r15)
>    e:   50 10 f0 a4             st      %r1,164(%r15)
>   12:   58 10 f0 a4             l       %r1,164(%r15)
>   16:   50 10 f0 a0             st      %r1,160(%r15)
>   1a:   58 20 f0 a0             l       %r2,160(%r15)
>   1e:   b3 cd 00 f0             lgdr    %r15,%f0
>   22:   b9 14 00 22             lgfr    %r2,%r2
>   26:   07 fe                   br      %r14
> The extra stores/loads to/from stack result from the "__typeof (*(mem))
> __atg101_val" usages in atomic_load_* macros in conjunction with volatile.
>
> As this behaviour is not s390 specific, I've grep'ed to see which arches
> use the generic spin-lock code and the c11-like-atomic-macros:
> sysdeps/hppa/nptl/pthread_spin_lock.c:18:#define
> SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
> sysdeps/unix/sysv/linux/hppa/atomic-machine.h:40:#define
> USE_ATOMIC_COMPILER_BUILTINS 0
>
> sysdeps/microblaze/nptl/pthread_spin_lock.c:19:#define
> SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
> sysdeps/microblaze/atomic-machine.h:39:#define
> USE_ATOMIC_COMPILER_BUILTINS 0
>
> sysdeps/aarch64/nptl/pthread_spin_lock.c:19:#define
> SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
> sysdeps/aarch64/atomic-machine.h:40:#define USE_ATOMIC_COMPILER_BUILTINS 1
>
> sysdeps/mips/nptl/pthread_spin_lock.c:18:#define
> SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
> sysdeps/mips/atomic-machine.h:95:#define USE_ATOMIC_COMPILER_BUILTINS 1
> sysdeps/mips/atomic-machine.h:216:#define USE_ATOMIC_COMPILER_BUILTINS 0
>
> sysdeps/nios2/nptl/pthread_spin_lock.c:19:#define
> SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
> sysdeps/unix/sysv/linux/nios2/atomic-machine.h:35:#define
> USE_ATOMIC_COMPILER_BUILTINS 0
>
> sysdeps/arm/nptl/pthread_spin_lock.c:18:#define
> SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
> sysdeps/arm/atomic-machine.h:37:#define USE_ATOMIC_COMPILER_BUILTINS 0
>
> sysdeps/m68k/nptl/pthread_spin_lock.c:19:#define
> SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
> sysdeps/m68k/m680x0/m68020/atomic-machine.h:48:#define
> USE_ATOMIC_COMPILER_BUILTINS 0
> sysdeps/m68k/coldfire/atomic-machine.h:54:#define
> USE_ATOMIC_COMPILER_BUILTINS 0
> sysdeps/unix/sysv/linux/m68k/coldfire/atomic-machine.h:40:#define
> USE_ATOMIC_COMPILER_BUILTINS 0
>
> What is your suggestion, how to proceed with the volatile int type in
> conjunction with the atomic-macros?
>
> Bye Stefan
>
This patch is not needed anymore as I've posted an adjusted generic 
spinlock code:
[PATCH 1/2] Optimize generic spinlock code and use C11 like atomic macros.
https://www.sourceware.org/ml/libc-alpha/2016-12/msg00617.html

[PATCH 2/2] S390: Use generic spinlock code.
https://www.sourceware.org/ml/libc-alpha/2016-12/msg00618.html

Please have a look.

Bye.
Stefan