This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: PowerPC: libc single-thread lock optimization
- From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- To: libc-alpha at sourceware dot org
- Date: Fri, 02 May 2014 10:27:27 -0300
- Subject: Re: PowerPC: libc single-thread lock optimization
- Authentication-results: sourceware.org; auth=none
- References: <5343F8F1 dot 4000400 at linux dot vnet dot ibm dot com> <1399036287 dot 32485 dot 6041 dot camel at triegel dot csb>
Hi Torvald, thanks for the review. I have dropped this patch and sent another one that focus
optimization only in lowlevellock.h instead of changing the atomic.h:
https://sourceware.org/ml/libc-alpha/2014-04/msg00671.html
https://sourceware.org/ml/libc-alpha/2014-04/msg00672.html
On 02-05-2014 10:11, Torvald Riegel wrote:
> On Tue, 2014-04-08 at 10:26 -0300, Adhemerval Zanella wrote:
>> This patch adds a single-thread optimization for libc.so locks used
>> within the shared objects. For each lock operations it checks it the
>> process has already spawned one thread and if not use non-atomic
>> operations. Other libraries (libpthread.so for instance) are unaffected
>> by this change.
>>
>> This is similar to x86_64 optimization on locks and atomics by using the
>> __libc_multiple_threads variable.
>>
>> Tested on powerpc32, powerpc64, and powerp64le.
>>
>> Note: for macro code change I tried to change as little as possible the
>> current syntax.
>>
>> --
>>
>> * nptl/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h
>> (__lll_robust_trylock): Add single-thread lock optimization for calls
>> within libc.so.
> See comment below.
>
>> * sysdeps/powerpc/bits/atomic.h
>> (__arch_compare_and_exchange_val_32_acq): Likewise.
>> (__arch_compare_and_exchange_val_32_rel): Likewise.
>> (__arch_atomic_exchange_32_acq): Likewise.
>> (__arch_atomic_exchange_32_rel): Likewise.
>> (__arch_atomic_exchange_and_add_32): Likewise.
>> (__arch_atomic_increment_val_32): Likewise.
>> (__arch_atomic_decrement_val_32): Likewise.
>> (__arch_atomic_decrement_if_positive_32): Likewise.
>> * sysdeps/powerpc/powerpc32/bits/atomic.h
>> (__arch_compare_and_exchange_bool_32_acq): Likewise.
>> (__arch_compare_and_exchange_bool_32_rel): Likewise.
>> * sysdeps/powerpc/powerpc64/bits/atomic.h
>> (__arch_compare_and_exchange_bool_32_acq): Likewise.
>> (__arch_compare_and_exchange_bool_32_rel): Likewise.
>> (__arch_compare_and_exchange_bool_64_acq): Likewise.
>> (__arch_compare_and_exchange_bool_64_rel): Likewise.
>> (__arch_compare_and_exchange_val_64_acq): Likewise.
>> (__arch_compare_and_exchange_val_64_rel): Likewise.
>> (__arch_atomic_exchange_64_acq): Likewise.
>> (__arch_atomic_exchange_64_rel): Likewise.
>> (__arch_atomic_exchange_and_add_64): Likewise.
>> (__arch_atomic_increment_val_64): Likewise.
>> (__arch_atomic_decrement_val_64): Likewise.
>> (__arch_atomic_decrement_if_positive_64): Likewise.
> I think the optimization you attempt here does not belong into the
> low-level atomics, as I've argued elsewhere in the thread.
>
>> ---
>>
>> diff --git a/nptl/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h b/nptl/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h
>> index ab92c3f..419ee2f 100644
>> --- a/nptl/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h
>> +++ b/nptl/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h
>> @@ -205,7 +205,9 @@
>> /* Set *futex to ID if it is 0, atomically. Returns the old value */
>> #define __lll_robust_trylock(futex, id) \
>> ({ int __val; \
>> - __asm __volatile ("1: lwarx %0,0,%2" MUTEX_HINT_ACQ "\n" \
>> + if (!SINGLE_THREAD_P) \
>> + __asm __volatile ( \
>> + "1: lwarx %0,0,%2" MUTEX_HINT_ACQ "\n" \
>> " cmpwi 0,%0,0\n" \
>> " bne 2f\n" \
>> " stwcx. %3,0,%2\n" \
>> @@ -214,6 +216,12 @@
>> : "=&r" (__val), "=m" (*futex) \
>> : "r" (futex), "r" (id), "m" (*futex) \
>> : "cr0", "memory"); \
>> + else \
>> + { \
>> + __val = *futex; \
>> + if (__val == 0) \
>> + *futex = id; \
>> + } \
>> __val; \
>> })
>>
> The single client of this is in pthread_mutex_trylock.c. If there's
> actually a need to do this, it belongs here, not in each arch's
> lowlevellock.h.
>
>