This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] powerpc: remove linux lowlevellock.h
- From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- To: libc-alpha at sourceware dot org
- Date: Thu, 25 Sep 2014 18:19:57 -0300
- Subject: Re: [PATCH] powerpc: remove linux lowlevellock.h
- Authentication-results: sourceware.org; auth=none
- References: <5421AF5E dot 8030402 at linux dot vnet dot ibm dot com> <1411554406 dot 27838 dot 93 dot camel at triegel dot csb> <5422B122 dot 8040803 at linux dot vnet dot ibm dot com> <1411565365 dot 27838 dot 101 dot camel at triegel dot csb> <5422CE1C dot 9030201 at linux dot vnet dot ibm dot com> <1411573079 dot 22112 dot 7 dot camel at triegel dot csb>
On 24-09-2014 12:37, Torvald Riegel wrote:
> On Wed, 2014-09-24 at 10:58 -0300, Adhemerval Zanella wrote:
>> On 24-09-2014 10:29, Torvald Riegel wrote:
>>> On Wed, 2014-09-24 at 08:55 -0300, Adhemerval Zanella wrote:
>>>> On 24-09-2014 07:26, Torvald Riegel wrote:
>>>>> On Tue, 2014-09-23 at 14:35 -0300, Adhemerval Zanella wrote:
>>>>>> diff --git a/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c b/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
>>>>>> index 2b8c84d..4906205 100644
>>>>>> --- a/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
>>>>>> +++ b/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
>>>>>> @@ -22,7 +22,7 @@
>>>>>> pthread_spin_unlock (pthread_spinlock_t *lock)
>>>>>> - __asm __volatile (__lll_rel_instr ::: "memory");
>>>>>> + __asm __volatile (__ARCH_REL_INSTR ::: "memory");
>>>>>> *lock = 0;
>>>>>> return 0;
>>>>> I'd prefer if this would be a memory_order_release barrier (which we
>>>>> don't have yet, I know, so bear with me). However, I see that the
>>>>> current situation is a bit messy:
>>>>> * It seems that atomic_write_barrier is intended to be a C11
>>>>> memory_order_release barrier/fence; at least it's used like that in all
>>>>> what I saw in glibc. So atomic_write_barrier would be our current way
>>>>> to express a release barrier.
>>>>> * However, PowerPC atomic_write_barrier is "eieio" and is not using
>>>>> __ARCH_REL_INSTR. Can you clarify the difference?
>>>> These two pages   have detailed information about C11/C++11 memory order
>>>> mappings to PowerPC instructions.
>>> I'm aware of these ...
>>>> As you can see the store release is indeed 'lwsync; st'. First link also
>>>> gives you an example of a spinlock implementation, however it is using a
>>>> full memory barrier (sync) instead of a write one (lwsync). It is better
>>>> explained in second link:
>>>> "If the critical section contains only normal cached memory operations,
>>>> the lwsync barrier may be substituted for the heavier-weight sync instruction
>>>> Non-cached memory are handled by kernel and exported to application only in very
>>>> specific situations (and afaik C11/C++11 specifications don't even handle it).
>>>> The idea is a process won't try to run a spinlock in a DMA area.
>>>> Also the 'Synchronizes With' gives some explanation about 'eieio' and 'lwsync'
>>>> relation. My understanding is 'lwsync' is the correct memory order instruction
>>>> to use in such cases.
>>>>  http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
>>>>  http://www.rdrop.com/users/paulmck/scalability/paper/N2745r.2011.03.04a.html
>>> ... but not of the difference between eieio and lwsync. AFAICT, N2745
>>> states that eieio really applies only to stores.
>>> What are your thoughts about changing atomic_write_barrier to eieio?
>>> Are you aware of any prior reasoning about that (e.g., whether
>>> atomic_write_barrier really only needs to prevent reordering of writes)?
>>> Would there be a significant performance degradation?
>> Reading again my last message I think I wasn't clear: my understanding of
>> 'eieio' is to be a safe atomic_write_barrier for cacheable and cache inhibited
>> memory. And it is preferred over 'sync' for performance reasons.
>> Now, for cacheable memory (default memory allocated from kernel), a 'lwsync'
>> is a safe atomic_write_barrier and it also performance better than 'eieio'
>> (and AFAIK GCC won't emit 'eieio' for C11/C++11 atomic operations). That's why
>> I see no need to change it to 'eieio' in the pthread_spin_unlock.
> Sorry, when I wrote "changing atomic_write_barrier to eieio" I actually
> meant changing it to using lwsync :)
> So, what are your thoughts about that?
Right, I thought you were solely referring to pthread_spin_unlock. About the
spinlock itself, now I understand the reasoning and I do agree that about
replace the asm with a macro to emit a release barrier (which for POWER
would be the same anyway). However, I would like to address it in another
Now for atomic_write_barrier, as I understand, was added in GLIBC as
memory_order_seq_cst semantics. However, as you have noted it is acting in
practice as release semantic; so am not sure about changing to 'lwsync'
I am checking with Steven Monroe and Paul E. McKenney if I overlooked
something here and why exactly they chose 'eieio'.
>>>>> I'm interested in having a release barrier there so that this won't get
>>>>> overlooked once we do the actual transition to C11 atomics.
>>>>> Also, related to that, it would be ideal if we could use the generic
>>>>> implementation (nptl/pthread_spin_unlock.c) on Power as well. However,
>>>>> this one uses atomic_full_barrier, which is stronger than what's used on
>>>>> the major archs (x86, Power, ARM).
>>> BTW, I've noticed that ARM uses the generic spin_lock, so ARM uses a
>>> full barrier effectively.
>> A full barrier do work on PowerPC, the specific optimization to use the atomic
>> write 'lwsync' was added later.
> Of course the full barrier would work as well. My concern is about what
> glibc should provide: current practice differs on major archs (x86 and
> POWER) is weaker than what POSIX requires -- yet the right semantics in
> my opinion. To me, this indicates that this would work as default in
> practice, and that the default implementation could use a release
> barrier too.
I do agree.
>>>>> I strongly believe this should use a release barrier, despite what POSIX
>>>>> is claiming (see #16432), but we'll have to deal with that separately.
>>>> I believe this is exactly why we have a specific POWER implementation: atomic
>>>> full barrier in powerpc are 'sync' and for this spinlock cases 'lwsync' is suffice.
>>> But it shouldn't be POWER-specific, IMO. Instead, the generic
>>> implementation should have a release barrier only (so, translating to
>>> lwsync on power), and all archs negatively affected by this should
>>> update their release barriers to be proper release barriers.
>> I agree with this definition. Do you have any updated on discussion you raise in
>> comment #1 in BZ#16432?
> No I haven't. I'll think about reviving the discussion.