This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] powerpc: remove linux lowlevellock.h

From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
To: libc-alpha at sourceware dot org
Date: Mon, 29 Sep 2014 16:13:13 -0300
Subject: Re: [PATCH] powerpc: remove linux lowlevellock.h
Authentication-results: sourceware.org; auth=none
References: <5421AF5E dot 8030402 at linux dot vnet dot ibm dot com> <1411554406 dot 27838 dot 93 dot camel at triegel dot csb> <5422B122 dot 8040803 at linux dot vnet dot ibm dot com> <1411565365 dot 27838 dot 101 dot camel at triegel dot csb> <5422CE1C dot 9030201 at linux dot vnet dot ibm dot com> <1411573079 dot 22112 dot 7 dot camel at triegel dot csb> <542486FD dot 9080009 at linux dot vnet dot ibm dot com>

On 25-09-2014 18:19, Adhemerval Zanella wrote:
> On 24-09-2014 12:37, Torvald Riegel wrote:
>> On Wed, 2014-09-24 at 10:58 -0300, Adhemerval Zanella wrote:
>>> On 24-09-2014 10:29, Torvald Riegel wrote:
>>>> On Wed, 2014-09-24 at 08:55 -0300, Adhemerval Zanella wrote:
>>>>> On 24-09-2014 07:26, Torvald Riegel wrote:
>>>>>> On Tue, 2014-09-23 at 14:35 -0300, Adhemerval Zanella wrote:
>>>>>>> diff --git a/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c b/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
>>>>>>> index 2b8c84d..4906205 100644
>>>>>>> --- a/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
>>>>>>> +++ b/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
>>>>>>> @@ -22,7 +22,7 @@
>>>>>>>  int
>>>>>>>  pthread_spin_unlock (pthread_spinlock_t *lock)
>>>>>>>  {
>>>>>>> -  __asm __volatile (__lll_rel_instr ::: "memory");
>>>>>>> +  __asm __volatile (__ARCH_REL_INSTR ::: "memory");
>>>>>>>    *lock = 0;
>>>>>>>    return 0;
>>>>>>>  }
>>>>>> I'd prefer if this would be a memory_order_release barrier (which we
>>>>>> don't have yet, I know, so bear with me).  However, I see that the
>>>>>> current situation is a bit messy:
>>>>>> * It seems that atomic_write_barrier is intended to be a C11
>>>>>> memory_order_release barrier/fence; at least it's used like that in all
>>>>>> what I saw in glibc.  So atomic_write_barrier would be our current way
>>>>>> to express a release barrier.
>>>>>> * However, PowerPC atomic_write_barrier is "eieio" and is not using
>>>>>> __ARCH_REL_INSTR.  Can you clarify the difference?
>>>>> These two pages [1] [2] have detailed information about C11/C++11 memory order
>>>>> mappings to PowerPC instructions.
>>>> I'm aware of these ...
>>>>
>>>>> As you can see the store release is indeed 'lwsync; st'.  First link also
>>>>> gives you an example of a spinlock implementation, however it is using a 
>>>>> full memory barrier (sync) instead of a write one (lwsync). It is better
>>>>> explained in second link:
>>>>>
>>>>> "If the critical section contains only normal cached memory operations, 
>>>>> the lwsync barrier may be substituted for the heavier-weight sync instruction
>>>>> above."
>>>>>
>>>>> Non-cached memory are handled by kernel and exported to application only in very
>>>>> specific situations (and afaik C11/C++11 specifications don't even handle it).
>>>>> The idea is a process won't try to run a spinlock in a DMA area. 
>>>>>
>>>>> Also the 'Synchronizes With' gives some explanation about 'eieio' and 'lwsync'
>>>>> relation. My understanding is 'lwsync' is the correct memory order instruction
>>>>> to use in such cases.
>>>>>
>>>>> [1] http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
>>>>> [2] http://www.rdrop.com/users/paulmck/scalability/paper/N2745r.2011.03.04a.html
>>>> ... but not of the difference between eieio and lwsync.  AFAICT, N2745
>>>> states that eieio really applies only to stores.
>>>>
>>>> What are your thoughts about changing atomic_write_barrier to eieio?
>>>> Are you aware of any prior reasoning about that (e.g., whether
>>>> atomic_write_barrier really only needs to prevent reordering of writes)?
>>>> Would there be a significant performance degradation?
>>> Reading again my last message I think I wasn't clear: my understanding of
>>> 'eieio' is to be a safe atomic_write_barrier for cacheable and cache inhibited
>>> memory. And it is preferred over 'sync' for performance reasons.
>>>
>>> Now, for cacheable memory (default memory allocated from kernel), a 'lwsync'
>>> is a safe atomic_write_barrier and it also performance better than 'eieio'
>>> (and AFAIK GCC won't emit 'eieio' for C11/C++11 atomic operations). That's why 
>>> I see no need to change it to 'eieio' in the pthread_spin_unlock.
>> Sorry, when I wrote "changing atomic_write_barrier to eieio" I actually
>> meant changing it to using lwsync :)
>> So, what are your thoughts about that?
> Right, I thought you were solely referring to pthread_spin_unlock.  About the
> spinlock itself, now I understand the reasoning and I do agree that about 
> replace the asm with a macro to emit a release barrier (which for POWER
> would be the same anyway).  However, I would like to address it in another
> patch.
>
> Now for atomic_write_barrier, as I understand, was added in GLIBC as 
> memory_order_seq_cst semantics.  However, as you have noted it is acting in 
> practice as release semantic; so am not sure about changing to 'lwsync' 
> I am checking with Steven Monroe and  Paul E. McKenney if I overlooked 
> something here and why exactly they chose 'eieio'.
>
>>>>>> I'm interested in having a release barrier there so that this won't get
>>>>>> overlooked once we do the actual transition to C11 atomics.
>>>>>>
>>>>>> Also, related to that, it would be ideal if we could use the generic
>>>>>> implementation (nptl/pthread_spin_unlock.c) on Power as well.  However,
>>>>>> this one uses atomic_full_barrier, which is stronger than what's used on
>>>>>> the major archs (x86, Power, ARM).
>>>> BTW, I've noticed that ARM uses the generic spin_lock, so ARM uses a
>>>> full barrier effectively.
>>> A full barrier do work on PowerPC, the specific optimization to use the atomic
>>> write 'lwsync' was added later.
>> Of course the full barrier would work as well.  My concern is about what
>> glibc should provide: current practice differs on major archs (x86 and
>> POWER) is weaker than what POSIX requires -- yet the right semantics in
>> my opinion.  To me, this indicates that this would work as default in
>> practice, and that the default implementation could use a release
>> barrier too.
> I do agree.

Torvald,

Besides the memory barrier discussion (which I think should be moved to another thread)
do you have any pending issues related to patch itself?

References:
- [PATCH] powerpc: remove linux lowlevellock.h
  - From: Adhemerval Zanella
- Re: [PATCH] powerpc: remove linux lowlevellock.h
  - From: Torvald Riegel
- Re: [PATCH] powerpc: remove linux lowlevellock.h
  - From: Adhemerval Zanella
- Re: [PATCH] powerpc: remove linux lowlevellock.h
  - From: Torvald Riegel
- Re: [PATCH] powerpc: remove linux lowlevellock.h
  - From: Adhemerval Zanella
- Re: [PATCH] powerpc: remove linux lowlevellock.h
  - From: Torvald Riegel
- Re: [PATCH] powerpc: remove linux lowlevellock.h
  - From: Adhemerval Zanella

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]