This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Future atomic operation cleanup

From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
To: libc-alpha at sourceware dot org
Date: Fri, 29 Aug 2014 10:32:15 -0300
Subject: Re: Future atomic operation cleanup
Authentication-results: sourceware.org; auth=none
References: <53F74A93 dot 30508 at linux dot vnet dot ibm dot com> <53F74CE4 dot 5070809 at twiddle dot net> <53FF3551 dot 8020503 at linux dot vnet dot ibm dot com> <1409240266 dot 6045 dot 62 dot camel at triegel dot csb> <53FF5889 dot 5080006 at linux dot vnet dot ibm dot com> <1409306115 dot 6045 dot 71 dot camel at triegel dot csb> <54005FAA dot 9000802 at linux dot vnet dot ibm dot com> <1409318874 dot 6045 dot 73 dot camel at triegel dot csb>

On 29-08-2014 10:27, Torvald Riegel wrote:
> On Fri, 2014-08-29 at 08:10 -0300, Adhemerval Zanella wrote:
>> On 29-08-2014 06:55, Torvald Riegel wrote:
>>> On Thu, 2014-08-28 at 13:27 -0300, Adhemerval Zanella wrote:
>>>> On 28-08-2014 12:37, Torvald Riegel wrote:
>>>>> On Thu, 2014-08-28 at 10:57 -0300, Adhemerval Zanella wrote:
>>>>>> On 22-08-2014 11:00, Richard Henderson wrote:
>>>>>>> On 08/22/2014 06:50 AM, Adhemerval Zanella wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Following comments from my first patch to optimize single-thread internal
>>>>>>>> glibc locking/atomics [1], I have changed the implementation to use now
>>>>>>>> relaxed atomics instead.  Addresing the concerns raised in last discussion, 
>>>>>>>> the primitives are still signal-safe (although not thread-safe), so if future
>>>>>>>> malloc implementation is changed to be async-safe, it won't require to a
>>>>>>>> adjust powerpc atomics.
>>>>>>>>
>>>>>>>> For catomic_and and catomic_or I follow the definition at 'include/atomic.h'
>>>>>>>> (which powerpc is currently using) and implemented the atomics with acquire
>>>>>>>> semantics.  The new implementation also is simpler.
>>>>>>>>
>>>>>>>> On synthetic benchmarks it shows an improvement of 5-10% for malloc
>>>>>>>> calls and an performance increase of 7-8% in 483.xalancbmk from
>>>>>>>> speccpu2006 (number from an POWER8 machine).
>>>>>>>>
>>>>>>>> Checked on powerpc64, powerpc32 and powerpc64le.
>>>>>>> Wow, that's a lot of boiler-plate.
>>>>>>>
>>>>>>> When development opens again, can we simplify all of these atomic operations by
>>>>>>> assuming compiler primitives?  That is, use the __atomic builtins for gcc 4.8
>>>>>>> and later, fall back to the __sync builtins for earlier gcc, and completely
>>>>>>> drop support for truly ancient compilers that support neither.
>>>>>>>
>>>>>>> As a bonus we'd get to unify the implementations across all of the targets.
>>>>>>>
>>>>>>>
>>>>>>> r~
>>>>>>>
>>>>>> I also agree we should move to more a unified implementation (in fact, I
>>>>>> plan to get rid of powerpc lowlevellock.h when devel opens again).  However
>>>>>> I really don't want to either wait or reimplement all the custom atomic to push
>>>>>> this optimization... 
>>>>> I believe that, unless the caveat Joseph mentioned actually applies to
>>>>> the archs you're concerned about, using the compiler builtins and doing
>>>>> the unification would simplify your patch considerably.  It would also
>>>>> avoid having to iterate over the changes of your patch again once we do
>>>>> all of the unification.
>>>>>
>>>>>> I think such change will require a lot of iteration and testing, which is not
>>>>>> the intend of this patch.
>>>>> If we move to C11-like atomics, then those will certainly go in
>>>>> incrementally, and exist in parallel with the current atomics for a
>>>>> while (until we reviewed all existing code using the old atomics and
>>>>> moved it over to the new atomics).
>>>>>
>>>>> If we use the compiler builtins, we'd also have to test less because we
>>>>> have no additional custom atomics implementation we need to maintain.
>>>>>
>>>> I do agree moving to a compiler specific is the best way, although for current
>>>> status where we support GCC 4.4 we still need hand crafted assembly for atomics.
>>>> Also, the GCC atomics are only support for 4.7+ afaik [1] and C11 for 4.9.
>>> Note that I'm not arguing for using C11 at this point, but using atomics
>>> that are very much like the atomic_*_explicit operations.  The function
>>> names can be different, but functionality, parameter ordering, etc.,
>>> would be similar.
>>>
>>>> We can make what Joseph suggested and move minimum compiler required for GCC 4.7.
>>>>
>>>> But if we intend to actually add mininum GCC 4.7, I can work on changing all the
>>>> atomics to GCC builtins.
>>> Another option would be to only enable the new atomics that we don't
>>> have right now (e.g., memory_order_relaxed) when building with 4.7 or
>>> higher, and falling back to some of the custom assembly ones otherwise.
>>> This would mean no performance regressions but the optimizations would
>>> only be effective for 4.7 or higher.  memory_order_relaxed loads/stores
>>> could be an exception, because I think we need to make them explicity,
>>> and we don't want to slow down code that now uses plain nonatomic
>>> loads/stores in cases where it should actually be memory_order_relaxed
>>> accesses.
>>>
>>>
>> I don't like this conditional enable because it will require more logic
>> to check for compiler version and add different code generation depending
>> of this. If the idea is to simplify and cleanup code I think it is better
>> to either add custom arch-specific code that will build regardless of
>> compiler or use compiler builtins and get rid off hand-craft assembly.
>> I see no point in enabling both.
> *If* we can make 4.7 a requirement, then I'm certainly all for using
> just the builtins on architectures where they are available and get
> inlined.
>
>
And that is exactly what I am asking: are we willing to raise GCC requirements
for 2.21?  If we will, I see no issue on rework the patch to use GCC builtins.
If we won't, I see to no point in trying to adjust this patch to use conditionally
the GCC builtins for GCC 4.7+.

Follow-Ups:
- Re: Future atomic operation cleanup
  - From: Chris Metcalf
- Re: Future atomic operation cleanup
  - From: Roland McGrath

References:
- [PATCH v2] PowerPC: libc single-thread lock optimization
  - From: Adhemerval Zanella
- Future atomic operation cleanup
  - From: Richard Henderson
- Re: Future atomic operation cleanup
  - From: Adhemerval Zanella
- Re: Future atomic operation cleanup
  - From: Torvald Riegel
- Re: Future atomic operation cleanup
  - From: Adhemerval Zanella
- Re: Future atomic operation cleanup
  - From: Torvald Riegel
- Re: Future atomic operation cleanup
  - From: Adhemerval Zanella
- Re: Future atomic operation cleanup
  - From: Torvald Riegel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]