This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 0/3] Improve ARM atomic performance for malloc
- From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- To: libc-alpha at sourceware dot org
- Date: Tue, 07 Oct 2014 13:28:53 -0300
- Subject: Re: [PATCH 0/3] Improve ARM atomic performance for malloc
- Authentication-results: sourceware.org; auth=none
- References: <1412349086-11473-1-git-send-email-will dot newton at linaro dot org> <Pine dot LNX dot 4 dot 64 dot 1410031625330 dot 14538 at digraph dot polyomino dot org dot uk> <1412602305 dot 30642 dot 47 dot camel at triegel dot csb> <CANu=DmhPzzN_YKTE9z7mkoAN=zjS-MLRbL93B6C1Y43E4CZg2A at mail dot gmail dot com> <1412691494 dot 30642 dot 104 dot camel at triegel dot csb>
On 07-10-2014 11:18, Torvald Riegel wrote:
> On Mon, 2014-10-06 at 14:55 +0100, Will Newton wrote:
>> On 6 October 2014 14:31, Torvald Riegel <triegel@redhat.com> wrote:
>>> On Fri, 2014-10-03 at 16:27 +0000, Joseph S. Myers wrote:
>>>> On Fri, 3 Oct 2014, Will Newton wrote:
>>>>
>>>>> The resulting atomic.h is hopefully somewhere close to a generic
>>>>> implementation based on the gcc intrinsics so could potentially
>>>>> be used as a base for a generic header.
>>>> That suggests to me that the starting point should be setting up a generic
>>>> header that can be used for multiple architectures and making the ARM
>>>> header inherit from it in the case where the relevant compiler support is
>>>> available, rather than putting all this generic code in an ARM header.
>>>> (And in turn, the starting point in the generic header could be the
>>>> particular operations for which more or less generic code already exists
>>>> in the ARM header, with other operations added to it later.)
>>> I agree.
>>>
>>> In addition, I think that the best step to do this would be when we
>>> switch to C11-like atomics because with this switch, this falls out kind
>>> of naturally.
>>>
>>> Will, have you looked at my suggestion and the POC patch I posted for
>>> how C11-like atomics could look like? I won't get to continue to work
>>> on this topic this week, but it's still on my agenda.
>> It's interesting, and long term seems like the best way of doing
>> things. However I do not see any viable chance of that work being
>> completed for 2.21. Do you have a timescale in mind? It seems we would
>> need to convert all uses of the atomic API and all the architecture
>> ports.
> I think 2.21 may be fully doable for at least a subset of this. As I
> outlined in my other email where I proposed the transition, we indeed do
> have a big first step in that we need for all architectures to provide
> C11-like atomics. I've already scanned through existing code, and I
> haven't seen any big issues wrt. that: x86 is clear, ARM already uses
> GCC builtins, for PowerPC we have a clear mapping from C11 to HW
> instructions. Many of the "smaller" archs just have simple ops, so
> there's less specific stuff to do.
I am in favor of support this transition, since I do also also to push the very
single-thread optimization for PPC. What do you have in mind for the subset of
this besides your initial approach some weeks ago?
I will try to summarize the topics raised in your thread "Transition to C11
atomics and memory model" on a wiki entry for 2.21 [1].
[1] https://sourceware.org/glibc/wiki/Release/2.21
>
> The C11-like atomics would then coexist for a while with the old-style
> atomics. We can then move one piece of concurrent code (ie, a cluster
> of functions that's complete in terms of including all functions that
> another function in the cluster synchronizes with) to C11-style atomics
> at a time. There's no hurry to do this before 2.21, although I already
> spotted a few things that are likely bugs (and they do affect ARM and
> Power).
>
> What I do need though is consensus from the community that the move
> towards C11 is fine, and feedback on any patches for that.
I think we can actually use the malloc code to exact this experiment, since now
both ARM and PPC wants to add the same single-thread optimization.