This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 0/3] Improve ARM atomic performance for malloc
- From: Torvald Riegel <triegel at redhat dot com>
- To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Wed, 08 Oct 2014 14:28:13 +0200
- Subject: Re: [PATCH 0/3] Improve ARM atomic performance for malloc
- Authentication-results: sourceware.org; auth=none
- References: <1412349086-11473-1-git-send-email-will dot newton at linaro dot org> <Pine dot LNX dot 4 dot 64 dot 1410031625330 dot 14538 at digraph dot polyomino dot org dot uk> <1412602305 dot 30642 dot 47 dot camel at triegel dot csb> <CANu=DmhPzzN_YKTE9z7mkoAN=zjS-MLRbL93B6C1Y43E4CZg2A at mail dot gmail dot com> <1412691494 dot 30642 dot 104 dot camel at triegel dot csb> <543414C5 dot 6080704 at linux dot vnet dot ibm dot com>
On Tue, 2014-10-07 at 13:28 -0300, Adhemerval Zanella wrote:
> On 07-10-2014 11:18, Torvald Riegel wrote:
> > On Mon, 2014-10-06 at 14:55 +0100, Will Newton wrote:
> >> On 6 October 2014 14:31, Torvald Riegel <triegel@redhat.com> wrote:
> >>> On Fri, 2014-10-03 at 16:27 +0000, Joseph S. Myers wrote:
> >>>> On Fri, 3 Oct 2014, Will Newton wrote:
> >>>>
> >>>>> The resulting atomic.h is hopefully somewhere close to a generic
> >>>>> implementation based on the gcc intrinsics so could potentially
> >>>>> be used as a base for a generic header.
> >>>> That suggests to me that the starting point should be setting up a generic
> >>>> header that can be used for multiple architectures and making the ARM
> >>>> header inherit from it in the case where the relevant compiler support is
> >>>> available, rather than putting all this generic code in an ARM header.
> >>>> (And in turn, the starting point in the generic header could be the
> >>>> particular operations for which more or less generic code already exists
> >>>> in the ARM header, with other operations added to it later.)
> >>> I agree.
> >>>
> >>> In addition, I think that the best step to do this would be when we
> >>> switch to C11-like atomics because with this switch, this falls out kind
> >>> of naturally.
> >>>
> >>> Will, have you looked at my suggestion and the POC patch I posted for
> >>> how C11-like atomics could look like? I won't get to continue to work
> >>> on this topic this week, but it's still on my agenda.
> >> It's interesting, and long term seems like the best way of doing
> >> things. However I do not see any viable chance of that work being
> >> completed for 2.21. Do you have a timescale in mind? It seems we would
> >> need to convert all uses of the atomic API and all the architecture
> >> ports.
> > I think 2.21 may be fully doable for at least a subset of this. As I
> > outlined in my other email where I proposed the transition, we indeed do
> > have a big first step in that we need for all architectures to provide
> > C11-like atomics. I've already scanned through existing code, and I
> > haven't seen any big issues wrt. that: x86 is clear, ARM already uses
> > GCC builtins, for PowerPC we have a clear mapping from C11 to HW
> > instructions. Many of the "smaller" archs just have simple ops, so
> > there's less specific stuff to do.
>
> I am in favor of support this transition, since I do also also to push the very
> single-thread optimization for PPC. What do you have in mind for the subset of
> this besides your initial approach some weeks ago?
The initial step that provides C11-like atomics is of course part of the
subset I mentioned. Depending on how much time we have until 2.21, we
could also do:
* Move over pthread_once implementation. I've reviewed (and changed)
this code already and so I'm confident how it would look like with
C11-like atomics.
* While browsing through the code bases for atomic* uses, I saw a couple
of cases that looked like broken implementations of pthread_once-like
functionality. acquire barriers where missing in a few cases. Moving
these over would also be a good step.
Other pieces of concurrent code might take more time, so I can't say
yet. If the *intended* synchronization isn't rather obvious or
documented (which is, sadly, often the case), then review will take more
time.
If any of you are familiar with a piece of code (ie, know about the
intended synchronization), I can also help you move that code over to
C11-like synchronization.
> I will try to summarize the topics raised in your thread "Transition to C11
> atomics and memory model" on a wiki entry for 2.21 [1].
>
> [1] https://sourceware.org/glibc/wiki/Release/2.21
Thanks! I don't see it there yet, but let me know when you have
something for review.
>
> >
> > The C11-like atomics would then coexist for a while with the old-style
> > atomics. We can then move one piece of concurrent code (ie, a cluster
> > of functions that's complete in terms of including all functions that
> > another function in the cluster synchronizes with) to C11-style atomics
> > at a time. There's no hurry to do this before 2.21, although I already
> > spotted a few things that are likely bugs (and they do affect ARM and
> > Power).
> >
> > What I do need though is consensus from the community that the move
> > towards C11 is fine, and feedback on any patches for that.
>
>
> I think we can actually use the malloc code to exact this experiment, since now
> both ARM and PPC wants to add the same single-thread optimization.
Okay. Then I'll leave it to you two to get some more insight in what we
could do about catomic*.