This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 3/3] sysdeps/arm/bits/atomic.h: Use relaxed atomics for catomic_*
- From: Will Newton <will dot newton at linaro dot org>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: libc-alpha <libc-alpha at sourceware dot org>
- Date: Mon, 6 Oct 2014 15:13:36 +0100
- Subject: Re: [PATCH 3/3] sysdeps/arm/bits/atomic.h: Use relaxed atomics for catomic_*
- Authentication-results: sourceware.org; auth=none
- References: <1412349086-11473-1-git-send-email-will dot newton at linaro dot org> <1412349086-11473-4-git-send-email-will dot newton at linaro dot org> <1412603012 dot 30642 dot 55 dot camel at triegel dot csb>
On 6 October 2014 14:43, Torvald Riegel <triegel@redhat.com> wrote:
> On Fri, 2014-10-03 at 16:11 +0100, Will Newton wrote:
>> Using the relaxed memory model for atomics when single-threaded allows
>> a reduction in the number of barriers (dmb) executed and an improvement in
>> single thread performance on the malloc benchtest:
>
> I'm aware that we do have catomic* functions and they are being used.
> However, I'm wondering whether they are the right tool for what we want
> to achieve.
They are kind of ugly and rather undocumented as to their precise
semantics, so I share your general concerns about these functions.
> They simply allow to avoid some of the overhead of atomics (but not
> necessarily all). Wouldn't it be better to change the calling code to
> contain optimized paths for single-threaded execution?
How would you suggest implementing that? My first instinct is that the
result would look a lot like what we have now, i.e. some kind of
wrapper round atomic functions that special-cases the single-threaded
case.
malloc is the main performance critical subsystem using catomic, so it
may be possible to do more of this work there and reduce the
complexity of the generic atomic implementation (although I believe an
earlier patch did do this to some extent but was rejected).
> Also, calling code could either be reentrant or not. For the former,
> you could even avoid actual atomic accesses instead of just avoiding the
> barriers. Also, the compiler could perhaps generate more efficient code
> if it doesn't have to deal with (relaxed) atomics.
Yes, that would be ideal if we had that option. It's not clear to me
what catomic_ actually means, it seems from the previous discussions
that it has to be atomic wrt. signal handlers which is why the atomic
operations remain (but the barriers can be dropped). malloc is
generally not re-entrant or AS-safe so optimizing away the atomic
instructions would be a bg win here...
>> Before: 259.073
>> After: 246.749
>
> What is the performance number for a program that does have several
> threads but runs with your patch (i.e., has conditional execution but
> can't avoid the barriers)?
I don't have them, but I will look at that.
> Do you have numbers for a hacked malloc that uses no atomics?
No, I'll see what I can do.
--
Will Newton
Toolchain Working Group, Linaro