This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 2/4] Add atomic operations similar to those provided by C11.
- From: Torvald Riegel <triegel at redhat dot com>
- To: "Joseph S. Myers" <joseph at codesourcery dot com>
- Cc: GLIBC Devel <libc-alpha at sourceware dot org>
- Date: Wed, 29 Oct 2014 23:45:34 +0100
- Subject: Re: [PATCH 2/4] Add atomic operations similar to those provided by C11.
- Authentication-results: sourceware.org; auth=none
- References: <1414617613 dot 10085 dot 23 dot camel at triegel dot csb> <1414619416 dot 10085 dot 46 dot camel at triegel dot csb> <Pine dot LNX dot 4 dot 64 dot 1410292156440 dot 15119 at digraph dot polyomino dot org dot uk>
On Wed, 2014-10-29 at 22:00 +0000, Joseph S. Myers wrote:
> On Wed, 29 Oct 2014, Torvald Riegel wrote:
>
> > This patch adds atomic operations similar to C11.
> >
> > The function naming is essentially the C11 names, but with the memory
> > order argument removed and added as a suffix. For example, C11's
> > atomic_store_explicit(&foo, 23, memory_order_release)
> > becomes
> > atomic_store_relaxed (&foo, 23);
>
> As previously discussed, I'm concerned about the explicit relaxed loads
> and stores being defined in terms of __atomic_* (see
> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63273>). Unless and until
> __atomic_* implements relaxed atomics as plain loads/stores not inhibiting
> optimization (as far as compatible with standard semantics), as evidenced
> by the change not significantly changing code generated for uses in glibc,
> I think the glibc implementation should be using plain loads and stores
> rather than __atomic_*.
Let me reply in some more detail.
First, do you agree that we need to make the compiler aware of
concurrency? For example, it would be bad if the compiler assumes that
it can safely reload from an atomic variable just because it was able to
prove that the loading thread didn't change it in the meantime.
If we assume that, we can either (1) use __atomic* and check all the
generated code, or (2) use inline asm, or (3) use volatile inline asm.
Any other options? Plain loads will not reliably make the compiler
aware that it has to take concurrent accesses into account.
That might also mean that atomic_store_relaxed should actually use
inline asm (see the comment in the patch). Thoughts?
However, I would guess that we won't be really affected by 63273 anyway.
The triggering usage there was very special in that the sanitizer
generates loads of relaxed atomic accesses, and just that. That's not
what we have in typical glibc code. If we use a relaxed access, it's
either (1) in front of a CAS, so we'll have an optimization-constraining
operation close-by anyway, or (2) it's in combination with an explicit
fence next to it (Dekker sync, relaxed load + acquire fence, etc.), so
it's likely that it can't optimized as freely anyway.
Are there any other examples where the lack of optimizations of relaxed
accesses in typical concurrent code was really decreasing performance
(ie, ignoring the sanitizer thing and non-optimized code such as maybe
comes out of templates that are *expected* to be optimized)?