This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/4] Add atomic operations similar to those provided by C11.


On Wed, 2014-10-29 at 22:00 +0000, Joseph S. Myers wrote:
> On Wed, 29 Oct 2014, Torvald Riegel wrote:
> 
> > This patch adds atomic operations similar to C11.
> > 
> > The function naming is essentially the C11 names, but with the memory
> > order argument removed and added as a suffix.  For example, C11's
> >   atomic_store_explicit(&foo, 23, memory_order_release)
> > becomes
> >   atomic_store_relaxed (&foo, 23);
> 
> As previously discussed, I'm concerned about the explicit relaxed loads 
> and stores being defined in terms of __atomic_* (see 
> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63273>).  Unless and until 
> __atomic_* implements relaxed atomics as plain loads/stores not inhibiting 
> optimization (as far as compatible with standard semantics), as evidenced 
> by the change not significantly changing code generated for uses in glibc, 
> I think the glibc implementation should be using plain loads and stores 
> rather than __atomic_*.

Let me reply in some more detail.

First, do you agree that we need to make the compiler aware of
concurrency?  For example, it would be bad if the compiler assumes that
it can safely reload from an atomic variable just because it was able to
prove that the loading thread didn't change it in the meantime.

If we assume that, we can either (1) use __atomic* and check all the
generated code, or (2) use inline asm, or (3) use volatile inline asm.
Any other options?  Plain loads will not reliably make the compiler
aware that it has to take concurrent accesses into account.

That might also mean that atomic_store_relaxed should actually use
inline asm (see the comment in the patch).  Thoughts?


However, I would guess that we won't be really affected by 63273 anyway.
The triggering usage there was very special in that the sanitizer
generates loads of relaxed atomic accesses, and just that.  That's not
what we have in typical glibc code.  If we use a relaxed access, it's
either (1) in front of a CAS, so we'll have an optimization-constraining
operation close-by anyway, or (2) it's in combination with an explicit
fence next to it (Dekker sync, relaxed load + acquire fence, etc.), so
it's likely that it can't optimized as freely anyway.

Are there any other examples where the lack of optimizations of relaxed
accesses in typical concurrent code was really decreasing performance
(ie, ignoring the sanitizer thing and non-optimized code such as maybe
comes out of templates that are *expected* to be optimized)?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]