This is the mail archive of the
mailing list for the glibc project.
Re: Transition to C11 atomics and memory model
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: Torvald Riegel <triegel at redhat dot com>, GLIBC Devel <libc-alpha at sourceware dot org>, David Miller <davem at davemloft dot net>
- Date: Mon, 15 Sep 2014 11:59:08 -0400
- Subject: Re: Transition to C11 atomics and memory model
- Authentication-results: sourceware.org; auth=none
- References: <1410719669 dot 4967 dot 160 dot camel at triegel dot csb>
Thanks for the email, very good questions.
SPARC pre-v9 question at the bottom for you.
On 09/14/2014 02:34 PM, Torvald Riegel wrote:
> I think we should transition to using the C11 memory model and atomics
> instead of the current custom implementation. There are two reasons for
Architecturally I think that glibc transitioning to the C11 memory model
is the *only* way forward.
> I propose that our phase in transitioning to C11 is to focus on uses of
> the atomic operations. In particular, the rules are:
> * All accesses to atomic vars need to use atomic_* functions. IOW, all
> non-atomic accesses are not subject to data races. The only exceptions
> is initialization (ie, when the variable is not visible to any other
> thread); nonetheless, initialization accesses must not result in data
> races with other accesses. (This exception isn't allowed by C11, but
> eases the transition to C11 atomics and likely works fine in current
> implementations; as alternative, we could require MO-relaxed stores for
> initialization as well.)
At present we rely on small word-length writes to complete atomically,
would you suggest we have to wrap those in true atomic operations?
Won't this hurt performance? What correctness issue exists?
> * Atomic vars aren't explicitly annotated with atomic types, but just
> use the base types. They need to be naturally aligned. This makes the
> transition easier because we don't get any dependencies on C11 atomic
> * On a certain architecture, we typically only use the atomic_* ops if
> the HW actually supports these; we expect to have pointer-sized atomics
> at most. If the arch has no native support for atomics, it can either
> use modified algorithms or emulate atomics differently.
I strongly suggest all such machines should emulate atomics in the kernel
using kernel-level locks. The downside of this is that all atomic vars
must use atomic_* functions because otherwise the release of the lock
word by a normal store won't order correctly. This already happened on
hppa with userspace spinlocks.
> * The atomic ops are similar to the _explicit variation of C11's
> functions, except that _explicit is replaced with the last part of the
> MO argument (ie, acquire, release, acq_rel, relaxed, seq_cst). All
> arguments (except the MO, which is dropped) are the same as for C11.
> That avoids using the same names yet should make the names easy to
> understand for people familiar with C11.
> I also propose an incremental transition. In particular, the steps are
> 1) Add new C11-like atomics. If GCC supports them on this architecture,
> use GCC's atomic builtins. Make them fall back to the existing atomics
> otherwise. Attached is a small patch that illustrates this.
> 2) Refactor one use (ie, all the accesses belonging to one algorithm or
> group of functions that synchronize with each other) at a time. This
> involves reviewing the code and basically reimplementing the
> synchronization bits in on top of the C11 memory model. We also should
> take this opportunity to add any documentation of concurrent code that's
> missing (which is often the case).
Not OK until we talk about it more.
> 3) For non-standard atomic ops (eg, atomic_add_negative()), have a look
> at all uses and decide whether we really need to keep them.
Agreed e.g. rewrite.
> 4) Once all of glibc uses the new atomics, remove the old ones for a
> particular arch if the oldest compiler required has support for the
> respective builtins.
> Open questions:
> * Are the current read/write memory barriers equivalent to C11
> acquire/release fences? I guess that's the case (did I mention lack of
> documentation? ;) ), but we should check whether this is true on every
> architecture (ie, whether the HW instructions used for read/write
> membars are the same as what the compiler would use for
> acquire/release). If not, we can't implement acquire/release based on
> read/write membars but need something else for this arch. I'd
> appreciate help from the machine maintainers for this one.
Create an internals manual? Add a new chapter on atomics? :-)
> * How do we deal with archs such as older SPARC that don't have CAS and
> other archs without HW support for atomics? Using modified algorithms
> should be the best-performing option (eg, if we can use one critical
> section instead of a complicated alternative that uses lots of atomic
> operations). However, that means we'll have to maintain more algorithms
> (even if they might be simpler).
No. Stop. One algorithm. All arches that can't meet the HW support for
atomics must enter the kernel and do the work there. This is just like hppa
and ARM do. They use a light-weight syscall mechanism and serialize in the
> Furthermore, do all uses of atomics work well with blocking atomics that
> might also not be indivisible steps? For example, the cancellation code
> might be affected because a blocking emulation of atomics won't be
It will be safe because the kernel emulation should not deliver a signal
during the emulation.
> * Which of the catomic_ variants do we really need? Similarly to the
> non-native atomics case, we often might be better off which running a
> slightly different nonatomic (or just nonsynchronizing) algorithm in the
> first place. We'll have to review all the uses to be able to tell.
> Thoughts? Any feedback and help is welcome!