Concurrency in glibc

Contents

Concurrency in glibc

Memory model and atomic operations

We basically use the C11 memory model, with the following differences to pure C11 code:

We do not use C11 as language standard at the moment, but rely on either custom implementations of atomic operations or compiler builtins.
The atomic operations we use have the same semantics as their C11 equivalents but use a different naming scheme to avoid collisions (see below).
C11 atomic types ensure that macro-less direct accesses use sequentially consistent memory order (memory_order_seq_cst). With the glibc emulation, direct accesses are closer to relaxed memory order (but not quite).
We (currently) do not use special types for the variables accessed by atomic operations, but the underlying non-atomic types with suitable alignment on each architecture.
The memory order is always explicit in atomic operations; with the current naming scheme, it is a suffix of the function implementing the atomic operation.
We do not provide all atomic operations specified by C11 but only those we currently have a need for. We can add further operations if/when there is demand.

Data-race-freedom, as defined by C11, is the default, and a requirement for new code. Old code may not satisfy this requirement (yet). We currently allow non-atomic initialization, as long as it happens-before all atomic accesses; however, new code should avoid non-atomic initialization if possible.

Naming scheme and currently provided functions

For a C11 function with the name atomic_OP_explicit(args, memory_order_MO), the equivalent glibc function has the name atomic_OP_MO(args). There are two small exceptions. First, we assume atomic_compare_exchange operations to only use memory_order_relaxed on the failure path, and thus only the requested memory order on the success path is part of the function name. Second, C11's atomic_thread_fence function does not have _explicit as a suffix in its name, so we just add the memory order without removing the suffix (see below).

Examples:

C11 code	glibc code
`atomic_load_explicit(&x, memory_order_relaxed)`	`atomic_load_relaxed(&x)`
`atomic_store_explicit(&x, 1, memory_order_release)`	`atomic_store_release(&x, 1)`
`atomic_thread_fence(memory_order_acquire)`	`atomic_thread_fence_acquire()`
`atomic_compare_exchange_weak_explicit(&x, &expected, desired, memory_order_release, memory_order_relaxed)`	`atomic_compare_exchange_weak_release(&x, &expected, desired)` (the failure path on CAS is always `memory_order_relaxed`)
`atomic_fetch_and_explicit(&x, operand, memory_order_relaxed)`	`atomic_fetch_and_relaxed(&x, operand)`

Across all architectures, we only provide atomic operations for naturally aligned unsigned/int and naturally-aligned pointers.

To see which atomic operations we currently define, open include/atomic.h and look for #if USE_ATOMIC_COMPILER_BUILTINS; the declarations that follow until the matching #else are what we currently provide.

Documentation guidelines

Concurrent code is often complex, and how it works can usually not be understood simply by going through it sequentially (unless you state space explosion is not an issue for your brain). We want to optimize for the readers of code, not for the authors; therefore, we want verbose documentation of the synchronization scheme. While this creates redundancy, we assume this helps in that it clarifies the intent of the author, and makes it easier for readers to see the intent and the high-level concurrent algorithm and cross-check between the intent and the actual code.

The high-level synchronization scheme of groups of functions that synchronize or coordinate with each other in some way should get documented. This will often include the following information:

The set of high-level states that can occur, and transitions between these states.
Assumptions about allowed uses of the functions in programs, especially if this prevents certain states or interleavings the code does not handle.
Parts of the execution that are atomic, and how that reduces the allowed interleavings and possible states.

This information should be on one of the functions in each group; the other functions in the group should explicitly refer to it in their documentation.

Code with locks

TODO

Code with atomics

If you use atomics, please make sure you are familiar with the C11 memory model.

Use the terms from the formalization of the C11 memory model by Batty et al.:

reads-from in statements about which store operation a load operations is reading from in the executions being discussed.
synchronizes-with in cases where release/acquire MO operations are in a reads-from relation
happens-before whenever you need to say that, so when it includes sequenced-before, or it's just the general ordering that matters.
modification order for the total order of stores per each memory location.

Document at least the following, unless this is trivial:

Why a certain memory order (MO) is sufficient. This especially applies to relaxed MO: why isn't a stronger MO required? If a relaxed MO operation is meant to work together with a fence, document the details as if the relaxed MO operation were of the stronger MO as ensured by the fence.
acquire MO, release MO: document reads-from and the resulting synchronizes-with relations (e.g., say which release MO store an acquire MO load is supposed to synchronize-with, and why).
seq_cst MO: document which other seq_cst operations are need to be ordered with this one, and why.
Fences: which, for example, relaxed MO atomic operations they are required for

Terminology:

CAS: a compare-and-exchange operation, also called compare-and-set ((e.g., atomic_compare_exchange_weak_relaxed) .
MO: memory order as defined by C11 (e.g., "relaxed MO").

Things to look out for:

ABA issues. For example, a CAS will compare the raw value, but not the logical state you associated with it; so, if you see a value A, it doesn't mean that there has been no state or value 'B' in the meantime. Document why ABA issues cannot happen or are benign if this is hard to understand.
"Pending" operations: It is often instructive to examine interleavings in which one thread is about to make an action (e.g., about to perform an atomic store). It will have observed a previous state and made a choice what to do next; but it might not get to doing this next step until a arbitrarily long time later. Handling such situations is often critical to make synchronization code work.
All accesses to an object should use atomics if there is at least one atomic access. This is for clarity and compatibility with C11 atomics, where non-macro accesses would needlessly use sequentially consistent memory order (memory_order_seq_cst).

Related information

Development TODO

Below is a list of further steps towards transitioning existing glibc code to a state that fulfills the requirements laid out above. This is not a complete list of tasks but rather reminders and steps that have already been identified:

nscd: Need to document and check synchronization on the daemon side. Need to likely revise daemon/client synchronization. Torvald has started work on this.
All futex words need to be accessed atomically. While doing that, make the futex words of type unsigned int if possible ("grep futex_ | grep '((unsigned int'" is a good indication).
allocatestack.c wait_lookup_done: acquire barriers missing?
pthread_create.c START_THREAD_DEFN: odd sync around futex_wait: loop condition different from futex condition
New glibc-internal futex API:
- lowlevellock and mutexes have not been transformed yet. Once that is done, we can remove the lll_futex functions and replace their uses in futex-internal.h with syscalls or calls to the NaCl runtime.
- All tls.h files don't use the new API yet. Siddhesh wants to look at this area in more detail.
- The old semaphore code still uses the lll_futex_ functions. Moving this over should be simple, but needs testing.
- Once Adhemerval has revised cancellation, we need to update the futex_ functions that are cancellation points.
rwlock: Torvald is working on a new algorithm/implementation.