This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] atomics vs. uniprocessor builds


One thing my recent C11 atomics patch has not addressed is that it does
not add a uniprocessor specialization of atomic operations when UP is
defined.  For the custom implementation of atomics, such a
specialization exists for three architectures:
* x86 and x86_64 drop the "lock;" prefix on atomic read-modify-write
operations if UP is defined,
* powerpc (32 and 64) drops most memory barriers if UP is defined,
* alpha drops most memory barriers if UP is defined.

My recent C11 atomics patches only affect x86_64 on GCC >= 4.7 because
this will use the compiler builtins for atomics, which don't have a
uniprocessor specialization.  powerpc and alpha are not affected
currently, but would be once we start using builtins on these
architectures too.

On powerpc and alpha, the current behavior could be achieved with the
builtins by, for each call to a builtin, use just memory_order_relaxed
for the memory order argument(s) and surround the call with custom
compiler barriers depending on the original MO.  Examples:

* __atomic_store_n ((mem), (val), __ATOMIC_RELEASE) becomes:
  __asm ("" ::: "memory")
  __atomic_store_n ((mem), (val), __ATOMIC_RELAXED);

* __atomic_load_n ((mem), __ATOMIC_ACQUIRE) becomes:
  __atomic_load_n ((mem), __ATOMIC_RELAXED)
  __asm ("" ::: "memory")

* __atomic_fetch_add ((mem), (operand), __ATOMIC_SEQ_CST)
  __asm ("" ::: "memory")
  __atomic_fetch_add ((mem), (operand), __ATOMIC_RELAXED)
  __asm ("" ::: "memory")

We need the additional compiler barriers because the memory order
argument is both a request for HW barriers (as necessary) and an
indication to the compiler which optimizations are allowed (e.g.,
reordering accross a __ATOMIC_RELAXED atomic is possible in many cases).

The same approach wouldn't completely work for x86.  We could use it to
avoid __ATOMIC_SEQ_CST barrier overhead (ie, store/load barriers), but
we can't get rid of the lock prefix.  We could add an extra memory order
argument to GCC, similar to what's been done for HLE, but this would be
either a flag that is modifies the normal memory orders (so that the
compiler can drop the lock prefix but can still prevent reordering as
required), or it would be a no-lock-prefix __ATOMIC_RELAXED and we'd
have to manually add the compiler barriers.

Note that the above does not affect the single-thread optimizations in
lowlevellock.h, which check at runtime for the presence of more than one
thread and avoid the lock prefix if this isn't the case.

Suggestions?  Requests?

My guess would be that uniprocessor builds are rarely used, so it might
be fine to just drop the UP optimization on the atomics.  Keeping the
single-thread optimization on the locks would probably cover the
majority of cases where this actually improves performance, partly
because locks are still more widely used in glibc as synchronization
mechanism than custom sync implementations based on atomics.

OTOH, if we want to eventually use atomics to implement lowlevellock in
a generic way, we'd need a way to drop the lock prefix from the x86
atomics.
We could maintain custom uniprocessor variants of just the atomic
operations that yield most of the performance benefits (e.g., UP acquire
CAS+exchange and release exchange to cover lowlevellock), and ensure
that algorithms use them explicitly where necessary (ie, #ifdef UP).  I
would guess that we'd end up with fewer ones that need to be maintained
as custom assembly.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]