This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Transition to C11 atomics and memory model

From: Torvald Riegel <triegel at redhat dot com>
To: GLIBC Devel <libc-alpha at sourceware dot org>
Date: Tue, 16 Sep 2014 23:48:00 +0200
Subject: Re: Transition to C11 atomics and memory model
Authentication-results: sourceware.org; auth=none
References: <1410719669 dot 4967 dot 160 dot camel at triegel dot csb>

On Sun, 2014-09-14 at 20:34 +0200, Torvald Riegel wrote:
> 1) Add new C11-like atomics.  If GCC supports them on this architecture,
> use GCC's atomic builtins.  Make them fall back to the existing atomics
> otherwise.  Attached is a small patch that illustrates this.

I made a scan through the current atomics implementation on different
archs.  It seems as if there's some disagreement about which operations
have which barrier semantics, and which ones are worthwhile to have
specific implementations of vs. falling back to a CAS loop.

Things that stood out for me (some may be bugs):
* Membars not being defined on i486 and x86_64 (unless I missed
something).  The HW memory model for these two is TSO, which would mean
that algorithms that rely on atomic_full_barrier won't work.  I'm not
sure whether we have these in current code, but atomic_full_barrier is
used in code I'm working on.
* mips use of __sync_lock_test_and_set for both _acq and _rel is
surprising.  Is this __sync builtin different on mips, or is the _rel
variant broken and needs a stronger barrier?
* powerpc having memory_order_relaxed semantics on some operations while
other archs use _acq, yet has acquire semantics for
decrement_if_positive.  Inconsistency?  Just a performance problem or a
correctness problem?
* What's the real semantics of atomic_write_barrier?  The SPARC code
looks as if this wouldn't be equivalent to a memory_order_release fence.
But for Power I'd guess it is?

Archs marked with ??? are those I know very little about.
Read/write/full membar are the current ones (not C11).  _acq and _rel
refer to what the current code understands as that, not C11
memory_order_acquire and memory_order_release

Default:
* If _rel variant not defined by arch, _acq is used.
* All atomic ops without a _acq or _rel suffix (eg,
atomic_exchange_and_add) default to have _acq semantics.
* atomic_decrement_if_positive has _acq semantics on the success path
(ie, if something gets added) but relaxed semantics on the failure path
(ie, if the result of the addition would be negative).
* Full membars are just compiler barriers (ie, __asm ("" ::: "memory")).
* Read/write membars default to full membars.

aarch64:
* Uses the C11 GCC builtins. 8-64b CAS + exchange.
* _acq and _rel implemented as memory_order_acquire /
memory_order_release.
* CAS failure path has memory_order_relaxed.
* Only full membar defined.

alpha ???:
* Custom asm for CAS + exchange. 8-64b.
* Full membar defined and equal to read membar.  Write membar is
different.
* CAS failure path seems to have memory_order_relaxed.
* atomic_exchange_and_add uses full membar.

arm:
* Uses either C11 GCC builtins, the older _sync* builtins, or CAS
provided by the OS (_arm_assisted...).  32b only CAS + exchange.
* CAS failure path has memory_order_relaxed.
* _acq and _rel implemented as mo_acquire / mo_release, except when
using _arm_assisted*, in which case only _acq is used.
* Only full membar defined.

hppa ???:
* Uses kernel for CAS. ??? How is the kernel aware of the size of the
CAS?
* Only _acq CAS defined.
* Membars are not defined.

i486:
* Custom asm for CAS, exchange, exchange_and_add, add, add_negative,
add_zero, increment, increment_and_test, decrement, decrement_and_test,
bit_set, and, or.  All for 8-32b.
* atomic_delay has custom asm.
* atomic_exchange_and_add uses the __sync builtin, so is a full membar.
* Membars are not defined.

ia64 ???:
* CAS based on __sync builtins; exchange uses __sync_lock_test_and_set
with proper release membar added for _rel.  32-64b.
* atomic_exchange_and_add is a full membar.
* Full barrier defined as __sync_synchronize.

m68k coldfire ???:
* Uses kernel for CAS. ??? 32b only it seems?
* Uses kernel for full membar.

m68k 68020 ???:
* Custom asm for CAS, exchange, exchange_and_add, add,
increment_and_test, decrement_and_test, bit_set, bit_test_set.  8-64b.
* Membars are not defined.

microblaze ???:
* Custom asm for CAS, exchange, exchange_and_add, increment, decrement.
32b.
* Membars are not defined.

mips ???:
* Uses C11 GCC builtins, __sync (see Joseph's notes on GCC version
requirements), or custom asm for CAS, exchange.  32b and 64b unless
ABI032.
* If __sync used, then more operations use the __sync builtins.
* If custom asm used, then exchange_and_add is defined too.
* CAS failure path is memory_order_relaxed (in case of C11 builtins and
custom asm).
* atomic_exchange uses __sync_lock_test_and_set for both _acq and _rel,
but has different barriers for _acq and _rel elsewhere.
* Full membar is defined

powerpc32:
* Custom asm for CAS, exchange.  32b.
* Custom asm also exists for exchange_and_add, increment, decrement,
decrement_if_positive. The first three of these don't use any membars,
so are likely memory_order_relaxed; however, decrement_if_positive  uses
an acquire barrier.
* CAS failure path has acquire membar on _acq, memory_order_relaxed
semantics on _rel.
* Read membar defined (lwsync or sync).  Full and write mbars defined.

powerpc64:
* Like powerpc32, but custom asm also provided for 64b (same ops).
* CAS failure path has acquire membar on _acq, memory_order_relaxed
semantics on _rel.
* Read membar defined (lwsync or sync).  Full and write mbars defined.

s390 ???:
* Custom asm for CAS. 32-64b.
* No membars defined.

sh ???:
* Kernel used for CAS, exchange, exchange_and_add, add_negative,
add_zero, increment_and_test, decrement_and_test, bit_set, bit_test_set.
8-32b.
* No membars defined.

sparc32 v9 ???:
* Custom asm for CAS, exchange.  32b.
* Full membar defined.
* Read membar defined.   I guess this would work as a
memory_order_acquire barrier (or C11 fence).
* Write membar defined.  This does not seem to implement a
memory_order_release barrier if I interpret the SPARC asm correctly, but
rather be like a more traditional "don't reorder the preceding write
with anything else" barrier; it issues a StoreLoad and StoreStore
barrier, but if used as memory_order_release, it will be followed by an
atomic store, so I guess it should be LoadStore and StoreStore?  I
haven't checked what GCC generates, so I'm not sure.

sparc32 pre v9 ???:
* Either uses custom asm for a lock-based implementation of atomics or
uses v9 code.

sparc64 ???:
* Custom asm for CAS, exchange.  32-64b.
* Membars are same as sparc32 v9.

x86_64:
* Uses __sync builtins for CAS, custom asm for exchange,
exchange_and_add, add, add_zero, increment, increment_and_test,
decrement, decrement_and_test, bit_set, bit_test_set, and, or. 8-64b.
* Custom asm for atomic ops that emit lock prefix conditionally (e.g.,
catomic_add).
* Membars are not defined.

Follow-Ups:
- Re: Transition to C11 atomics and memory model
  - From: Joseph S. Myers
- Re: Transition to C11 atomics and memory model
  - From: Torvald Riegel
- Re: Transition to C11 atomics and memory model
  - From: Torvald Riegel

References:
- Transition to C11 atomics and memory model
  - From: Torvald Riegel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]