This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Fix atomic_full_barrier on x86 and x86_64.
- From: Torvald Riegel <triegel at redhat dot com>
- To: Rich Felker <dalias at libc dot org>
- Cc: GLIBC Devel <libc-alpha at sourceware dot org>
- Date: Wed, 29 Oct 2014 21:55:00 +0100
- Subject: Re: [PATCH] Fix atomic_full_barrier on x86 and x86_64.
- Authentication-results: sourceware.org; auth=none
- References: <1414606736 dot 10085 dot 1 dot camel at triegel dot csb> <20141029203158 dot GU22465 at brightrain dot aerifal dot cx>
On Wed, 2014-10-29 at 16:31 -0400, Rich Felker wrote:
> On Wed, Oct 29, 2014 at 07:18:56PM +0100, Torvald Riegel wrote:
> > Currently, x86 and x86_64 do not define any of the barriers. The
> > generic default for the barriers is that full barriers are just compiler
> > barriers and read and write barriers default to using the full barrier
> > definition. On x86 and x86_64, however, only read/write barriers (ie,
> > acquire and release fences in how glibc uses them) are just compiler
> > barriers -- a full barrier needs an mfence instruction. This patch
> > defines atomic_full_barrier accordingly without making read/write
> > barriers stronger than compiler barriers.
>
> Can you explain why you claim these are needed?
Because TSO is not sequential consistency. Consider Dekker
synchronization -- you need a full membar there to prevent store/load
reordering. Also see
http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
> The architecture
> inherently has strong memory ordering that prevents all reorderings
> except for moving loads before stores when the addresses differ.
And a full barrier (in the sense of memory_order_seq_cst) prevents that.
> Are
> you concerned about creating a barrier for the loosely-ordered "nt"
> sse operations or something? Or something else?
No. The focus is on barriers/fences used in combination with
memory_order_relaxed atomics.
> > The only use of atomic_full_barrier on x86/x86_64 is in
> > sysdeps/nptl/fork.c, but there, ISTM that this actually only needs an
> > acquire barrier. So, this patch doesn't fix existing bugs in *current*
> > code, and only makes fork a tiny tiny bit slower.
> >
> > Nonetheless, there are other uses of full barriers in generic code
> > (e.g., semaphores). Those don't apply for x86/x86_64 because we
> > currently have arch-specific code for those, but would if we start using
> > the generic code on x86 too (and we couldn't do until we fix the x86
> > full barrier).
>
> Shouldn't the atomics for these already provide sufficient barriers?
Synchronization code similar to Dekker synchronization is more
efficiently expressed with seq_cst fences and relaxed loads/stores than
with seq_cst loads/stores.
> > commit be14e33b607609462130a0264c7a9460596da3f8
> > Author: Torvald Riegel <triegel@redhat.com>
> > Date: Wed Oct 29 10:34:36 2014 +0100
> >
> > Fix atomic_full_barrier on x86 and x86_64.
> >
> > * sysdeps/x86_64/bits/atomic.h: (atomic_full_barrier,
> > atomic_read_barrier, atomic_write_barrier): Define.
> > * sysdeps/i386/i486/bits/atomic.h (atomic_full_barrier,
> > atomic_read_barrier, atomic_write_barrier): Define.
> >
> > diff --git a/sysdeps/i386/i486/bits/atomic.h b/sysdeps/i386/i486/bits/atomic.h
> > index 76e0e8e..7c432f9 100644
> > --- a/sysdeps/i386/i486/bits/atomic.h
> > +++ b/sysdeps/i386/i486/bits/atomic.h
> > @@ -532,3 +532,7 @@ typedef uintmax_t uatomic_max_t;
> > #define atomic_or(mem, mask) __arch_or_body (LOCK_PREFIX, mem, mask)
> >
> > #define catomic_or(mem, mask) __arch_or_body (__arch_cprefix, mem, mask)
> > +
> > +#define atomic_full_barrier() __asm ("mfence" ::: "memory")
> > +#define atomic_read_barrier() __asm ("" ::: "memory")
> > +#define atomic_write_barrier() __asm ("" ::: "memory")
>
> Does glibc now require i686+sse? AFAIK mfence is SIGILL on all
> previous x86 generations. An alternate portable (to earlier chips) but
> slower approach is a useless xchg. It also _might_ work to perform a
> store/load pair on the same address (I'm not clear on whether this
> method is valid).
Good point.
What's the include order on x86, counting down from the most recent
supported revision (ie, for an i686 machine, include sysdeps/i386/i686
first, then i586 and i486)?