This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Linux: Implement membarrier function
- From: Alan Stern <stern at rowland dot harvard dot edu>
- To: "Paul E. McKenney" <paulmck at linux dot ibm dot com>
- Cc: David Goldblatt <davidtgoldblatt at gmail dot com>, <mathieu dot desnoyers at efficios dot com>, Florian Weimer <fweimer at redhat dot com>, <triegel at redhat dot com>, <libc-alpha at sourceware dot org>, <andrea dot parri at amarulasolutions dot com>, <will dot deacon at arm dot com>, <peterz at infradead dot org>, <boqun dot feng at gmail dot com>, <npiggin at gmail dot com>, <dhowells at redhat dot com>, <j dot alglave at ucl dot ac dot uk>, <luc dot maranget at inria dot fr>, <akiyks at gmail dot com>, <dlustig at nvidia dot com>, <linux-arch at vger dot kernel dot org>, <linux-kernel at vger dot kernel dot org>
- Date: Fri, 14 Dec 2018 16:39:34 -0500 (EST)
- Subject: Re: [PATCH] Linux: Implement membarrier function
On Fri, 14 Dec 2018, Paul E. McKenney wrote:
> I would say that sys_membarrier() has zero-sized read-side critical
> sections, either comprising a single instruction (as is the case for
> synchronize_sched(), actually), preempt-disable regions of code
> (which are irrelevant to userspace execution), or the spaces between
> consecutive pairs of instructions (as is the case for the newer
> IPI-based implementation).
>
> The model picks the single-instruction option, and I haven't yet found
> a problem with this -- which is no surprise given that, as you say,
> an actual implementation makes this same choice.
I believe that for RCU tests the LKMM gives the same results for
length-zero critical sections interspersed between all the instructions
and length-one critical sections surrounding all instructions (except
synchronize_rcu). But the proof is tricky and I haven't checked it
carefully.
> > > The other thing that took some time to get used to is the possibility
> > > of long delays during sys_membarrier() execution, allowing significant
> > > execution and reordering between different CPUs' IPIs. This was key
> > > to my understanding of the six-process example, and probably needs to
> > > be clearly called out, including in an example or two.
> >
> > In all the examples I'm aware of, no more than one of the IPIs
> > generated by each sys_membarrier call really matters. (Of course,
> > there's no way to know in advance which one it will be, so you have to
> > send an IPI to every CPU.) The execution delays and reordering
> > between different CPUs' IPIs don't appear to be significant.
>
> Well, there are litmus tests that are allowed in which the allowed
> execution is more easily explained in terms of delays between different
> CPUs' IPIs, so it seems worth keeping track of.
>
> There might be a litmus test that can tell the difference between
> simultaneous and non-simultaneous IPIs, but I cannot immediately think of
> one that matters. Might be a failure of imagination on my part, though.
P0 P1 P2
Wc=1 [mb01] Rb=1
memb Wa=1 Rc=0
Ra=0 Wb=1 [mb02]
The IPIs have to appear in the positions shown, which means they cannot
be simultaneous. The test is allowed because P2's reads can be
reordered.
Alan