This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH][BZ 18743] PowerPC: Fix a race condition when eliding a lock
- From: Torvald Riegel <triegel at redhat dot com>
- To: Peter Bergner <bergner at vnet dot ibm dot com>
- Cc: Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, libc-alpha at sourceware dot org
- Date: Sun, 23 Aug 2015 14:16:16 +0200
- Subject: Re: [PATCH][BZ 18743] PowerPC: Fix a race condition when eliding a lock
- Authentication-results: sourceware.org; auth=none
- References: <1438274936-26493-1-git-send-email-tuliom at linux dot vnet dot ibm dot com> <55BA703D dot 7010303 at linaro dot org> <874mkl3wtq dot fsf at totoro dot lan> <1440103701 dot 5188 dot 46 dot camel at otta>
On Thu, 2015-08-20 at 15:48 -0500, Peter Bergner wrote:
> On Thu, 2015-07-30 at 23:06 -0300, Tulio Magno Quites Machado Filho wrote:
> > Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> > >> + else \
> > >> + for (int i = __elision_aconf.try_tbegin; i > 0; i--) \
> > >> + { \
> > >> + asm volatile("" ::: "memory"); \
> > >
> > > I can't really understand the requirement of this compiler barrier here. If
> > > compiler is moving around the 'is_lock_free' *before* __builtin_tbegin IMHO
> > > this is a compiler bug.
> >
> > This is the second problem and I agree with you again. IMHO, __builtin_tbegin
> > starts a critical path and the compiler should not be moving a memory access out
> > of the critical path or into the critical path.
> > However, as the current implementations of GCC have this "issue" (some people
> > may not agree with us), I believe we could carry this compiler barrier at least
> > until all GCC versions supporting __builtin_tbegin are fixed.
>
> I completely agree we need a barrier here, so I will fix that. The question
> I'm starting to wonder, is do we need more than just a memory barrier or
> do we need a complete optimization barrier, so no code (even scalar code)
> will move past the __builtin_tbegin() (and other HTM builtins)?
Given that the compiler is involved, I believe we need to specify
semantics on the level of language-level memory models. (Otherwise, for
example if we consider compiler implementation internals such as
reordering, we would have to consider a lot more possibilities -- which
should instead result from the specification.)
First, let's just look at the concurrency aspect of this.
If we consider just transactions with semantics as in the ISO C++
Transactional Memory TS, then we have similar semantics as for a
critical section. Specifically, tbegin is like a lock acquisition, and
tend is a lock release (with the lock being a single global
implementation-internal lock).
The builtins may want to give additional guarantees regarding mixing
non-transactional and transactional synchronization, to get "closer" to
the strong isolation property HTMs typically guarantee. However, I
think to be practical, all such semantics would require that any
nontransactional access that is meant to be concurrent with a
transactional access to that same location would have to use atomics for
the former (or a transaction).
If it is required to use atomics for the access within the transaction,
I don't see anything different to doing that in a critical section.
This would be the case for C++ anyway because of the atomic types
separating atomic from nonatomic accesses, currently.
With the atomic builtins, we can access nonatomic types, and can also
use plain accesses (ie, in the transaction). But even if this is what
we want to allow, I don't yet see a case that would be different from
the case of a critical section.
Second, beyond concurrency, we have to consider whether we get
additional constraints on code generation because of txn aborts. I
don't think we do because tbegin's result is unknown at compile time, as
well as what it does internally. From the perspective of code
generation, it either will execute the txn atomically, or will execute
the txn abort/failure path. Thus, even if at runtime, a call to tabort
decides about the outcome of a previous tbegin, this isn't
distinguishable at compile time because of the atomic execution of txns.
Therefore, I don't think aborts result in constraints that are not yet
there anyway due to tbegin returning a value that is unknown at compile
time.
> Let's look at the following test case:
>
> long
> foo (long dest, long src0, long src1, long tries)
> {
> long i;
> for (i = 0; i < tries; i++)
> {
> if (__builtin_tbegin (0))
> {
> dest = src0 + src1;
> if (dest == 13)
> __builtin_tabort(0);
> __builtin_tend (0);
> }
> }
> return dest;
> }
>
> If I compile this with -O3 -mcpu=power8 -S, I see the "src0 + src1"
> expression hoisted out of the loop, which I can accept as "ok", since
> both src0 and src1 are constant for the whole function.
Yes. src0 and src1 are not observable to other threads nor the tbegin,
so we're back at sequential code for these. So, no matter what tbegin
returns, the result for these will be the same.
> However, I
> am also seeing the assignment of dest as well as the compare of dest
> with 13 also being moved outside the tbegin/tend pair. This doesn't
> seem correct to me, since the update to dest should only ever be
> observable if one or more of the transactions succeeds.
I think doing the compare outside is fine. The assignment to dest could
be done outside the txn if this does not mess with the other code path
for when tbegin returns false. For example, the following would be a
correct code transformation because we speculate the assignment to dest
and correct any misspeculation on the txn-failure path:
tmp = dest;
dest = src0 + src1;
flag = dest == 13;
if (__builtin_tbegin (0))
{
if (flag)
__builtin_tabort(0);
__builtin_tend (0);
}
else
dest = tmp;
In this case, there isn't any txn-specific about tbegin. If the
compiler is making the assignment to dest visible on the tbegin-failure
path too, and returning this value, then maybe the compiler believes
that the tbegin return value is known or constant?
To summarize, I think tbegin needs to have lock acquisition semantics
and an unknown return value, and tend needs to have lock release
semantics.
Does everyone agree with this reasoning?
Peter (or someone else), could you take care of documenting this
reasoning (or whatever reasoning we agree on) in the GCC sources for the
HTM builtins (ie, Power, s390, and Intel)? I can review such a patch,
and can also help with the wording if necessary. Thanks!