This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Fix elision in nested trylock.
- From: Dominik Vogt <vogt at linux dot vnet dot ibm dot com>
- To: libc-alpha at sourceware dot org
- Date: Fri, 28 Jun 2013 08:05:56 +0200
- Subject: Re: [PATCH] Fix elision in nested trylock.
- References: <1372288203 dot 22198 dot 1113 dot camel at triegel dot csb> <20130627004146 dot GK6123 at two dot firstfloor dot org> <1372323284 dot 22198 dot 2309 dot camel at triegel dot csb>
- Reply-to: vogt at linux dot vnet dot ibm dot com
On Thu, Jun 27, 2013 at 10:54:44AM +0200, Torvald Riegel wrote:
> On Thu, 2013-06-27 at 02:41 +0200, Andi Kleen wrote:
> > On Thu, Jun 27, 2013 at 01:10:03AM +0200, Torvald Riegel wrote:
>
> > Elision enforce the invariant that either all parallel users on
> > a lock are eliding or all users are locking. This is done by putting
> > the lock value into the read-set, so any locker will abort any parallel
> > transactions.
>
> That's not the case if you consider just a single thread. Here's an
> example execution for a thread with some interference by another thread:
>
> Assumptions: mutex->__data.__elision is zero.
>
> 1) Thread A tries to acquire the lock (via __lll_lock_elision or
> __lll_trylock_elision). It reads value zero for *try_lock (ie,
> mutex->__data.__elision). Gets suspended. No transactions so far.
>
> 2) Another thread does something with the lock, fails to use elision,
> elision adapts, and mutex->__data.__elision ends up being greater than
> zero. This doesn't abort thread A. The other thread is suspended,
> leaving the lock unacquired.
>
> 3) We (thread A) resume execution. Start elision for the lock. We're
> in a transaction now, but we haven't written to the lock. Return to the
> program, with lock acquired.
Remark: The code either elides the lock _or_ acquires it, not
both at the same time. In case of elision the lock will appear to
have been acquired only after the transaction completes.
> 4) We (thread A) call trylock for the same lock, and we end up in
> __lll_trylock_elision. We read *try_lock (mutex->__data.__elision), and
> get a value greater than zero.
Remark: This would only happen if the write has been flushed to
the point of unification of the two cpu's caches. If the write is
still pending, thread A aborts because of a read-after-write
conflict on mutex->__data.__elision (at least on z, I assume it's
the same with Intel).
> We don't use the elision code path, but
> call lll_trylock, which will acquire the lock because nobody has
> acquired it visibly. This is incorrect; POSIX requires us to return
> that the lock is already acquired.
>
> So, the problem is that someone else might race with our load of
> mutex->__data.__elision in the outermost transaction and our decision
> based on this load.
Actually, I think your patch is not really complete. There is a
similar effect that plays a role only when elision for recursive
mutexes is implemented:
* __elision starts with a value > 0
* the mutex is locked recursively several times
* each time one of the inner locks is released, __elision is
decreased
* if that happens often enough, the code will switch to using
elision while the lock has been acquired in the outermost
invocation.
==> When the outermost phtread_mutex_...lock() is called, it
should really freeze the value of __elision so that all
nested calls use the same strategy as the outermost call.
> Similarly, the __elision field can be on a
> different cacheline than __data.__lock, so any pending write by another
> thread to __elision (with a value > 0) can trigger the same bug also in
> other executions.
This can be fixed by just reading __elision right after the
outermost transaction starts, thus putting the cacheline with
__elision into the read set. But that helps only if the
transaction has already begun, not for the race condition you
described.
P.S.: Please do not cc me, I read the list.
Ciao
Dominik ^_^ ^_^
--
Dominik Vogt
IBM Germany