This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Lock elision problems in glibc-2.18
- From: Dominik Vogt <vogt at linux dot vnet dot ibm dot com>
- To: libc-alpha at sourceware dot org
- Date: Fri, 23 Aug 2013 10:49:16 +0200
- Subject: Lock elision problems in glibc-2.18
- Reply-to: vogt at linux dot vnet dot ibm dot com
The current implementation of pthread_mutex lock elision is
problematic if software uses a glibc with elision and other pieces
of code that use transactions (other libraries for example). The
root cause of the problem is that transactions may be used in
several independent contexts, but the cpu automatically nests
these into the same transaction.
> /* elision-lock.c: Elided pthread mutex lock.
...
> int
> __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private)
> {
> if (*adapt_count <= 0)
> {
> unsigned status;
> int try_xbegin;
>
> for (try_xbegin = aconf.retry_try_xbegin;
> try_xbegin > 0;
> try_xbegin--)
> {
> if ((status = _xbegin()) == _XBEGIN_STARTED)
> {
> if (*futex == 0)
> return 0;
>
> /* Lock was busy. Fall back to normal locking.
> Could also _xend here but xabort with 0xff code
> is more visible in the profiler. */
> _xabort (_ABORT_LOCK_BUSY);
Let's assume we have this user code:
XBEGIN (abort_handler)
pthread_mutex_lock(&mutex);
/* do something */
pthread_mutex_unlock(&mutex);
XEND
...
abort_handler:
...
If the "_xabort (_ABORT_LOCK_BUSY)" is executed, this will abort
the outermost transaction, i.e. the XBEGIN in the first line, and
jump to abort_handler and _not_ use the abort handling code in
elision-lock.c.
1) This is inefficient because the _xabort terminates more
(nested) transactions than necessary; it would be sufficient to
use _xend here and just do the normal retries or spin a while.
2) The user's abort_handler is probably not able to deal with the
abort code _ABORT_LOCK_BUSY because it does not know about it.
3) Even if the outermost transaction was started from glibc:
pthread_mutex_lock(&mutex1);
pthread_mutex_lock(&mutex2);
/* do something */
pthread_mutex_unlock(&mutex2);
pthread_mutex_unlock(&mutex1);
if mutex2 _xaborts, the abort handler is called for mutex*1*.
I.e., although mutex2 was busy, mutex1 is penalized for it.
4) Vice versa, the abort handling code in elision-lock.c might be
triggered by a third party abort that uses abort codes unknown
to glibc, or even the same codes with a different meaning.
> /* elision-trylock.c: Lock eliding trylock for pthreads.
...
> int
> __lll_trylock_elision (int *futex, short *adapt_count)
> {
> /* Implement POSIX semantics by forbiding nesting
> trylock. Sorry. After the abort the code is re-executed
> non transactional and if the lock was already locked
> return an error. */
> _xabort (_ABORT_NESTED_TRYLOCK);
See (2) above; _ABORT_NESTED_TRYLOCK may be caught and
misinterpreted by a third party abort handler.
> /* Only try a transaction if it's worth it. */
> if (*adapt_count <= 0)
> {
> unsigned status;
>
> if ((status = _xbegin()) == _XBEGIN_STARTED)
> {
> if (*futex == 0)
> return 0;
>
> /* Lock was busy. Fall back to normal locking.
> Could also _xend here but xabort with 0xff code
> is more visible in the profiler. */
> _xabort (_ABORT_LOCK_BUSY);
Same problem as in elision-lock.c.
> /* elision-unlock.c: Commit an elided pthread lock.
...
> int
> __lll_unlock_elision(int *lock, int private)
> {
> /* When the lock was free we're in a transaction.
> When you crash here you unlocked a free lock. */
> if (*lock == 0)
> _xend();
> else
> lll_unlock ((*lock), private);
> return 0;
> }
5) We cannot assume that we are unlocking a free lock if no
transaction is open at this point. This is because even if the
transaction was created by pthread_mutex_lock(), third party code
may have closed it before calling pthread_mutex_unlock():
The third party code might assume, that if a transaction is open
at some point it must have been created by that third party code
earlier (very much like __lll_unlock_elision):
/* 3rd party code */
my_unlock()
{
if (_xtest())
{
/* We have an open transaction, close it. */
_xend();
}
else
{
/* Nothing to do, we've already closed the transaction, or
it has aborted earlier. */
}
}
While it is certainly the fault of the code in my_unlock() that
the program won't work as expected (closing transactions
prematurely), the general protection fault cause by the following
code is definitely cause by the code in __lll_unlock_elision():
pthread_mutex_lock();
my_unlock()
pthread_mutex_lock(); <-- crashes
(This crashes because an XEND outside of transactional mode causes
a general protection fault.)
Note that HTM on z/Architecture suffers from similar problems.
However, using TEND outside a transaction is safe while TABORT
outside a transaction causes a special-operation exception.
Summary
-------
Different pieces of code that use transactions that are logically
independent share a single (nested) transaction in the cpu. A
user (e.g. a library) of transactions must not assume that
* the outermost transactions is always created by the user,
* the innermost transaction has been created by the user,
* the abort code caught by its abort handler has been set by the
user,
* the abort code is passed to the user's abort handler, if the
user aborts a transaction,
* aborts are ever handled by the user's own abort handler.
Failing to do so might break Posix semantics of the elision
code(!)
Rules for transactional coding (draft)
--------------------------------------
The following rules help to reduce potential problems if each
piece of software that uses transactions sticks to the rules:
A) Abort handlers should not interpret abort codes beyond what is
documented in the cpu specification (i.e. on Intel, it should
ignore the user defined bits 31:24 of the abort status; on z,
it should ignore the abort code completely and look only at the
condition code). Abort codes are thus only useful for
debugging and not as a means of controlling program flow.
If it is necessary to use abort codes to pass information from
the transaction to the user, the abort codes _must_ be globally
unique, at least if used in libraries. It might be necessary
to register them through Icann or so.
B) Because of (A), user defined abort codes should not be used to
control program flow. If they are, it is the responsibility of
the programmer to make sure that all software components that
deal with transactions agree on the interpretation of the abort
codes and can deal with codes set by third party software.
C) If a transaction body calls any functions, it cannot be assumed
that a transaction is still open when an XEND or XABORT
instruction is reached. It is necessary to check whether a
transaction is still open and skip the XABORT or XEND if not.
D) Explicitly aborting transactions should be avoided except for
debugging purposes.
E) The innermost transactions should only be closed (with XEND,
TEND etc.) if it was created by the same user.
F) Make sure that control of program flow works as expected even
if your abort handler is never called when transactions abort
(because it's not the outermost transaction).
Applying the rules to the current elision code
----------------------------------------------
(A) and (B) are easy to implement. Instead of aborting with
_ABORT_LOCK_BUSY we can use XEND here and handle the (logical)
transaction failure directly.
(C) can be implemented with "if (_xtest()) ..." where necessary.
(D) can be implemented for lock(), but trylock() currently depends
on the external abort because of Posix requirements.
(E) would require to write down the current nesting depth each
time a transaction is started and only use XEND, TEND, etc. if the
nesting depth is still the same. It's unclear though how to
behave if the nesting depth has changed, though. Furthermore,
while it is possible to query the current nesting depth on z, I
think there is no way to do that on Intel.
I have no solution for (F) yet; if pthread mutexes are only used
from inside third party transactions, the adapt_count would never
be modified in the abort path, because the abort path is never
executed. This completely breaks the adaption logic.
Example code for z that tries to implement these rules
------------------------------------------------------
int
__lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private)
{
if (*adapt_count <= 0)
{
unsigned status;
int try_xbegin;
for (try_xbegin = aconf.retry_try_xbegin;
try_xbegin > 0;
try_xbegin--)
{
if (__builtin_expect
((status = __builtin_tbegin((void *)0)) == _TBEGIN_OK, 1))
{
if (*futex == 0)
return 0;
/* Lock was busy. Fall back to normal locking. It's faster to
simply end the transaction here and fall back to using the lock
than aborting. Furthermore, if we abort here we might abort an
outermost transaction that might have been created by someone
else and who cannot handle out abort code. */
__builtin_tend ();
/* Right now we skip here. Better would be to wait a bit
and retry. This likely needs some spinning. */
if (*adapt_count != aconf.skip_lock_busy)
*adapt_count = aconf.skip_lock_busy;
break;
}
else
{
asm volatile ("" ::: "f0", "f1", "f2", "f3", "f4", "f5", "f6",
"f7", "f8", "f9", "f10", "f11", "f12", "f13",
"f14", "f15", "cc", "memory");
if (status != _TABORT_RETRY)
{
/* The condition code indicates that a retry is probably
futile. Use the normal locking and next time use lock.
Be careful to avoid writing to the lock. */
if (*adapt_count != aconf.skip_lock_internal_abort)
*adapt_count = aconf.skip_lock_internal_abort;
break;
}
}
}
}
else
{
/* Use a normal lock until the threshold counter runs out.
Lost updates possible. */
(*adapt_count)--;
}
/* Use a normal lock here. */
return LLL_LOCK ((*futex), private);
}
int
__lll_unlock_elision(int *lock, int private)
{
/* When the lock was free, we elided the lock earlier. This does not
necessarily mean that we are in a transaction, because the user code may
have closed the transaction. Although the program might then fail, the
unlock function needs to handle this situation gracefully. */
if (*lock == 0)
{
/* Although on zEC12 the instruction ETND is slower than TEND, we cannot
use TEND here because if no elided lock is pending, TEND could close a
transaction that was started outside glibc. */
if (__builtin_tx_nesting_depth () > 0)
{
__builtin_tend();
}
}
else
lll_unlock ((*lock), private);
return 0;
}
Ciao
Dominik ^_^ ^_^
--
Dominik Vogt
IBM Germany