This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Lock elision problems in glibc-2.18


On Wed, 2013-09-11 at 14:03 +0200, Torvald Riegel wrote:
> On Fri, 2013-08-23 at 10:49 +0200, Dominik Vogt wrote: 
> > F) Make sure that control of program flow works as expected even
> >    if your abort handler is never called when transactions abort 
> >    (because it's not the outermost transaction).

[...]

> > I have no solution for (F) yet; if pthread mutexes are only used
> > from inside third party transactions, the adapt_count would never
> > be modified in the abort path, because the abort path is never
> > executed.  This completely breaks the adaption logic.
> 
> The robustness of the adaptation is indeed a problem.  In the worst case
> a forward progress problem (ie, correctness).  A strict ABI for the
> semantics of certain abort codes could be a solution, or perhaps we can
> fake this effectively.  I'll think more about it.

The only potentially problematic scenario I could come up with so far is
the following:

* An application (or any other non-glibc piece) uses transactions, and
interprets abort codes differently than glibc does.  In particular,
there is some abort code X that it understands as meaning to retry the
transaction.
* In glibc HLE, we use code X to denote a permanent failure.  The app
tries in its transaction to acquire a lock, and elision is enabled.

The worst case is that the application tries to deduct meaning from a
certain abort code, and does something bad.  I think that a sane
application shouldn't do that because it must know that it other things
it calls might use transactions.

The second worst case is that the app always retries the transaction
after X.  This also requires that glibc always aborts with code X.  We
shouldn't ever keep abort due to a busy lock forever; if that happens,
we do have a deadlock situation (it could be starvation too, but then
the end result of starvation is starvation, which is fine).
More problematic might be that we have to abort on nested trylocks
(_ABORT_NESTED_TRYLOCK, 0xfd).  We need to do this, and if the app
misinterprets and *always* keeps retrying, it will hang.

The first way out of this would be for the application to eventually
fall back to not using transactions (same applies for any other code
starting an outermost transaction).  That would follow from abort codes
being hints, but it doesn't make the tuning of the app easier.  If the
app would retry a lot of times, we could still have a significant
performance problem.

Second, we could rely on abort codes being ABI.  But then we need to pay
more attention to which codes we reserve for what, etc.  The 16 that
Andi seems to have reserved (IIRC) might run out rather quickly.

Third, we could fake a permanent abort reason for nested trylock, for
example doing a noop system call.  This would show up in the caller as a
non-retryable abort.  However, we probably don't want to rely on
particular HTM implementation properties (such as will abort on
syscall), so we might need to have an additional xabort after the
syscall anyway just for safety.

I would prefer the ABI solution, but if that turns out to not be doable,
I think the third one that fakes a permanent abort could be practical
too.  Thus, I think we're set, at least for glibc.

If the above makes sense for everyone, I'll add it to the guidelines.

(Note that we discussed the counterpart of this case, when glibc HLE is
the outermost transaction, already.  Dominik pointed out that we don't
have a tuning parameter for this case yet but have the fixed amount of
retries; nonetheless, we'll run without transactions eventually, and
with the current parameters after just a few failures.)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]