This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Lock elision problems in glibc-2.18


On Fri, 2013-09-20 at 11:00 +0200, Dominik Vogt wrote:
> On Tue, Sep 17, 2013 at 03:59:54PM +0200, Torvald Riegel wrote:
> > In the second phase, the caller simply chooses to use other callees, and
> > then it doesn't matter whether glibc's elision tuning is used or not.
> > 
> > The situation that we want to avoid in this case is that if we abort due
> > to bad tuning (the only possible cause for this should be having to
> > abort on trylock(); anything else shouldn't make things worse given that
> > we're executing transactionally already).  But the only way out of this
> > is for the caller to not use transactional execution, and let callees
> > tune.  Thus, not misinterpreting abort codes, or being conservative on
> > explicit aborts, would be the right thing to do.
> 
> Then maybe we should take a different approach to tuning.  At the
> moment it's very liberal; elision is tried optimistically and
> using locks is the "bad" fallback mode.  Perhaps using locks
> should be the default, and a certain mutex has to prove that it's
> worth trying transactions.  (With Tsx there still remains the
> problem to communicate this from inside nested transactions.)

Nested transactions are indeed the problem here.  This is not just on
TSX though, because even if you could communicate nontransactionally,
this shouldn't increase the likelihood of aborts.  So you'd at least
need a separate cacheline somewhere for the tuning data (e.g., the
equivalent of adapt_count); you can't put this in the lock because this
would then cause conflicts with threads monitoring the lock's futex
field transactionally.  The separate cacheline would increase overheads,
and potentially also HTM capacity requirements depending on the HTM.

> > > Perhaps we should start with writing down the requirements for
> > > the tuning algorithm.  Just some bullets that come to my mind:
> > > 
> > >  * The ultimate goal is to use elision where it helps and to not
> > >    use it where it harms.
> > >  * A (possible recurring) training phase where the algorithm
> > >    performs badly is acceptable.
> > >  * After a training phase, a certain minimum or average
> > >    performance must be guaranteed in all cases.  If not, what are
> > >    the exceptions, and how do we deal with them (e.g. argue that
> > >    they are irrelevant (reasons), easy to avoid etc.).
> > 
> > Those sound good.
> > 
> > >  * If lock elision is ever to be enabled by default, the tuning
> > >    algorithm must make sure that no (relevant?) software shows
> > >    a significant (unacceptable) performance loss.
> > 
> > Possibly.  Although this is certainly a trade-off between average
> > performance gains and losses in particular cases.   We certainly want to
> > avoid cases where performance really crashes.
> 
> If you're not comfortable with this proposed requirement, then
> what is the requirement you have in mind?  I think we really need
> to pin this down.  Otherwise we'll always have cases where one of
> us thinks a certain scenario is unacceptable and someone else
> thinks it is acceptable.

The requirement as written above seemed to be too black-and-white for my
taste.  I agree that different applications (and thus workloads) will be
of different importance to different people.  Personally, I'd be fine if
we'd loose a bit of performance in some workloads if in return we get
good gains in most others.  That's obviously a very fuzzy position.  But
I'm not quite sure how we can pin that down further without having to
rely on too strict requirements.

> I've made a quick test on z.  There is only one thread that does
> 
>   pthread_mutex_lock(&mutex)
>   if (in_transaction())
>     TABORT(temporary)

This TABORT is meant to simulate a conflict?  If so, we'd should also
compare it against a case in which at least one other thread does the
same; this would model the case where we had a conflict due to what the
critical sections do.
If you just compare against disabled elision, this is essentially
comparing against a case in which the conflict would be due to other
synchronization besides the lock used for the critical section (e.g.,
another thread constantly incrementing an atomic variable accessed in
the critical section).

>   else
>     /* do nothing */
>   pthread_mutex_unlock(&mutex)
>  
> This is maybe the shortest possible transaction body that aborts
> every transaction with a temporary abort code, thus triggering
> the out of retries situation.
> 
> relative
> performance  setup
> -----------  ----------------------------------------------------
>        100%  elision disabled through configure
>         ~7%  elision enabled, without the out of retries patch
>        ~81%  elision enabled, with the out out retries patch,
>              skip_lock_out_of_retries set to very conservative
>              value (32000)

Why is there still a 20% loss, give that this should only very rarely
try to use elision?  Is this the normal transaction (setup) overhead?

> 
> Okay, that scenario _is_ artificial, but it shows roughly the
> worst that can happen if a software has extreme data contention
> and uses short lived locks.  The numbers may be different for
> Tsx.
> 
> With the out of retries patch the worst case is a performance loss
> of about 20%, without the patch performance can really drop to
> abysmal values.  Of course all this says nothing about average
> performance.

Which (retry_)try_xbegin value did you use?



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]