This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [RFC] Lock elision implementation guidelines

From: Torvald Riegel <triegel at redhat dot com>
To: Rich Felker <dalias at aerifal dot cx>
Cc: GLIBC Devel <libc-alpha at sourceware dot org>, andi <andi at firstfloor dot org>, Andi Kleen <ak at linux dot intel dot com>
Date: Fri, 15 Feb 2013 12:46:31 +0100
Subject: Re: [RFC] Lock elision implementation guidelines
References: <1360527652.3065.11521.camel@triegel.csb> <20130213001224.GZ20323@brightrain.aerifal.cx>
On Tue, 2013-02-12 at 19:12 -0500, Rich Felker wrote:
> On Sun, Feb 10, 2013 at 09:20:52PM +0100, Torvald Riegel wrote:
> > --- Compability of lock()/unlock() with pthreads mutex types
> > 
> > This scheme allows a thread to acquire a lock that it has already
> > acquired, and there is no error checking when releasing a lock.  Thus,
> > for which pthreads mutex types is it a suitable implementation?:
> > 
> > PTHREAD_MUTEX_DEFAULT: Yes.  Recursive locking or releasing an unlocked
> > mutex result in undefined behavior, so the implementation is free to do
> > anything in such cases.
> > 
> > PTHREAD_MUTEX_NORMAL: Strictly speaking, no.  Allows undefined behavior
> > on releasing an unlocked mutex, but requires deadlock when trying to
> > lock an acquired lock.  Deadlock is not the same as undefined behavior:
> > even though there's no progress on deadlock, nothing bad will happen
> > either.
> > ??? PTHREAD_MUTEX_NORMAL is the same type as PTHREAD_MUTEX_DEFAULT in
> > current glibc.  This would mean that we'd have to assume the program
> > wanted NORMAL, so we couldn't use elision for the default mutex type.
> > However, it's probably rare for correct programs to rely on the deadlock
> > guarantee in a meaningful way -- should we ignore the deadlock
> > requirement?
> 
> No. You cannot ignore any requirements. That's why they're called
> requirements and not recommendations.

Usually, I wouldn't reconsider any requirements.  But this case is
special enough to warrant thinking about this.

First, the real-world use cases that are enabled by guaranteeing
deadlock instead of undefined behavior are limited:
- The mutexes aren't required to be cancellation points.  Only rwlock
might contain cancellation points.  Thus, AFAICT, there is no way to
recover from a deadlock using POSIX facilities.
- One can't just kill threads in other ways and still expect the program
to continue to work.
- To find out whether there was a deadlock, the program would have to
explicitly track which mutexes were acquired and are about to be
acquired.  POSIX doesn't let you query whether a mutex is part of a
deadlock.  If the program is tracking this anyway, why would it not use
lock acquisitions with timeouts to prevent any deadlocks right away?
- If we can't recover from a deadlock, then we can't make progress after
a deadlock.  Making no progress is a fault in the program.  Thus, the
deadlock requirement doesn't enable us to fix a faulty program.

Thus, the only benefit that the deadlock requirement provides is to give
a program fail-stop behavior when a deadlock fault in the program
triggers the deadlock.  It's just that: ensuring that some bugs in a
buggy programs make it fail in a potentially safer way.  If you assume
your program to not be prone to deadlocks, there's no change in
behavior.

Second, we have to consider all the use cases.  For example, for
experimentation (use case E in my prior email), I think it would be fine
to have an explicit switch (e.g., env var) that allows glibc to ignore
the deadlock requirement.  The same could apply in the PT use case.

In the PNT case where elision is enabled by default, I'm not quite sure
whether I'd want to preserve the deadlock requirement or not.  I
probably would.  That's why this paragraph in my write-up started with
"???", BTW.

> If it would be beneficial to
> make DEFAULT and NORMAL different, a new constant should be added for
> DEFAULT with the relaxed semantics. Unfortunately this may require new
> symbol versions since the type is encoded statically into the mutex
> structure with the mutex initializers.

Right.

> > --- trylock() and alternative LE implementations
> > 
> > With the monitor-the-lock implementation scheme described previously,
> > there is no way to reliably detect whether a lock has already been
> > acquired by a thread.  This makes it unsuitable for most mutex types,
> > and also has implications for trylock():  For recursive mutexes,
> > trylock() will just acquire the lock, as expected; for nonrecursive
> > mutexes, trylock() must return an error if the mutex is already
> > acquired.  This doesn't work in all cases:
> > - If acquired without elision, then trylock() can see this.
> > - If acquired with elision by another thread, then trylock() will just
> > try to acquire, which is fine. (If with elision, this is like plain
> > elision; if without elision, this will abort the transaction of the
> > other thread that used elision.)
> > - If acquired with elision by the same thread, trylock() must return an
> > error but it can't because it cannot check whether the lock has been
> > acquired already.
> > 
> > There are a few potential solutions:
> > 
> > 1) Track which mutexes have been acquired (in software)
> > - Can't track this using bits in the lock itself or in shared data, or
> > the write will prevent elision due to data conflicts with other
> > transactions.  Could track using TLS.
> > - Con: larger runtime overheads and increases the amount of data
> > transactions touch (important because an HTM's access tracking capacity
> > is limited).  Could still increase scalability of contended CS, but
> > overheads for uncontended.
> > - Pro: allows elision to be used for all mutex types.
> 
> I would track with a single TLS pointer to the mutex locked. If more
> than one mutex is being held at the same time, this is almost surely
> an indication that elision should not be used.

No, I disagree, this isn't such an indication.  Why would it be?

Whether elision is useful or not doesn't depend on the number of locks
you acquire, except that you might hit limitations in the hardware
(e.g., if considering Intel HLE, how many lock elisions an
implementation can track).  For example, with HTM-based implementations,
I don't see a reason why you wouldn't want to use elision for
hand-over-hand locking with fine-granular locks (e.g., when navigating a
linked list or similar).  Just because elision might allow for having a
simple coarse-granular lock as fallback if elision is frequently
successful, doesn't mean that you should use it only for that.  Elision
both allows you to improve scalability by executing critical sections
concurrently, as well as to reduce cache misses on locks.

> This would make for a
> much simpler implementation than the current proposal and would be
> much more useful -- it could actually be used for all lock types,
> including the current DEFAULT which aliases NORMAL.
> 
> > 2) trylock() on non-recursive mutexes aborts transactions/elision
> > - Con: bad for custom spinlocks in programs built on top of trylock()
> 
> This is an idiotic idiom and not worth making design decisions based
> on it.

Sorry, but this statement is unclear.  Do you dislike trylock()-based
spinlocks or spin-then-block locks?  Those can be useful, especially
given that our current adaptive mutexes aren't really performing that
well.  Second, you need trylock() to build deadlock-avoiding lock
acquisitions schemes (e.g., see the C++11's generic locking algorithms,
30.4.3).

Or do you dislike aborting trylock()?

> > - Pro: Simple to implement. 
> 
> I don't think it's simpler than my variant of option 1 above.

I do think so, because there's no tracking of which locks have been
elided, you just abort in the trylock if you're executing
transactionally.  You could add some tweaks like just aborting if the
lock is being elided, which needs some more care in the implementation.

> > 3) Add a new kind of trylock() with different semantics; use option 2
> > for the original trylock():
> > A) possibly_recursive_trylock(): would behave like a recursive mutex if
> > used with elision and like a nonrecursive mutex' trylock() if used
> > without elision on a nonrecursive mutex.
> > - Con: Adds a new kind of trylock().  Unclear whether it's a useful
> > addition in the long term because it's just used to support a certain
> > kind of HW support.
> > - Con: The semantics are cumbersome.  The program can't detect whether
> > elision is currently being used in a clean way, and it can't force it
> > either.
> > B) nonrecursive_trylock(): has undefined behavior when used to acquire a
> > mutex recursively -- the program must never do that.
> > - Con: Adds a new kind of trylock().
> > - Con: Adds more undefined behavior.
> > - Pro: Semantics are clearer, and might be practical in the context of
> > typical uses of nonrecursive mutex types.
> 
> This option is just hideous, as it's not at all transparent, but
> requires applications to use highly nonstandard interfaces designed
> around the lock-elision implementation rather than designed to be
> naturally idiomatic.

I dislike option A) too, but nonetheless we need to explain why we
didn't pick it in any design document/discussion that we expect to be
helpful to future glibc developers.

I don't really dislike option B) except that it adds nonstandard
interfaces.  However, as I wrote, the semantics can be clearly
explained, in particular when you show the contrast to the original
trylock's semantics.

What the original trylock's semantics allow you to do (in contrast to
nonrecursive_trylock()) is to detect deadlock on nonrecursive mutexes
caused by *relocking* a mutex you already have acquired.  Which natural
idiom does this serve?  Why not use a recursive mutex right away if
relocking is okay in your program?  If it's not okay, then we again have
a buggy program.  So this is either for buggy programs or motivated by
assumptions about overheads of recursive mutexes (i.e., related to mutex
implementations).

Note that I find the C++11 mutex definition to be really intuitive, yet
it specifies different mutex operation preconditions.  I don't have data
that many other people agree, but nor do you that people would disagree.
Thus, it at least serves as indication that the POSIX mutex specs aren't
the only ones that make sense.

Another advantage of option B) is that something like
nonrecursive_trylock() would satisfy the C++11 trylock requirements, so
we would have users.  I believe that C++11 will have much more users in
the future, and it would be bad for us too if the C++11 locks cannot be
based on glibc mutexes because this would mean we split our users and
development efforts.

Of course, all of our discussion here kind of depends on what kind of HW
support we get.  While the kind of this support might change, having
just pure HTM's isn't too unlikely, I would guess.  So if this is what
we'll likely get, so be it.  It's okay to try to push HW into a
direction that's easier for SW, but we also have to be realistic.

> > 4) New nonrecursive mutex types with different trylock() semantics.
> 
> Again, same as above...
> 
> > 5) Advise programmers to switch from nonrecursive to recursive mutexes
> 
> Any solution that starts with "advise programmers" is outside the
> scope of a solution to this problem.

I strongly disagree.  First, we have options that preserve semantics, so
if we pick them, this becomes just a performance problem.  Which is part
of the problem -- if we wouldn't want to benefit from performance gains
due to elision, we wouldn't be having this discussion *at all*.

So if performance is part of the problem, then we have to find good
trade-offs that satisfy the performance needs of our users.  If we can't
fit good performance into the current framework of semantics that are
required for mutexes by POSIX, then why couldn't we instruct programmers
to use recursive mutexes?

If we wouldn't be concerned about performance, we wouldn't need nor have
adaptive mutexes, we wouldn't need separate spinlocks, and we could use
recursive mutexes to implement PTHREAD_MUTEX_DEFAULT.  Nonetheless,
those different mutex types are provided.

> > ??? Which of these options do we prefer? Option 2) is a safe approach,
> > might still allow using LE often in most programs (if we ignore the
> > conservative deadlock requirement of default==normal mutexes), and is
> > easy to do.  Option 1) has runtime overheads and currently no
> > implementation, but would allow for using LE on all mutex types.  Option
> > 3) adds another glibc-only extension; not sure whether this is worth it.
> > Option 5) might be a useful workaround, but still faces the error
> > checking issue of recursive mutexes.
> 
> I think option 1 is by far the preferred option.

I read this as you saying that *you* prefer option 1 strongly.

I don't agree, but I haven't seen any performance data for option 1 --
it's hard to estimate how big the overhead would be because we don't
have any data on how likely it is that a thread has more than one mutex
acquired at the same time, in cases where we could potentially use
elision.  I doubt that this is unlikely.
Second, any data we touch additionally can decrease the likelihood of
elision being successful because it adds to the capacity required by
transactions.

Andi, have you looked at option 1?  I know you can't discuss performance
in detail, but perhaps you can add something to this discussion.

What do others think?  Are there any other issues that you'd like to be
discussed, investigated, or better explained to have an opinion?

> > ??? Does the Austin Group have an opinion about how to solve this?
> 
> The Austin Group's "opinion" is the text of the standard

Look, the text of the standard predates the availability of any HW
support for lock elision, and the spec was probably written when people
weren't aware of the concept of lock elision.  Or perhaps they were
aware, but considered it too far off in the future to be of importance
back then.

Thus, the people in the Austin Group might have a opinion about elision
and how to marry it to the mutex semantics as they are now.  I doubt
they're some kind of droids that just have the standard text as their
only thought :)

So, I don't know about you, but I think that it often can't hurt to ask
others for feedback.  This includes discussing specifications after
they've been written.

> which
> specifies certain behavior for the standard mutex types that conflicts
> with all of the options above except option 1 or putting all the
> lock-elision in new nonstandard mutex types.

It does not conflict with option 2 at all.  Think about what happens
when we abort a transaction -- we just don't use elision for the
affected lock acquisitions.


Torvald
Follow-Ups:
- Re: [RFC] Lock elision implementation guidelines
  - From: Andi Kleen
- Re: [RFC] Lock elision implementation guidelines
  - From: Rich Felker
References:
- [RFC] Lock elision implementation guidelines
  - From: Torvald Riegel
- Re: [RFC] Lock elision implementation guidelines
  - From: Rich Felker
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]