This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Lock elision test results

From: Torvald Riegel <triegel at redhat dot com>
To: Libc-Alpha at sourceware dot org
Date: Wed, 03 Jul 2013 14:54:11 +0200
Subject: Re: Lock elision test results
References: <20130614102653 dot GA21917 at linux dot vnet dot ibm dot com> <1372767484 dot 22198 dot 4505 dot camel at triegel dot csb> <20130703081138 dot GA5977 at linux dot vnet dot ibm dot com>

On Wed, 2013-07-03 at 10:11 +0200, Dominik Vogt wrote:
> On Tue, Jul 02, 2013 at 02:18:04PM +0200, Torvald Riegel wrote:
> > On Fri, 2013-06-14 at 12:26 +0200, Dominik Vogt wrote:
> > > Test 2 (nested locks)
> > > ======
> > > 
> > > Setup
> > > -----
> > > 
> > > Three concurrent threads using pthread mutexes (m1, ..., m10) and
> > > counters c1, ..., c10.  All static data structures are allocated
> > > in separate cache lines.
> > > 
> > > all threads:
> > > 
> > >   barrier
> > >   take start timestamp (only thread 1)
> > >   repeat <n> times
> > >     lock m1, increment c1
> > >     lock m2, increment c2
> > >     ...
> > >     lock m10, increment c10
> > >     unlock m10
> > >     unlock m9
> > >     ...
> > >     unlock m1
> > >   barrier
> > >   take end timestamp (only thread 1)
> > > 
> > > Performance is measured in the inverse of the time taken on thread
> > > 1.
> > 
> > See above, throughput vs. fairness.
> 
> The troughput varies by a factor of two or more in multiple test
> runs, except for the test with elision enabled, where the results
> vary only but about 10%.

So with elision, we seem to get more fairness?  That could very well be
part of why elision is slower, because less fairness sometimes means
more throughput (eg, less cacheline ping pong).

> I agree that in this case it is also
> interesting to find out why performance of the test can vary that
> much.  It must have something to do with internal timing of the
> threads and possibly starvation of some thread(s).
> 
> > > Test execution
> > > --------------
> > > 
> > > Identical to test 1.
> > > 
> > > Result
> > > ------
> > > 
> > > (1) unpatched  : 100.00%
> > > (2) old glibc  : 134.35%
> > 
> > What causes this difference between (1) and (2)?  If we get a 34%
> > difference just for stock glibc, this seems to be a bigger problem for
> > me than similar overheads if we turn on elision, which is still an
> > experimental feature.
> 
> As interesting that may be, it's really not a question on my
> agenda as it does not involve transactional memory.

And I understand this, but at the same time this does affect the
baseline for any TM / elision measurements.  If we have such a 30%
variance irrespective of elision or not, then this doesn't exactly
improve confidence in any conclusions we might draw based on the
measurements with elision.

> 
> > > (3) elision off:  56.45%
> > > (4) elision on :  31.31%
> > 
> > We need more data to understand these results.  See above.
> 
> The result of (3) needs to be explained.  I suggest to look closer
> at this effect with a more recent version of the elision patches.
> I'll do that after 2.18 freeze when the elision patches change
> less frequently.

Thanks.

> The result of (4) does not really irritate me because the test is
> designed for high contention.

Maybe.  OTOH, a good elision adaptation algorithm should settle
somewhere close to the no-elision case, if the latter performs better.

> > > The abort ratio in (4) in all threads is between 5% and 10%.
> > 
> > That's interesting because all threads seem to conflict with each other
> > (ie, they increment the same counters).
> 
> When the abort ratio is calculated as (100 * aborts / num_tbegins)
> you get lower values with nested locks because one abort can cancel
> up to ten nested transactions.

I assumed you counted aborts compared to each outermost transaction
started.  A better way to report the abort rate might be to show (100 *
aborts / num_commits_outermost_txns), where the commits are in your
tests equal to the number of iterations that committed with elision
turned on.  This way, it's easier to see how many aborts where needed
until a transaction was able to commit. 

> Also, as soon as one thread X has acquired m1, the other threads
> cannot get through to the code that uses the later locks, so X can
> use elision with no risk of being disturbed by the other threads.

But if they use elision, they could.  What's stopping them is that there
are conflicts on c1, c2, ... (unless your HTM does somethings special
for increments).  So my guess would be that elision is unlikely to be
used -- but this doesn't need to be the case given that the critical
section is still short, so depending on how cache misses play out,
elision could be used (although it wouldn't do much more than reduce
contention on the locks (there will still be contention on c1, c2,...)).
That's why I asked how often elision is actually used in this test.

Torvald

References:
- Re: Lock elision test results
  - From: Torvald Riegel
- Re: Lock elision test results
  - From: Dominik Vogt

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]