This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Lock elision: Give PTHREAD_MUTEX_NORMAL-like mutexes a new internal type.


On Thu, 2013-06-27 at 07:38 +0200, Dominik Vogt wrote:
> On Tue, Jun 25, 2013 at 02:39:58PM +0200, Torvald Riegel wrote:
> > On Tue, 2013-06-25 at 08:49 +0200, Dominik Vogt wrote:
> > > On Sat, Jun 22, 2013 at 01:04:58AM +0200, Torvald Riegel wrote:
> > > Actually, I don't understand this pressure to get _something_ into
> > > 2.18 when it's clear that there will be no *well tested* elision
> > > patches
> > 
> > We can't get a lot of real-world testing done if we don't expose it to
> > real users.  I understand Andi says he has done a decent amount of
> > testing himself; I agree with him that wider-scale testing would be good
> > to have as the next step.
> 
> If testing has already been done, why not look at the test results
> first?  At the moment, the whole fuzz about transactional memory
> is just driven by politics, not by data.

Andi, could you please post a summary of your data at some point?

> > > > but this gives us the 90% that
> > > > we're interested in (ie, enabling elision for most mutexes including in
> > > > existing binaries),
> > > 
> > > How can we know what we're interested with zero performance
> > > testing up to now.
> > 
> > Sorry if it wasn't clear who the "we" in this sentence referred to.  I
> > meant the group of people that seemed to have interest in having elision
> > in 2.18 so far.  So, Andi, Carlos, myself, and potentially others as
> > well.
> 
> The let me rephrase my question:  How can you know what you're
> interested with zero performance testing up to now.

What does zero performance testing refer to?  Zero wide-spread testing
by exposing it to real users?

> As it is now, you're just assuming or hoping for certain
> properties of transactional memory without any evidence that they
> exist in reality.  I _have_ data on transactional memory that
> suggests that your hopes will not come true.

Then post this data.  I assume that you have data on how Haswell's
transactional memory (TM) performs, because that's what Andi's patches
are about.

> > > Really, I'm not questioning this just for fun
> > > but because I've been testing transactional memory and lock
> > > elision on z/archirecture for several month, and because of the
> > > test results I'm much less optimistic that any _existing_
> > > application will get any relevant benefit from elision for free.
> > 
> > Well, this certainly depends a lot on the hardware.  I haven't done the
> > tests that you have done, but I trust Andi that the testing that he has
> > done shows that we can get a performance benefit.
> 
> As far as I know, nobody has ever done real application tests with
> transactional memory.

There's published work for STMs on real applications like memcached.
Sun has done tests on real code back when they worked on the Rock TM.
No published papers on Haswell TM performance AFAIK, but that's no
surprise given the hardware is new.

> All there is are micro benchmarks and runs
> of that funny "Stamp" "benchmark suite" (funny because it totally
> ignores cache effects of transactional memory which are the key to
> implementing it in hardware; and some of the tests do a different
> workload depending on timing and the parameters of execution).

Agreed, STAMP isn't a great set of benchmarks.

> I'll never believe someone has done real world tests unless he
> documents the precise test setup so that everybody can repeat the
> tests.  This is because I tried to do these real world tests
> myself and was unable to find a suitable application that could
> substantially benefit from lock elision

Lock elision isn't equal to TM.  TM is the general programming
abstraction.  HTM and STM are hardware/software implementations of TM.
Lock elision is something that you can implement with an HTM or STM, but
STM will be slower of course.

> > > I rather expect that some applications can be (and have to be)
> > > carefully reworked to yield these benefits, while other
> > > applications will never get them.  Please keep in mind that lock
> > > elision is not a magical wand that fixes every lock related
> > > performance problem without ever understanding what the problem
> > > was in the first place.
> > 
> > There are no silver bullets, of course.  But I also don't see a reason
> > to not let people experiment with it, provided we don't build up any
> > baggage that we can't get rid off.
> 
> As far as I understand up to now you're talking about putting only
> some praparations for lock elision into 2.18, and that does not
> help anybody ...

Why not?  We can experiment with it, measure stuff, and (provided the
tuning env vars are accepted), experiment with the auto tuning too.
What's not to like about this?

> 
> > With the approach that I have
> > outlined and that this patch here is part of, in the worst case we have
> > to deprecate the configure-time switch to enable elision; there is no
> > external interface change, and even ignoring a --enable-elision=on
> > setting is fine because it's all just about performance.
> 
> I posted test results some days ago.  The 22 to 45 percent
> performance loss even with elision disabled do not count?

As far as I remember this thread, it wasn't quite clear at that time
whether those results were correct.

> So far
> there has not been a plan to write the patches in a way that the
> existing compilation result does not change if the configure
> switch is not set.

In the patch that I sent
(http://sourceware.org/ml/libc-alpha/2013-06/msg00842.html), we do add
unlikely code paths for mutexes that have been explicitly initialized to
NORMAL.  There might be a tiny performance overhead for that (but we
already have lots of branches there...).  If you're concerned about
that, we can add #ifdefs to avoid theses extra paths when we don't have
elision-supporting hardware or don't want to use elision.  Please try
this patch, and post performance results on your architectures.

> I.e. at the moment the mere presence of any
> patches for Intel seems to harm performance for _all_
> architerctures, even the ones that do not implement lock elision,
> and even if lock elision is configured out.

See above.  Which code are you referring to?  Andi's patch set, or what
I posted?

Torvald


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]