This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] benchtests: Add pthread_once common-case test.
- From: Torvald Riegel <triegel at redhat dot com>
- To: "Carlos O'Donell" <carlos at redhat dot com>
- Cc: GLIBC Devel <libc-alpha at sourceware dot org>
- Date: Thu, 10 Apr 2014 10:13:33 +0200
- Subject: Re: [PATCH] benchtests: Add pthread_once common-case test.
- Authentication-results: sourceware.org; auth=none
- References: <1381266586 dot 18547 dot 1130 dot camel at triegel dot csb> <5335FAD2 dot 9090902 at redhat dot com>
On Fri, 2014-03-28 at 18:42 -0400, Carlos O'Donell wrote:
> On 10/08/2013 05:09 PM, Torvald Riegel wrote:
> > This adds a benchtest for the common-case scenario for pthread_once.
> Siddhesh noted he wanted one more reviewer here, and that's me.
> This is orthogonal to the pthread_once unification, so I'm treating
> it first since it's easier to review.
> > We have a single thread that runs a no-op initialization once and then
> > repeatedly runs checks of the initialization (i.e., an acquire load and
> > conditional jump) in a tight loop. This gives us, on average, the
> > best-case latency of pthread_once (the initialization is the
> > exactly-once slow path, and we're not looking at initialization-related
> > synchronization overheads in this case). I'm adding this to investigate
> > whether we still need the x86-custom pthread_once version written in
> > assembler, or whether we can use the generic version without a
> > performance loss (both use the same algorithm).
> > This needs the other patch that adds the include-sources directive to
> > scripts/bench.pl.
> > OK?
> Yes, fix the one nit below (descriptive line in bench source) and
> Slightly off topic.
> Two questions.
> Do we ever want to measure two things here?
> 1. Latency of a call to pthread_once when initializing.
> 2. Latency of a call to pthread_once after initialization is done.
> Here we measure #2.
> Are we saying #1 is not important because it happens only once?
Yes, basically. More specifically, it's best-case latency, where we
initialize once and then just read and read from it, even from the same
For #1, we could distinguish between:
* Initialization by a single thread, no further use: Probably less
likely to occur in practice, unless pthread_once is used conservatively
and without real reason.
* Concurrent init/use, no use afterwards: Multiple threads using
pthread_once to agree on who does something.
We could measure the latter, but this will be mostly an exercise in
measuring consensus/CAS/lock performance on the particular machine --
there's little we can do differently, except for everything we could
also do to decrease memory contention in locks and the like.
> Second question.
> What results do you get for x86-64 with and without assembly? ;-)