This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH 4/6][BZ #11588] benchtests: Add benchmarks for pthread_cond_* functions
- From: Ben Shelton <ben dot shelton at ni dot com>
- To: triegel at redhat dot com
- Cc: libc-alpha at sourceware dot org, gratian dot crisan at ni dot com
- Date: Thu, 18 Dec 2014 16:55:47 -0600
- Subject: Re: [PATCH 4/6][BZ #11588] benchtests: Add benchmarks for pthread_cond_* functions
- Authentication-results: sourceware.org; auth=none
I'm working with Gratian on improving the benchmark for comparing the C
implementation of the pthread_cond_* functions with the existing x86
assembly implementation. I had a few questions about your feedback
concerning the benchmarks:
> > From: Gratian Crisan <email@example.com>
> > Add a benchmark set that measures the average execution time, min, max,
> > running variance and standard deviation for:
> Probably quite a bit of this should be factored out into a general
> facility to run multi-threaded benchtests.
> I think the mean may not be very useful here. If we're looking at
> latencies, I suppose it would be best if we could show the 90th
> percentile or so of the latencies we measured. Your benchmark
> measurements show that there are very large outliers, which seem to
> disturb the mean significantly.
Agreed -- that would be nice to have, and I'll try to implement that.
> > - N threads calling pthread_cond_signal/pthread_cond_broadcast w/o any
> > waiters consuming the signal.
> But those threads all use their own separate mutex and condvar -- so
> you're not actually testing scalability of the condvar. You do test
> single-thread latency in some way, but wouldn't it be better to do that
> with just one thread to remove all the interference from oversubscribing
> the system (ie, more threads than cores/CPUs available)?
> If you were testing scalability of a single mutex/condvar instance,
> tunning 100 threads seems too large. For most hardware that means
> oversubscription, so you're not really testing how much contention the
> data structure creates but rather how your scheduler deals with
> oversubscription, and whether you need to schedule particular threads.
> Oversubscription is a scenario that should be tested, but I think you'd
> rather want to test 1, 2, 4, 8, 16, 32, and then a 100 threads.
> > - time it takes to execute pthread_cond_signal/pthread_cond_broadcast in
> > the presence of a waiter.
> Makes sense -- but I believe you still want to also test scalability.
> So, test with N waiters and 1 signaler, for example. 1 waiter and N
> signalers or N waiters and N signalers could also be interesting, but
> less so than N waiters and 1 signaler.
Sure, that makes sense -- I agree that the "multiple threads waiting on
a single condvar" case is one worth testing and one that the current
benchmark does not cover.
> > - round trip time from the ptread_cond_signal call to pthread_cond_wait or
> > pthread_cond_timedwait return for N threads.
> Again, your threads all seem to use their own condvar. What this should
> test, I believe, is if you have N threads, which all wait for a signal
> and after they got one they send out another signal. A waiter should
> test that the signal was really for it, so a signal would set some
> number or pointer that indicates which thread it intended to wake up.
I'm a little confused on what you mean here.
- For each of the N waiter threads, after they get the signal they're
waiting for, who do they send out another signal to? Back to the
signaler thread on a separate condvar (ping-pong)? To another waiter
- Let's assume that there are 8 threads waiting and the signaler thread
wants to wake up thread 3 of 8. It signals on the condition variable,
but thread 7 of 8 wakes up instead, checks the pointer, and determines
that it wasn't intended to be woken up. What's the expected behavior
in this case? Does the signaler thread signal? Does the waiter
- What is the time interval we're trying to measure here? If a thread
gets woken up that wasn't the one we were intending to wake up, how
does that affect the timing?
> > - round trip time from the ptread_cond_broadcast call to pthread_cond_wait
> > or pthread_cond_timedwait return for N threads.
> Likewise, but we don't need to check for spurious wake-ups I believe.
> I believe this benchmark is large enough to warrant a comment at the top
> of the file about what it actually tests. Or the output needs to be
> more self-explanatory.