Hi Ulrich,
First off, I wanted to share the testcase we use to demonstrate the hang
possible without PI aware Condvars as well as test the patch itself. It
uses dlsym to determine if the new np API exists and uses it if it can:
http://dvhart.com/darren/linux/pthread_cond_hang.c
Now, regarding the added costs. I felt the changes to the common case
(when PI is not involved) were pretty minimal. Certainly any significant
regression there would be unacceptable. To try and measure the impact to
the common case by this patch, I prepared the testcase here:
http://dvhart.com/darren/linux/condvar_perf.c
It performs N iterations of cond_wait/cond_signal and reports how long
it took and how many cycles/second it achieved. I found I could get the
highest cycles/sec by running as SCHED_FIFO and by not bothering with
the mutex lock in the signaling thread, which I understand isn't good
practice, but is still compliant and seemed appropriate in this case.
I then built three versions of glibc 2.11.1 from git:
1) git: unmodified git sources
2) c_only: pthread_cond*.S files deleted
3) pi_condvar: same as c_only with the pi_condvar patches applied
Comparing #3 against #2 allows us to eliminate any gains #1 would have
solely from the hand written asm. 3 will eventually contain hand written
asm, but until the non-posix API is agreed upon, it doesn't make sense
to expend the effort of writing the asm code in my opinion.
I then ran 10 runs of 10M iterations each at SCHED_FIFO 1 priority on
each of the three glibcs, the results (following) suggest no significant
change in the non PI condvar performance, sitting right at ~270k (avg)
cycles/sec for each glibc.
build-x86_64-2.11.1-git
Cycles/Second: 279831.187500
Cycles/Second: 261911.421875
Cycles/Second: 277664.125000
Cycles/Second: 284847.718750
Cycles/Second: 285067.281250
Cycles/Second: 267918.718750
Cycles/Second: 284785.656250
Cycles/Second: 277402.843750
Cycles/Second: 202379.703125
Cycles/Second: 266421.718750
Min: 202379.703125 us
Max: 285067.28125 us
Avg: 268823.0375 us
build-x86_64-2.11.1-c_only
Cycles/Second: 277931.781250
Cycles/Second: 275614.093750
Cycles/Second: 271194.125000
Cycles/Second: 280155.093750
Cycles/Second: 284708.156250
Cycles/Second: 190936.031250
Cycles/Second: 264253.468750
Cycles/Second: 281354.281250
Cycles/Second: 290366.218750
Cycles/Second: 279962.000000
Min: 190936.03125 us
Max: 290366.21875 us
Avg: 269647.525 us
build-x86_64-2.11.1-pi_condvar
Cycles/Second: 263975.937500
Cycles/Second: 279577.281250
Cycles/Second: 276504.531250
Cycles/Second: 266163.562500
Cycles/Second: 262115.796875
Cycles/Second: 279219.406250
Cycles/Second: 265263.812500
Cycles/Second: 262226.468750
Cycles/Second: 284592.687500
Cycles/Second: 278975.875000
Min: 262115.796875 us
Max: 284592.6875 us
Avg: 271861.535938 us
This is only the cond_signal case, and doesn't account for
cond_timedwait or cond_broadcast, but I wouldn't expect those to
experience any additional impact from this patch. Are there scenarios
you can think of that are likely to suffer greater impact that are not
covered by this rather simple test case?
Thanks,