Bug 13234 - Non-pshared condition variables are ~2-2.5x slower than pshared ones at broadcast
Summary: Non-pshared condition variables are ~2-2.5x slower than pshared ones at broad...
Status: RESOLVED WORKSFORME
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-28 23:34 UTC by Rich Felker
Modified: 2017-01-11 14:14 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Test program which exhibits performance difference pshared/non-pshared. (501 bytes, text/x-csrc)
2011-09-28 23:34 UTC, Rich Felker
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rich Felker 2011-09-28 23:34:47 UTC
Created attachment 5950 [details]
Test program which exhibits performance difference pshared/non-pshared.

The attached program shows NPTL's non-process-shared condition variables (which utilize futex requeue) performing significantly worse than process-shared ones (which simply use a broadcast futex wake). On my machine (Atom N280 dual core) it takes ~11.7 seconds with non-pshared cond var and ~5.3 seconds with a pshared cond var (comment/uncomment the pthread_cond_init line to change which is used).

Of course requeue-based broadcast should scale better to huge numbers of waiters. This test program only has 5 waiters. Still, the performance should not be this bad. With musl libc, I get comparable performance with pshared and non-pshared cond var (and both ways outperform NPTL, with run times around 2.5-3 seconds).

If you're unwilling to properly fix whatever's making it slow, perhaps just using a broadcast futex wake rather than the requeue code whenever the number of waiters is less than ~10 would be an easy "fix"...

BTW, I suspect the overly-complex sequencing code aimed at minimizing spurious wakes, which also seems responsible for bugs 12875 and 13165, is probably part of the problem...
Comment 1 Torvald Riegel 2017-01-11 14:14:37 UTC
I can't reproduce this on x86_64 RHEL7 (old condvar algorithm).  The new condvar algorithm doesn't use requeue, so it should also not be affected.  Therefore, I'll close this bug.