Summary: | Non-pshared condition variables are ~2-2.5x slower than pshared ones at broadcast | ||
---|---|---|---|
Product: | glibc | Reporter: | Rich Felker <bugdal> |
Component: | nptl | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | fweimer, triegel |
Priority: | P2 | Flags: | fweimer:
security-
|
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: | Test program which exhibits performance difference pshared/non-pshared. |
I can't reproduce this on x86_64 RHEL7 (old condvar algorithm). The new condvar algorithm doesn't use requeue, so it should also not be affected. Therefore, I'll close this bug. |
Created attachment 5950 [details] Test program which exhibits performance difference pshared/non-pshared. The attached program shows NPTL's non-process-shared condition variables (which utilize futex requeue) performing significantly worse than process-shared ones (which simply use a broadcast futex wake). On my machine (Atom N280 dual core) it takes ~11.7 seconds with non-pshared cond var and ~5.3 seconds with a pshared cond var (comment/uncomment the pthread_cond_init line to change which is used). Of course requeue-based broadcast should scale better to huge numbers of waiters. This test program only has 5 waiters. Still, the performance should not be this bad. With musl libc, I get comparable performance with pshared and non-pshared cond var (and both ways outperform NPTL, with run times around 2.5-3 seconds). If you're unwilling to properly fix whatever's making it slow, perhaps just using a broadcast futex wake rather than the requeue code whenever the number of waiters is less than ~10 would be an easy "fix"... BTW, I suspect the overly-complex sequencing code aimed at minimizing spurious wakes, which also seems responsible for bugs 12875 and 13165, is probably part of the problem...