Bug 13234

Summary: Non-pshared condition variables are ~2-2.5x slower than pshared ones at broadcast
Product: glibc Reporter: Rich Felker <bugdal>
Component: nptlAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED WORKSFORME    
Severity: normal CC: fweimer, triegel
Priority: P2 Flags: fweimer: security-
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: Test program which exhibits performance difference pshared/non-pshared.

Description Rich Felker 2011-09-28 23:34:47 UTC
Created attachment 5950 [details]
Test program which exhibits performance difference pshared/non-pshared.

The attached program shows NPTL's non-process-shared condition variables (which utilize futex requeue) performing significantly worse than process-shared ones (which simply use a broadcast futex wake). On my machine (Atom N280 dual core) it takes ~11.7 seconds with non-pshared cond var and ~5.3 seconds with a pshared cond var (comment/uncomment the pthread_cond_init line to change which is used).

Of course requeue-based broadcast should scale better to huge numbers of waiters. This test program only has 5 waiters. Still, the performance should not be this bad. With musl libc, I get comparable performance with pshared and non-pshared cond var (and both ways outperform NPTL, with run times around 2.5-3 seconds).

If you're unwilling to properly fix whatever's making it slow, perhaps just using a broadcast futex wake rather than the requeue code whenever the number of waiters is less than ~10 would be an easy "fix"...

BTW, I suspect the overly-complex sequencing code aimed at minimizing spurious wakes, which also seems responsible for bugs 12875 and 13165, is probably part of the problem...
Comment 1 Torvald Riegel 2017-01-11 14:14:37 UTC
I can't reproduce this on x86_64 RHEL7 (old condvar algorithm).  The new condvar algorithm doesn't use requeue, so it should also not be affected.  Therefore, I'll close this bug.