When using a PTHREAD_PRIO_INHERIT mutex with a condvar, the pthread_cond* calls can still cause an unbounded priority inversion via the internal condvar lock. The POSIX specification doesn't provide a mechanism to specify the protocol of the condvar. We would like to do this at runtime, but unfortunately it is legal to call pthread_cond_signal() or pthread_cond_broadcast() without first waiting on the lock, so the mutex type may not be known the first time the condvar is used. A new API, pthread_condattr_setprotocol_np() and pthread_condattr_getprotocol_np(), would allow the user to create a PTHREAD_PRIO_INHERIT condvar. This would use a PTHREAD_PRIO_INHERIT mutex for the internal condvar lock, eliminating the potential for hitting an unbounded priority inversion on that lock. This topic was initially discussed here: http://sources.redhat.com/ml/libc-alpha/2010-01/msg00011.html And again here: http://sources.redhat.com/ml/libc-alpha/2010-02/msg00089.html Test cases and patches for evaluation will be attached.
Created attachment 4779 [details] Unbounded priority inversion testcase pthread_cond_hang.c Using vanilla cond vars results in a pthread_cond_wait priority inversion deadlock. With PI cond vars it works fine. This test uses dlsym to use the new API if it exists.
Created attachment 4780 [details] pi condvar: pthread_condattr_setprotocol_np 2.12 port
Created attachment 4781 [details] pi condvar: add tst-condpi1.c basic API test
Created attachment 4782 [details] pi condvar patch impact on normal path testcase This test is designed to test the efficiency of the pthread_cond_wait() and pthread_cond_signal() code paths within glibc. Ulrich has expressed concern about these patches negatively impacting the non-pi condvar paths. I felt the changes to the common case (when PI is not involved) were pretty minimal. Certainly any significant regression there would be unacceptable. I built three versions of glibc 2.11.1 from git: 1) git: unmodified git sources 2) c_only: pthread_cond*.S files deleted 3) pi_condvar: same as c_only with the pi_condvar patches applied Comparing #3 against #2 allows us to eliminate any gains #1 would have solely from the hand written asm. 3 will eventually contain hand written asm, but until the non-posix API is agreed upon, it doesn't make sense to expend the effort of writing the asm code in my opinion. I then ran 10 runs of 10M iterations each at SCHED_FIFO 1 priority on each of the three glibcs, the results (following) suggest no significant change in the non PI condvar performance, sitting right at ~270k (avg) cycles/sec for each glibc. build-x86_64-2.11.1-git Cycles/Second: 279831.187500 Cycles/Second: 261911.421875 Cycles/Second: 277664.125000 Cycles/Second: 284847.718750 Cycles/Second: 285067.281250 Cycles/Second: 267918.718750 Cycles/Second: 284785.656250 Cycles/Second: 277402.843750 Cycles/Second: 202379.703125 Cycles/Second: 266421.718750 Min: 202379.703125 us Max: 285067.28125 us Avg: 268823.0375 us build-x86_64-2.11.1-c_only Cycles/Second: 277931.781250 Cycles/Second: 275614.093750 Cycles/Second: 271194.125000 Cycles/Second: 280155.093750 Cycles/Second: 284708.156250 Cycles/Second: 190936.031250 Cycles/Second: 264253.468750 Cycles/Second: 281354.281250 Cycles/Second: 290366.218750 Cycles/Second: 279962.000000 Min: 190936.03125 us Max: 290366.21875 us Avg: 269647.525 us build-x86_64-2.11.1-pi_condvar Cycles/Second: 263975.937500 Cycles/Second: 279577.281250 Cycles/Second: 276504.531250 Cycles/Second: 266163.562500 Cycles/Second: 262115.796875 Cycles/Second: 279219.406250 Cycles/Second: 265263.812500 Cycles/Second: 262226.468750 Cycles/Second: 284592.687500 Cycles/Second: 278975.875000 Min: 262115.796875 us Max: 284592.6875 us Avg: 271861.535938 us This is only the cond_signal case, and doesn't account for cond_timedwait or cond_broadcast, but I wouldn't expect those to experience any additional impact from this patch. Are there scenarios you can think of that are likely to suffer greater impact that are not covered by this rather simple test case?
While working up tst-condpi2.c (make check version of pthread_cond_hang.c), I found something went wrong with my port attached as "0001-pi-condvar- pthread_condattr_setprotocol_np-2.12-port.patch" and the test case still hangs when affined to CPU 0. I'll look into why and get an updated patch posted as soon as possible.
Turns out the trouble in comment 5 was due to my forgetting to remove the nptl/pthread_cond*.S files before building the library this time around. The original 0001 patch did indeed function correctly. I have reworked the patches, improved the patch headers and naming as well as added a new tst-condpi2.c testcase which tests for priority inversion avoidance. A follow-on patch detects an unbounded inversion and exits gracefully. Without the last patch, the glibc test framework will kill the test after the 2 second timeout - which may be the preferred approach. I'll attach the updated patches now.
Created attachment 4785 [details] Unbounded priority inversion test case The first version (attachment 4779 [details]) was missing the "_np" in the dl lookup. This fixes that.
Created attachment 4786 [details] [PATCH 1/4 V2] pi-condvars: add protocol support to pthread_condattr_t Updated patch header. Functionally equivalent to attachment 4780 [details].
Created attachment 4787 [details] [PATCH 2/4 V2] pi-condvars: add tst-condpi1.c, basic API test Updated patch header.
Created attachment 4788 [details] [PATCH 3/4 V2] pi-condvars: add tst-condpi2.c, priority inversion avoidance test New "make check" test, tst-condpi2.c, to test priority inversion avoidance.
Created attachment 4789 [details] [PATCH 4/4 V2] pi-condvars: tst-condpi2.c, detect priority inversion Optional patch. This goes beyond relying on the glibc 2 second timeout to abort the test. Instead, it detects the unbounded priority inversion and exits gracefully. Depending on the preferred approach, this patch can be ignored without a problem.
It was brought to my attention that the Signed-off-by's in the patches attached and sent to libc-alpha are not appropriate for glibc development. This was just an artifact of my git usage, and I will correct in subsequent versions. May we discuss these as is, as we our foremost concern is agreeing on the proposed new condattr protocol APIs?
I have posted V3 of the patches to libc-alpha. Rather than bog bugzilla down with additional attachments, I'll wait until an agreement has been made on libc-alpha. These patches have proper ChangeLogs and have removed the requeue-pi C implementation which the previous versions were carrying along.
Created attachment 4821 [details] pi condvar patch impact on normal path testcase Updated condvar_perf testcase which runs the previous test multiple times and calculates various statistics to aid in determining the impact of the pi_condvar patches.
Ulrich, before I spend more time on the test cases and possibly trying to optimize the current implementation, can you comment on if you have any objection to the proposed API changes themselves?
(In reply to comment #15) > Ulrich, before I spend more time on the test cases and possibly trying to > optimize the current implementation, can you comment on if you have any objection > to the proposed API changes themselves? Is there any chance that this issue is going to make any progress ? It affects real users and we have a working solution for it. Are there any issues with the approach itself ? If not, what needs to be done to get this resolved ? Thanks, tglx
(In reply to comment #16) > Is there any chance that this issue is going to make any progress ? It affects > real users and we have a working solution for it. I can only confirm this. We just saw that unfortunate deadlock again in real application (something to be shipped in products). We ran into it before, but this time we finally realized that it's not due to an outdated glibc but rather a still unfixed bug. Our applications consist of dozens of real-time threads at different prio levels and practically can't be reworked to avoid cond variables in RT threads. So we really need a solution, and that preferably without having to carry our own patched glibc versions. Thanks, Jan
Ping on this unfixed bug. Is there a chance to reconsider the extension Darren proposed? We are stuck with unhandy and inefficient PRIO_PROTECT here due to this issue. And I'm sure we aren't alone...
Note that the condvar implementation will have to change due to a recent POSIX change (see bug 13165). That means that how to solve this bug might also depend on how we fix bug 13165. It would be good to fix both in one go.
Created attachment 7238 [details] [PATCH 1/3 V3] pi-condvars: add protocol support to pthread_condattr_t Changes since V2: * Ported to glibc 2.19 * Dropped the changes related to making the external mutex associated with the condvar PI-aware since they have been already committed. * Moved lll_pi_lock()/lll_pi_unlock() to lowlevelpilock.c * Changed the value bitmap in pthread_condattr to use only 1 bit for clock ID This matches the equivalent bitmap in the __nwaiters field. * Used the new condattr masks and shifts where appropriate
Created attachment 7239 [details] [PATCH 2/3 V3] pi-condvars: add tst-condpi1.c, basic API test Changes since V2: * updated copyright to 2013 and tested on glibc 2.19 * functional equivalent to V2
Created attachment 7240 [details] [PATCH 3/3 V3] pi-condvars: add tst-condpi2.c, priority inversion avoidance test Changes since V2: * updated copyright to 2013 and tested on glibc 2.19 * functional equivalent to V2
Created attachment 7282 [details] [PATCH 1/3 V3] pi-condvars: add protocol support to pthread_condattr_t Updated with lates changes suggested on libc-alpha: https://sourceware.org/ml/libc-alpha/2013-10/msg00731.html
GLibC v2.19 was just released and no fix for this issue. Why is this? What's involved in getting this done (seems like there's a patch available for v2.19 that works)? -David
It looks like the updated patches were posted to the mailing list were never pinged once the copyright assignment was confirmed. Also, it looks like there were comments to be addressed.
I don't see the copyright assignments yet on file according to copyright.list, though both contributors said they were working on them. If sent in, assign@gnu.org should be chased up about getting copyright.list updated, and once the assignments are ready, the contributors should keep pinging the patches weekly on libc-alpha for as long as needed as per the contribution checklist. We still need someone to take charge of working through glibc's open nptl bugs and reviewing / fixing them (there are a few people with significant expertise in the area, but not so much systematic work through all the bugs).
I'm also working on a new condvar implementation, motivated by a POSIX clarification that requires stronger signaler--waiter guarantees than what's guaranteed by the current implementation (IOW, the current implementation does not fulfill POSIX requirements). The new implementation also fixes an ABA issue in the current implementation. The PI support in there is incomplete yet, however, and it might require futex changes/additions.
(In reply to joseph@codesourcery.com from comment #26) > I don't see the copyright assignments yet on file according to > copyright.list, though both contributors said they were working on them. > If sent in, assign@gnu.org should be chased up about getting > copyright.list updated, and once the assignments are ready, the > contributors should keep pinging the patches weekly on libc-alpha for as > long as needed as per the contribution checklist. > > We still need someone to take charge of working through glibc's open nptl > bugs and reviewing / fixing them (there are a few people with significant > expertise in the area, but not so much systematic work through all the > bugs). Last I have heard the right people signed the required papers on our side and they were sent to assign@gnu.org sometime mid-December. I haven't heard anything back. Let me see if I can get a status update. Once the copyright assignment is confirmed I can provide updated patches.
Gratian, right, as far as I understand it, Paul has provided all the necessary information for the IBM assignment for my and Dinakar's contributions. If there is anything I can do to help here, please let me know. Torvalds, I believe in our discussions we agreed that the work you are doing is orthogonal to this, and while it may impact the same code paths, it was best to deal with them as separate efforts and not block one on the other. I would be very interested in continuing to follow along on that effort as well though.
(In reply to Darren Hart from comment #29) > Torvalds, I believe in our discussions we agreed that the work you are doing > is orthogonal to this, and while it may impact the same code paths, it was > best to deal with them as separate efforts and not block one on the other. I > would be very interested in continuing to follow along on that effort as > well though. Yes, it is orthogonal. It's just that the current implementation, with or without your fixes, does not fulfill the stronger waiter-signaler requirements due to the POSIX clarification. Thus, we'll need to do something about the latter; but it's no reason to not include your fixes while the latter isn't ready.
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla.
I believe there is an error in [PATCH 1/3 V3] where COND_PRIO_PROTECT will clobber the CLOCKID in cond->__data.__nwaiters. This is because the COND_PROTOCOL values are not shifted by COND_PROTOCOL_SHIFT before reading and writing to __nwaiters. I believe the switch cases in pthread_cond_init.c:__pthread_cond_init() should look like this: case PTHREAD_PRIO_INHERIT: cond->__data.__nwaiters |= COND_PRIO_INHERIT << COND_PROTOCOL_SHIFT; break; case PTHREAD_PRIO_PROTECT: cond->__data.__nwaiters |= COND_PRIO_PROTECT << COND_PROTOCOL_SHIFT; break; And the cond_lock() and cond_unlock() in cond-lock.h should look like this: static inline void cond_lock(pthread_cond_t *cond, int pshared) { if (pshared == LLL_PRIVATE && (((cond->__data.__nwaiters & COND_PROTOCOL_MASK) >> COND_PROTOCOL_SHIFT) == COND_PRIO_INHERIT)) lll_pi_lock (cond->__data.__lock, pshared); else lll_lock (cond->__data.__lock, pshared); } static inline void cond_unlock(pthread_cond_t *cond, int pshared) { if (pshared == LLL_PRIVATE && (((cond->__data.__nwaiters & COND_PROTOCOL_MASK) >> COND_PROTOCOL_SHIFT) == COND_PRIO_INHERIT)) lll_pi_unlock (cond->__data.__lock, pshared); else lll_unlock (cond->__data.__lock, pshared); }
I looked through this, and you appear to be correct. We didn't catch it in testing as the test case only depended on reading and writing the COND_PROTOCOL_INHERIT, which works as the shift was missing from both locations. Oddly, I had observed what I thought was a similar error with the CLOCK_ID, but after going through it realized it was just an optimization as the CLOCK_ID is bit 0. Sadly, I didn't catch this while staring at the same code :-( Nice catch John! Who wants to respin the patches?
(In reply to Darren Hart from comment #33) > I looked through this, and you appear to be correct. We didn't catch it in > testing as the test case only depended on reading and writing the > COND_PROTOCOL_INHERIT, which works as the shift was missing from both > locations. > > Oddly, I had observed what I thought was a similar error with the CLOCK_ID, > but after going through it realized it was just an optimization as the > CLOCK_ID is bit 0. Sadly, I didn't catch this while staring at the same code > :-( > > Nice catch John! > > Who wants to respin the patches? I can do it but it's going to take me a couple of days due to current workload.
@Gratian: Maybe while you are at it, you can change the protocol enum in internaltypes.h to: enum { COND_PRIO_INHERIT = 1, COND_PRIO_PROTECT }; Although technically irrelevant, I found it confusing why it begins with 2. This was probably the reason for Darren's oversight in the protocol shift bug.
Created attachment 7442 [details] [PATCH 1/3 V4] pi-condvars: add protocol support to pthread_condattr_t Incorporated suggestions made by John Ogness and updated the version to 2.20 now that 2.19 is released.
I just checked libc git master and do not see these changes merged. Is there some pending review or process yet to be completed? Anything needed that I can help with? Is the copyright assignment issue closed?
(In reply to Darren Hart from comment #37) > Is the copyright assignment issue closed? I apologize for the (lawyer induced) delay. I finally got confirmation that the assignment was signed on the FSF side. I will dust off the patches in the next few days, add benchmarks for x86 (to hopefully prove that an assembly implementation is not necessary) and resubmit them.
On 4/17/14, 7:53, "gratian.crisan at ni dot com" <sourceware-bugzilla@sourceware.org> wrote: >https://sourceware.org/bugzilla/show_bug.cgi?id=11588 > >--- Comment #38 from Gratian Crisan <gratian.crisan at ni dot com> --- >(In reply to Darren Hart from comment #37) > >> Is the copyright assignment issue closed? > >I apologize for the (lawyer induced) delay. I finally got confirmation >that the >assignment was signed on the FSF side. >I will dust off the patches in the next few days, add benchmarks for x86 >(to >hopefully prove that an assembly implementation is not necessary) and >resubmit >them. Awesome!
Sorry, this is taking me longer than I originally thought. It turns out that this commit: https://sourceware.org/git/?p=glibc.git;a=commit;h=47c5adebd2c864a098c3af66e61e1147dc3cf0b4 is unconditionally disabling PI futexes on ARM because there is a kernel configuration (CONFIG_CPU_USE_DOMAINS && CONFIG_SMP) that doesn't support them. So I need to find a solution for that first before I can finish testing on ARM.
Created attachment 7684 [details] [PATCH 4/6] benchtests: Add benchmarks for pthread_cond_* functions
Created attachment 7685 [details] [PATCH 5/6] x86_64: Remove assembly implementations for pthread_cond_*
Created attachment 7686 [details] [PATCH 6/6] arm: Re-enable PI futex support for ARM kernels >= 3.14.3
Created attachment 7687 [details] [PATCH 1/6 V5] pi-condvars: add protocol support to pthread_condattr_t
Created attachment 7688 [details] [PATCH 2/6] pi-condvars: add tst-condpi1.c, basic API test
Created attachment 7689 [details] [PATCH 3/6] pi-condvars: add tst-condpi2.c, priority inversion avoidance test
Looks like this again missed the last glibc release (2.20). Any chance this might finally get into v2.21 (which looks to be scheduled for 01/2015)?
The problem here is that the Austin Group has clarified that POSIX requires stronger ordering guarantees wrt. waiters vs. signals than what the current implementation offers. We can implement them, but we're not aware of an (efficient) implementation that fulfills both these clarified POSIX requirements and can do PI under all circumstances: https://sourceware.org/ml/libc-alpha/2014-08/msg00287.html If you have further input on these choices, please comment on libc-alpha.
I have constructed another condvar implementation that I'm testing right now. However, the outlook for PI support is still not great. Essentially, we need to boost the priority of waiters that have to wake up from a futex that is used to send signals, and have to confirm that they have woken up. This seems different from what the futex PI support can do today. Two things that could help are support for 64b futexes, restricting PI support to non-process-shared condvars, and perhaps other futex operations.
Torvald, I'm reading up (again :-) ) on the issues you've raised from the Austin Group and building context on the use of the sequence locks, etc. Would you share your implementation with me so I can see what kinds of changes are being considered and help building support for the PI cases? Regarding boosting waiters for sending signals. Current futex PI boosting is based on the Linux kernel rtmutex. Can you elaborate on the what/why in your statement: > Essentially, we need to boost the priority of waiters that have to wake up from a > futex that is used to send signals, and have to confirm that they have woken up. Thanks.
Torvald, I am interested in the proposed implementation too if you don't mind sharing. PI is critical for our real-time applications. We have been carrying the patches attached to this bugzilla entry for multiple years now in order to avoid deadlocks. I would like to help if possible.
Torvald, can you provide a specific citation, preferably to the accessible SUS. I'm looking to see if I have access to POSIX. I'd like to read the specific requirement regarding stronger ordering of waiters vs signals.
(In reply to Darren Hart from comment #50) > Torvald, I'm reading up (again :-) ) on the issues you've raised from the > Austin Group and building context on the use of the sequence locks, etc. > Would you share your implementation with me so I can see what kinds of > changes are being considered and help building support for the PI cases? I'm in the process of cleaning up the current (second) implementation, and will share it on libc-alpha once that is done. > Regarding boosting waiters for sending signals. Current futex PI boosting is > based on the Linux kernel rtmutex. Can you elaborate on the what/why in your > statement: > > > Essentially, we need to boost the priority of waiters that have to wake up from a > > futex that is used to send signals, and have to confirm that they have woken up. > In a nutshell, the problem goes like this: * Due to having to wake up with ordering constraints, the condition that a futex needs to cover is not an on/off thing like a mutex, but a sequence, in an abstract sense. If you need to wake at position N, all 0..N waiters can consume. * Futexes provide no guarantees regarding the order in which waiters enqueue and ware woken (eg, no FIFO). * Futexes are 32b, so if one represents the ordering sequence with counting, there is some chance of a 32b wrap around and an ABA issue on the futex comparison. That may be unlikely to hit, but I'm not really comfortable putting such an ABA issue into released code knowingly. * My previous attempt of a condvar suffered a performance problem because one can't know whether a futex wake-up was spurious or not; if one doesn't know, one needs to conservatively add a fake futex wake-up. * In this implementation, one can deal with the ABA issue by quiescing the condvar. But for this to be possible with PI, one needs to boost the priority of waiters when sending a signal. I don't know how to do that with the current futex PI ops. * My current attempt builds two groups that waiters wait on, complete with separate futexes per group. That avoids performance issues of the former simpler algorithm, but now we need to quiesce each group before we can reuse its memory (which we need to do because of process-shared). So similar PI problem as before. Specifically, the problem for PI is thus that we have waiters blocked on a futex that is not an exclusive lock. We need to be able to send them signals and cause them to confirm that they have woken up. Thus, their priorities must be boosted at least for as long as they need to confirm the wake-up. If we could have something like a shared lock (ie, rwlock) with PI support, all waiters that are about to block on the signal/condvar futex could acquire a readlock and be boosted by a high-prio wrlock writer. But I don't know whether we can build that with the current futex PI ops. The problem is easier for the non-process-shared case because then we can just allocate additional memory and, for example, build our own userspace wait queues, in which boosting can be done with an exclusive lock.
(In reply to Darren Hart from comment #52) > Torvald, can you provide a specific citation, preferably to the accessible > SUS. I'm looking to see if I have access to POSIX. I'd like to read the > specific requirement regarding stronger ordering of waiters vs signals. http://austingroupbugs.net/view.php?id=609 (probably best to read from comment 1403 downwards). I think the C++ specification for condvars is the cleanest one (see N4567, paragraph 30.5p3-4). AFAIU, POSIX and ISO C have the same semantics in mind.
I've posted the new algorithm upstream: https://sourceware.org/ml/libc-alpha/2016-05/msg00605.html Have a look if you're interested. I'd appreciate any reviews and comments on the PI support problem.
The new condition variable implementation is now committed upstream. It should be the base for any improvement suggestions from now on. How to support PI for condvars has also been discussed at the Linux Real-Time Summit 2016: https://wiki.linuxfoundation.org/realtime/events/rt-summit2016/schedule So far, there is no known solution for how to achieve PI support given the current constraints we have (eg, available futex operations, POSIX requirements, ...).
Also fixed on 2.35 branch. https://jigsawpuzzle.io/