Bug 11588

Summary: pthread condvars are not priority inheritance aware
Product: glibc Reporter: Darren Hart <darren.hart>
Component: nptlAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: allan, darren.hart, davidh, dino, drepper.fsp, glibc-bugs, gratian.crisan, jan.kiszka, john.ogness, mtk.manpages, pasky, scot4spam, stano, triegel
Priority: P2 Flags: fweimer: security-
Version: 2.12   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: Unbounded priority inversion testcase pthread_cond_hang.c
pi condvar: pthread_condattr_setprotocol_np 2.12 port
pi condvar: add tst-condpi1.c basic API test
pi condvar patch impact on normal path testcase
Unbounded priority inversion test case
[PATCH 1/4 V2] pi-condvars: add protocol support to pthread_condattr_t
[PATCH 2/4 V2] pi-condvars: add tst-condpi1.c, basic API test
[PATCH 3/4 V2] pi-condvars: add tst-condpi2.c, priority inversion avoidance test
[PATCH 4/4 V2] pi-condvars: tst-condpi2.c, detect priority inversion
pi condvar patch impact on normal path testcase
[PATCH 1/3 V3] pi-condvars: add protocol support to pthread_condattr_t
[PATCH 2/3 V3] pi-condvars: add tst-condpi1.c, basic API test
[PATCH 3/3 V3] pi-condvars: add tst-condpi2.c, priority inversion avoidance test
[PATCH 1/3 V3] pi-condvars: add protocol support to pthread_condattr_t
[PATCH 1/3 V4] pi-condvars: add protocol support to pthread_condattr_t
[PATCH 4/6] benchtests: Add benchmarks for pthread_cond_* functions
[PATCH 5/6] x86_64: Remove assembly implementations for pthread_cond_*
[PATCH 6/6] arm: Re-enable PI futex support for ARM kernels >= 3.14.3
[PATCH 1/6 V5] pi-condvars: add protocol support to pthread_condattr_t
[PATCH 2/6] pi-condvars: add tst-condpi1.c, basic API test
[PATCH 3/6] pi-condvars: add tst-condpi2.c, priority inversion avoidance test

Description Darren Hart 2010-05-11 18:45:58 UTC
When using a PTHREAD_PRIO_INHERIT mutex with a condvar, the pthread_cond* calls 
can still cause an unbounded priority inversion via the internal condvar lock. 
The POSIX specification doesn't provide a mechanism to specify the protocol of 
the condvar. We would like to do this at runtime, but unfortunately it is legal 
to call pthread_cond_signal() or pthread_cond_broadcast() without first waiting 
on the lock, so the mutex type may not be known the first time the condvar is 
used. A new API, pthread_condattr_setprotocol_np() and 
pthread_condattr_getprotocol_np(), would allow the user to create a 
PTHREAD_PRIO_INHERIT condvar. This would use a PTHREAD_PRIO_INHERIT mutex for 
the internal condvar lock, eliminating the potential for hitting an unbounded 
priority inversion on that lock.

This topic was initially discussed here:
http://sources.redhat.com/ml/libc-alpha/2010-01/msg00011.html

And again here:
http://sources.redhat.com/ml/libc-alpha/2010-02/msg00089.html

Test cases and patches for evaluation will be attached.
Comment 1 Darren Hart 2010-05-11 18:48:14 UTC
Created attachment 4779 [details]
Unbounded priority inversion testcase pthread_cond_hang.c

Using vanilla cond vars results in a pthread_cond_wait priority inversion
deadlock. With PI cond vars it works fine. This test uses dlsym to use the new
API if it exists.
Comment 2 Darren Hart 2010-05-11 18:50:02 UTC
Created attachment 4780 [details]
pi condvar: pthread_condattr_setprotocol_np 2.12 port
Comment 3 Darren Hart 2010-05-11 18:50:55 UTC
Created attachment 4781 [details]
pi condvar: add tst-condpi1.c basic API test
Comment 4 Darren Hart 2010-05-11 18:55:41 UTC
Created attachment 4782 [details]
pi condvar patch impact on normal path testcase

This test is designed to test the efficiency of the pthread_cond_wait() and
pthread_cond_signal() code paths within glibc. Ulrich has expressed concern
about these patches negatively impacting the non-pi condvar paths. I felt the
changes to the common case (when PI is not involved) were pretty minimal.
Certainly any significant regression there would be unacceptable.

I built three versions of glibc 2.11.1 from git:
1)	  git: unmodified git sources
2)     c_only: pthread_cond*.S files deleted
3) pi_condvar: same as c_only with the pi_condvar patches applied

Comparing #3 against #2 allows us to eliminate any gains #1 would have solely
from the hand written asm. 3 will eventually contain hand written asm, but
until the non-posix API is agreed upon, it doesn't make sense to expend the
effort of writing the asm code in my opinion.

I then ran 10 runs of 10M iterations each at SCHED_FIFO 1 priority on each of
the three glibcs, the results (following) suggest no significant change in the
non PI condvar performance, sitting right at ~270k (avg) cycles/sec for each
glibc.

build-x86_64-2.11.1-git
Cycles/Second: 279831.187500
Cycles/Second: 261911.421875
Cycles/Second: 277664.125000
Cycles/Second: 284847.718750
Cycles/Second: 285067.281250
Cycles/Second: 267918.718750
Cycles/Second: 284785.656250
Cycles/Second: 277402.843750
Cycles/Second: 202379.703125
Cycles/Second: 266421.718750

Min:  202379.703125  us
Max:  285067.28125  us
Avg:  268823.0375  us


build-x86_64-2.11.1-c_only
Cycles/Second: 277931.781250
Cycles/Second: 275614.093750
Cycles/Second: 271194.125000
Cycles/Second: 280155.093750
Cycles/Second: 284708.156250
Cycles/Second: 190936.031250
Cycles/Second: 264253.468750
Cycles/Second: 281354.281250
Cycles/Second: 290366.218750
Cycles/Second: 279962.000000

Min:  190936.03125  us
Max:  290366.21875  us
Avg:  269647.525  us


build-x86_64-2.11.1-pi_condvar
Cycles/Second: 263975.937500
Cycles/Second: 279577.281250
Cycles/Second: 276504.531250
Cycles/Second: 266163.562500
Cycles/Second: 262115.796875
Cycles/Second: 279219.406250
Cycles/Second: 265263.812500
Cycles/Second: 262226.468750
Cycles/Second: 284592.687500
Cycles/Second: 278975.875000

Min:  262115.796875  us
Max:  284592.6875  us
Avg:  271861.535938  us


This is only the cond_signal case, and doesn't account for cond_timedwait or
cond_broadcast, but I wouldn't expect those to experience any additional impact
from this patch. Are there scenarios you can think of that are likely to suffer
greater impact that are not covered by this rather simple test case?
Comment 5 Darren Hart 2010-05-13 01:55:07 UTC
While working up tst-condpi2.c (make check version of pthread_cond_hang.c), I 
found something went wrong with my port attached as "0001-pi-condvar-
pthread_condattr_setprotocol_np-2.12-port.patch" and the test case still hangs 
when affined to CPU 0. I'll look into why and get an updated patch posted as soon 
as possible.
Comment 6 Darren Hart 2010-05-13 08:02:54 UTC
Turns out the trouble in comment 5 was due to my forgetting to remove the 
nptl/pthread_cond*.S files before building the library this time around. The 
original 0001 patch did indeed function correctly. I have reworked the patches, 
improved the patch headers and naming as well as added a new tst-condpi2.c 
testcase which tests for priority inversion avoidance. A follow-on patch detects 
an unbounded inversion and exits gracefully. Without the last patch, the glibc 
test framework will kill the test after the 2 second timeout - which may be the 
preferred approach. I'll attach the updated patches now.
Comment 7 Darren Hart 2010-05-13 08:05:03 UTC
Created attachment 4785 [details]
Unbounded priority inversion test case

The first version (attachment 4779 [details]) was missing the "_np" in the dl lookup.
This fixes that.
Comment 8 Darren Hart 2010-05-13 08:07:07 UTC
Created attachment 4786 [details]
[PATCH 1/4 V2] pi-condvars: add protocol support to pthread_condattr_t 

Updated patch header. Functionally equivalent to attachment 4780 [details].
Comment 9 Darren Hart 2010-05-13 08:08:18 UTC
Created attachment 4787 [details]
[PATCH 2/4 V2] pi-condvars: add tst-condpi1.c, basic API test 

Updated patch header.
Comment 10 Darren Hart 2010-05-13 08:10:40 UTC
Created attachment 4788 [details]
[PATCH 3/4 V2] pi-condvars: add tst-condpi2.c, priority inversion avoidance test 

New "make check" test, tst-condpi2.c, to test priority inversion avoidance.
Comment 11 Darren Hart 2010-05-13 08:13:15 UTC
Created attachment 4789 [details]
[PATCH 4/4 V2] pi-condvars: tst-condpi2.c, detect priority inversion 

Optional patch. This goes beyond relying on the glibc 2 second timeout to abort
the test. Instead, it detects the unbounded priority inversion and exits
gracefully. Depending on the preferred approach, this patch can be ignored
without a problem.
Comment 12 Darren Hart 2010-05-13 18:44:41 UTC
It was brought to my attention that the Signed-off-by's in the patches attached 
and sent to libc-alpha are not appropriate for glibc development. This was just 
an artifact of my git usage, and I will correct in subsequent versions. May we 
discuss these as is, as we our foremost concern is agreeing on the proposed new 
condattr protocol APIs?
Comment 13 Darren Hart 2010-05-27 23:18:55 UTC
I have posted V3 of the patches to libc-alpha. Rather than bog bugzilla down with 
additional attachments, I'll wait until an agreement has been made on libc-alpha. 
These patches have proper ChangeLogs and have removed the requeue-pi C 
implementation which the previous versions were carrying along.
Comment 14 Darren Hart 2010-05-28 20:34:43 UTC
Created attachment 4821 [details]
pi condvar patch impact on normal path testcase

Updated condvar_perf testcase which runs the previous test multiple times and
calculates various statistics to aid in determining the impact of the
pi_condvar patches.
Comment 15 Darren Hart 2010-06-17 20:59:45 UTC
Ulrich, before I spend more time on the test cases and possibly trying to 
optimize the current implementation, can you comment on if you have any objection 
to the proposed API changes themselves?
Comment 16 Thomas Gleixner 2010-06-17 22:17:01 UTC
(In reply to comment #15)
> Ulrich, before I spend more time on the test cases and possibly trying to 
> optimize the current implementation, can you comment on if you have any objection 
> to the proposed API changes themselves?

Is there any chance that this issue is going to make any progress ? It affects
real users and we have a working solution for it.

Are there any issues with the approach itself ? If not, what needs to be done to
get this resolved ?

Thanks,

      tglx

Comment 17 Jan Kiszka 2011-02-14 17:28:26 UTC
(In reply to comment #16)
> Is there any chance that this issue is going to make any progress ? It affects
> real users and we have a working solution for it.

I can only confirm this. We just saw that unfortunate deadlock again in real application (something to be shipped in products). We ran into it before, but this time we finally realized that it's not due to an outdated glibc but rather a still unfixed bug.

Our applications consist of dozens of real-time threads at different prio levels and practically can't be reworked to avoid cond variables in RT threads. So we really need a solution, and that preferably without having to carry our own patched glibc versions.

Thanks,
Jan
Comment 18 Jan Kiszka 2012-06-05 16:36:45 UTC
Ping on this unfixed bug. Is there a chance to reconsider the extension Darren proposed? We are stuck with unhandy and inefficient PRIO_PROTECT here due to this issue. And I'm sure we aren't alone...
Comment 19 Torvald Riegel 2012-10-25 11:03:02 UTC
Note that the condvar implementation will have to change due to a recent POSIX change (see bug 13165).  That means that how to solve this bug might also depend on how we fix bug 13165.  It would be good to fix both in one go.
Comment 20 Gratian Crisan 2013-10-17 21:16:18 UTC
Created attachment 7238 [details]
[PATCH 1/3 V3] pi-condvars: add protocol support to pthread_condattr_t

Changes since V2:
* Ported to glibc 2.19
* Dropped the changes related to making the external mutex associated with the
condvar PI-aware since they have been already committed.
* Moved lll_pi_lock()/lll_pi_unlock() to lowlevelpilock.c
* Changed the value bitmap in pthread_condattr to use only 1 bit for clock ID
This matches the equivalent bitmap in the __nwaiters field.
* Used the new condattr masks and shifts where appropriate
Comment 21 Gratian Crisan 2013-10-17 21:22:33 UTC
Created attachment 7239 [details]
[PATCH 2/3 V3] pi-condvars: add tst-condpi1.c, basic API test

Changes since V2:
  * updated copyright to 2013 and tested on glibc 2.19
  * functional equivalent to V2
Comment 22 Gratian Crisan 2013-10-17 21:24:37 UTC
Created attachment 7240 [details]
[PATCH 3/3 V3] pi-condvars: add tst-condpi2.c, priority inversion avoidance test

Changes since V2:
  * updated copyright to 2013 and tested on glibc 2.19
  * functional equivalent to V2
Comment 23 Gratian Crisan 2013-11-15 17:36:24 UTC
Created attachment 7282 [details]
[PATCH 1/3 V3] pi-condvars: add protocol support to pthread_condattr_t

Updated with lates changes suggested on libc-alpha: https://sourceware.org/ml/libc-alpha/2013-10/msg00731.html
Comment 24 davidh 2014-02-08 00:47:45 UTC
GLibC v2.19 was just released and no fix for this issue. Why is this? What's involved in getting this done (seems like there's a patch available for v2.19 that works)?
-David
Comment 25 Allan McRae 2014-02-08 00:58:21 UTC
It looks like the updated patches were posted to the mailing list were never pinged once the copyright assignment was confirmed.  Also, it looks like there were comments to be addressed.
Comment 26 joseph@codesourcery.com 2014-02-08 01:06:01 UTC
I don't see the copyright assignments yet on file according to 
copyright.list, though both contributors said they were working on them.  
If sent in, assign@gnu.org should be chased up about getting 
copyright.list updated, and once the assignments are ready, the 
contributors should keep pinging the patches weekly on libc-alpha for as 
long as needed as per the contribution checklist.

We still need someone to take charge of working through glibc's open nptl 
bugs and reviewing / fixing them (there are a few people with significant 
expertise in the area, but not so much systematic work through all the 
bugs).
Comment 27 Torvald Riegel 2014-02-08 01:20:32 UTC
I'm also working on a new condvar implementation, motivated by a POSIX clarification that requires stronger signaler--waiter guarantees than what's guaranteed by the current implementation (IOW, the current implementation does not fulfill POSIX requirements).  The new implementation also fixes an ABA issue in the current implementation. The PI support in there is incomplete yet, however, and it might require futex changes/additions.
Comment 28 Gratian Crisan 2014-02-08 03:52:41 UTC
(In reply to joseph@codesourcery.com from comment #26)
> I don't see the copyright assignments yet on file according to 
> copyright.list, though both contributors said they were working on them.  
> If sent in, assign@gnu.org should be chased up about getting 
> copyright.list updated, and once the assignments are ready, the 
> contributors should keep pinging the patches weekly on libc-alpha for as 
> long as needed as per the contribution checklist.
> 
> We still need someone to take charge of working through glibc's open nptl 
> bugs and reviewing / fixing them (there are a few people with significant 
> expertise in the area, but not so much systematic work through all the 
> bugs).

Last I have heard the right people signed the required papers on our side and they were sent to assign@gnu.org sometime mid-December. I haven't heard anything back. Let me see if I can get a status update. Once the copyright assignment is confirmed I can provide updated patches.
Comment 29 Darren Hart 2014-02-10 23:00:34 UTC
Gratian, right, as far as I understand it, Paul has provided all the necessary information for the IBM assignment for my and Dinakar's contributions. If there is anything I can do to help here, please let me know.

Torvalds, I believe in our discussions we agreed that the work you are doing is orthogonal to this, and while it may impact the same code paths, it was best to deal with them as separate efforts and not block one on the other. I would be very interested in continuing to follow along on that effort as well though.
Comment 30 Torvald Riegel 2014-02-11 02:32:20 UTC
(In reply to Darren Hart from comment #29)
> Torvalds, I believe in our discussions we agreed that the work you are doing
> is orthogonal to this, and while it may impact the same code paths, it was
> best to deal with them as separate efforts and not block one on the other. I
> would be very interested in continuing to follow along on that effort as
> well though.

Yes, it is orthogonal.  It's just that the current implementation, with or without your fixes, does not fulfill the stronger waiter-signaler requirements due to the POSIX clarification.  Thus, we'll need to do something about the latter; but it's no reason to not include your fixes while the latter isn't ready.
Comment 31 Jackie Rosen 2014-02-16 16:55:56 UTC
*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen from the domain http://volichat.com
Page where seen: http://volichat.com/adult-chat-rooms
Marked for reference. Resolved as fixed @bugzilla.
Comment 32 John Ogness 2014-02-20 17:43:50 UTC
I believe there is an error in [PATCH 1/3 V3] where COND_PRIO_PROTECT will clobber the CLOCKID in cond->__data.__nwaiters. This is because the COND_PROTOCOL values are not shifted by COND_PROTOCOL_SHIFT before reading and writing to __nwaiters.

I believe the switch cases in pthread_cond_init.c:__pthread_cond_init() should look like this:

    case PTHREAD_PRIO_INHERIT:
      cond->__data.__nwaiters |= COND_PRIO_INHERIT << COND_PROTOCOL_SHIFT;
      break;

    case PTHREAD_PRIO_PROTECT:
      cond->__data.__nwaiters |= COND_PRIO_PROTECT << COND_PROTOCOL_SHIFT;
      break;


And the cond_lock() and cond_unlock() in cond-lock.h should look like this:

static inline void
cond_lock(pthread_cond_t *cond, int pshared)
{
  if (pshared == LLL_PRIVATE
      && (((cond->__data.__nwaiters & COND_PROTOCOL_MASK) >> COND_PROTOCOL_SHIFT)
         == COND_PRIO_INHERIT))
    lll_pi_lock (cond->__data.__lock, pshared);
  else
    lll_lock (cond->__data.__lock, pshared);
}

static inline void
cond_unlock(pthread_cond_t *cond, int pshared)
{
  if (pshared == LLL_PRIVATE
      && (((cond->__data.__nwaiters & COND_PROTOCOL_MASK) >> COND_PROTOCOL_SHIFT)
         == COND_PRIO_INHERIT))
    lll_pi_unlock (cond->__data.__lock, pshared);
  else
    lll_unlock (cond->__data.__lock, pshared);
}
Comment 33 Darren Hart 2014-02-20 18:40:59 UTC
I looked through this, and you appear to be correct. We didn't catch it in testing as the test case only depended on reading and writing the COND_PROTOCOL_INHERIT, which works as the shift was missing from both locations.

Oddly, I had observed what I thought was a similar error with the CLOCK_ID, but after going through it realized it was just an optimization as the CLOCK_ID is bit 0. Sadly, I didn't catch this while staring at the same code :-(

Nice catch John!

Who wants to respin the patches?
Comment 34 Gratian Crisan 2014-02-20 19:22:21 UTC
(In reply to Darren Hart from comment #33)
> I looked through this, and you appear to be correct. We didn't catch it in
> testing as the test case only depended on reading and writing the
> COND_PROTOCOL_INHERIT, which works as the shift was missing from both
> locations.
> 
> Oddly, I had observed what I thought was a similar error with the CLOCK_ID,
> but after going through it realized it was just an optimization as the
> CLOCK_ID is bit 0. Sadly, I didn't catch this while staring at the same code
> :-(
> 
> Nice catch John!
> 
> Who wants to respin the patches?

I can do it but it's going to take me a couple of days due to current workload.
Comment 35 John Ogness 2014-02-21 08:44:07 UTC
@Gratian: Maybe while you are at it, you can change the protocol enum in internaltypes.h to:

enum {
  COND_PRIO_INHERIT = 1,
  COND_PRIO_PROTECT
};

Although technically irrelevant, I found it confusing why it begins with 2. This was probably the reason for Darren's oversight in the protocol shift bug.
Comment 36 Gratian Crisan 2014-02-26 17:11:33 UTC
Created attachment 7442 [details]
[PATCH 1/3 V4] pi-condvars: add protocol support to pthread_condattr_t

Incorporated suggestions made by John Ogness and updated the version to 2.20 now that 2.19 is released.
Comment 37 Darren Hart 2014-03-31 19:31:16 UTC
I just checked libc git master and do not see these changes merged. Is there some pending review or process yet to be completed? Anything needed that I can help with?

Is the copyright assignment issue closed?
Comment 38 Gratian Crisan 2014-04-17 14:53:41 UTC
(In reply to Darren Hart from comment #37)

> Is the copyright assignment issue closed?

I apologize for the (lawyer induced) delay. I finally got confirmation that the assignment was signed on the FSF side.
I will dust off the patches in the next few days, add benchmarks for x86 (to hopefully prove that an assembly implementation is not necessary) and resubmit them.
Comment 39 Darren Hart 2014-04-17 15:15:46 UTC
On 4/17/14, 7:53, "gratian.crisan at ni dot com"
<sourceware-bugzilla@sourceware.org> wrote:

>https://sourceware.org/bugzilla/show_bug.cgi?id=11588
>
>--- Comment #38 from Gratian Crisan <gratian.crisan at ni dot com> ---
>(In reply to Darren Hart from comment #37)
>
>> Is the copyright assignment issue closed?
>
>I apologize for the (lawyer induced) delay. I finally got confirmation
>that the
>assignment was signed on the FSF side.
>I will dust off the patches in the next few days, add benchmarks for x86
>(to
>hopefully prove that an assembly implementation is not necessary) and
>resubmit
>them.


Awesome!
Comment 40 Gratian Crisan 2014-04-30 19:53:13 UTC
Sorry, this is taking me longer than I originally thought. It turns out that this commit: https://sourceware.org/git/?p=glibc.git;a=commit;h=47c5adebd2c864a098c3af66e61e1147dc3cf0b4 is unconditionally disabling PI futexes on ARM because there is a kernel configuration (CONFIG_CPU_USE_DOMAINS && CONFIG_SMP) that doesn't support them. So I need to find a solution for that first before I can finish testing on ARM.
Comment 41 Gratian Crisan 2014-07-07 20:39:44 UTC
Created attachment 7684 [details]
[PATCH 4/6] benchtests: Add benchmarks for pthread_cond_* functions
Comment 42 Gratian Crisan 2014-07-07 20:40:36 UTC
Created attachment 7685 [details]
[PATCH 5/6] x86_64: Remove assembly implementations for pthread_cond_*
Comment 43 Gratian Crisan 2014-07-07 20:41:29 UTC
Created attachment 7686 [details]
[PATCH 6/6] arm: Re-enable PI futex support for ARM kernels >= 3.14.3
Comment 44 Gratian Crisan 2014-07-07 20:43:27 UTC
Created attachment 7687 [details]
[PATCH 1/6 V5] pi-condvars: add protocol support to pthread_condattr_t
Comment 45 Gratian Crisan 2014-07-07 20:44:45 UTC
Created attachment 7688 [details]
[PATCH 2/6] pi-condvars: add tst-condpi1.c, basic API test
Comment 46 Gratian Crisan 2014-07-07 20:45:53 UTC
Created attachment 7689 [details]
[PATCH 3/6] pi-condvars: add tst-condpi2.c, priority inversion avoidance test
Comment 47 davidh 2014-10-02 21:57:26 UTC
Looks like this again missed the last glibc release (2.20). Any chance this might finally get into v2.21 (which looks to be scheduled for 01/2015)?
Comment 48 Torvald Riegel 2014-10-06 07:57:49 UTC
The problem here is that the Austin Group has clarified that POSIX requires stronger ordering guarantees wrt. waiters vs. signals than what the current implementation offers.  We can implement them, but we're not aware of an (efficient) implementation that fulfills both these clarified POSIX requirements and can do PI under all circumstances:

https://sourceware.org/ml/libc-alpha/2014-08/msg00287.html

If you have further input on these choices, please comment on libc-alpha.
Comment 49 Torvald Riegel 2016-05-17 07:52:32 UTC
I have constructed another condvar implementation that I'm testing right now.  However, the outlook for PI support is still not great.  Essentially, we need to boost the priority of waiters that have to wake up from a futex that is used to send signals, and have to confirm that they have woken up.  This seems different from what the futex PI support can do today.

Two things that could help are support for 64b futexes, restricting PI support to non-process-shared condvars, and perhaps other futex operations.
Comment 50 Darren Hart 2016-05-17 16:01:00 UTC
Torvald, I'm reading up (again :-) ) on the issues you've raised from the Austin Group and building context on the use of the sequence locks, etc. Would you share your implementation with me so I can see what kinds of changes are being considered and help building support for the PI cases?

Regarding boosting waiters for sending signals. Current futex PI boosting is based on the Linux kernel rtmutex. Can you elaborate on the what/why in your statement:

> Essentially, we need to boost the priority of waiters that have to wake up from a 
> futex that is used to send signals, and have to confirm that they have woken up.

Thanks.
Comment 51 Gratian Crisan 2016-05-17 16:23:19 UTC
Torvald, I am interested in the proposed implementation too if you don't mind sharing. PI is critical for our real-time applications. We have been carrying the patches attached to this bugzilla entry for multiple years now in order to avoid deadlocks. I would like to help if possible.
Comment 52 Darren Hart 2016-05-17 17:21:29 UTC
Torvald, can you provide a specific citation, preferably to the accessible SUS. I'm looking to see if I have access to POSIX. I'd like to read the specific requirement regarding stronger ordering of waiters vs signals.
Comment 53 Torvald Riegel 2016-05-17 18:09:11 UTC
(In reply to Darren Hart from comment #50)
> Torvald, I'm reading up (again :-) ) on the issues you've raised from the
> Austin Group and building context on the use of the sequence locks, etc.
> Would you share your implementation with me so I can see what kinds of
> changes are being considered and help building support for the PI cases?

I'm in the process of cleaning up the current (second) implementation, and will share it on libc-alpha once that is done.

> Regarding boosting waiters for sending signals. Current futex PI boosting is
> based on the Linux kernel rtmutex. Can you elaborate on the what/why in your
> statement:
> 
> > Essentially, we need to boost the priority of waiters that have to wake up from a 
> > futex that is used to send signals, and have to confirm that they have woken up.
> 

In a nutshell, the problem goes like this:
* Due to having to wake up with ordering constraints, the condition that a futex needs to cover is not an on/off thing like a mutex, but a sequence, in an abstract sense.  If you need to wake at position N, all 0..N waiters can consume.
* Futexes provide no guarantees regarding the order in which waiters enqueue and ware woken (eg, no FIFO).
* Futexes are 32b, so if one represents the ordering sequence with counting, there is some chance of a 32b wrap around and an ABA issue on the futex comparison.  That may be unlikely to hit, but I'm not really comfortable putting such an ABA issue into released code knowingly.
* My previous attempt of a condvar suffered a performance problem because one can't know whether a futex wake-up was spurious or not; if one doesn't know, one needs to conservatively add a fake futex wake-up.
* In this implementation, one can deal with the ABA issue by quiescing the condvar.  But for this to be possible with PI, one needs to boost the priority of waiters when sending a signal.  I don't know how to do that with the current futex PI ops.
* My current attempt builds two groups that waiters wait on, complete with separate futexes per group.  That avoids performance issues of the former simpler algorithm, but now we need to quiesce each group before we can reuse its memory (which we need to do because of process-shared).  So similar PI problem as before.

Specifically, the problem for PI is thus that we have waiters blocked on a futex that is not an exclusive lock.  We need to be able to send them signals and cause them to confirm that they have woken up.  Thus, their priorities must be boosted at least for as long as they need to confirm the wake-up.  If we could have something like a shared lock (ie, rwlock) with PI support, all waiters that are about to block on the signal/condvar futex could acquire a readlock and be boosted by a high-prio wrlock writer.  But I don't know whether we can build that with the current futex PI ops.

The problem is easier for the non-process-shared case because then we can just allocate additional memory and, for example, build our own userspace wait queues, in which boosting can be done with an exclusive lock.
Comment 54 Torvald Riegel 2016-05-17 18:13:49 UTC
(In reply to Darren Hart from comment #52)
> Torvald, can you provide a specific citation, preferably to the accessible
> SUS. I'm looking to see if I have access to POSIX. I'd like to read the
> specific requirement regarding stronger ordering of waiters vs signals.

http://austingroupbugs.net/view.php?id=609
(probably best to read from comment 1403 downwards).  I think the C++ specification for condvars is the cleanest one (see N4567, paragraph 30.5p3-4).  AFAIU, POSIX and ISO C have the same semantics in mind.
Comment 55 Torvald Riegel 2016-05-30 14:16:45 UTC
I've posted the new algorithm upstream:
https://sourceware.org/ml/libc-alpha/2016-05/msg00605.html
Have a look if you're interested.  I'd appreciate any reviews and comments on the PI support problem.