When most public API operations on pthread_t's execute, the per-thread lock
(struct pthread.lock) is acquired to enforce consistency of the kernel and
userspace data structures.
This can cause a problem when a thread (t1) lowers its own priority and some
other thread (t2, high priority) then immediately becomes runnable as a result
of the priority shift. The scenario would look like this:
* There are three threads: T1 (low prio), T2 (mid prio), T3 (high prio)
* T1 is initially running at some value higher than its permanent priority, to
do some startup work
* T2 is executing some CPU-bound job that is always runnable
* T1 finishes initialization, sets itself to its lower (permanent) priority.
This requires grabbing the locking its own per-thread futex ("lock" in struct
pthread). The syscall to alter scheduling parameters will immediately result in
T2 being put on the CPU, so the lock is not yet dropped.
* T3 eventually needs to do some adjustment of T1's scheduling options. So it
tries to grab T1's per-thread lock, but can't since T1 still holds it because
its scheduling syscall hasn't returned to userspace yet.
* Priority inversion. T2 continues to run unchallenged.
Can the pthread.lock be treated as a PI futex instead of a standard futex, in
order to get priority inheritance and work around this inversion?
I'll attach an example program shortly.
Created attachment 2048 [details]
Example to illustrate priority inversion in NPTL pthread internals
Example test case (priority-inversion.c) confirmed. A priority inversion occurs,
causing the high-priority third thread to wait on the low-priority first thread.
I have a few points worth noting:
(1) I was unable to find anywhere in the POSIX specification that states the
per-thread mutex must not cause a priority inversion.
(2) Changing the pthread implementation so that every thread has a
priority-inheritance mutex, instead of a standard mutex, would cause some
performance loss due to the extra overhead associated with using
(3) The priority inversion situation you have described does not cause any of
the threads to hold onto a resource indefinitely, thereby preventing some other
thread from ever making forward progress. All threads eventually make forward
progress, therefore, this is more of a performance issue than a correctness
issue. So I am marking this bug as an enhancement.
BZ flagged for Ulrich's attention: