Code in the PRIO_INHERIT cases of pthread_mutex_lock/unlock/trylock/timedlock call futex() directly with FUTEX_(UN)LOCK_PI regardless of the pshared-ness of the mutex. Other cases call futex() indirectly through __lll_*() interfaces that set FUTEX_PRIVATE_FLAG as appropriate. This was observed to cause contention on mm->mmap_sema leading to large, random latencies on otherwise quick lock operations (apparently due to brk() holding mmap_sema). I patched nptl to set the private flag in the PRIO_INHERIT cases (based on the pshared-ness of the mutex) and tested that against the regression tests and my application. The tests pass, the application works, and the performance problem is solved. However, I am working from a fairly superficial understanding of this code, and I have not received replies from my inquiries to experts in this code. I will try to attach my patch after I submit. I think it illustrates the desired behavior, but it may not be exactly what glibc maintainers would want.
Created attachment 2988 [details] Add FUTEX_PRIVATE_FLAG support for PRIO_INHERIT mutexes as described in the original bug report
I've checked in a slightly different, more efficient patch.