Bug 14477 - Cancelling phtread_cond_wait() hangs with PRIO_INHERIT mutex
Summary: Cancelling phtread_cond_wait() hangs with PRIO_INHERIT mutex
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: 2.14
: P2 normal
Target Milestone: ---
Assignee: Siddhesh Poyarekar
Depends on:
Reported: 2012-08-16 06:47 UTC by Simon Falsig
Modified: 2014-06-25 10:49 UTC (History)
3 users (show)

See Also:
Last reconfirmed:
fweimer: security-

Test case (468 bytes, application/octet-stream)
2012-08-16 06:47 UTC, Simon Falsig

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Falsig 2012-08-16 06:47:35 UTC
Created attachment 6581 [details]
Test case

I have a problem canceling threads waiting in a pthread_cond_wait(), that use mutexes with the `PTHREAD_PRIO_INHERIT` attribute set. This only happens on certain platforms though. I've posted this on stackoverflow also
(http://stackoverflow.com/questions/11878445/cancelling-pthread-cond-wait-hangs-with-prio-inherit-mutex ), posted to the libc-help mailing list, and was told to file a bugreport in here. 

I haven't yet had the chance to try out the newest versions (been using 2.13 and 2.14), but if anyone could give a few hints as to whether this might be fixed in the current versions (or isn't broken in an older version), I'd be happy to hear about it!

The attached minimal example demonstrates my problem: (compile with g++ pthread_cond_wait.cpp -lpthread)

Every time I run it, main() hangs on pthread_join(). A gdb backtrace shows the following:

    Thread 2 (Thread 0xb7d15b70 (LWP 257)):
    #0  0xb7fde430 in __kernel_vsyscall ()
    #1  0xb7fcf362 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142
    #2  0xb7fcc9f9 in __condvar_w_cleanup () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:434
    #3  0x08048fbe in threadFunc (arg=0x0) at /home/pthread_cond_wait.cpp:22
    #4  0xb7fc8ca0 in start_thread (arg=0xb7d15b70) at pthread_create.c:301
    #5  0xb7de73ae in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

    Thread 1 (Thread 0xb7d166d0 (LWP 254)):
    #0  0xb7fde430 in __kernel_vsyscall ()
    #1  0xb7fc9d64 in pthread_join (threadid=3083950960, thread_return=0x0) at pthread_join.c:89
    #2  0x0804914a in main () at /home/pthread_cond_wait.cpp:41

If PTHREAD_PRIO_INHERIT isn't set on the mutex, everything works as it should, and the program exits cleanly.

Platforms with problems:

 - Embedded AMD Fusion board, running a [PTXDist][1] based 32-bit Linux 3.2.9-rt16 (with [RTpatch][2] 16). We are using the newest [OSELAS][3] i686 cross toolchain (2011.11.1), using gcc 4.6.2, glibc 2.14.1, binutils 2.21.1a, kernel 2.6.39.
 - Same board with the 2011.03.1 toolchain also (gcc 4.5.2 / glibc 2.13 / binutils 2.18 / kernel 2.6.36).

Platforms with no problems:

 - Our own ARM-board, also running a PTXDist Linux (32-bit, using OSELAS arm-v4t cross toolchain (1.99.3) with gcc 4.3.2 / glibc 2.8 / binutils 2.18 / kernel 2.6.27.
 - My laptop (Intel Core i7), running 64-bit Ubuntu 11.04 (virtualized / kernel, gcc 4.5.2 / eglibc 2.13-0ubuntu13.1 / binutils

I have been looking around the net for solutions, and have come across a few patches that I've tried, but without any effect:

 - [Making the condition variables priority inheritance aware.][4]
 - [Handling EAGAIN from FUTEX_WAIT_REQUEUE_PI][5]

  [1]: http://www.ptxdist.org/software/ptxdist/index_en.html
  [2]: https://rt.wiki.kernel.org/index.php/Main_Page
  [3]: http://www.ptxdist.de/oselas/toolchain/index_en.html
  [4]: http://sourceware.org/bugzilla/show_bug.cgi?id=11588
  [5]: http://sourceware.org/git/?p=glibc.git;a=commit;h=c5a0802a682dba23f92d47f0

Best regards,
Simon Falsig
Comment 1 Siddhesh Poyarekar 2012-09-28 13:37:05 UTC
I've submitted a patch for review:

Comment 2 Simon Falsig 2012-09-28 14:43:10 UTC
Thanks! That sounds very interesting - I'll try to test the patch on my own system on Monday - will report back my findings.
Comment 3 Siddhesh Poyarekar 2012-10-01 18:19:23 UTC
I have committed the fix to master.  Marking this as fixed:

Comment 4 Simon Falsig 2012-10-04 09:35:15 UTC
Sorry for the delay (been hung up with other tasks at work), but I've finally gotten around to testing it on my system - and I'm happy to say that it works, patched onto my glibc-2.14.1 :)

Thanks a lot! - this just made my day;)