Summary: | Pthread hang where there are still waiters when mutex is in "unlocked" state. | ||
---|---|---|---|
Product: | glibc | Reporter: | Ryan S. Arnold <rsa> |
Component: | nptl | Assignee: | Ulrich Drepper <drepper.fsp> |
Status: | RESOLVED FIXED | ||
Severity: | critical | CC: | glibc-bugs, rsa |
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | powerpc-linux | Target: | powerpc-linux |
Build: | powerpc-linux | Last reconfirmed: | |
Attachments: |
pthread hang
patch to avoid the hang by awakening waiters before returning TIMEOUT. Simplified testcase with cleaner termination path. |
Description
Ryan S. Arnold
2007-11-01 17:03:11 UTC
Created attachment 2070 [details]
pthread hang
Associated test-case to reproduce on PowerPC hardware.
Created attachment 2071 [details]
patch to avoid the hang by awakening waiters before returning TIMEOUT.
The following patch ensures that waiters will be awoken before returning the
timeout. This patch avoids an unnecessary system call in the usual timeout
case.
A simpler solution if we don't care about the system call cost would be to
unconditionally invoke lll_futex_wake before returning.
I've verified that this patch does indeed prevent the hang scenario described.
The analysis is correct but the patch is less than optimal. I've checked in something different and also fixed x86 and x86-64. Thanks Ulrich, for future reference: "(__lll_timedlock_wait): If we time out, try one last time to lock the futex to avoid losing a wakeup signal." lowlevellock.c http://sourceware.org/cgi-bin/cvsweb.cgi/libc/nptl/sysdeps/unix/sysv/linux/lowlevellock.c.diff?cvsroot=glibc&r1=1.17&r2=1.18 i386/lowlevellock.S http://sourceware.org/cgi-bin/cvsweb.cgi/libc/nptl/sysdeps/unix/sysv/linux/i386/i486/lowlevellock.S.diff?cvsroot=glibc&r1=1.19&r2=1.20 x86_64/lowlevellock.S http://sourceware.org/cgi-bin/cvsweb.cgi/libc/nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S.diff?cvsroot=glibc&r1=1.21&r2=1.22 I tested the patch on a Power5 machine and I'm still encountering the hang. Others indicate that they're getting the hang as well on different classes of PowerPC hardware. Is there any information you'd like me to gather to determine why it's still happening? I've made some more changes (and some optimizations). The current code should work. Created attachment 2112 [details]
Simplified testcase with cleaner termination path.
The fixed worked perfectly on POWER6. On POWER5 I kept running into a
segmentation fault in the exit() path of the test-case.
The test-case is problematic since the exit() in the child thread's
thread_exit() function causes process termination which ends up sending two
threads down the glibc exit() pipeline at the same time and the linked list of
exit handlers and ends up dereferencing a pointer which has already been
zeroed.
I've modified the test case to demonstrate a more appropriate exit strategy
(which also ends up simplifying the testcase).
I think this bug is resolved.
Thanks for the fix Ulrich.
|