Bug 417 - pthread_mutex_timedlock can sometime consume a futex_wake signal and return a timeout -- leading to a hang.
Summary: pthread_mutex_timedlock can sometime consume a futex_wake signal and return a...
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Ulrich Drepper
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-09-30 09:47 UTC by Sebastien Decugis
Modified: 2004-10-01 10:34 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Sample test to show the bug and more detailed explanation (1.79 KB, application/x-tar)
2004-09-30 09:48 UTC, Sebastien Decugis
Details
patch attempt to correct the bug for i486 architecture (297 bytes, patch)
2004-09-30 09:51 UTC, Sebastien Decugis
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastien Decugis 2004-09-30 09:47:11 UTC
In some circumstances, the pthread_mutex_timedlock operation can be awaken by a
call to pthread_mutex_unlock, but the timeout expires before it is scheduled. In
case the futex status is 1 when the awaken thread resumes, it will return the
timeout status and leave the futex in "1" state. In case another thread whould
be waiting for the same mutex, it will never be awaken.

This bug was first posted on 10/2003 in comp.programming.threads
http://groups.google.com/groups?selm=x74qxsrde4.fsf%40bolo.xenadyne.com

I've checked, the faulty behavior is still present in the current nptl code.
Comment 1 Sebastien Decugis 2004-09-30 09:48:53 UTC
Created attachment 211 [details]
Sample test to show the bug and more detailed explanation

This archive contains a sample code to show the error (will reproduce on a
mono-CPU machine), and a file describing the problem in a more detailed
fashion.
Comment 2 Sebastien Decugis 2004-09-30 09:51:29 UTC
Created attachment 212 [details]
patch attempt to correct the bug for i486 architecture

Here is an attempt I made to correct the problem. I'm not sure if there are no
undesirable side effects. The basics is that if the futex was in "1" state, we
change it to "2". The only problem I can see is that we will eventually enter
the kernel on a mutex_unlock where it would not be necessary....
Comment 3 Ulrich Drepper 2004-10-01 10:34:37 UTC
I've checked in some changes which should fix the problem.  The test program is
bogus, though.  At least the assert is and this is the only reason why I saw the
program abort at any time.

Somebody might want to tell the original poster about the change and ask for
retesting.