In some circumstances, the pthread_mutex_timedlock operation can be awaken by a call to pthread_mutex_unlock, but the timeout expires before it is scheduled. In case the futex status is 1 when the awaken thread resumes, it will return the timeout status and leave the futex in "1" state. In case another thread whould be waiting for the same mutex, it will never be awaken. This bug was first posted on 10/2003 in comp.programming.threads http://groups.google.com/groups?selm=x74qxsrde4.fsf%40bolo.xenadyne.com I've checked, the faulty behavior is still present in the current nptl code.
Created attachment 211 [details] Sample test to show the bug and more detailed explanation This archive contains a sample code to show the error (will reproduce on a mono-CPU machine), and a file describing the problem in a more detailed fashion.
Created attachment 212 [details] patch attempt to correct the bug for i486 architecture Here is an attempt I made to correct the problem. I'm not sure if there are no undesirable side effects. The basics is that if the futex was in "1" state, we change it to "2". The only problem I can see is that we will eventually enter the kernel on a mutex_unlock where it would not be necessary....
I've checked in some changes which should fix the problem. The test program is bogus, though. At least the assert is and this is the only reason why I saw the program abort at any time. Somebody might want to tell the original poster about the change and ask for retesting.