Bug 9691 - pthread race condition causes tst-mutex8 to fail
Summary: pthread race condition causes tst-mutex8 to fail
Status: RESOLVED INVALID
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: 2.8
: P2 normal
Target Milestone: ---
Assignee: Ulrich Drepper
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-29 12:24 UTC by Bill Mason
Modified: 2014-07-02 07:35 UTC (History)
2 users (show)

See Also:
Host: i686-pc-linux-gnu
Target: i686-pc-linux-gnu
Build: i686-pc-linux-gnu
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bill Mason 2008-12-29 12:24:44 UTC
tst-mutex8 usually runs fine but occasionally (~10% of the time) fails with this
error:
cl: mutex_unlocked failed

So I tried adding print statements to tst-mutex8.c, pthread_mutex_lock.c, and
pthread_mutex_unlock.c to trace them, and I'm pretty sure there is a race
condition.  It only happens for recursive locks.

check_type creates the thread and calls pthread_barrier_wait.  tf (the thread)
locks the mutex, and calls pthread_barrier_wait.  create_type then does an extra
lock and unlock, then cancels the thread.  Meanwhile, tf has added cl to its
cleanup, and is in pthread_cond_wait, which unlocks the mutex at the beginning
and locks it at the end.  But before the lock happens, the thread has been
canceled, and cl is called, which calls unlock again, after nusers is already 0.
 It returns EPERM and tst-mutex8 fails.

The way to replicate the problem is to run tst-mutex8, but I presume plenty of
developers have done this already.  It's a race condition, which makes it very
difficult to replicate.  Hopefully it can be replicated by running it a few
hundred times in a loop.  Otherwise, I hope you can check the code to see if I'm
correct that there is a race condition.  I might be missing something, as I'm
not familiar with this code, but it looks like a race condition from the few
hours of debugging I did.

Here are the details of the system that has produced this problem:
glibc-2.8-20081222
i686-pc-linux-gnu
Options to configure: --prefix=/usr --disable-profile --enable-add-ons    
--enable-kernel=2.6.0 --libexecdir=/usr/lib/glibc
configparms: CFLAGS += -march=i486 -mtune=native
I'm experiencing this while trying to build Linux From Scratch 6.4:
http://www.linuxfromscratch.org/lfs/view/stable/chapter06/glibc.html
Host system: Debian etch
Kernel: 2.6.18-6-686 #1 SMP Fri Dec 12 16:48:28 UTC 2008 i686 GNU/Linux
GCC version: 4.3.2
GNU ld version: 2.18
Comment 1 Ulrich Drepper 2009-01-25 17:53:41 UTC
This is a gcc bug.  It doesn't generate correct unwind information for
cancellation.c.

https://bugzilla.redhat.com/show_bug.cgi?id=481498