This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
pthread_mutex_destroy returns EBUSY, but the mutex isn't locked
- From: "Adrian Ludwin" <adrian dot ludwin at gmail dot com>
- To: libc-help at sourceware dot org
- Date: Wed, 4 Jun 2008 08:48:16 -0400
- Subject: pthread_mutex_destroy returns EBUSY, but the mutex isn't locked
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:reply-to :to:subject:mime-version:content-type:content-transfer-encoding :content-disposition; bh=T5wkrt6+RoThUCttzjjg4jMqf2yQA03k0EVidLUh6T0=; b=qETnV7i1saljKba0TqwlunE6ce68XuOW6DuxjrWEWRUg0Y8eH2My+0oymcT+VNStZK mcV+QuGvUVSMPDUsSSp1+Qbx0kHB2czKV1wBL0h/sU+A2+sJxpCP/ulTZk82+Deu+q1f eMATbU5l97gXtOmNQut4TEznfDZ2GqP2uN71U=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:reply-to:to:subject:mime-version:content-type :content-transfer-encoding:content-disposition; b=plWdiL/Zee8Q9aFec3ZzhOxxueDWrS4hSVA0GSbT7LiGhPB60LyDRDmcsQmoDdY1hw pY8N6eaiDAfG1LnVqoISg5KG9ylXoq1SKYogz2fe1YXGknWaekzNjtZ7x8x9wda16afR kSQadHFc5BPwp7JjI9tsUiv3eRVPO2bYF9EzQ=
- Reply-to: adrian dot ludwin at gmail dot com
Hi all, I've got a problem with a mutex that's been bugging me for a
while. On a very specific platform (an Opteron running CentOS with
kernel 2.6.9-55.ELsmp with glibc version 2.3.4-2.36) I have less than
1% of my pthread_mutex_destroy calls fail with the error "EBUSY." The
errors are not repeatable reliably on the same machine, and not
repeatable at all on another platform (a Core 2 Duo running the same
OS but with glibc version 2.3.4-2.25). I have not been able to
recreate the error in a simple testcase. In the real program, the
error affects about four mutexes in completely separate parts of the
program.
I am certain the mutex is not actually locked, since I've started
printing out its contents prior to destroying them. One example is as
follows:
Mutex 0x920df0: lock=0, count=0, owner=0, nusers=2, kind=2
(The "kind=2" field means this is an error checking mutex, but the
problem occurs for normal mutexes as well; this ensures that I'm not
unlocking a mutex I don't own). As you can see, the futex ("lock") is
zero, as is both the count and owning thread. Only nusers is non-zero;
by far the most common value I see is 1, though I've also seen 2, 5, 6
and 10. I've never seen very large or negative numbers that would
strongly suggest memory corruption, though of course I can't rule this
out. The mutexes are allocated off the heap using plain-vanilla malloc
and free, which I believe should be legal.
I added an assertion that nusers must be zero every time we release a
mutex (using another mutex as a wrapper to ensure that another thread
doesn't grab it suddenly). It always passed, but by the time the same
mutex was destroyed, nusers had mysteriously changed.
If I ignore the error, the program runs to its natural conclusion
(which can take several hours), and always operates correctly. This,
again, does not rule out memory corruption, but it does seem to reduce
the likelihood of it as one might expect the corruption to affect more
than just one aspect of the program.
Barring memory corruption (valgrind and hellgrind haven't found
anything), has anyone ever heard of an issue like this before? Or
should I just be looking harder for corruption?
Many thanks, Adrian