dead-lock in glibc

Carlos O'Donell carlos@systemhalted.org
Thu Mar 16 01:54:00 GMT 2017


On Wed, Mar 15, 2017 at 4:35 PM, Joël Krähemann <jkraehemann@gmail.com> wrote:
> * libc6 2.24-9

> Might be I was trying to do a recursive lock on a non-recursive mutex?
> I was playing 64 beats with the notation editor of GSequencer in a infinite
> loop. Suddenly it aborted after some playbacka approximetaly 3 to 4 minutes.

No. The asserts are intended to indicate internal consistency is violated.

Recursively locking a non-recursive mutex should lead to the thread
getting stuck forever, but not an assert.

>>> gsequencer: ../nptl/pthread_mutex_lock.c:349:
>>> __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e,
>>> __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind !=
>>> PTHREAD_MUTEX_RECURSIVE_NP)' failed.
>>> Aborted

We've had a failure in the futex syscall, but that should not by
itself trigger an assert.

The failure was either "no thread found" or "deadlock".

The assert triggers when we get "deadlock" from the kernel but the
mutex was error-checking or recursive. Internally we don't ever expect
to get "deadlock" from the kernel for these kinds of mutexes and
indicates an algorithmic problem.

It's an algorithmic problem because earlier code should have detected
we owned the mutex in the recursive case, bumped the ownership
counter, and returned zero.

It's an algorithmic problem because earlier code should have detected
we owned the mutex in the error checking case, and should have
returned EDEADLK without making any futex syscalls.

So we didn't own the mutex and an attempt to acquire it determined it
was locked by someone else (not us), and then the kernel returned
EDEADLK, which doesn't make sense because we didn't own it to begin
with!

It points to a kernel or glibc issue with PI mutexes.

Cheers,
Carlos.



More information about the Libc-help mailing list