dead-lock in glibc

Joël Krähemann jkraehemann@gmail.com
Thu Mar 16 06:30:00 GMT 2017


Hi Carlos

Thank you for the hints. If you need additional information please let me know.

regards,
Joël


On Thu, Mar 16, 2017 at 2:54 AM, Carlos O'Donell
<carlos@systemhalted.org> wrote:
> On Wed, Mar 15, 2017 at 4:35 PM, Joël Krähemann <jkraehemann@gmail.com> wrote:
>> * libc6 2.24-9
>
>> Might be I was trying to do a recursive lock on a non-recursive mutex?
>> I was playing 64 beats with the notation editor of GSequencer in a infinite
>> loop. Suddenly it aborted after some playbacka approximetaly 3 to 4 minutes.
>
> No. The asserts are intended to indicate internal consistency is violated.
>
> Recursively locking a non-recursive mutex should lead to the thread
> getting stuck forever, but not an assert.
>
>>>> gsequencer: ../nptl/pthread_mutex_lock.c:349:
>>>> __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e,
>>>> __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind !=
>>>> PTHREAD_MUTEX_RECURSIVE_NP)' failed.
>>>> Aborted
>
> We've had a failure in the futex syscall, but that should not by
> itself trigger an assert.
>
> The failure was either "no thread found" or "deadlock".
>
> The assert triggers when we get "deadlock" from the kernel but the
> mutex was error-checking or recursive. Internally we don't ever expect
> to get "deadlock" from the kernel for these kinds of mutexes and
> indicates an algorithmic problem.
>
> It's an algorithmic problem because earlier code should have detected
> we owned the mutex in the recursive case, bumped the ownership
> counter, and returned zero.
>
> It's an algorithmic problem because earlier code should have detected
> we owned the mutex in the error checking case, and should have
> returned EDEADLK without making any futex syscalls.
>
> So we didn't own the mutex and an attempt to acquire it determined it
> was locked by someone else (not us), and then the kernel returned
> EDEADLK, which doesn't make sense because we didn't own it to begin
> with!
>
> It points to a kernel or glibc issue with PI mutexes.
>
> Cheers,
> Carlos.



More information about the Libc-help mailing list