dead-lock in glibc

Fri Mar 31 21:07:00 GMT 2017

Hi

Here was the mutex locked wrong. First unlock() and then unlock(), again.

Bests,
Joël

On Fri, Mar 31, 2017 at 10:35 PM, Joël Krähemann <jkraehemann@gmail.com> wrote:
> Hi
>
> Just run the test again, hanging at a different time.
>
> Bests,
> Joël
>
>
> On Thu, Mar 16, 2017 at 7:30 AM, Joël Krähemann <jkraehemann@gmail.com> wrote:
>> Hi Carlos
>>
>> Thank you for the hints. If you need additional information please let me know.
>>
>> regards,
>> Joël
>>
>>
>> On Thu, Mar 16, 2017 at 2:54 AM, Carlos O'Donell
>> <carlos@systemhalted.org> wrote:
>>> On Wed, Mar 15, 2017 at 4:35 PM, Joël Krähemann <jkraehemann@gmail.com> wrote:
>>>> * libc6 2.24-9
>>>
>>>> Might be I was trying to do a recursive lock on a non-recursive mutex?
>>>> I was playing 64 beats with the notation editor of GSequencer in a infinite
>>>> loop. Suddenly it aborted after some playbacka approximetaly 3 to 4 minutes.
>>>
>>> No. The asserts are intended to indicate internal consistency is violated.
>>>
>>> Recursively locking a non-recursive mutex should lead to the thread
>>> getting stuck forever, but not an assert.
>>>
>>>>>> gsequencer: ../nptl/pthread_mutex_lock.c:349:
>>>>>> __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e,
>>>>>> __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind !=
>>>>>> PTHREAD_MUTEX_RECURSIVE_NP)' failed.
>>>>>> Aborted
>>>
>>> We've had a failure in the futex syscall, but that should not by
>>> itself trigger an assert.
>>>
>>> The failure was either "no thread found" or "deadlock".
>>>
>>> The assert triggers when we get "deadlock" from the kernel but the
>>> mutex was error-checking or recursive. Internally we don't ever expect
>>> to get "deadlock" from the kernel for these kinds of mutexes and
>>> indicates an algorithmic problem.
>>>
>>> It's an algorithmic problem because earlier code should have detected
>>> we owned the mutex in the recursive case, bumped the ownership
>>> counter, and returned zero.
>>>
>>> It's an algorithmic problem because earlier code should have detected
>>> we owned the mutex in the error checking case, and should have
>>> returned EDEADLK without making any futex syscalls.
>>>
>>> So we didn't own the mutex and an attempt to acquire it determined it
>>> was locked by someone else (not us), and then the kernel returned
>>> EDEADLK, which doesn't make sense because we didn't own it to begin
>>> with!
>>>
>>> It points to a kernel or glibc issue with PI mutexes.
>>>
>>> Cheers,
>>> Carlos.