This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Weird behavior observed with NPTL semaphores

From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
To: libc-help at sourceware dot org
Date: Wed, 12 Nov 2014 22:43:30 -0200
Subject: Re: Weird behavior observed with NPTL semaphores
Authentication-results: sourceware.org; auth=none
References: <86064E213FDE854885686AF0C4584029A76CB4A1A8 at ONWVEXCHMB02 dot ciena dot com>

Hi

On 30-10-2014 18:37, Tetreault, Francois wrote:
> Hello,
>
> We have questions about the glibc Native POSIX Thread Library (NPTL).
>
> We have an application which has a few threads, where mutexs are used to arbitrate access to data. 
> The Mutex object content is as shown below.
>     mMutex = {
>       __data = {
>         __lock = -2147473878, 
>         __count = 0, 
>         __owner = 0, 
>         __kind = 33, 
>         __nusers = 0, 
>         {
>           __spins = 0, 
>           __list = {
>             __next = 0x0
>           }
>         }
>       }, 
>       __size = "\200\000&*", '\000' <repeats 11 times>, "!\000\000\000\000\000\000\000", 
>       __align = -2147473878
>     }
>
> Where 33 translates to:
> #define PTHREAD_MUTEX_TYPE(m) ((m)->__data.__kind & 127)
>
> PTHREAD_MUTEX_PRIO_INHERT_NP = 32
> PTHREAD_MUTEX_RECURSIVE_NP = 1
> PTHREAD_MUTEX_PI_RECURSIVE_NP = PTHREAD_MUTEX_PRIO_INHERT_NP | PTHREAD_MUTEX_RECURSIVE_NP
>
> A problem occurs, only once in a blue moon, where the code fails to release the semaphore. It complains about the semaphore not being owned by any threads when it comes to give it away.
> We have added our own instrumentation, to hopefully understand what is going on. See our trace below. 
> Caution; our tracing is not perfect as it is not reentrant; we could easily get preempted while we are capturing the data.
> Also note that, in our trace:
> . "pre" is the value of the fields prior to the mutex operation, and "post" is afterwards.
> . MUTEX_GIVE is a call to pthread_mutex_unlock(), and
> . MUTEX_TAKE is a call to pthread_mutex_lock().
>
> { [trace 1]
>       calling_task = 3659, 
>       action = MUTEX_GIVE, 
>       pre_count = 1, 
>       pre_owner = 3659, 
>       post_count = 0, 
>       post_owner = 0
>     },  { [trace 2]
>       calling_task = 4690, 
>       action = MUTEX_TAKE, 
>       pre_count = 0, 
>       pre_owner = 0, 
>       post_count = 1, 
>       post_owner = 4690
>     }, { [trace 3]
>       calling_task = 3659, 
>       action = MUTEX_TAKE, 
>       pre_count = 1, 
>       pre_owner = 4690, 
>       post_count = 1, 
>       post_owner = 3659
>     }, { [trace 4]
>       calling_task = 4690, 
>       action = MUTEX_GIVE, 
>       pre_count = 1, 
>       pre_owner = 4690, 
>       post_count = 0, 
>       post_owner = 0
>     }, { [trace 5]
>       calling_task = 3659, 
>       action = MUTEX_GIVE, 
>       pre_count = 0, 
>       pre_owner = 0, 
>       post_count = 0, 
>       post_owner = 0
>     }, { [trace 6]
>       calling_task = 4690, 
>       action = MUTEX_TAKE, 
>       pre_count = 0, 
>       pre_owner = 0, 
>       post_count = 0, 
>       post_owner = 0
>     }, { [trace 7]
>       calling_task = 3659, 
>       action = MUTEX_TAKE, 
>       pre_count = 0, 
>       pre_owner = 0, 
>       post_count = 1, 
>       post_owner = 0
>     }, { [trace 8]
>       calling_ta sk = 3659, 
>       action = MUTEX_GIVE, 
>       pre_count = 1, 
>       pre_owner = 0, 
>       post_count = 1, 
>       post_owner = 0
>     }
>
> In the end [trace 8], the Mutex content is as follows:
>     mMutex = {
>       __data = {
>         __lock = -2147479989, 
>         __count = 1, 
>         __owner = 0, 
>         __kind = 33, 
>         __nusers = 0, 
>         {
>           __spins = 0, 
>           __list = {
>             __next = 0x0
>           }
>         }
>       }, 
>       __size = "\200\000\016K\000\000\000\001\000\000\000\000\000\000\000!\000\000\000\000\000\000\000", 
>       __align = -2147479989
>     }
>   }
>
> The trace data actually triggered more questions than answers.
>
> 1. Is it ever a valid state to have a count greater than 0 while the value of owner is 0?
> 2. Note that our code asserts if any non-successful code is returned from calling either pthread_mutex_unlock() or pthread_mutex_lock().
> 3. In [trace 5], coming in (pre) we expected the mutex to be owned by 3659, but both count and owner are set to 0. 
> 4. Starting from this point on, the content of the trace seems to be falling apart. Yet our code only asserts when it gets to [trace 8]!
> 5. Also notice that the owner field is always 0 from [trace 5] onwards.
> 6. Is there any known bugs that could lead to this weird behavior?
>
> Info about the system.
> . Linux Kernel version: 3.4.36
> . Glibc version: 2.9 "stable"
> . GCC version: powerpc-e500-linux-gnuspe-gcc (GCC) 4.6.3
> . Processor: Freescale MPC8572
> . Mode of operation: Symmetric Multi-Processing (SMP)
>
> Thank you,
> Francois
>
>
Your GLIBC version seems to be quite old compared to both kernel and GCC. Have you
tried with a new GLIBC? I am not aware of any powerpc bugs related to pthreads,
but due the GLIBC version I am not excluding it. Also I think there were some fixes
for PTHREAD_MUTEX_PI_RECURSIVE_NP in more recent versions.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]