This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: shared data protection failed in pthread_cond_timedwait
- From: Torvald Riegel <triegel at redhat dot com>
- To: Yang Yingliang <yangyingliang at huawei dot com>
- Cc: Will Newton <will dot newton at linaro dot org>, libc-help at sourceware dot org, libc-alpha <libc-alpha at sourceware dot org>
- Date: Mon, 28 Apr 2014 20:46:03 +0200
- Subject: Re: shared data protection failed in pthread_cond_timedwait
- Authentication-results: sourceware.org; auth=none
- References: <535A078F dot 3050003 at huawei dot com> <CANu=DmiUPkfwnL1gUundA2VvYy5oG_M=PpNnSHtpoj6w7TCKbA at mail dot gmail dot com> <535B5616 dot 7060202 at huawei dot com> <535B80F8 dot 5020309 at huawei dot com>
On Sat, 2014-04-26 at 17:48 +0800, Yang Yingliang wrote:
> On 2014/4/26 14:45, Yang Yingliang wrote:
> > On 2014/4/25 17:43, Will Newton wrote:
> >> On 25 April 2014 07:58, Yang Yingliang <yangyingliang@huawei.com> wrote:
> >>> Hi,
> >>>
> >>>
> >>> I have 22 threads wait in pthread_cond_timedwait. When they are all woke up, I found
> >>> there are more than one threads can access shared data in pthread_cond_timedwait.
> >>>
> >>> I added print messages as follow code:
> >>>
> >>> --- libc/nptl/pthread_cond_timedwait.c
> >>> +++ libc/nptl/pthread_cond_timedwait.c
> >>> @@ -34,6 +34,7 @@
> >>> #else
> >>> # include <bits/libc-vdso.h>
> >>> #endif
> >>> +#include <stdio.h>
> >>>
> >>> /* Cleanup handler, defined in pthread_cond_wait.c. */
> >>> extern void __condvar_cleanup (void *arg)
> >>> @@ -235,7 +239,9 @@
> >>>
> >>> bc_out:
> >>>
> >>> +printf("start do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self());
> >>> cond->__data.__nwaiters -= 1 << COND_NWAITERS_SHIFT;
> >>> +printf("end do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self());
> >>>
> >>> /* If pthread_cond_destroy was called on this variable already,
> >>> notify the pthread_cond_destroy caller all waiters have left
> >>>
> >>>
> >>> I tested on Linux arma15el 3.10.37+ #2 SMP Fri Apr 25 11:23:25 CST 2014 armv7l GNU/Linux.
> >>> Here is the result:
> >>>
> >>> start do sub :45, lock:1 0xb6d9a460
> >>> end do sub :43, lock:1 0xb6d9a460
> >>> start do sub :43, lock:1 0xb6d9e460
> >>> end do sub :41, lock:2 0xb6d9e460
> >>> start do sub :43, lock:2 0xb6dbe460 //two threads both access the shared data
> >>> start do sub :41, lock:1 0xb6daa460
> >>> end do sub :39, lock:2 0xb6daa460
> >>> start do sub :39, lock:2 0xb6de6460
> >>> end do sub :37, lock:2 0xb6de6460
> >>> start do sub :37, lock:2 0xb6db6460
> >>> end do sub :35, lock:2 0xb6db6460
> >>> start do sub :35, lock:2 0xb6dc2460
> >>> end do sub :33, lock:2 0xb6dc2460
> >>> end do sub :37, lock:2 0xb6dbe460
> >>> start do sub :33, lock:2 0xb6dc6460
> >>> end do sub :31, lock:0 0xb6dc6460
> >>> start do sub :31, lock:2 0xb6dae460
> >>> end do sub :29, lock:2 0xb6dae460
> >>> start do sub :29, lock:2 0xb6db2460
> >>> end do sub :27, lock:2 0xb6db2460
> >>> start do sub :27, lock:2 0xb6dba460
> >>> end do sub :25, lock:2 0xb6dba460
> >>> start do sub :25, lock:2 0xb6da2460
> >>> end do sub :23, lock:2 0xb6da2460
> >>>
> >>> Is lll_lock (cond->__data.__lock, pshared) failed?
> >>>
> >>> pshared is LLL_SHARED.
> >>
> >> I have had a quick look at this and there is no obvious reason I can
> >> see for this behaviour, unless there is some way that IO buffering
> >> could cause the messages to be strangely interleaved. The other
> >> alternative that may be worth investigating is whether or not
> >> ldrex/strex is working correctly in your SMP system.
> >>
> >
> > After doing some investigation, it looks like atomic_compare_and_exchange_val_acq
> > not doing atomic. So two threads can both acquire lock when futex is 0. Is there
> > something wrong in atomic_compare_and_exchange_val_acq ?
> >
> > #define __lll_lock(futex, private) \
> > ((void) ({ \
> > int *__futex = (futex); \
> > if (__builtin_expect (atomic_compare_and_exchange_val_acq (__futex, \
> > 1, 0), 0)) \
> > { \
> > if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \
> > __lll_lock_wait_private (__futex); \
> > else \
> > __lll_lock_wait (__futex, private); \
> > } \
> > }))
> >
>
> I noticed that atomic_compare_and_exchange_val_acq are not atomic for ARM in glibc-2.18.
> I will try glibc-2.19.
Could you share the full test case for this, please? Did you just test
whether some mutex can be acquired several times?