This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: çå: Help, any one ever meet hanging on _IO_lock_lock(list_all_lock) issue ?
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: Wuqixuan <wuqixuan at huawei dot com>, "Carlos O'Donell" <carlos at systemhalted dot org>
- Cc: "libc-help at sourceware dot org" <libc-help at sourceware dot org>, "schwab at redhat dot com" <schwab at redhat dot com>, Jiazhenghua <jiazhenghua at huawei dot com>, "Liuyong (John)" <john dot liuyong at huawei dot com>
- Date: Fri, 15 Nov 2013 21:53:50 -0500
- Subject: Re: çå: Help, any one ever meet hanging on _IO_lock_lock(list_all_lock) issue ?
- Authentication-results: sourceware.org; auth=none
- References: <BB7C62C2B0732E4DA93834A501E846456C8D8003 at szxema505-mbx dot china dot huawei dot com> <CAE2sS1ishHhT+LEqHkcadXyP4wBeWJFGRMroLmVQGrMEBMD9tg at mail dot gmail dot com> <BB7C62C2B0732E4DA93834A501E846456C8D8023 at szxema505-mbx dot china dot huawei dot com> <BB7C62C2B0732E4DA93834A501E846456C8D80D2 at szxema505-mbx dot china dot huawei dot com>,<CAE2sS1hQG7m3fKsqyqTXi-izB5cWM0ruqTw5Z2RofQH64-M+VQ at mail dot gmail dot com> <BB7C62C2B0732E4DA93834A501E846456C8DA38F at szxema505-mbs dot china dot huawei dot com>
On 11/15/2013 08:18 PM, Wuqixuan wrote:
>> Andreas can best comment on that.
>
> We found a discussion which can cause this problem and Andreas also was on that, so we guess below
> discussion cause Andreas to make the patch ( can Andreas confirm?) .
> http://sourceware-org.1504.n7.nabble.com/PATCH-Fix-possible-deadlock-in-stdio-locking-code-td6853.html
>
> Also, we analyze the code that thread(A) call pthread_cancel and thread(B) call fcloseall at the same time,
> can reproduce this issue.
>
> Thread A Thread B
> fcloseall
> _IO_cleanup
> pthread_cancelïBï _IO_flush_all_lockp(0)
> _IO_cleanup_region_start_noarg (flush_cleanup)
> (here register flush_cleanup, if we did the patch, flush_cleanup will not be registered)
> write (here is a cancel point, so it check somebody cancel me)
> call flush_cleanup, (list_all_lock->cnt reduce to -1, but occur)
> thread exit.
>
> So we guess our problem is similiar with it.
Just to be clear: Does Andreas' patch fix this issue?
Note that we can try to force this issue to appear by
doing some white-box testing and using systemtap to
delay or abitrarily synchronize the operation of certain
threads (I've done this before to expose threading issues
by delaying pthread calls via the ld audit interface to
widen small race conditions).
Cheers,
Carlos.