This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Multi Threaded programs deadlock doing simple I/O operations


On Friday, June 10, 2005 at 3:44 PM, Mark Pizzolato wrote:
> On Thursday, June 09, 2005 at 6:12 PM, Mark Pizzolato wrote:
>> On Thursday, June 09, 2005 at 3:35 PM, Christopher Faylor wrote:
>> > On Wed, Jun 08, 2005 at 05:43:59PM -0700, Mark Pizzolato wrote:
>> > >There is a serious problem for multi threaded programs doing simple >> > >I/O
>> > >operations in cygwin (open, dup, fdopen, fclose, and close).
>> > >
>> > >The attached 81 line test program clearly demonstrates the issue (by
>> > >hanging and no longer consuming CPU or performing any I/O >> > >operations).
>> >
>> > Thanks for the relatively small test case. That was enough to track >> > the
>> > problem down. I'm generating a new snapshot with a fix for this
>> > problem.
>>
>> The snapshot looks good!
>>
>> This fixes the stability problems with clamav's clamd that I've been >> chasing
>> for a long time.
>
> Some more follow up here...I'm running with the 20050609 snapshot dll.
>
> clamav's clamd now runs better than it has ever for me on cygwin.....
>
> until "it doesn't",
>
> once it starts to run poorly it won't run cleanly again until I reboot > the system
> (I haven't actually tried after merely exiting all processes ..)

Well, i spoke too soon here. There may be some interaction with many recently closed tcp sessions sitting in TIME_WAIT. I'm not sure, but after some time, I can restart and experience aparrently good behavior and then things get "poor" as described.


If I run with the 20050607 snapshot, the new "poor" behavior doesn't happen, while the test program I provided earlier in this thread hangs as described. So, the fix to the original problem and the new "poor" behavior are clearly related to changes between the 20050607 and the 20050609 snapshots.

> To be more specific about the "poor" behavior:
>
>
> - pthread_unlock_mutex fails leaving errno with a value of 90. This is > in a place where there is only one path through about a dozen lines of > code and the mutex is definately locked. there may have been a call to > pthread_create, and a definate call to pthread_cond_signal.
> - once the above error happens, calls (by the same thread) to accept() > fail using a file descriptor which we've been successfully using all > along and only close when the program exists.
>
> so some change introduced recently (since 1.5.17-1), and possibly in > 20050609 fixes the dup() issue but now mutex operations are failing in > strange ways.
>
> Sorry not to have a simple isolated test case for this. The good news > is that once it breaks it won't run correcfly again until a reboot.

I'm working on a test program to recreate this behavior.


- Mark Pizzolato


-- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]