This is the mail archive of the
cygwin
mailing list for the Cygwin project.
Re: Snapshot 20040225: make hangs/errors out
On Thu, Mar 04, 2004 at 10:59:48AM -0500, Christopher Faylor wrote:
>On Wed, Mar 03, 2004 at 09:14:28PM -0500, Christopher Faylor wrote:
>>On Wed, Mar 03, 2004 at 06:16:55PM -0500, Rolf Campbell wrote:
>>>Christopher Faylor wrote:
>>>>>>No, but I'll try to catch one. (I removed the strace from my script.)
>>>>>
>>>>>Ok, caught two already. (Produced with attached script + Makefile)
>>>>
>>>>Not much to there, unfortunately.
>>>>
>>>>Out of curiousity, can you duplicate this problem with the snapshot? I
>>>>see that this is your own build, probably built with
>>>>--enable-debugging.
>>>>
>>>>I've been diligently testing things with the snapshot rather than my
>>>>own build because I was trying to debug what was in the subject.
>>>>Snapshots aren't built with --enable-debugging. If this is just an
>>>>artifact from building with --enable-debugging, then I'm not too
>>>>worried.
>>>
>>>Ok, I've been running the script with the '25 snapshot all day, with 44
>>>failures. All the same type of failures I was seeing with the cvs
>>>(with --enable-debugging). Unfortunitely, the ethernet card on my home
>>>machine broke so for now I'll upload one of the strace files to a
>>>geocites site. Nothing looks suspicious to me in the strace, maybe
>>>it's a bug in make? http://www.geocities.com/endlisnis/Temp/freeze.zip
>>
>>Thanks. Unfortunately, I don't see anything more here than in the other
>>strace output.
>>
>>I did manage to duplicate this after 1437 repetitions or so. My strace
>>didn't show anything either, unfortunately, but now maybe I can slowly
>>get to the bottom of the problem.
>
>Weird. Now that I've managed to duplicate it, I can do so at will. I
>guess that's good news.
>
>I see what is causing the symptom but not what is causing the problem.
>I spent a sleepless night modelling multi-threaded signal interrupts
>in my head but I'm still not any closer to understanding the problem.
>
>The problem is that malloc allocates some memory, puts the address of
>the memory in the eax register, and then returns. In the meantime, two
>signals have come in, so rather than return immediately, malloc returns
>to the signal handler and then the signal handler is called again. In
>some cases, this causes the eax register to become zero and so make
>(rightly) complains. In theory, this shouldn't happen since the eax
>register should have been saved on the stack.
>
>Nope. Typing an explanation doesn't help me figure this out. Bummer.
I think I may have figured this out.
It wasn't the eax register being zeroed. It was actually the test for
zero returning improper values due to being interrupted by a signal.
I made a fix last night that allowed me to run this for 2500+
iterations. Of course, I have managed to do that before without error,
so that doesn't mean much, I guess. Backing the change out resulted in
a 'virtual memory exhausted' error in less than a hundred iterations,
however. Odd that I can duplicate it so readily now. I think my
computer was previously trying to shield me from the pain of debugging
this problem.
There is a new snapshot up now with my fix in it. Please try it.
cgf
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/