showstopper bugs (boring technical details -- run away! run away!)

Scott Carter scarter@emware.com
Tue Nov 7 13:13:00 GMT 2000


On Monday, November 06, 2000 09:33 AM, Chris Faylor wrote:
>By the way, I now need to do some laundry unless someone 
>else gets around to it (hah).

That I can handle -- I've got considerable experience with dirty laundry (in
fact, my main task Saturday was doing bed spreads/sheets, without child
interrupts). Zip things up and email the batch to me and I'll get write() on
it. But if you've tarred anything or spawned a lot of guts, I can't
guarantee that I can rm the stains.

Please answer these FAQs-
 * What's the sizeof your shorts: long, double, or wide.
 * What kind of threads: cotton, poly, wool? (no satin/silk/lace please).
 * Any special fabric char warning tags/labels?
 * Fabric softener or pesticide (to kill the bugs)?
 * Should I de-linted it? 
 * Would you like long strings truncated?
 * How would you like you socks stacked.
 * Starch or no on your callers?
 * Do you need any patches on that?

All-tmp-a-Cheers,
Scott Carter
Soft wear Engineer


-----Original Message-----
From: Christopher Faylor [ mailto:cgf@redhat.com ]
Sent: Monday, November 06, 2000 09:33 AM
To: 'cygwin@sources.redhat.com'
Subject: Re: showstopper bugs (boring technical details -- run away! run
away!)


On Mon, Nov 06, 2000 at 09:55:30AM -0500, Town, Brad wrote:
>Chris Faylor wrote:
>>I've had a couple of show stopper bugs reported to me which, of course,
>>I can't duplicate, so I've held off on the release until I can either
>>duplicate and fix them or someone else can fix them (hah).
>
>Arrgh! There's that "hah" again! :)
>
>Would it be possible for you to briefly recap the show-stopper bugs?
>I'll help if I can.

Wow.  I've really stumbled onto something with the (hah).

The showstopper bugs were (I'm using the past tense because I am such an
incurable optimist) random errors from wait_subproc when logging in via
ssh.  Corinna reported them and since they were indicative of a serious
problem in cygwin, I've been trying to track them down "in my spare
time" (I'm supposed to be doing more managing and less programming).

I duplicated the problems last night at around 9PM and checked in a fix
at around 1AM.  As I was triumphantly drifting off to sleep, I realized
that some of my fix was questionable, so I have to redo it today.

The problem was due to the way cygwin handles the 'exec' call.  Since
Windows has nothing that says "start a new process and give it the same
pid", we have to kludge around this.  So, when a program exec's, a stub
sticks around waiting for an event from the newly "execed" process.  When
it gets the event, the stub opens the parent process with OpenProcess,
duplicates a handle to the newly execed process into its parent, and then
exits.  The parent notices the exit, discovers that there is a new handle,
for its child, does some bookkeeping and goes back to waiting for children
to exit.

The problem was that the process of contacting the parent was not 100%
reliable.  I don't know why this is now the case, but I worked around the
problem by always passing a handle to the parent process to all of the
children.  This is something that I've wanted to do for a while anyway.

In the process of fixing this bug, I stumbled across several other *#$!
signal races which I worked around.  Today, after a fresh night's sleep,
I believe that I know how to fix them.

Anyway, thanks for the offer.  If you want to look at the code in question,
it's in sigproc.cc (wait_subproc) and spawn.cc (spawn_guts).  This is not
for the faint of heart.  I keep meaning to add more comments and document
the whole sorry mess but I've never gotten around to it.

By the way, I now need to do some laundry unless someone else gets around
to it (hah).

cgf

--
Want to unsubscribe from this list?
Send a message to cygwin-unsubscribe@sourceware.cygnus.com



More information about the Cygwin mailing list