showstopper bugs (boring technical details -- run away! run away!)
Mon Nov 6 08:33:00 GMT 2000
On Mon, Nov 06, 2000 at 09:55:30AM -0500, Town, Brad wrote:
>Chris Faylor wrote:
>>I've had a couple of show stopper bugs reported to me which, of course,
>>I can't duplicate, so I've held off on the release until I can either
>>duplicate and fix them or someone else can fix them (hah).
>Arrgh! There's that "hah" again! :)
>Would it be possible for you to briefly recap the show-stopper bugs?
>I'll help if I can.
Wow. I've really stumbled onto something with the (hah).
The showstopper bugs were (I'm using the past tense because I am such an
incurable optimist) random errors from wait_subproc when logging in via
ssh. Corinna reported them and since they were indicative of a serious
problem in cygwin, I've been trying to track them down "in my spare
time" (I'm supposed to be doing more managing and less programming).
I duplicated the problems last night at around 9PM and checked in a fix
at around 1AM. As I was triumphantly drifting off to sleep, I realized
that some of my fix was questionable, so I have to redo it today.
The problem was due to the way cygwin handles the 'exec' call. Since
Windows has nothing that says "start a new process and give it the same
pid", we have to kludge around this. So, when a program exec's, a stub
sticks around waiting for an event from the newly "execed" process. When
it gets the event, the stub opens the parent process with OpenProcess,
duplicates a handle to the newly execed process into its parent, and then
exits. The parent notices the exit, discovers that there is a new handle,
for its child, does some bookkeeping and goes back to waiting for children
The problem was that the process of contacting the parent was not 100%
reliable. I don't know why this is now the case, but I worked around the
problem by always passing a handle to the parent process to all of the
children. This is something that I've wanted to do for a while anyway.
In the process of fixing this bug, I stumbled across several other *#$!
signal races which I worked around. Today, after a fresh night's sleep,
I believe that I know how to fix them.
Anyway, thanks for the offer. If you want to look at the code in question,
it's in sigproc.cc (wait_subproc) and spawn.cc (spawn_guts). This is not
for the faint of heart. I keep meaning to add more comments and document
the whole sorry mess but I've never gotten around to it.
By the way, I now need to do some laundry unless someone else gets around
to it (hah).
Want to unsubscribe from this list?
Send a message to firstname.lastname@example.org
More information about the Cygwin