How to make child of failed fork exit cleanly?

Ryan Johnson ryan.johnson@cs.utoronto.ca
Tue May 3 15:46:00 GMT 2011


Hi all,

I'm working on some changes to fork() which would detect early the case 
where a parent-child pair have unresolvable differences in address space 
layout (e.g. thread stacks, heaps, or statically-linked dlls which moved).

Detecting the problem turned out to be pretty easy, but making the child 
exit cleanly is not. This leads to two questions, followed by what I 
have figured out so far while attempting to answer them myself.

1. What's the best way to make a child process notify the parent that 
the fork() cannot succeed, and exit cleanly?

2. When the child does exit, how to prevent finalizers from running for 
dlls which did not load properly?

Context for the first question: Existing fork failure code calls 
api_fatal(), but that sends messages to the terminal and generates a 
stack trace, in addition to the desired result of making the parent's 
fork() call return an error message. Further, Windows 7 treats such an 
exit as grounds for an automatic process restart, and respawns the 
failed child up to five more times before giving up. The result is a 
screen full of error messages and stack traces even if the fork 
eventually succeeds. It's especially annoying under terminal apps like 
emacs or screen, where the messages clutter up the display pretty badly.

Given that the cause of the fork failure is known (rather than some 
surprise or bug), I propose that the messages go to some strace channel 
(a new one for fork, perhaps?) and that the child exit without 
attempting to generate a dump file (especially since dump generation 
itself has a tendency to cause crashes). It would also be good, in cases 
where the parent is the reason for fork failures, to prevent Windows 
from respawning the process so many times (though it is admittedly handy 
when the child was the problem and the fork succeeds on the nth try). 
All of this still leaves the question of how to exit the child process, 
"properly" though. Is it necessary to wait for dll initialization to 
finish first, for example?

Context for the second question: exiting the child tends to trigger 
access violations, often in a pthread_mutex destructor call (la-la 
land). Some of these can be avoided by disabling stack dumping from 
api_fatal (see separate email about alloca and stack walking), but the 
others continue to mystify.

Overal, AFAICT, the cygwin dll design assumes that all dlls have loaded 
properly, and a failed fork breaks that invariant. I worry that some 
"properly-loaded" dll accesses state of a "not-properly loaded" 
dependency, but haven't been able to eliminate fully two simpler 
explanations yet:

(a) A statically-linked dll maps to a different address in the child 
than the parent, and because copied-over dll state references addresses 
which are valid in the parent but not the child, dll initialization 
crashes. For example, this was probably responsible for the access 
violations I reported earlier [1]. I've verified that this can be 
avoided by checking for handle mismatches in dll_list::alloc and forcing 
an early exit, but this leads to...

(b) Finalizers run for a dynamically-linked dll which never loaded 
(and/or a statically-linked dll which loaded to the wrong location -- I 
can't tell).  I've tried inserting checks in a few places to not run 
finalizers unless the after-fork initialization completed, by extending 
dll_list entries to say whether a given dll initialized properly, but 
I've clearly not isolated the cause because the access violations 
continue. Part of the challenge is that the dll_list copied over from 
the parent process will always say that every dll initialized properly. 
It also doesn't help that many dll initializers run before cygwin1.dll 
(sometimes even other cygwin dlls, if they've been dynamically rebased), 
so the value of in_forkee is reliable.

[1] http://cygwin.com/ml/cygwin-developers/2011-04/msg00006.html

Thoughts?
Ryan



More information about the Cygwin-developers mailing list