fast/native fork?

Jay K
Sun Jan 21 08:32:00 GMT 2018

I have some desire to discuss fork.
I know it is an old and difficult topic.

I found this:

 "Cygwin fork and RtlCloneUserProcess"

NT has had fork since v1.
The Posix subsystem used it.

You didn't need Vista's introduction of RtlCloneUserProcess.

This from 2005 alludes to how to make it work:

but I have difficult questions for you -- anyone, but including Corinna.

What do you expect it to do?

I mean, consider that there is no pthread_atfork or an analog in Win32.
Dlls at all levels of the Win32 stack, might have
process-specific state, that needs to be reinitialized.
Maybe they were holding locks in a worker thread.
Maybe they had the pid cached. Dumb, but it works w/o fork.
The usual problems that pthread_atfork is meant to solve.
Which memory do you expect to be inherited copy on write, and which
memory do you expect to revert to whatever it is when a process starts (or a
dll is loaded)?

You could hope for something like calling DllMain(process attach) of all the
dlls could help, but it can't, at least because dlls just depend on the
default data in the image, they don't write the defaults to memory
at every load.

ntdll.dll is special. It somehow knows fork occured and can reinitialize itself.
ntdll.dll is special -- it used by the Posix subsystems (SFU/SUA/Interix/etc.) and always had to work with fork.
  (Up until Windows 10 and changes for WSL, ntdll.dll was loaded into
  all usermode processes: Win32, Posix, OS/2. This was specifically
  changed for WSL.)

But no other dll expects this.

Now, I have some wierd ideas. Let's brainstorm a little?

Can you somehow leave the child process in limbo waiting for exec, know
when you have waited too long (because *anything* else happened) and only
then do the expensive copying?

Or there is the very problem of getting to the exec call and having the exec parameters?

I mean, what if you actually knew 100% that exec would come very soon after fork?
 What would/could you do then?

And when exec does not follow, what do you do?
How much of the child process is inherited from the parent, vs. how much
is reinitialized such as for a new process?

If there a solution that optimizes the guaranteed to exec case,
can you almost just assume it? Breaking the rare (?) program that does not exec?
You could even omit fork.
You'd have fork_slow and fork_before_exec.
People would have to ifdef Cygwin and chose what they want.
Or the default could be fork_before_exec, breaking a small number
of programs, that could be easily ported.

Ok, how about this?
Can you implement exec using only ntdll.dll, avoiding kernel32.dll?
 And the small/zero number of other things valid/used between fork and exec?
And assume exec follows fork?
If so then that is a solution:
 learn how to use native fork
 and have exec only use ntdll.dll
 That will give you a fast fork + exec sequence.
 Or, can you in the new process, just reinitialize kernel32.dll and kernelbase.dll,
 and only use them for exec?
It doesn't do anything for fork without exec but I still don't understand
how that is supposed to work in Win32.
 Or how about this:
 Again, if you assume exec is coming, and you just need fork 
 to do the minimum -- basically to get a pid.
 fork calls CreateProcess, with a helper .exe, suspended
 fork calls CreateThread, passing it the register context of the creating thread
 the helper thread suspends the creating thread, and takes over its
 register context (including rip and rsp, approximately), shortly thereafter
 the helper thread in the creating process reaches exec.
 At this point, somehow, it adjusts everything..hand waving.
 How do you implement exec today? Does fork actually get the pid
 of the new child, and exec in the child somehow "replaces" the executable, or
 does exec create a second child, with another pid, and the original
 child just waits for it, and returns its exit code as its own?
 Or do you have indirection on pids, and cygwin pids are not win32 pids?
Oh, that's right, setjmp/longjmp.
fork calls setjmp in the parent.
The first return continues until exec in the parent and then
returns the new pid the second time?
  Given an arrival at exec, in the parent instead of the child,
  the usual child part of fork need never run at all.

To repeat: To what extent, if any, can we assume exec follows fork?
And what can be done with this idea?
I understand the more general model, where exec does not follow fork.
But how common is it? How would Cygwin fair if by default fork+exec
was fast, fork w/o exec didn't work, and people ported those somehow?

 - Jay

Problem reports:
Unsubscribe info:

More information about the Cygwin mailing list