NtCreateProcess redux

Daniel Colascione dan.colascione@gmail.com
Mon Apr 25 22:05:00 GMT 2011

On 4/25/2011 12:33 PM, Ryan Johnson wrote:
> I know that folks have looked before into NtCreateProcess as a way of
> doing a real fork() in cygwin, but it's very unclear from the various
> list archives why it's still a bad idea today, other than its being
> undocumented.

It's a bad idea because it doesn't work.  You can certainly create a 
forked child with NtCreateProcess, but without being able to connect it 
to csrss and the rest of the win32 subsystem, this new process is 
useless.  NtCreateProcess-fork works for Interix because it has its own 
NT subsystem, but Cygwin has to live within win32, and I don't think 
creating a new subsystem is feasible for anyone without access to the NT 

> If there's no interest in revisiting NtCreateProcess, I have some really
> crazy ideas to offer, but they would still leave us copying whole
> address spaces and trying to outsmart Windows along the way.

As (I think) cgf once said, Cygwin has been around for a long time, and 
most of the crazy ideas that didn't make it into the code were weighed, 
judged, and found wanting for one reason or another.  I have some crazy 
ideas of my own, mostly involving using shared sections instead of 
NtCopyVirtualMemory to duplicate memory, but I haven't had time to 
implement them[1].

As far as the address space issue goes: when NT creates a new process, 
the loader, in ntdll, gains control before the entry point is ever 
called, and this loader is what's responsible for the initial VM layout. 
  Because ntdll is a "known dll", you can't replace it with a friendlier 
implementation.  After the loader completes its work, the kernel does 
some black magic and resets the initial thread's stack so that it begins 
executing in the ntdll thread startup routine, so you never actually 
_see_ the loader executing.

The only thing that might have a chance of working is to unload 
everything except user32, kernel32, and a few other components, then 
start fresh with a more constrained module loading strategy.

[1] If process A has section S, the contents of which we'd like to 
duplicate in child-process B as S', and B inherits a handle to S, it's 
slower to remap S in B and memcpy it to S' than it is to just initialize 
S' from A's address space with NtCopyVirtualMemory.  But that's the 
single-threaded case.  It turns out that if we have the child map S 
somewhere and have one thread touch S[0], S'[0], S[4096], S'[4096], etc. 
while another thread does a mempcy from S to S', we handily beat the 
NtCopyVirtualMemory approach.

More information about the Cygwin-developers mailing list