This is the mail archive of the
mailing list for the Cygwin project.
Re: NtCreateProcess redux
On 25/04/2011 6:05 PM, Daniel Colascione wrote:
On 4/25/2011 12:33 PM, Ryan Johnson wrote:
That would definitely go in the Bad Things category... I didn't realize
you had to manually deal with subsystems. Looking at the NT internals
book, I see now that their fork example is a thoroughly scary hack
(casting arbitrary hex numbers to function pointers and trying to call
I know that folks have looked before into NtCreateProcess as a way of
doing a real fork() in cygwin, but it's very unclear from the various
list archives why it's still a bad idea today, other than its being
It's a bad idea because it doesn't work. You can certainly create a
forked child with NtCreateProcess, but without being able to connect
it to csrss and the rest of the win32 subsystem, this new process is
useless. NtCreateProcess-fork works for Interix because it has its
own NT subsystem, but Cygwin has to live within win32, and I don't
think creating a new subsystem is feasible for anyone without access
to the NT source.
As far as the address space issue goes: when NT creates a new process,
the loader, in ntdll, gains control before the entry point is ever
called, and this loader is what's responsible for the initial VM
layout. Because ntdll is a "known dll", you can't replace it with a
friendlier implementation. After the loader completes its work, the
kernel does some black magic and resets the initial thread's stack so
that it begins executing in the ntdll thread startup routine, so you
never actually _see_ the loader executing.
Yes, I've noticed that. Windbg can actually trace the load process,
thoughI don't have any debug symbols to know what's going on. There are
even several nameless dlls which get loaded and unloaded before WOW64
hands over control.
Unfortunately, AFAICT it's impossible to unload statically-linked DLLs:
you can call FreeModule() on their handle, and it returns success, but
the image remains loaded in memory.
The only thing that might have a chance of working is to unload
everything except user32, kernel32, and a few other components, then
start fresh with a more constrained module loading strategy.
However, the main crazy idea I've been toying with uses the same basic
premise: make the .exe a minimal stub (maybe not even linking
cygwin1.dll directly) which dynamically loads a .dll containing all the
application's code and link-time dependencies. Doing so would minimize
the number of address space changes the NT loader could impose during
process startup. Most fork failures I see right now are due to
statically-linked dlls moving around, which we can't really do anything
to avoid or fix, other than calling rebaseall with crossed fingers. At
least with dynamically-loaded dlls we have a semblance of control.
Not necessarily what you want to do all the time, but for these
problematic dll-heavy apps which also like to fork... I'll send a
separate email soon with more details.
A trade-off between the cost of traps to fault in pages vs. the cost of
syscalls to do inter-process memory transfers? It seems like the latter
would win if you copied enough pages at a time (the actual memcpy cost
should be about the same either way). What happens if both threads just
call NtCopyVirtualMemory in parallel?
 If process A has section S, the contents of which we'd like to
duplicate in child-process B as S', and B inherits a handle to S, it's
slower to remap S in B and memcpy it to S' than it is to just
initialize S' from A's address space with NtCopyVirtualMemory. But
that's the single-threaded case. It turns out that if we have the
child map S somewhere and have one thread touch S, S', S,
S', etc. while another thread does a mempcy from S to S', we
handily beat the NtCopyVirtualMemory approach.