semi-solved: fork-related access violations on win7-x64

Ryan Johnson
Sat Apr 16 06:03:00 GMT 2011

Hi all,

I've isolated one source of access violations on my win7-x64 machine, 
and it's nasty.

The offending series of events is:
1. Two linked-in dlls share the same base address
2. The process forks
3. Windows assigns the child's dll a different base addresses than it 
chose for the parent
4. This code from dll::init () ( runs in the child, with p 
addresses from the parent:
> 1.75         (07-May-10):   /* This should be a no-op.  Why didn't we 
> just import this variable? */
> 1.78         (27-Mar-11):   if (!p.envptr)
> 1.78         (27-Mar-11):     p.envptr = &__cygwin_environ;
> 1.79         (06-Apr-11):   else if (*(p.envptr) != __cygwin_environ)
> 1.78         (27-Mar-11):     *(p.envptr) = __cygwin_environ;

It was only recently that "somebody" noticed that the envptr could be 
wrong and added code to "fix" it, but that leaves all the other members 
of p just as wrong as before. If we're lucky, p points to unmapped 
memory, causing one access violation; otherwise, we jump off into la-la 
land and do who-knows-what with bad addresses.

It was trivial to make dll_list::alloc() call api_fatal() when it 
detects a parent/child handle mismatch; whatever spawns the child 
process is apparently willing to try as many as six times before giving 
up. Six retries gives 8/60 around 85% success rate for my toy benchmark, 
suggesting that Windows 7 has ~25% probability of resolving a 
conflicting dll base address the same way twice in a row. This varies 
all over the map, tho: sometimes fork() succeeds in one try quite a few 
times in a row; or it may fail completely as many times in a row, with 
2-3 failures being the most common.

Unfortunately, the failed forks don't quite go away cleanly, since a 
static destructor from one of my two conflicting dlls tries to run (and 
fails), as does some cygwin-related finalization:
>   14428 [main] fork 6148 
> C:\cygwin\home\Ryan\experiments\fork-tests\fork.exe: *** fatal error - 
> Location of C:\cygwin\home\Ryan\experiments\fork-tests\cygfoo.dll 
> changed from 0x3A0000 (parent) to 0x320000 (child)
> Stack trace:
> Frame     Function  Args
> 0027B45C  610294DB  (0027B45C, 00000000, 00000000, 00000000)
> 0027B74C  610294DB  (00000001, 00008000, 00000000, 61184ADA)
> 0027C77C  61005E37  (611AC5E8, 0027C7A4, 003A0000, 00320000)
> 0028C7AC  61022626  (611E2440, 00320000, 00324078, 00000002)
> 0028C7EC  61022814  (0028F9F0, 0028C828, 6102271D, 00000000)
> End of stack trace
> * * * (null) fini
> CloseHandle(win32_obj_id<0x104>) failed virtual 
> pthread_mutex::~pthread_mutex(): 1585, Win32 error 6

So... is there any way to unload a DLL_LINK and "encourage" it to go to 
the right place? Alternatively, is there a quieter way to kill off 
failed child processes? I would imagine that only a rebase can make a 
DLL_LINK always go where it belongs on the first try, much as I despise 
the temporary band-aid that is rebasing.


More information about the Cygwin-developers mailing list