ipc, sockets and windows sp2

Vincent Dedun kraken@smousseland.com
Fri Apr 1 08:11:00 GMT 2005


There seems to be odd problems with windows sp2 (and some sp1 with 
undetermined updates).

I work on windows version of drqueue, which is an opensource distributed 
rendering management software (for use with maya rendering for exemple), 
designed for unix, so it uses IPC ans sockets.

The port works well for the most of it, except for the server itself 
(the master program).
The unix version has no problem on all this, it works on linux, bsds, irix..

Please take a look in the main loop (main function), of this short file :
http://www.drqueue.org/svn/trunk/drqueue/master.c

basicly, the program do this :
-init config
-load saved database of jobs
-set signals handlers
-get shared memory (IPC shared memory and semaphores)
-fork a consitency checking task (it is not involved in the problem, i 
tested)
-bind a port (it's server!)
-then go the usual main loop which forks childs process to accept 
connections.

on windows sp2 (and some sp1 with updates), the master keep yelling a 
strange error  :

 *** MapViewOfFileEx (0xF10000), Win32 error 487.  Terminating.

error 487 means 
"Attempt to access invalid address."


So the listening child process dies immediatly, which has for effects to 
write again and again this error, as new child process are launched when 
others are leaving to keep a minimum of MASTERNCHILDREN ready to listen 
process (to support high load from network, anyway, with 
MASTERNCHILDREN=1, it does the same).

After debugging, i saw that the child process hangs with this error 
(from cygwin1.dll), as soon as it forks.

I took a look in cygwin sources, and found that MapViewOfFileEx was used 
in shared memory and mmap stuff.
So i tested with shared memory code disabled, and the problems disappeared !

Moreover, i tested several combinaisons and found that putting 
get_socket function AFTER the fork corrected the problem !
get_socket function simply create a socket and bind the usual way, it is 
defined here if you want to take a look :
http://www.drqueue.org/svn/trunk/drqueue/communications.c

So in the current master.c, you'll see a short #ifdef __CYGWIN, to have 
get_socket called after the forks instead of before.

The problem is that every child process binds, which is not the correct 
way to manage the socket, the bind has to be done one time at beginning. 
So my windows version of master with this trick will accept one 
connection but will die (at least on sp2) with communication errors when 
some clients connect, even with only 2 clients.

After googling a bit, i found that there were some issues in sp2 with 
modified behaviour of undocumented windows system functions that caused 
some problems in some part of cygwin. I think this has something to do 
with this (i can't tell more on this).

So, there are bugs in cygwin dll when you use shared memory (attached!), 
forks and socket..
Moreover, the problem is the fork, if you fork BEFORE a bind OR AFTER 
socket closing, there is no problem, but if you fork SOMEWHERE in the 
socket process (bind, listen, accept, read on socket), you get this error.

To resume : I think that using a socket file descriptor that has been 
created in a parent process for any socket operation is just not 
possible if you use shared memory (under some circumstances), with those 
windows version.

Does someone know this problem? Does someone has a workaround to keep my 
master program running on lastest windows version, waiting a cygwin fix ?

Thanks,

Kraken



--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list