This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: ipc, sockets and windows sp2


Corinna Vinschen wrote :

So I hope you wouldn't mind I attached a short testing program you can easily compil with gcc to reproduce the bug.



Cool, that's exactly what I was asking for. I was immediately able to reproduce the problem and it turned out, that on fork() the socket duplication from parent to child process for some reason occupied space in the child, which in the parent is occupied by the shared memory returned by shmat.

Consequentially the duplication of the shared memory couldn't occupy the
same address as in the parent.  That's a fatal error so the forked child
terminated itself with error 487, which basically means "Invalid address".

I've changed fork() so that the shared memory is duplicated before sockets
are duplicated, which is ok because sockets don't have special requirements
for memory addresses.  That works fine for me, but it would be good if you
could test the next snapshot, which I just uploaded, nevertheless.

It's just incredible that nobody found this problem before.



Yes, I find this incredible as any unix server which use IPC (instead of threads for exemple), will wants to support multiple connections at a time so use this mechanisms.
I doubt that we're the only ones to use shared memory, socket and multi-process !!


Anyway, BIG THANKS to have resolved the problem so quickly.
I recompiled from the cygwin cvs, and it solved my problem, my master now runs well.


However, there is still a problem, sorry ;)

This time with semaphores (either part of IPC). It's less important for me as the master can runs without them, but it's better to have them.
So i updated the test case to see what happens.


I added semaphore lock/release function that I call in the child process, so each child want to lock before accepting connection and released when connection is finished.

For one child, it is ok, but starting second child, the semaphore lock operation (semop() with sem_flg=SEM_UNDO and sem_op=-1) makes cygserver hangs !
Then I get "lost connection to cygserver" errors from my process, plus some "error getting signal_arrived to server(6)" from cygserver process.


So, instead of waiting for semaphore release (semval to go back from 0 to 1), semop returns even if the semaphore is locked, then the program continues like the semaphore was unlocked, but it is still locked.

moreover, sem value is decremented at each semaphore_lock call, so it get -1 value at third call, where we want it to have either 0 for locked and 1 for unlocked. Then it stops here as cygserver is hanged, no more news from next childs (I set 10 child in the exemple).

under osx for exemple, you see the first child locking the semaphore, then all childs wait for the semaphore to be released (semop wait for releasing), and semaphore value is 1 then 0.

I hope this will help,
thank you again for your fix.

Vincent

PS: the same conditions as previous ones apply to this test (windows version, cygwin dll contains your update on fix_shm_after_fork).

------------------------
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/sem.h>
#include <signal.h>
#include <sys/wait.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <sys/errno.h>

#define USE_IPC
#define USE_SEM
//define BIND_AFTER_FORK 

#define BUFFERLEN 256

struct	database
{
	int		shmid;
	int 	semid;
	int 	test1;
	int 	test2;
}
*wdb;

int			get_shared_memory(char *path_key)
{
	key_t 	key;
	int		shmid;
	int		shmflg;
	char	file[BUFFERLEN];

  snprintf(file, BUFFERLEN-1, "%s.exe", path_key);
	if ((key = ftok(file, 'Z')) == -1)
	{
		perror("Getting key for shared memory");
		exit(1);
	}
	shmflg = IPC_CREAT|0600;
	if ((shmid = shmget(key, sizeof(struct database), shmflg)) == -1)
	{
		perror ("Getting shared memory");
		exit(1);
	}
	fprintf(stderr,"shmid: %i\n", shmid);
	return (shmid);
}

int					get_semaphores(char *path_key)
{
	key_t			key;
	int				semid;
	struct sembuf	op;
	int				semflg;
	char			file[BUFFERLEN];

  snprintf(file, BUFFERLEN-1, "%s.exe", path_key);
	if ((key = ftok(file, 'Z')) == -1)
	{
		perror("Getting key for semaphores");
		exit(1);
	}
	semflg = IPC_CREAT|0600;
	if ((semid = semget(key, 1, semflg)) == -1)
	{
		perror("Getting semaphores");
		exit(1);
	}
	if (semctl(semid, 0, SETVAL, 1) == -1)
	{
		perror("semctl SETVAL -> 1");
		exit(1);
	}
	if (semctl(semid, 0, GETVAL) == 0)
	{
		op.sem_num = 0;
		op.sem_op = 1;
		op.sem_flg = 0;
		if (semop(semid, &op, 1) == -1)
		{
			perror("semaphore_release");
			exit(1);
		}
	}
	fprintf(stderr,"semval: %i semid: %i\n", semctl (semid, 0, GETVAL), semid);
	return (semid);
}

void		*attach_shared_memory(int shmid)
{
	void	*rv; // return value

	if ((rv = shmat(shmid, 0, 0)) == (void *) -1)
	{
		perror("shmat");
		return ((void *) -1);
	}

	return (rv);
}

int		detach_shared_memory(void *shmaddr)
{
	int	rv; // return value

	if ((rv = shmdt(shmaddr)) == -1)
	{
		perror("shmdt");
		return (-1);
	}

	return (rv);
}

void					set_signal_handlers (void)
{
	struct sigaction	ignore;

	ignore.sa_handler = SIG_IGN;
	sigemptyset(&ignore.sa_mask);
	ignore.sa_flags = 0;
	sigaction(SIGHUP, &ignore, NULL); // So we keep running as a daemon
}

int						get_socket(short port)
{
	int					sfd; //socket file descriptor
	struct sockaddr_in	addr;
	int					opt;

	opt = 1;
	sfd = socket(PF_INET, SOCK_STREAM, 0);
	if (sfd == -1)
	{
		perror("socket");
		exit(1);
	}
	else
	{
		if (setsockopt(sfd, SOL_SOCKET, SO_REUSEADDR, (int *) &opt, sizeof(opt)) == -1)
			perror ("setsockopt");
		addr.sin_family = AF_INET;
		addr.sin_port = htons(port);
		addr.sin_addr.s_addr = htonl(INADDR_ANY);
		if (bind(sfd, (struct sockaddr *) &addr, sizeof (addr)) == -1)
		{
			perror("bind");
			sfd = -1;
		} else {
			listen (sfd, 5);
		}
	}
	return (sfd);
}

int		accept_socket	(int sfd, struct sockaddr_in *addr)
{
  int	fd;
  int	len = sizeof(struct sockaddr_in);

	if ((fd = accept(sfd, (struct sockaddr *) addr, &len)) == -1)
  {
    perror("Accepting connection\n");
    exit(1);
  }
  return (fd);
}

void 			semaphore_lock(int semid)
{
  struct sembuf	op;

  op.sem_num = 0;
  op.sem_op = -1;
  op.sem_flg = SEM_UNDO;

  fprintf(stderr,"Locking... semval: %i semid: %i\n",semctl (semid,0,GETVAL),semid);
  if (semop(semid, &op, 1) == -1)
  {
	perror("semaphore_lock");
	printf("%i\n",errno);
	exit(0);
  }
  fprintf(stderr,"Locked !!! semval: %i semid: %i\n",semctl (semid,0,GETVAL),semid);
}

void			semaphore_release(int semid)
{
  struct sembuf	op;

  fprintf(stderr,"Unlocking... semval: %i semid: %i\n",semctl (semid,0,GETVAL),semid);
  op.sem_num = 0;
  op.sem_op = 1;
  op.sem_flg = SEM_UNDO;
  if (semop(semid, &op, 1) == -1)
  {
    perror ("semaphore_release");
	printf("%i\n",errno);
	exit(0);
  }
  fprintf(stderr,"Unlocked !!! semval: %i semid: %i\n",semctl (semid,0,GETVAL),semid);
}

int						main(int argc, char *argv[])
{
	int					sfd; // socket file descriptor
	int					csfd; // child sfd, the socket once accepted
	int					shmid; // shared memory id
	int					semid; // semaphore id
	struct sockaddr_in	addr; // Address of the remote host
	pid_t				child;
	pid_t				child_wait;
	int					n_children;
	int					rc; // Return code
	int					i; // For loops

	n_children = 0;
	set_signal_handlers();
	
#ifdef USE_IPC
	shmid = get_shared_memory(argv[0]);
	semid = get_semaphores(argv[0]);
	if ((wdb = attach_shared_memory(shmid)) == (void *) -1)
		exit (1);
	wdb->shmid = shmid;
	wdb->semid = semid;
#endif

#ifndef BIND_AFTER_FORK
	if ((sfd = get_socket(1234)) == -1)
		exit(0);
#endif

	printf ("Waiting for connections...\n");
	while (1)
	{
		if (n_children < 10)
		{
			if ((child = fork()) == 0)
			{
#ifdef BIND_AFTER_FORK
				if ((sfd = get_socket(1234)) == -1)
					exit(0);
#endif
#ifdef USE_SEM
				semaphore_lock(wdb->semid);
#endif
				if ((csfd = accept_socket(sfd, &addr)) != -1)
				{
					close(sfd);
					// handle connection here
					close(csfd);
				}
				else
					perror("Accepting connection\n");
#ifdef USE_SEM
				semaphore_release(wdb->semid);
#endif
				exit(0);
			}
			else if (child != -1)
				n_children++;
			else
				perror("Forking\n");
		}
		else
		{
			if ((child_wait = wait (&rc)) != -1)
				n_children--;
		}
	}
	exit(0);
}

shmid: 65536
semval: 1 semid: 65536
Waiting for connections...
Locking... semval: 1 semid: 65536
Locked !!! semval: 0 semid: 65536
Locking... semval: 0 semid: 65536
     13 [main] a 2468 transport_layer_pipes::connect: lost connection to cygserver, error = 2
Locked !!! semval: -1 semid: 65536
     10 [main] a 4120 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      7 [main] a 1092 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 4616 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      8 [main] a 4844 transport_layer_pipes::connect: lost connection to cygserver, error = 2
     11 [main] a 4024 transport_layer_pipes::connect: lost connection to cygserver, error = 2
     15 [main] a 4596 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      8 [main] a 4368 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 4448 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 3800 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 2212 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 5192 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 588 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 5876 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 4940 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      7 [main] a 2304 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      4 [main] a 6080 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 1488 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 4076 transport_layer_pipes::connect: lost connection to cygserver, error = 2
     10 [main] a 2980 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 4152 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      6 [main] a 1836 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      6 [main] a 3660 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      7 [main] a 5408 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 4720 transport_layer_pipes::connect: lost connection to cygserver, error = 2
     10 [main] a 460 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 5444 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 1752 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      4 [main] a 1944 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      8 [main] a 5796 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 2928 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 5068 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 1096 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 4156 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 3720 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 5992 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      9 [main] a 5052 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 3424 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 364 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 4360 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 4440 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 5548 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 3832 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 2756 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 5148 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      9 [main] a 3880 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      5 [main] a 4356 transport_layer_pipes::connect: lost connection to cygserver, error = 2
      8 [main] a 5836 transport_layer_pipes::connect: lost connection to cygserver, error = 2

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]