This is the mail archive of the libc-hacker@sourceware.org mailing list for the glibc project.

Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Timing window in NPTL fork.c causes hangs.


One of our larger application is experiencing hangs and we have tracked
this down to interaction between fork/atfork and the malloc
implementation. We have a simplified test case (attached) that
illuminates this problem.

Basically the NPTL fork is not atomic to signal due to the at_fork
handling which must run before (atfork prepare) and after (atfork parent
and child) the fork syscall. The GLIBC runtime uses atfork processing
internal to insure correct behaviour for the parent and child after the
fork. This includes IO and malloc, for example the calloc contains the
following code sequence:

    /* Suspend the thread until the `atfork' handlers have completed.
       By that time, the hooks will have been reset as well, so that
       mALLOc() can be used again. */
    (void)mutex_lock(&list_lock);
    (void)mutex_unlock(&list_lock);
    return public_mALLOc(sz);

This is no problem as long as fork processing continues and call the
malloc atfork parent/child handler.

However the code in sysdeps/unix/sysv/linux/fork.c is exposed to signals
interupting its operation. If the thread calling fork is interrupted by
a signal, after it has processed atfork prepare handlers but before it
has processed the atfork parent handles, and the signal handler blocks
for any reason (sigsuspend or attempts IO) the process can hang. For
example any other thread attempting to call malloc will wait for the
atfork handlers to release the "list_lock" but the thread processing the
fork in now blocked and can not proceed. If the forking thread is
dependent on one of the other threads to wake it (via signal) that
thread may block on the list_lock first and now we have deadlock.

So is it OK for NPTLs fork implementation to not be atomic relative to
signals?

>From the POSIX spec we see statements like:

13089 ... Since the fork ( ) call can be considered as atomic
13090 from the applicationâs perspective, the set would be initialized
as empty and such signals would
13091 have arrived after the fork ( ); see also <signal.h>.

In this case fork is definitely not atomic.

So what should we do about this? One possible solution is to use the
signal mask and disable async signals for the duration of __libc_fork().
Or at least from just before atfork prepare processing to after atfork
parent/child processing.

We have experimented with this in our application (masking signals
before the fork call and restoring them after in the parent and child).
And this does seem to elliminate the hang.

But should we change the libc NPTL fork implement to use signal masks to
give the application the appeirence that fork is atomic?
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <signal.h>
#include <stdio.h>

#define CALLOC_NMEMB 10000
/* 
* VERSION 1.1
* This testcase has 4 threads. The main thread simply starts the other threads
* and then sleeps on a pthread_join. The forkingThread repeatedly calls 
* fork. The signalingThread repeatedly signals the forking thread, which  
* causes the forking thread to do sigsuspend. The third thread repeatedly 
* callocs and frees memory. Only when it is done with the calloc does it 
* signal the suspended thread to continue. The theory is that when the forking 
* thread gets suspended in the right place, it is holding a lock that the
* callocing thread needs to continue, so the calloc thread hangs waiting on 
* that lock, and it cannot signal the forking thread to continue, creating a 
* deadlock. 
*/

int killflag = 1;
pthread_t forkThread;
pthread_t sigThread;
pthread_t calThread;


void  sigusr1Handler(int signum){
	sigset_t set1;
	sigfillset(&set1);
	sigdelset(&set1, SIGUSR2);
	sigsuspend(&set1);
	killflag = 1;
}

void  sigusr2Handler(int signum){
	return;
}

void* callocingThread(void *ptr)
{ 
	int * memptr;
  
	while(1)
	{ 
		memptr = calloc(CALLOC_NMEMB,4);
		if (!memptr){
			fprintf(stderr, "calloc failed\n");
		}
		pthread_kill(forkThread, SIGUSR2);
		free(memptr);
	}
}

void* signalingThread(void *ptr)
{ 
	while(1)
	{ 
		if (killflag) {
			killflag = 0;
			pthread_kill(forkThread, SIGUSR1);
		}
	}
}


void* forkingThread(void *ptr)
{ 
	pid_t pid;
	int i;

	struct sigaction sigusr1_action;
	struct sigaction sigusr2_action;

	sigfillset(&sigusr1_action.sa_mask);
	sigfillset(&sigusr2_action.sa_mask);

	sigusr1_action.sa_handler = &sigusr1Handler;
	sigusr2_action.sa_handler = &sigusr2Handler;

	sigaction(SIGUSR1, &sigusr1_action, NULL);
	sigaction(SIGUSR2, &sigusr2_action, NULL);

	while(1)
	{ 
		pid = fork();
		fprintf(stderr, ".");
		if (pid == 0){
		/* child */
			exit(0);
		} else if (pid > 0) {
		/* parent */
			waitpid(pid,NULL,NULL);
			continue;
		} else {
			fprintf(stderr, "fork failed\n");
		}
	}
}


int main(int argc , char *argv[])
{




	pthread_create(&forkThread, 0, &forkingThread, 0);
	pthread_create(&calThread, 0, &callocingThread, 0);
	pthread_create(&sigThread, 0, &signalingThread, 0);

	pthread_join(forkThread, NULL);

	return 0;
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]