This is the mail archive of the libc-hacker@sourceware.org mailing list for the glibc project.
Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
One of our larger application is experiencing hangs and we have tracked this down to interaction between fork/atfork and the malloc implementation. We have a simplified test case (attached) that illuminates this problem. Basically the NPTL fork is not atomic to signal due to the at_fork handling which must run before (atfork prepare) and after (atfork parent and child) the fork syscall. The GLIBC runtime uses atfork processing internal to insure correct behaviour for the parent and child after the fork. This includes IO and malloc, for example the calloc contains the following code sequence: /* Suspend the thread until the `atfork' handlers have completed. By that time, the hooks will have been reset as well, so that mALLOc() can be used again. */ (void)mutex_lock(&list_lock); (void)mutex_unlock(&list_lock); return public_mALLOc(sz); This is no problem as long as fork processing continues and call the malloc atfork parent/child handler. However the code in sysdeps/unix/sysv/linux/fork.c is exposed to signals interupting its operation. If the thread calling fork is interrupted by a signal, after it has processed atfork prepare handlers but before it has processed the atfork parent handles, and the signal handler blocks for any reason (sigsuspend or attempts IO) the process can hang. For example any other thread attempting to call malloc will wait for the atfork handlers to release the "list_lock" but the thread processing the fork in now blocked and can not proceed. If the forking thread is dependent on one of the other threads to wake it (via signal) that thread may block on the list_lock first and now we have deadlock. So is it OK for NPTLs fork implementation to not be atomic relative to signals? >From the POSIX spec we see statements like: 13089 ... Since the fork ( ) call can be considered as atomic 13090 from the applicationâs perspective, the set would be initialized as empty and such signals would 13091 have arrived after the fork ( ); see also <signal.h>. In this case fork is definitely not atomic. So what should we do about this? One possible solution is to use the signal mask and disable async signals for the duration of __libc_fork(). Or at least from just before atfork prepare processing to after atfork parent/child processing. We have experimented with this in our application (masking signals before the fork call and restoring them after in the parent and child). And this does seem to elliminate the hang. But should we change the libc NPTL fork implement to use signal masks to give the application the appeirence that fork is atomic?
#include <pthread.h> #include <stdlib.h> #include <stdio.h> #include <string.h> #include <unistd.h> #include <sys/types.h> #include <signal.h> #include <stdio.h> #define CALLOC_NMEMB 10000 /* * VERSION 1.1 * This testcase has 4 threads. The main thread simply starts the other threads * and then sleeps on a pthread_join. The forkingThread repeatedly calls * fork. The signalingThread repeatedly signals the forking thread, which * causes the forking thread to do sigsuspend. The third thread repeatedly * callocs and frees memory. Only when it is done with the calloc does it * signal the suspended thread to continue. The theory is that when the forking * thread gets suspended in the right place, it is holding a lock that the * callocing thread needs to continue, so the calloc thread hangs waiting on * that lock, and it cannot signal the forking thread to continue, creating a * deadlock. */ int killflag = 1; pthread_t forkThread; pthread_t sigThread; pthread_t calThread; void sigusr1Handler(int signum){ sigset_t set1; sigfillset(&set1); sigdelset(&set1, SIGUSR2); sigsuspend(&set1); killflag = 1; } void sigusr2Handler(int signum){ return; } void* callocingThread(void *ptr) { int * memptr; while(1) { memptr = calloc(CALLOC_NMEMB,4); if (!memptr){ fprintf(stderr, "calloc failed\n"); } pthread_kill(forkThread, SIGUSR2); free(memptr); } } void* signalingThread(void *ptr) { while(1) { if (killflag) { killflag = 0; pthread_kill(forkThread, SIGUSR1); } } } void* forkingThread(void *ptr) { pid_t pid; int i; struct sigaction sigusr1_action; struct sigaction sigusr2_action; sigfillset(&sigusr1_action.sa_mask); sigfillset(&sigusr2_action.sa_mask); sigusr1_action.sa_handler = &sigusr1Handler; sigusr2_action.sa_handler = &sigusr2Handler; sigaction(SIGUSR1, &sigusr1_action, NULL); sigaction(SIGUSR2, &sigusr2_action, NULL); while(1) { pid = fork(); fprintf(stderr, "."); if (pid == 0){ /* child */ exit(0); } else if (pid > 0) { /* parent */ waitpid(pid,NULL,NULL); continue; } else { fprintf(stderr, "fork failed\n"); } } } int main(int argc , char *argv[]) { pthread_create(&forkThread, 0, &forkingThread, 0); pthread_create(&calThread, 0, &callocingThread, 0); pthread_create(&sigThread, 0, &signalingThread, 0); pthread_join(forkThread, NULL); return 0; }
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |