not that this is a great idea, but my read of the documentation implies that forking in a signal handler should work. indeed it does in glibc unless there has been a call to malloc. i orginally thought this might be a kernel problem (see: http://lkml.org/lkml/2004/5/4/150) but they directed me here. here's some ver_linux output: ----- Linux e.mscd.edu 2.6.10-1.766_FC3 #1 Wed Feb 9 23:06:42 EST 2005 i686 i686 i 386 GNU/Linux Gnu C 3.4.2 Gnu make 3.80 binutils 2.15.92.0.2 Linux C Library 2.3.4 Dynamic linker (ldd) 2.3.4 ----- here's the program: ----- #include <stdio.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #include <stdlib.h> #include <assert.h> void sig_handler (int signum) { int child; if ((child = fork ()) == 0) exit (0); waitpid (child, NULL, 0); } int main (int argc, char **argv) { int parent = getpid(); int child; struct sigaction action; sigemptyset (&action.sa_mask); action.sa_handler = sig_handler; /* works if the following line is commented out */ malloc (sizeof (int)); assert (sigaction (SIGALRM, &action, NULL) == 0); /* ** create a child that sends the signal to be caught */ if ((child = fork ()) == 0) { if (kill (parent, SIGALRM) == -1) perror ("kill"); exit (0); } waitpid (child, NULL, 0); } and here is a capture of an strace: beaty@emess->Problem$ strace ./problem execve("./problem", ["./problem"], [/* 28 vars */]) = 0 uname({sys="Linux", node="emess.mscd.edu", ...}) = 0 brk(0) = 0x9360000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=124129, ...}) = 0 old_mmap(NULL, 124129, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7fcc000 close(3) = 0 open("/lib/tls/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 O\1\000"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=1521612, ...}) = 0 old_mmap(NULL, 1219740, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe000 mprotect(0x921000, 27804, PROT_NONE) = 0 old_mmap(0x922000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x123000) = 0x922000 old_mmap(0x926000, 7324, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x926000 close(3) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fcb000 mprotect(0x922000, 8192, PROT_READ) = 0 mprotect(0xe5c000, 4096, PROT_READ) = 0 set_thread_area({entry_number:-1 -> 6, base_addr:0xb7fcb940, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 munmap(0xb7fcc000, 124129) = 0 getpid() = 12119 brk(0) = 0x9360000 brk(0x9381000) = 0x9381000 rt_sigaction(SIGALRM, {0x8048550, [], SA_RESTORER, 0x825a48}, NULL, 8) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7fcb988) = 12120 --- SIGALRM (Alarm clock) @ 0 (0) --- --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x925c90, FUTEX_WAIT, 2, NULL <unfinished ...> ----- any ideas/pointers would be most appreciated.
The problem is that the signal from the child arrives before the parent has time to finish the atfork work. This causes a deadlock. I added some code to the mainline version to prevent this.