|Summary:||getpid() wrong in child's signal handler after clone()|
|Product:||glibc||Reporter:||Michael Kerrisk <mtk.manpages>|
|Component:||libc||Assignee:||Ulrich Drepper <drepper.fsp>|
Description Michael Kerrisk 2008-09-22 11:44:12 UTC
As at glibc 2.8, glibc caching of PIDs for getpid() means that if a signal is delivered to the child soon after a clone() (i.e., before the child has a chance to update the cache), then a call to getpid() within the signal handler in the child returns the wrong value. To test this, the attached program creates a child process that continuously sends a SIGQUIT signal to the process group. Meanwhile the parent loops creating children that sleep for a moment, and then terminate. In that time, the SIGQUIT handler will be invoked in the child. If the getpid() cache has not yet been updated, then it will (occasionally) happen that the values returned by glibc's getpid() and a raw syscall(SYS_getpid) will not match. When that occurs, the child prints a message noting the mismatch. If this program is invoked with any command-line argument, then it uses fork() instead of clone(). This can be used to show that the problem does not occur for fork().
Comment 1 Michael Kerrisk 2008-09-22 11:47:13 UTC
Created attachment 2959 [details] Test program When running this program on glibc 2.8 on an i386 system, I see output such as the following: $ ./clone_getpid_sighandler_bug Before clone getpid() = 1991 sigsender PID = 1993 getpid() mismatch (loop=2710): getpid()=1991; syscall(SYS_getpid)=4823 getpid() mismatch (loop=5383): getpid()=1991; syscall(SYS_getpid)=7504 getpid() mismatch (loop=5383): getpid()=1991; syscall(SYS_getpid)=7504
Comment 2 Ulrich Drepper 2008-09-22 23:56:31 UTC
You cannot use clone this way. In fact, nobody should use clone. There are assumptions made in the system about the way clone is used. If you want to use clone you have to do everything yourself, including preparing the thread descriptor.
Comment 3 Michael Kerrisk 2008-09-23 01:12:42 UTC
(In reply to comment #2) > You cannot use clone this way. In fact, nobody should use clone. There are > assumptions made in the system about the way clone is used. If you want to use > clone you have to do everything yourself, including preparing the thread descriptor. All of this does kind of beg the question: why does glibc provide a clone() wrapper then?