Bug 6910

Summary: getpid() wrong in child's signal handler after clone()
Product: glibc Reporter: Michael Kerrisk <mtk.manpages>
Component: libcAssignee: Ulrich Drepper <drepper.fsp>
Status: RESOLVED WONTFIX    
Severity: normal CC: glibc-bugs, michael.kerrisk
Priority: P2 Flags: fweimer: security-
Version: 2.8   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: Test program

Description Michael Kerrisk 2008-09-22 11:44:12 UTC
As at glibc 2.8, glibc caching of PIDs for getpid() means that if a signal is
delivered to the child soon after a clone() (i.e.,  before the child has a
chance to update the cache), then a call to getpid() within the signal handler
in the child returns the wrong value.   

To test this, the attached program creates a child process that continuously
sends a SIGQUIT signal to the process group.  Meanwhile the parent loops
creating children that sleep for a moment, and then terminate.  In that time,
the SIGQUIT handler will be invoked in the child.  If the getpid() cache has not
yet been updated, then it will (occasionally) happen that the values returned by
glibc's getpid() and a raw syscall(SYS_getpid) will not match.  When that
occurs, the child prints a message noting the mismatch.

If this program is invoked with any command-line argument, then it uses fork()
instead of clone().  This can be used to show that the problem does not occur
for fork().
Comment 1 Michael Kerrisk 2008-09-22 11:47:13 UTC
Created attachment 2959 [details]
Test program

When running this program on glibc 2.8 on an i386 system, I see output such as
the following:

$ ./clone_getpid_sighandler_bug
Before clone getpid() = 1991
sigsender PID = 1993
getpid() mismatch (loop=2710): getpid()=1991; syscall(SYS_getpid)=4823
getpid() mismatch (loop=5383): getpid()=1991; syscall(SYS_getpid)=7504
getpid() mismatch (loop=5383): getpid()=1991; syscall(SYS_getpid)=7504
Comment 2 Ulrich Drepper 2008-09-22 23:56:31 UTC
You cannot use clone this way.  In fact, nobody should use clone.  There are
assumptions made in the system about the way clone is used.  If you want to use
clone you have to do everything yourself, including preparing the thread descriptor.
Comment 3 Michael Kerrisk 2008-09-23 01:12:42 UTC
(In reply to comment #2)
> You cannot use clone this way.  In fact, nobody should use clone.  There are
> assumptions made in the system about the way clone is used.  If you want to 
use
> clone you have to do everything yourself, including preparing the thread 
descriptor.

All of this does kind of beg the question: why does glibc provide a clone() 
wrapper then?