6910 – getpid() wrong in child's signal handler after clone()

Bug 6910 - getpid() wrong in child's signal handler after clone()

Summary: getpid() wrong in child's signal handler after clone()

Status:	RESOLVED WONTFIX

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	libc (show other bugs)
Version:	2.8

Importance:	P2 normal
Target Milestone:	---
Assignee:	Ulrich Drepper

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-09-22 11:44 UTC by Michael Kerrisk
Modified:	2014-07-02 07:22 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Flags:	fweimer: security-

Attachments
Test program (1.39 KB, text/plain) 2008-09-22 11:47 UTC, Michael Kerrisk	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael Kerrisk 2008-09-22 11:44:12 UTC

As at glibc 2.8, glibc caching of PIDs for getpid() means that if a signal is
delivered to the child soon after a clone() (i.e.,  before the child has a
chance to update the cache), then a call to getpid() within the signal handler
in the child returns the wrong value.   

To test this, the attached program creates a child process that continuously
sends a SIGQUIT signal to the process group.  Meanwhile the parent loops
creating children that sleep for a moment, and then terminate.  In that time,
the SIGQUIT handler will be invoked in the child.  If the getpid() cache has not
yet been updated, then it will (occasionally) happen that the values returned by
glibc's getpid() and a raw syscall(SYS_getpid) will not match.  When that
occurs, the child prints a message noting the mismatch.

If this program is invoked with any command-line argument, then it uses fork()
instead of clone().  This can be used to show that the problem does not occur
for fork().

Comment 1 Michael Kerrisk 2008-09-22 11:47:13 UTC

Created attachment 2959 [details]
Test program

When running this program on glibc 2.8 on an i386 system, I see output such as
the following:

$ ./clone_getpid_sighandler_bug
Before clone getpid() = 1991
sigsender PID = 1993
getpid() mismatch (loop=2710): getpid()=1991; syscall(SYS_getpid)=4823
getpid() mismatch (loop=5383): getpid()=1991; syscall(SYS_getpid)=7504
getpid() mismatch (loop=5383): getpid()=1991; syscall(SYS_getpid)=7504

Comment 2 Ulrich Drepper 2008-09-22 23:56:31 UTC

You cannot use clone this way.  In fact, nobody should use clone.  There are
assumptions made in the system about the way clone is used.  If you want to use
clone you have to do everything yourself, including preparing the thread descriptor.

Comment 3 Michael Kerrisk 2008-09-23 01:12:42 UTC

(In reply to comment #2)
> You cannot use clone this way.  In fact, nobody should use clone.  There are
> assumptions made in the system about the way clone is used.  If you want to 
use
> clone you have to do everything yourself, including preparing the thread 
descriptor.

All of this does kind of beg the question: why does glibc provide a clone() 
wrapper then?