This is the mail archive of the
mailing list for the glibc project.
[PATCH v2] test-skeleton: Kill any child process's offspring
- From: "Maciej W. Rozycki" <macro at codesourcery dot com>
- To: <libc-alpha at sourceware dot org>
- Cc: Ondřej Bílka <neleai at seznam dot cz>, Roland McGrath <roland at hack dot frob dot com>
- Date: Mon, 23 Jun 2014 01:25:34 +0100
- Subject: [PATCH v2] test-skeleton: Kill any child process's offspring
- Authentication-results: sourceware.org; auth=none
- References: <alpine dot DEB dot 1 dot 10 dot 1309231748080 dot 4379 at tp dot orcam dot me dot uk> <20131011163332 dot GA20086 at domone dot podge> <20140210171709 dot GA27843 at domone>
On Mon, 10 Feb 2014, Ondřej Bílka wrote:
> friendly ping
> On Fri, Oct 11, 2013 at 06:33:32PM +0200, Ondřej Bílka wrote:
> > On Mon, Sep 23, 2013 at 09:24:36PM +0100, Maciej W. Rozycki wrote:
> > > Hi,
> > >
> > > Some of the test cases driven by our test-skeleton create subprocesses.
> > > In a test scenario involving what looks to me like a Linux kernel bug I
> > > have observed stray processes of this kind left behind where their parent
> > > was killed by test-skeleton after a timeout however the process in
> > > question was not. This leads to a clutter in the environment our test
> > > suite is run within and on weaker systems may also affect test results
> > > where the processing power consumed by such a leftover causes the lack of
> > > same for subsequent test cases and consequently timeouts.
> > >
> > > The cause of leaving such processes behind is signal_handler that calls
> > > kill to send SIGKILL to test-skeleton's immediate child only and not any
> > > further descendants. Conveniently as the child starts it places itself in
> > > a separate process group that is then inherited by any further children.
> > > Therefore a simple if not obvious fix to this problem is to send SIGKILL
> > > to the child's process group instead which this change implements. There
> > > is no need to wait for any additional descendants left as the exit status
> > > of the immediate child is enough for our purposes and init(8) will take
> > > care of the rest in the unlikely case this change addresses.
> > >
> > > The problem itself was observed with nptl/tst-mutexpi9 and that this fix
> > > makes go away -- no stray processes after the test suite has terminated
> > > anymore (nptl/tst-mutexpi9 has its own problem of course on this target,
> > > but that's another matter; I think there's a value in making the test
> > > suite itself more robust).
> > >
> > > OK to apply?
> > >
> > What is status of this patch?
Here's a version updated according to concerns expressed. Code now
checks for an error from setpgid and in the unlikely event of one it
retains the current behaviour and kills itself only rather than the
The issue with nptl/tst-mutexpi9 has since gone for one reason or
another, so this has been only lightly tested, making sure that another
test case observed to time out kills itself.
OK to apply?
2014-06-23 Maciej W. Rozycki <email@example.com>
* test-skeleton.c (signal_handler): Undo any PID negation made
(main): Negate the PID if setpgid succeeded, otherwise report an
--- glibc-fsf-trunk-quilt.orig/test-skeleton.c 2014-06-17 16:03:16.621929397 +0100
+++ glibc-fsf-trunk-quilt/test-skeleton.c 2014-06-21 00:03:23.141622346 +0100
@@ -138,8 +138,9 @@ signal_handler (int sig __attribute__ ((
- /* Send signal. */
+ /* Send signal. Undo any PID negation made after setpgid. */
kill (pid, SIGKILL);
+ pid = abs (pid);
/* Wait for it to terminate. */
@@ -341,8 +342,14 @@ main (int argc, char *argv)
/* We put the test process in its own pgrp so that if it bogusly
- generates any job control signals, they won't hit the whole build. */
- setpgid (0, 0);
+ generates any job control signals, they won't hit the whole build.
+ Negate the PID if successful so that kill targets the PGID, to
+ reach any child process's offspring. */
+ status = setpgid (0, 0);
+ if (status == 0)
+ pid = -pid;
+ printf ("Failed to set the process group ID: %m\n");
/* Execute the test function and exit with the return value. */