|Summary:||Multiple stopped threads aren't terminating|
|Product:||frysk||Reporter:||Andrew Cagney <cagney>|
|Component:||general||Assignee:||Chris Moller <cmoller>|
|Bug Depends on:||3502|
C testcase for this bug
Testcase for this bug
Description Andrew Cagney 2006-10-18 14:44:05 UTC
Downstream bug: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=210693 The frysk TestProcStopped testMultiThreadedStoppedAckDaemon case hangs during teardown. Attempts are made to kill each of two ptrace-attached threads: kill -KILL 2741 kill -KILL 2741 kill -CONT 2741 kill -CONT 2741 detach -KILL 2741 detach -KILL 2741 but a subsequent waitpid(-1,...) blocks indefinitely, suggesting that the kill signals were never delivered. Kernel 2.6.18-1.2725.el5 How reproducible: 100% Steps to Reproduce: 1. Install a kernel with the latest utrace tatch. 2. Install and build frysk 3. cd to the frysk build directoy/frysk-core 4. Run ./TestRunner -c FINE frysk.proc.TestProcStopped Actual results: Test hangs after the testMultiThreadedStoppedAckDaemon. Expected results: Test Runs to completion. Additional info: May be related to bug 207674: PTRACE_DETACH doesn't deliver signals under utrace. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=207674, the difference being that 207674 dealt with runnable procs but this bug deals with stopped procs. This is entirely my conjecture at this point, based on very little investigation.
Comment 1 Chris Moller 2006-10-23 15:17:45 UTC
Created attachment 1387 [details] C testcase for this bug Anyway, what the test does is start a few procs, reparent them to init, ptrace-attaches them, and then goes through the frysk tearDown sequence to detach and kill them. This works, init seems to get notified that the procs have been killed, but frysk doesn't.
Comment 2 Andrew Cagney 2006-10-24 14:36:03 UTC
Chris, I'm seeing this test also fail on FC-5? FAIL: frysk3381/reparent
Comment 3 Andrew Cagney 2006-11-10 18:01:27 UTC
I saw notes on what fail/pass behavior this test was looking for, can that be added here? Also, on FC-5, this test appears to largely pass (at least follow the expected behavior). Dies just at end, probably a nit, will create separate bug for that fix.
Comment 4 Chris Moller 2006-11-15 05:06:42 UTC
Created attachment 1421 [details] Testcase for this bug What appears to have been happening is that buggy kernels either don't deliver kill(pid, SIGKILL) signals to attached processes, or prevent the process from acting on that signal. This testcase creates and attaches a few child procs, waitpid()s to make sure the attach succeeds, then kill(pid, SIGKILL)s the procs. It then spins on a waitpid(-1, NULL, WNOHANG) until a non-positive pid is returned. If /no/ positive pids are returned, it is assumed that the kill()s did not succeed and the test fails; otherwise it passes. This test works as expected by passing on FC5 machines and an FC6 machine with a 2.6.18-1.2849.fc6 kernel, and failing otherwise.
Comment 5 Andrew Cagney 2006-11-23 21:00:55 UTC
Test f3381 is passing on broken FC 5 and FC 6 systems!
Comment 6 Chris Moller 2006-11-30 16:46:41 UTC
All f3381 does is check that a kill(pid, SIGKILL) to an attached stopped process actually succeeds in killing the process. Ptrace and some older version of utrace didn't do that and it appears that Roland did something to change the behaviour, presumably because SIGKILLs should always work. All that passing the test means is that SIGKILLs work under the circumstances described--it doesn't imply a thing about otherwise "broken FC 5 and FC 6 systems." The test that demonstrated this failure mode (frysk.proc.TestProcStopped.testMultiThreadedStoppedAckDaemon) is still failing, but it fails only intermittantly now and appears to be failing by a different mechanism. I'm trying to isolate the mechanism now--when I figure it out, I'll try to come up with another C testcase that demonstrates it reliably.
Comment 7 Andrew Cagney 2006-11-30 20:04:18 UTC
This looks very similar to: http://sourceware.org/bugzilla/show_bug.cgi?id=3595 for which I created a test and it gets the results: fail: 2.6.18-1.2239.fc5 (my machine) fail: 2.6.18-1.2849.fc6 (towns) pass: 2.6.17-1.2174_FC5 (toadstool)