Summary: | Multiple stopped threads aren't terminating | ||
---|---|---|---|
Product: | frysk | Reporter: | Andrew Cagney <cagney> |
Component: | general | Assignee: | Chris Moller <cmoller> |
Status: | RESOLVED DUPLICATE | ||
Severity: | normal | CC: | cmoller |
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Bug Depends on: | 3502 | ||
Bug Blocks: | 3595 | ||
Attachments: |
C testcase for this bug
Testcase for this bug |
Description
Andrew Cagney
2006-10-18 14:44:05 UTC
Created attachment 1387 [details]
C testcase for this bug
Anyway, what the test does is start a few procs, reparent them to init,
ptrace-attaches them, and then goes through the frysk tearDown sequence
to detach and kill them. This works, init seems to get notified that
the procs have been killed, but frysk doesn't.
Chris, I'm seeing this test also fail on FC-5? FAIL: frysk3381/reparent I saw notes on what fail/pass behavior this test was looking for, can that be added here? Also, on FC-5, this test appears to largely pass (at least follow the expected behavior). Dies just at end, probably a nit, will create separate bug for that fix. Created attachment 1421 [details]
Testcase for this bug
What appears to have been happening is that buggy kernels either don't deliver
kill(pid, SIGKILL) signals to attached processes, or prevent the process from
acting on that signal. This testcase creates and attaches a few child procs,
waitpid()s to make sure the attach succeeds, then kill(pid, SIGKILL)s the
procs. It then spins on a waitpid(-1, NULL, WNOHANG) until a non-positive pid
is returned. If /no/ positive pids are returned, it is assumed that the
kill()s did not succeed and the test fails; otherwise it passes.
This test works as expected by passing on FC5 machines and an FC6 machine with
a 2.6.18-1.2849.fc6 kernel, and failing otherwise.
Test f3381 is passing on broken FC 5 and FC 6 systems! All f3381 does is check that a kill(pid, SIGKILL) to an attached stopped process actually succeeds in killing the process. Ptrace and some older version of utrace didn't do that and it appears that Roland did something to change the behaviour, presumably because SIGKILLs should always work. All that passing the test means is that SIGKILLs work under the circumstances described--it doesn't imply a thing about otherwise "broken FC 5 and FC 6 systems." The test that demonstrated this failure mode (frysk.proc.TestProcStopped.testMultiThreadedStoppedAckDaemon) is still failing, but it fails only intermittantly now and appears to be failing by a different mechanism. I'm trying to isolate the mechanism now--when I figure it out, I'll try to come up with another C testcase that demonstrates it reliably. This looks very similar to: http://sourceware.org/bugzilla/show_bug.cgi?id=3595 for which I created a test and it gets the results: fail: 2.6.18-1.2239.fc5 (my machine) fail: 2.6.18-1.2849.fc6 (towns) pass: 2.6.17-1.2174_FC5 (toadstool) |