Bug 3639 - fc6: testTerminateKillKILL(frysk.proc.TestTaskTerminateObserver)
Summary: fc6: testTerminateKillKILL(frysk.proc.TestTaskTerminateObserver)
Status: RESOLVED FIXED
Alias: None
Product: frysk
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Andrew Cagney
URL:
Keywords:
: 3640 (view as bug list)
Depends on: 3640
Blocks: 3385 2654 3489
  Show dependency treegraph
 
Reported: 2006-12-04 18:07 UTC by Chris Moller
Modified: 2011-03-16 21:19 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
side by side trace comparison (523 bytes, text/plain)
2007-03-27 04:39 UTC, Chris Moller
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Moller 2006-12-04 18:07:58 UTC
testTerminateKillKILL(frysk.proc.TestTaskTerminateObserver)junit.framework.AssertionFailedError:
terminating value expected:<-9> but was:<128>
   at frysk.proc.TestTaskTerminateObserver.check(TestRunner)
   at frysk.proc.TestTaskTerminateObserver.terminate(TestRunner)
   at frysk.proc.TestTaskTerminateObserver.testTerminateKillKILL(TestRunner)
   at frysk.junit.Runner.runCases(TestRunner)
   at frysk.junit.Runner.runArchCases(TestRunner)
   at frysk.junit.Runner.runTestCases(TestRunner)
   at TestRunner.main(TestRunner)
Comment 1 Andrew Cagney 2006-12-04 20:05:16 UTC

*** This bug has been marked as a duplicate of 3489 ***
Comment 2 Andrew Cagney 2007-01-31 22:15:26 UTC
Fixes to make exit47 test case pass, do not fix this bug.  Assuming a separate
problem and re-splitting.
Comment 3 Andrew Cagney 2007-01-31 22:16:28 UTC
*** Bug 3640 has been marked as a duplicate of this bug. ***
Comment 4 Andrew Cagney 2007-01-31 22:22:01 UTC
RHEL https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=226684
Comment 5 Chris Moller 2007-03-26 17:26:04 UTC
What's happening is that in TestTaskTerminateObserver.java,
Terminate.updateTerminating(...) is never executed when funit-exit is passed a -
Sig.KILL_.  Before I go to all the effort of possibly re-chasing an already
chased bug, I thought I'd ask about the possibility that this is related to the
x-state bug; i.e., is a "terminating" condition related to the transient X
state?  If so, if you SIGKILL a process, is it possible to miss a "terminating"
and go straight to "terminated?"

(A little background for Roland, to whom I've cc-ed this:  this is a bug where a
process being sent a kill(SIGKILL) should trigger a "terminating" observer, but
doesn't seem to be doing that.)
Comment 6 Andrew Cagney 2007-03-26 17:55:05 UTC
The waitpid call should return a reasonable value.  For instance:

-> some sort of error indication, for instance ESRCH, EINTR, ...

-> PID killed with -9

But instead it's getting back that the process exited with 128.

Our tests show that exit47, the previous bug, is fixed.
Comment 7 Chris Moller 2007-03-26 19:34:15 UTC
Okay, I'll chase that, but it's not the only possibility. 
TestTaskTerminateObserver.Terminate initialises int terminating = INVALID (where
INVALID = 128), but public Action updateTerminating (...) never runs, leaving
terminating = INVALID, which is the proximate cause of the failure.  I'll wire
up the waitpid() and check if it's causing the problem.

The exit47 test does indeed pass.
Comment 8 Andrew Cagney 2007-03-26 21:10:55 UTC
Ah, so it is seeing no notification at all that the task was terminated?
Comment 9 Chris Moller 2007-03-26 21:23:32 UTC
Yeah, for SIGKILL, updateTerminating never gets hit; updateTerminated does.  The
pattern of waitpid returns is similar in all test cases (terminate(),
terminated(), and terminating()), nothing unexpected, but the first two fail for
a lack of a terminating event.
Comment 10 Andrew Cagney 2007-03-26 21:51:34 UTC
Nice analysis.

Can you check that this is an intended behavior change with Roland.
Comment 11 Roland McGrath 2007-03-26 21:52:19 UTC
I don't know what's going on in the frysk world, but a couple things from the
kernel side might be relevant.

First is that for death by SIGKILL you may well not see any EXIT event
(WIFSTOPPED, SIGTRAP|PTRACE_EVENT_EXIT<<16).
You will see the death event (WIFSIGNALED) for sure except possibly in the case
of multi-threaded exec by a non-leader thread (when you won't see a report from
the old leader, but the exec'ing thread will change its PID to the leader's).

Second is there is a rare race bug in kernels before the recent test kernels,
that can produce a bogus wait status value.  The bogus value will be a
WIFSTOPPED with WSTOPSIG 0 or some high bits set.  This is a very unlikely race.
 Also, it doesn't produce a WIFEXITED value in the bug case, so it doesn't seem
likely to be relevant to what you are seeing.

Nothing comes to mind with a bogus status of 0x8000.  An _exit(128) produces
that status.
Comment 12 Chris Moller 2007-03-27 04:39:58 UTC
Created attachment 1653 [details]
side by side trace comparison

This attachment shows the diagnostic output of two failing tests and one
passing test, all of which do a kill(SIGKILL).	None of the tests get a
"terminating" event--the one that passes does so only because it's not
/expecting/ a terminating event.

None of the waitpid()s look wrong to me except possibly lines 13 and 14: should
there really be two waitpid()s in a row returning WIFSIGNALED(9) on the same
task?
Comment 13 Roland McGrath 2007-03-27 04:49:05 UTC
Your trace doesn't indicate whether different threads are doing different wait
calls or did ptrace calls or forks.  If thread A forks a child, and thread B
does PTRACE_ATTACH to that child, then on death there is one report "to B" (but
available to all threads in the same process calling wait*) and then there is a
second one "to A".  The second one happens because you are the real parent of
the child that is no longer ptrace'd after the ptracer's wait returns
WIFSIGNALED/WIFEXITED.  The first one happens because you are the ptracer but
not the real parent, but there are two of you so all things can be true and false.
Comment 14 Chris Moller 2007-03-27 17:07:33 UTC
Here's what's happening:

1. Wait.cxx:processStatus() decodes the waitpid status and if (WIFSTOPPED
(status) && (PTRACE_EVENT_EXIT ==  WSTOPEVENT (status))) it calls exitEvent()

2. LinuxPtraceHost.PollWaitOnSigChld.exitEvent() calls processTerminatingEvent()

3. Task.processTerminatingEvent() calls .handleTerminatingEvent()

4. LinuxPtraceTaskState.handleTerminatingEvent() calls notifyTerminating()

5. Task.notifyTerminating() calls updateTerminating()

6. TestTaskTerminateObserver. updateTerminating sets the int terminating value.

If, in Wait.cxx:processStatus(), status == 9,  (KILL), WIFSIGNALED (status) is
true rather than WIFSTOPPED (status), so none of the foregoing happens, causing
the test to fail.  What I don't know is if the process described above is in
fact what the programmer who wrote it intended and the test exercises conditions
that weren't meant to be exercised, or if the process described is flawed or
incomplete.
Comment 15 Andrew Cagney 2007-04-05 16:22:41 UTC
(In reply to comment #14)

> If, in Wait.cxx:processStatus(), status == 9,  (KILL), WIFSIGNALED (status) is
> true rather than WIFSTOPPED (status), so none of the foregoing happens, causing
> the test to fail.  What I don't know is if the process described above is in
> fact what the programmer who wrote it intended and the test exercises conditions
> that weren't meant to be exercised, or if the process described is flawed or
> incomplete.

The programmer, me, didn't know that the "terminating" event was not guarenteed
when the process was killed using -9.  Just the testcase needs to be adjusted to
be more flexable.
Comment 16 Andrew Cagney 2007-07-04 19:06:51 UTC
Index: frysk-core/frysk/proc/ChangeLog
2007-07-04  Andrew Cagney  <cagney@redhat.com>

	* TestTaskTerminateObserver.java (check): Remove brokenIfUtraceXXX
	check for bug 3489.
	(testTerminateKillKILL, testTerminatingKillKILL): Delete.