Bug 3491

Summary: testTerm(frysk.proc.TestTaskTerminateObserver)junit.framework.AssertionFailedError: event loop run explictly stopped (startChild (Sig_HUP))
Product: frysk Reporter: Andrew Cagney <cagney>
Component: generalAssignee: Andrew Cagney <cagney>
Status: RESOLVED INVALID    
Severity: normal CC: cmoller, scox
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Bug Depends on:    
Bug Blocks: 1496, 1582, 3385, 2654    

Description Andrew Cagney 2006-11-08 18:19:34 UTC
testTerm(frysk.proc.TestTaskTerminateObserver)junit.framework.AssertionFailedError:
event loop run explictly stopped (startChild (Sig_HUP))
   at frysk.proc.TestLib.assertRunUntilStop(TestRunner)
   at frysk.proc.TestLib.assertRunUntilStop(TestRunner)
   at frysk.proc.TestLib$AckHandler.assertAwait(TestRunner)
   at frysk.proc.TestLib$AckHandler.await(TestRunner)
   at frysk.proc.TestLib$Child.<init>(TestRunner)
   at frysk.proc.TestLib$AckProcess.<init>(TestRunner)
   at frysk.proc.TestLib$DetachedAckProcess.<init>(TestRunner)
   at frysk.proc.TestTaskTerminateObserver.testTerm(TestRunner)
   at frysk.junit.Runner.runCases(TestRunner)
   at frysk.junit.Runner.runArchCases(TestRunner)
   at frysk.junit.Runner.runTestCases(TestRunner)
   at TestRunner.main(TestRunner)
Comment 1 Andrew Cagney 2006-11-26 21:24:16 UTC
This is a utrace bug, block on FC 6, not FC 5.
Comment 2 Andrew Cagney 2006-11-27 20:29:16 UTC
This is due to a .17 vs .18.utrace change in behavior.

Given a non-main task that has exited, but not yet been joined, in kernel.17
that task would appear in /proc in the state 'X', in kernel.18.utrace the task
completely disappears.

Is this considered a change in defined behavior?
Comment 3 Andrew Cagney 2006-11-27 21:43:36 UTC
Rhel 5 bug:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217433
Comment 4 Andrew Cagney 2006-11-27 21:47:13 UTC
The test proper was rewritten, but the problem / question still exists.

Index: frysk-core/frysk/pkglibexecdir/ChangeLog
2006-11-27  Andrew Cagney  <cagney@redhat.com>

        * funit-threadexit.c (running_thread_can_exit): New barrier.
        (main, op_thread): Use running_thread_can_exit to block thread's
        exit until after main has opened the thread's /proc/stat file.

        * funit-threadexit.c (scan_thread): Delete
        (main): Do the scan for thread in 'X' state here, instead of in
        scan_thread.  Create only one thread.
        (condition_cond, condition_mutex): Delete.
        (thread_running_barrier): Rename "barrier".
        (thread_id): Make volatile.
        (op_thread): Simplify, use only one barrier.


Index: frysk-core/frysk/proc/ChangeLog
2006-11-27  Andrew Cagney  <cagney@redhat.com>

        * TestTaskTerminateObserver.java (TerminatingCounter.addedTo):
        Add; stop the event loop.
        (testAttachToUnJoinedTask): Rename testTerm; simplify, explicitly
        terminate the thread.
Comment 5 Andrew Cagney 2006-11-27 22:05:54 UTC
Test case added, closing.

Index: frysk-imports/tests/ChangeLog
2006-11-27  Andrew Cagney  <cagney@redhat.com>

        * frysk3491/x-state.c: New file.
        * Makefile.am (TESTS, noinst_PROGRAMS): Add frysk3491/x-state.
        (frysk3491_x_state_SOURCES, frysk3491_x_state_LDFLAGS): Define.
Comment 6 Andrew Cagney 2007-02-06 19:21:26 UTC
Moving to suspended state ...
Comment 7 Elena Zannoni 2007-02-06 19:39:37 UTC
Bug is still there in the fc6 kernel. The redhat bugzilla bug referred to in the
comments is not accessible from outside Red Hat. There is no public status
on this problem. Can somebody in RH please clone 
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217433
as a the kernel bug for FC6?
Comment 8 Andrew Cagney 2007-02-06 19:49:01 UTC
Suspending bug while upstream issue resolved.

(It looks like this test is illustrating a race condition that became more racy
in the switch from .17 to .18 based kernels).
Comment 9 Kris Van Hees 2007-02-09 01:31:52 UTC
Looking closer at this problem, I think that the problem is actually with the
test itself.  It contains the following code (specific to FC6):

      char buffer [1024];
      int n = pread (fd, buffer, sizeof (buffer), 0);
      if (n <= 0) {
        // On FC-6 the thread completly disappears from /proc.
        if (errno == ESRCH) {
          printf ("%d.%d pread returns %d (%s)\n",
                  getpid (), gettid(),
                  errno, strerror (errno));
          exit (1);
        }
        perror ("pread");
        exit (1);
      }

Given that the comment states that the behaviour in FC6 is that the thread
disappears completely from /proc, the exit code should be 0 in this case,
signaling a successful completion of the test case.

I don't have CVS commit privs as far as I know, so could someone make this
change (assuming of course I am right)?
Comment 10 Andrew Cagney 2007-02-09 14:59:41 UTC
(In reply to comment #9)
> Looking closer at this problem, I think that the problem is actually with the
> test itself.

That was deliberate - detect the specific condition causing the corresponding
test to fail and then exit with failure on that.

Here's a description of what is going on from roland:
> please show the /proc/pid/status contents with X state.
> The X (EXIT_DEAD) state means in the middle of being reaped.
> For a noninitial nptl thread, this means almost finished dying,
> since the threads "reap" themselves (when not ptraced).  There
> is just a short race window after the thread starts dying when
> it can still be looked up in /proc.  I suspect nothing changed
> but the timing.  The only non-race way you can ever see X state
> is if you opened an fd on the /proc file before it died, then
> read later from that open fd.

So the kernel test needs adjusting, and a lot more comments, and the
correspnding testTerm might need a re-think.
Comment 11 Andrew Cagney 2007-07-16 22:16:24 UTC
There is a race between ptrace/waitpid seeing an event and /proc/$$/stat[us]
seeing or reflecting that same event.  Consequently what can be seen on one
kernel (here X state) won't be seen on later kernels.