3491 – testTerm(frysk.proc.TestTaskTerminateObserver)junit.framework.AssertionFailedError: event loop run explictly stopped (startChild (Sig_HUP))

Bug 3491 - testTerm(frysk.proc.TestTaskTerminateObserver)junit.framework.AssertionFailedError: event loop run explictly stopped (startChild (Sig_HUP))

Summary: testTerm(frysk.proc.TestTaskTerminateObserver)junit.framework.AssertionFailed...

Status:	RESOLVED INVALID

Alias:	None

Product:	frysk
Classification:	Unclassified
Component:	general (show other bugs)
Version:	unspecified

Importance:	P2 normal
Target Milestone:	---
Assignee:	Andrew Cagney

URL:
Keywords:

Depends on:
Blocks:	1496 1582 3385 2654
	Show dependency tree / graph

Reported:	2006-11-08 18:19 UTC by Andrew Cagney
Modified:	2007-07-16 22:16 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrew Cagney 2006-11-08 18:19:34 UTC

testTerm(frysk.proc.TestTaskTerminateObserver)junit.framework.AssertionFailedError:
event loop run explictly stopped (startChild (Sig_HUP))
   at frysk.proc.TestLib.assertRunUntilStop(TestRunner)
   at frysk.proc.TestLib.assertRunUntilStop(TestRunner)
   at frysk.proc.TestLib$AckHandler.assertAwait(TestRunner)
   at frysk.proc.TestLib$AckHandler.await(TestRunner)
   at frysk.proc.TestLib$Child.<init>(TestRunner)
   at frysk.proc.TestLib$AckProcess.<init>(TestRunner)
   at frysk.proc.TestLib$DetachedAckProcess.<init>(TestRunner)
   at frysk.proc.TestTaskTerminateObserver.testTerm(TestRunner)
   at frysk.junit.Runner.runCases(TestRunner)
   at frysk.junit.Runner.runArchCases(TestRunner)
   at frysk.junit.Runner.runTestCases(TestRunner)
   at TestRunner.main(TestRunner)

Comment 1 Andrew Cagney 2006-11-26 21:24:16 UTC

This is a utrace bug, block on FC 6, not FC 5.

Comment 2 Andrew Cagney 2006-11-27 20:29:16 UTC

This is due to a .17 vs .18.utrace change in behavior.

Given a non-main task that has exited, but not yet been joined, in kernel.17
that task would appear in /proc in the state 'X', in kernel.18.utrace the task
completely disappears.

Is this considered a change in defined behavior?

Comment 3 Andrew Cagney 2006-11-27 21:43:36 UTC

Rhel 5 bug:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217433

Comment 4 Andrew Cagney 2006-11-27 21:47:13 UTC

The test proper was rewritten, but the problem / question still exists.

Index: frysk-core/frysk/pkglibexecdir/ChangeLog
2006-11-27  Andrew Cagney  <cagney@redhat.com>

        * funit-threadexit.c (running_thread_can_exit): New barrier.
        (main, op_thread): Use running_thread_can_exit to block thread's
        exit until after main has opened the thread's /proc/stat file.

        * funit-threadexit.c (scan_thread): Delete
        (main): Do the scan for thread in 'X' state here, instead of in
        scan_thread.  Create only one thread.
        (condition_cond, condition_mutex): Delete.
        (thread_running_barrier): Rename "barrier".
        (thread_id): Make volatile.
        (op_thread): Simplify, use only one barrier.


Index: frysk-core/frysk/proc/ChangeLog
2006-11-27  Andrew Cagney  <cagney@redhat.com>

        * TestTaskTerminateObserver.java (TerminatingCounter.addedTo):
        Add; stop the event loop.
        (testAttachToUnJoinedTask): Rename testTerm; simplify, explicitly
        terminate the thread.

Comment 5 Andrew Cagney 2006-11-27 22:05:54 UTC

Test case added, closing.

Index: frysk-imports/tests/ChangeLog
2006-11-27  Andrew Cagney  <cagney@redhat.com>

        * frysk3491/x-state.c: New file.
        * Makefile.am (TESTS, noinst_PROGRAMS): Add frysk3491/x-state.
        (frysk3491_x_state_SOURCES, frysk3491_x_state_LDFLAGS): Define.

Comment 6 Andrew Cagney 2007-02-06 19:21:26 UTC

Moving to suspended state ...

Comment 7 Elena Zannoni 2007-02-06 19:39:37 UTC

Bug is still there in the fc6 kernel. The redhat bugzilla bug referred to in the
comments is not accessible from outside Red Hat. There is no public status
on this problem. Can somebody in RH please clone 
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217433
as a the kernel bug for FC6?

Comment 8 Andrew Cagney 2007-02-06 19:49:01 UTC

Suspending bug while upstream issue resolved.

(It looks like this test is illustrating a race condition that became more racy
in the switch from .17 to .18 based kernels).

Comment 9 Kris Van Hees 2007-02-09 01:31:52 UTC

Looking closer at this problem, I think that the problem is actually with the
test itself.  It contains the following code (specific to FC6):

      char buffer [1024];
      int n = pread (fd, buffer, sizeof (buffer), 0);
      if (n <= 0) {
        // On FC-6 the thread completly disappears from /proc.
        if (errno == ESRCH) {
          printf ("%d.%d pread returns %d (%s)\n",
                  getpid (), gettid(),
                  errno, strerror (errno));
          exit (1);
        }
        perror ("pread");
        exit (1);
      }

Given that the comment states that the behaviour in FC6 is that the thread
disappears completely from /proc, the exit code should be 0 in this case,
signaling a successful completion of the test case.

I don't have CVS commit privs as far as I know, so could someone make this
change (assuming of course I am right)?

Comment 10 Andrew Cagney 2007-02-09 14:59:41 UTC

(In reply to comment #9)
> Looking closer at this problem, I think that the problem is actually with the
> test itself.

That was deliberate - detect the specific condition causing the corresponding
test to fail and then exit with failure on that.

Here's a description of what is going on from roland:
> please show the /proc/pid/status contents with X state.
> The X (EXIT_DEAD) state means in the middle of being reaped.
> For a noninitial nptl thread, this means almost finished dying,
> since the threads "reap" themselves (when not ptraced).  There
> is just a short race window after the thread starts dying when
> it can still be looked up in /proc.  I suspect nothing changed
> but the timing.  The only non-race way you can ever see X state
> is if you opened an fd on the /proc file before it died, then
> read later from that open fd.

So the kernel test needs adjusting, and a lot more comments, and the
correspnding testTerm might need a re-think.

Comment 11 Andrew Cagney 2007-07-16 22:16:24 UTC

There is a race between ptrace/waitpid seeing an event and /proc/$$/stat[us]
seeing or reflecting that same event.  Consequently what can be seen on one
kernel (here X state) won't be seen on later kernels.