Bug 2359

Summary: testRefreshZombie(frysk.proc.TestRefresh)junit.framework.AssertionFailedError: event loop run explictly stopped (waiting for ack)
Product: frysk Reporter: Andrew Cagney <cagney>
Component: generalAssignee: Unassigned <frysk-bugzilla>
Status: NEW ---    
Severity: normal CC: woodzltc
Priority: P1    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Bug Depends on: 2430, 2431, 2432    
Bug Blocks: 2081    

Description Andrew Cagney 2006-02-19 19:30:57 UTC
RHEL-4 system

testRefreshZombie(frysk.proc.TestRefresh)junit.framework.AssertionFailedError:
event loop run explictly stopped (waiting for ack)
   at _ZN4java4lang11VMThrowable16fillInStackTraceEPNS0_9ThrowableE
(/usr/lib/libgcj.so.6)
   at _ZN4java4lang9Throwable16fillInStackTraceEv (/usr/lib/libgcj.so.6)
   at _ZN4java4lang9ThrowableC1EPNS0_6StringE (/usr/lib/libgcj.so.6)
   at _ZN4java4lang5ErrorC1EPNS0_6StringE (/usr/lib/libgcj.so.6)
   at 0x080afcfc (Unknown Source)
   at 0x080aedc1 (Unknown Source)
   at 0x080aecc0 (Unknown Source)
   at 0x0807fbbe (Unknown Source)
   at 0x0807fa8a (Unknown Source)
   at 0x0807f2f5 (Unknown Source)
   at 0x0807f395 (Unknown Source)
   at 0x08082618 (Unknown Source)
   at ffi_call_SYSV (/usr/lib/libgcj.so.6)
   at ffi_call (/usr/lib/libgcj.so.6)
   at
_Z18_Jv_CallAnyMethodAPN4java4lang6ObjectEPNS0_5ClassEP10_Jv_MethodbbP6JArrayIS4_EP6jvalueSB_bS4_
(/usr/lib/libgcj.so.6)
   at
_Z18_Jv_CallAnyMethodAPN4java4lang6ObjectEPNS0_5ClassEP10_Jv_MethodbP6JArrayIS4_EPS7_IS2_ES4_
(/usr/lib/libgcj.so.6)
   at _ZN4java4lang7reflect6Method6invokeEPNS0_6ObjectEP6JArrayIS4_E
(/usr/lib/libgcj.so.6)
   at 0x080ae17e (Unknown Source)
   at 0x080adf96 (Unknown Source)
   at 0x080afcb4 (Unknown Source)
   at 0x080ae9cd (Unknown Source)
   at 0x080ae937 (Unknown Source)
   at 0x080adf64 (Unknown Source)
   at 0x080add7d (Unknown Source)
   at 0x080add3f (Unknown Source)
   at 0x080add7d (Unknown Source)
   at 0x080add3f (Unknown Source)
   at 0x080ac3b1 (Unknown Source)
   at 0x080ac34e (Unknown Source)
   at 0x0809a307 (Unknown Source)
   at 0x0807939e (Unknown Source)
   at _ZN3gnu4java4lang10MainThread9call_mainEv (/usr/lib/libgcj.so.6)
   at _ZN3gnu4java4lang10MainThread3runEv (/usr/lib/libgcj.so.6)
   at _Z13_Jv_ThreadRunPN4java4lang6ThreadE (/usr/lib/libgcj.so.6)
   at _Z11_Jv_RunMainP14_Jv_VMInitArgsPN4java4lang5ClassEPKciPS6_b
(/usr/lib/libgcj.so.6)
   at _Z11_Jv_RunMainPN4java4lang5ClassEPKciPS4_b (/usr/lib/libgcj.so.6)
   at JvRunMain (/usr/lib/libgcj.so.6)
   at 0x08079344 (Unknown Source)
   at __libc_start_main (/lib/tls/libc.so.6)
   at 0x08079289 (Unknown Source)
Comment 1 Andrew Cagney 2006-02-19 21:58:49 UTC
- stracing TestRunner makes the problem go away
- enabling logging makes the problem go away
Comment 2 Andrew Cagney 2006-02-19 22:02:28 UTC
- on a mono-processor, this rarely happens
- on an smp, this always always happens

With SMP, after a fork(), both the parent and child will run free and in
parallel.  On a mono-processor, only one will run -> strongly suggests some sort
of race condition.

The other possability is that the child is being hit by a signal while it is
trying to find it's feet.
Comment 3 Andrew Cagney 2006-02-19 22:04:26 UTC
The system call sequence is: vfork -> fork -> exec.  This is from just adding
print statements:

Running testRefreshZombie(frysk.proc.TestRefresh) ...zombie test started
program /home/cagney/native/frysk-core/frysk/pkglibexecdir/funit-child
v 24952 pid 24953 status 0x0
child pid 24953
zombie created
FAIL
  junit.framework.AssertionFailedError: event loop run explictly stopped
(waiting for ack)

<<program ..>> was printed by frysk.sys.Fork.spawn just before the exec call,
which strongly suggests that the call sequence (vfork -> fork -> exec)
succeeded, but the final exec killed the entire process.
Comment 4 Andrew Cagney 2006-02-23 20:52:06 UTC
Index: frysk-imports/tests/ChangeLog
2006-02-23  Andrew Cagney  <cagney@redhat.com>

        * Makefile.am (vfork_exec_vfork_exec_SOURCES, noinst_PROGRAMS)
        (TESTS): Add vfork-exec/vfork-exec.c.
        * vfork-exec/vfork-exec.c: New test.
Comment 5 Andrew Cagney 2006-03-07 21:37:18 UTC
Index: frysk-core/frysk/proc/ChangeLog
This detects the problem, and cleans up the mess; it doesn't yet fix it.

2006-03-07  Andrew Cagney  <cagney@redhat.com>

        * TestLib.java: Check for still pending signals.

Index: frysk-sys/frysk/sys/ChangeLog
2006-03-07  Andrew Cagney  <cagney@redhat.com>

        * Poll.java (poll): Add description.
        * SigSet.java, cni/SigSet.cxx: Change all void methods to return
        this SigSet.

2006-03-06  Andrew Cagney  <cagney@redhat.com>

        * SigSet.java (getPending, suspend, blockProcMask)
        (unblockProcMask, setProcMask, getProcMask): Add.
        * cni/SigSet.cxx: Ditto.
        * TestSigSet.java (testProcMask): New test.

        * cni/SigSet.hxx, cni/SigSet.cxx, SigSet.java, TestSigSet.java:
        New files.
Comment 6 Andrew Cagney 2006-03-07 22:12:23 UTC
This fixes TestRefresh where there was a possibly dangling signal:

Index: frysk-core/frysk/proc/ChangeLog
2006-03-07  Andrew Cagney  <cagney@redhat.com>

        * TestRefresh.java (testExitLoosesChild): Replace
        testExitLoosesAllChildren, only create one child process.
Comment 7 Andrew Cagney 2006-03-07 22:15:20 UTC
Remaining cases of dangling signals turned into separate bugs.