I've been refactoring and fixing the OffspringObserver (this will get very confusing very quickly.) and have been getting the above error very often. For reference: ProcTasksObserver is the old OffspringObserver, so called because its purpose is to keep track of all the threads of a process. This now implements a TaskObserver.Cloned and TaskObserver.Terminated. ProcObserver.ProcTasks is the old ProcObserver.Offspring Interface ProcTaskTester is the old OffspringTester Here is the code that I have been using, in a nutshell I start a new process with 2 tasks and explicitly kill one of them, then wait for the process to die on its own (I will eventually kill the process explicitly but I'm having trouble doing that at the right time): testDeleteDetached() { AckProcess ackProcess = new DetachedAckProcess (1); //Create Process Proc proc = ackProcess.findProcUsingRefresh(); //Add observer ProcTasksTester observerTester = new ProcTasksTester(); new ProcTasksObserver (proc, observerTester); //Delete a clone. ackProcess.delClone(); new StopEventLoopWhenProcRemoved(proc.getPid()); assertRunUntilStop (6000, "running to attach"); } and now what you've all been waiting for, the error message: testDeleteDetached(frysk.proc.TestProcTasksObserver)java.lang.RuntimeException: {frysk.proc.LinuxTask@1e1ba0,pid=11192,tid=11192,state=attaching} in state "attaching" did not handle handleSignaledEvent at frysk.proc.State.unhandled(TestRunner) at frysk.proc.TaskState.handleSignaledEvent(TestRunner) at frysk.proc.Task.processSignaledEvent(TestRunner) at frysk.proc.LinuxHost$PollWaitOnSigChld$2.stopped(TestRunner) at frysk.sys.Wait.waitAllNoHang(TestRunner) at frysk.proc.LinuxHost$PollWaitOnSigChld.execute(TestRunner) at frysk.event.EventLoop.runEventLoop(TestRunner) at frysk.event.EventLoop.runPolling(TestRunner) at frysk.proc.TestLib.assertRunUntilStop(TestRunner) at frysk.proc.TestLib.assertRunUntilStop(TestRunner) at frysk.proc.TestLib$AckHandler.assertAwait(TestRunner) at frysk.proc.TestLib$AckHandler.await(TestRunner) at frysk.proc.TestLib$AckProcess.delClone(TestRunner) at frysk.proc.TestProcTasksObserver.delete(TestRunner) at frysk.proc.TestProcTasksObserver.testDeleteDetached(TestRunner) at frysk.junit.Runner.<init>(TestRunner) at TestRunner.main(TestRunner)
and here is the ProcTasksObserver class in entirety. Its a little long....: public final class ProcTasksObserver implements TaskObserver.Cloned, TaskObserver.Terminated { protected static final Logger logger = Logger.getLogger (Config.FRYSK_LOG_ID); private final Proc proc; private final ProcObserver.ProcTasks procTasksObserver; private Task mainTask; /** * An observer that monitors all Tasks of a process notifying the * caller of each new Task as it is added. */ public ProcTasksObserver (Proc theProc, ProcObserver.ProcTasks theProcTasksObserver) { logger.log (Level.FINE, "{0} new\n", this); proc = theProc; procTasksObserver = theProcTasksObserver; // The rest of the construction must be done synchronous to // the EventLoop, schedule it. Manager.eventLoop.add (new Event () { public void execute () { // Get a preliminary list of tasks - XXX: hack really. proc.sendRefresh (); mainTask = Manager.host.get (new TaskId (proc.getPid ())); if (mainTask == null) { logger.log (Level.FINE, "Could not get main thread of " + "this process\n {0}", proc); procTasksObserver.addFailed (proc, new RuntimeException ("Process lost: could not " + "get the main thread of this process.\n" + proc)); return; } requestAddObservers(mainTask); } }); } private void requestAddObservers(Task task) { task.requestAddClonedObserver(ProcTasksObserver.this); task.requestAddTerminatedObserver(ProcTasksObserver.this); } // Never block the parent. public Action updateClonedParent (Task parent, Task offspring) { return Action.CONTINUE; } /** * When ever a new cloned offspring appears notify the observer, * and add a cloned observer to it. */ public Action updateClonedOffspring (Task parent, Task offspring) { procTasksObserver.taskAdded (offspring); logger.log (Level.FINE, "ProcTasksObserver.updateClonedOffspring() " + "parent: {0} offspring: {1}\n", new Object[] { parent, offspring}); requestAddObservers(offspring); // Need to BLOCK and UNBLOCK so that the // request to add an observer has enough time // to be processed before the task continues. offspring.requestUnblock (this); return Action.BLOCK; } private boolean isMainTaskAdded; public void addedTo(Object observable) { if (!isMainTaskAdded) { isMainTaskAdded= true; // XXX: Is there a race here with a rapidly cloning task? for (Iterator iterator = proc.getTasks().iterator(); iterator.hasNext(); ) { Task task = (Task) iterator.next(); procTasksObserver.existingTask (task); if (task != mainTask) { logger.log (Level.FINE, "{0} Inside if not mainTask\n", this); requestAddObservers(task); } } } } public void addFailed(Object observable, Throwable w) { throw new RuntimeException("How did this (addFailed) happen ?!"); } public void deletedFrom(Object observable) { //procTasksObserver.taskRemoved ((Task) observable); } public Action updateTerminated(Task task, boolean signal, int value) { procTasksObserver.taskRemoved(task); return Action.CONTINUE; } }
just fyi, Nurden and I just figured out why: http://sourceware.org/bugzilla/show_bug.cgi?id=2819 occurs. Briefly, the frysk process sends the kernel the sequence: ptrace ATTACH PID signal PID SIGNAL The kernel, via waitpid, then either returns stopped-for-no-reason matching the ATTACH, but occasionally the SIGNAL is delivered leading to a stopped-with-signal. I don't think it's possible to create a deterministic test for this.
There's a race when both an attach and signal, for the same process, are sent to the kernel. The result is either a stop-then-signal or signal-then-stop? Index: frysk-core/frysk/proc/ChangeLog 2006-06-25 Andrew Cagney <cagney@redhat.com> * StressAttachDetachSignaledTask.java: Enable. Remove code doing back-to-back attach-detach as it is so agressive that the process never makes forward progress. * TaskState.java: During an attach save any signal delivering it after the attach has completed.