This is the mail archive of the frysk@sources.redhat.com mailing list for the frysk project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: fc6 frysk-core failures.


Andrew,

I've been cross matching the history of the failing vs. passing pids all
morning.  so far the most significant difference appears to be in the
way signals are handled.  E.g., at one point, the failing fc6 log shows:

    17742.17742:
    argv[0]=/home/moller/tinkering/frysk/09-12/build/frysk-core/frysk/pkglibexecdir/funit-child
    17742.17742: argv[1]=--wait=suspend
    17742.17742: argv[2]=10
    17742.17742: argv[3]=17739
    17742.17742: starting 17742
    17742.17742: new thread 17742.17742
    17742.17742: notify 17739 with 10 (User defined signal 1) -- new
    thread 17742.17742

The passing fc5 code is identical except that it /doesn't/ have that
last 'notify...' line.  Then, later the passing version shows:

    17015.17015: clone 0x40a00940 @ 0 created (added)
    17015.17015: notify 17012 with 12 (User defined signal 2) -- clone
    0x40a00940 @ 0 created (added) 13-Sep-06 11:07:31 AM
    frysk.proc.LinuxHost$PollWaitOnSigChld$2 getTask
    FINE:{TaskId,17017}stopped

    13-Sep-06 11:07:31 AM frysk.proc.Host get
    FINE: {frysk.proc.LinuxHost@2e1bd335,state=running} get TaskId

whereas the failing log shows about 90 lines of stuff between the
"FINE:{TaskId,17017}stopped" line and the "13-Sep-06 11:07:31 AM
frysk.proc.Host get" line.  The intervening stuff includes things like:

    13-Sep-06 11:07:42 AM frysk.event.EventLoop remove
    FINEST: ... return
    {frysk.proc.TestLib$AckHandler$AckSignal@26bdb0,sig=Sig_USR2}

    13-Sep-06 11:07:42 AM frysk.proc.TestLib$AckHandler$AckSignal execute
    FINE: {frysk.proc.TestLib$AckHandler$AckSignal@26bdb0,sig=Sig_USR2}
    execute (assertSendAddCloneWaitForAcks (Sig_USR1,Sig_USR2))

and a bunch of other things the passing code doesn't seem to do.  I'm
trying to narrow that down for significance now.

Chris


Andrew Cagney mumbled something on 09/13/2006 01:17 PM:
> Chris,
>
> This is good progress and the thing to be looking at.  Unfortunatly,
> it may, or may not be the source of the problems, hard to say right now.
>
> Broadly what is happening is that the test as successfully run (true? 
> was there an earlier message reporting fail?) and now the tearDown
> code is going through and trying to destroy (using very brute force)
> any processes created during the test's run.  The brute force process,
> put simply, throws everything and anything repeatedly at the processes
> -- kill -9, kill -cont, detach, ... -- in the hope it can be made to
> go away.  A failure here could be due to kernel differences, but could
> just as easily be a timing issue.
>
> In the log, going backwards from:
>
> testManyExistingThreadAttached(frysk.proc.TestProcTasksObserver)
>    >>>>>>>>>>>>>>>> start tearDown
>
> what are the last few interactions involving pid 14860?  In particular
> is there anything indicating the state of that pid/tid's object.
>
> Andrew
>
>
>
> Chris Moller wrote:
>> Running ./frysk-core/TestRunner -c FINE frysk.proc.TestProcTasksObserver
>> fails on fc6.  The fc6 logs and the fc5 logs are, aside from minor
>> variations in the order of things, are identical until they get to:
>>
>>     12-Sep-06 11:13:47 PM frysk.proc.TestLib tearDown
>>     FINE:
>>     testManyExistingThreadAttached(frysk.proc.TestProcTasksObserver)
>>     >>>>>>>>>>>>>>>> start tearDown
>>
>>     12-Sep-06 11:13:47 PM frysk.proc.TestLib tearDown
>>     FINE:
>>     testManyExistingThreadAttached(frysk.proc.TestProcTasksObserver)
>>     kill -KILL 14860
>>
>>     12-Sep-06 11:13:47 PM frysk.proc.TestLib tearDown
>>     FINE:
>>     testManyExistingThreadAttached(frysk.proc.TestProcTasksObserver)
>>     kill -CONT 14860
>>
>>     12-Sep-06 11:13:47 PM frysk.proc.TestLib tearDown
>>     FINE:
>>     testManyExistingThreadAttached(frysk.proc.TestProcTasksObserver)
>>     detach -KILL 14860 (failed)
>>
>>
>>
>> under fc5, and
>>
>>     12-Sep-06 11:20:26 PM frysk.proc.TestLib tearDown
>>     FINE: testDoCloneAttached(frysk.proc.TestProcTasksObserver)
>>     >>>>>>>>>>>>>>>> start tearDown
>>
>>     12-Sep-06 11:20:26 PM frysk.proc.TestLib tearDown
>>     FINE: testDoCloneAttached(frysk.proc.TestProcTasksObserver) kill
>>     -KILL 29450
>>
>>     12-Sep-06 11:20:26 PM frysk.proc.TestLib tearDown
>>     FINE: testDoCloneAttached(frysk.proc.TestProcTasksObserver) kill
>>     -CONT 29450
>>
>>     12-Sep-06 11:20:26 PM frysk.proc.TestLib tearDown
>>     FINE: testDoCloneAttached(frysk.proc.TestProcTasksObserver) detach
>>     -KILL 29450
>>
>> under fc6.  I.e., prima facie, the detach -KILL seems shows a failure
>> under fc6 and a pass under fc5.  I'll poke at it some more tomorrow.
>>
>>
>>   
>

Attachment: signature.asc
Description: OpenPGP digital signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]