This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Increase timeout on gdb.base/exitsignal.exp


On Tuesday, August 25 2015, Pedro Alves wrote:

> On 08/25/2015 06:42 AM, Sergio Durigan Junior wrote:
>> I have noticed that BuildBot is showing random failures of
>> gdb.base/exitsignal.exp, specifically when testing on the
>> Fedora-ppc64be-native-gdbserver-m64 builder.  Since I wrote this test
>> a while ago, I decided to investigate this further.
>> 
>> This is what you see when you examine gdb.log:
>> 
>>   Breakpoint 1, main (argc=1, argv=0x3fffffffe3c8) at ../../../binutils-gdb/gdb/testsuite/gdb.base/segv.c:26
>>   26	     raise (SIGSEGV);
>>   (gdb) print $_exitsignal
>>   $1 = void
>>   (gdb) PASS: gdb.base/exitsignal.exp: $_exitsignal is void before running
>>   print $_exitcode
>>   $2 = void
>>   (gdb) PASS: gdb.base/exitsignal.exp: $_exitcode is void before running
>>   continue
>>   Continuing.
>> 
>>   Program received signal SIGSEGV, Segmentation fault.
>>   0x00003fffb7cbf808 in .raise () from target:/lib64/libc.so.6
>>   (gdb) PASS: gdb.base/exitsignal.exp: trigger SIGSEGV
>>   continue
>>   Continuing.
>>   FAIL: gdb.base/exitsignal.exp: program terminated with SIGSEGV (timeout)
>>   print $_exitsignal
>>   FAIL: gdb.base/exitsignal.exp: $_exitsignal is 11 (SIGSEGV) after SIGSEGV. (timeout)
>>   print $_exitcode
>>   FAIL: gdb.base/exitsignal.exp: $_exitcode is still void after SIGSEGV (timeout)
>>   kill
>> 
>>   Program terminated with signal SIGSEGV, Segmentation fault.
>>   The program no longer exists.
>>   (gdb) print $_exitsignal
>>   $3 = 11
>>   (gdb) print $_exitcode
>>   $4 = void
>> 
>> Clearly a timeout issue: one can see that even though the tests failed
>> because the program was still running, both 'print' commands actually
>> succeeded later.
>>
>
> I recently bumped time outs for a few reverse/record tests, but in that
> case, it's justified because recording requires single-stepping all
> instructions, so it naturally takes a while.  In this case, I don't see what
> could reasonably be causing the delay.  It shouldn't really ever take 60
> seconds just to deliver a signal and have the kernel report back
> process exit.  What could cause this delay?  I'm not sure whether the
> process's signalled exit status is reported to the parent before or after
> the kernel fully writes the core dump --- it occurred to me that if after,
> then writing a big core dump could explain a delay.  So I would
> suggest switching to a signal that does cause a core dump by default,
> like e.g., SIGKILL/SIGTERM.  Though in this case, the core dump generated
> should be small, so I'm mystified.  This could be papering over some
> latent problem...

TBH I am also mystified by this failure.

I don't think the problem is the time it takes for the Linux kernel to
write the coredump.  As you can see, this test is using:

  <https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/testsuite/gdb.base/segv.c>

Which is absurdly simple, and I don't think it would take more than 2
seconds to write a corefile for it.

As I said, I could not reproduce this failure on a PPC64 machine here.
I'll try to log into the buildslave and see if I can test directly from
there.

>>  gdb_continue_to_end
>>  
>> -# Checking $_exitcode.  It should be 0.
>> -gdb_test "print \$_exitcode" " = 0" \
>> -    "\$_exitcode is zero after normal inferior is executed"
>> +with_timeout_factor 10 {
>> +    # Checking $_exitcode.  It should be 0.
>> +    gdb_test "print \$_exitcode" " = 0" \
>> +	"\$_exitcode is zero after normal inferior is executed"
>>  
>> -# Checking $_exitsignal.  It should still be void, since the inferior
>> -# has not received any signal.
>> -gdb_test "print \$_exitsignal" " = void" \
>> -    "\$_exitsignal is still void after normal inferior is executed"
>> +    # Checking $_exitsignal.  It should still be void, since the inferior
>> +    # has not received any signal.
>> +    gdb_test "print \$_exitsignal" " = void" \
>> +	"\$_exitsignal is still void after normal inferior is executed"
>> +}
>> 
>
> This (many instances) doesn't make sense to me.  And I think wouldn't
> fix anything.  Seems to me the bumped timeout, if any, should be around
> the continue that caused the first time out:
>
> # Continue until the end.
> gdb_test "continue" "Program terminated with signal SIGSEGV.*" \
>     "program terminated with SIGSEGV"

Sorry, I am confused...  The timeout does not occur on this command: it
occurs on the print commands.  So I think we must extend the timeout for
the print commands; don't we?

Thanks,

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]