This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [PATCH] Increase timeout on gdb.base/exitsignal.exp
- From: Sergio Durigan Junior <sergiodj at redhat dot com>
- To: Pedro Alves <palves at redhat dot com>
- Cc: GDB Patches <gdb-patches at sourceware dot org>
- Date: Tue, 25 Aug 2015 14:24:42 -0400
- Subject: Re: [PATCH] Increase timeout on gdb.base/exitsignal.exp
- Authentication-results: sourceware.org; auth=none
- References: <1440481342-25971-1-git-send-email-sergiodj at redhat dot com> <55DC46C5 dot 4050808 at redhat dot com>
On Tuesday, August 25 2015, Pedro Alves wrote:
> On 08/25/2015 06:42 AM, Sergio Durigan Junior wrote:
>> I have noticed that BuildBot is showing random failures of
>> gdb.base/exitsignal.exp, specifically when testing on the
>> Fedora-ppc64be-native-gdbserver-m64 builder. Since I wrote this test
>> a while ago, I decided to investigate this further.
>>
>> This is what you see when you examine gdb.log:
>>
>> Breakpoint 1, main (argc=1, argv=0x3fffffffe3c8) at ../../../binutils-gdb/gdb/testsuite/gdb.base/segv.c:26
>> 26 raise (SIGSEGV);
>> (gdb) print $_exitsignal
>> $1 = void
>> (gdb) PASS: gdb.base/exitsignal.exp: $_exitsignal is void before running
>> print $_exitcode
>> $2 = void
>> (gdb) PASS: gdb.base/exitsignal.exp: $_exitcode is void before running
>> continue
>> Continuing.
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x00003fffb7cbf808 in .raise () from target:/lib64/libc.so.6
>> (gdb) PASS: gdb.base/exitsignal.exp: trigger SIGSEGV
>> continue
>> Continuing.
>> FAIL: gdb.base/exitsignal.exp: program terminated with SIGSEGV (timeout)
>> print $_exitsignal
>> FAIL: gdb.base/exitsignal.exp: $_exitsignal is 11 (SIGSEGV) after SIGSEGV. (timeout)
>> print $_exitcode
>> FAIL: gdb.base/exitsignal.exp: $_exitcode is still void after SIGSEGV (timeout)
>> kill
>>
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> The program no longer exists.
>> (gdb) print $_exitsignal
>> $3 = 11
>> (gdb) print $_exitcode
>> $4 = void
>>
>> Clearly a timeout issue: one can see that even though the tests failed
>> because the program was still running, both 'print' commands actually
>> succeeded later.
>>
>
> I recently bumped time outs for a few reverse/record tests, but in that
> case, it's justified because recording requires single-stepping all
> instructions, so it naturally takes a while. In this case, I don't see what
> could reasonably be causing the delay. It shouldn't really ever take 60
> seconds just to deliver a signal and have the kernel report back
> process exit. What could cause this delay? I'm not sure whether the
> process's signalled exit status is reported to the parent before or after
> the kernel fully writes the core dump --- it occurred to me that if after,
> then writing a big core dump could explain a delay. So I would
> suggest switching to a signal that does cause a core dump by default,
> like e.g., SIGKILL/SIGTERM. Though in this case, the core dump generated
> should be small, so I'm mystified. This could be papering over some
> latent problem...
TBH I am also mystified by this failure.
I don't think the problem is the time it takes for the Linux kernel to
write the coredump. As you can see, this test is using:
<https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/testsuite/gdb.base/segv.c>
Which is absurdly simple, and I don't think it would take more than 2
seconds to write a corefile for it.
As I said, I could not reproduce this failure on a PPC64 machine here.
I'll try to log into the buildslave and see if I can test directly from
there.
>> gdb_continue_to_end
>>
>> -# Checking $_exitcode. It should be 0.
>> -gdb_test "print \$_exitcode" " = 0" \
>> - "\$_exitcode is zero after normal inferior is executed"
>> +with_timeout_factor 10 {
>> + # Checking $_exitcode. It should be 0.
>> + gdb_test "print \$_exitcode" " = 0" \
>> + "\$_exitcode is zero after normal inferior is executed"
>>
>> -# Checking $_exitsignal. It should still be void, since the inferior
>> -# has not received any signal.
>> -gdb_test "print \$_exitsignal" " = void" \
>> - "\$_exitsignal is still void after normal inferior is executed"
>> + # Checking $_exitsignal. It should still be void, since the inferior
>> + # has not received any signal.
>> + gdb_test "print \$_exitsignal" " = void" \
>> + "\$_exitsignal is still void after normal inferior is executed"
>> +}
>>
>
> This (many instances) doesn't make sense to me. And I think wouldn't
> fix anything. Seems to me the bumped timeout, if any, should be around
> the continue that caused the first time out:
>
> # Continue until the end.
> gdb_test "continue" "Program terminated with signal SIGSEGV.*" \
> "program terminated with SIGSEGV"
Sorry, I am confused... The timeout does not occur on this command: it
occurs on the print commands. So I think we must extend the timeout for
the print commands; don't we?
Thanks,
--
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF 31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/