This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [BuildBot] Notifications disabled for Debian-s390x-* and Fedora-ppc64*-* builders


On 12/15/2017 03:53 PM, David Edelsohn wrote:
> On Fri, Dec 15, 2017 at 10:42 AM, Pedro Alves <palves@redhat.com> wrote:
>> On 12/15/2017 03:06 PM, David Edelsohn wrote:
>>
>>> Third, the testsuite summaries that no one from the GDB community
>>> monitored show that the testsuite runtime jumped from a relatively
>>> short amount of time to over 9 hours for each run, which points to a
>>> newly introduced problem in GDB or in the testsuite (timeouts?).
>>
>> That may well be.  Can you point at some representative builds,
>> before/after the jump?
> 
> The testsuite runs for 6 minutes on RHEL7 s390x buildslave and 9 hours
> on Debian Jessie s390x buildslave.

Those are separate machines.  I'd like to see the jump on the same
machine, so we can maybe pinpoint what caused it.

I was really asking for URLs.  Here looks like there's some:

 https://gdb-build.sergiodj.net/builders/Debian-s390x-native-gdbserver-m64

Here, for example:

 https://gdb-build.sergiodj.net/builders/Debian-s390x-native-gdbserver-m64/builds/4351

"test gdb tested GDB failed (9 hrs, 2 mins, 56 secs)"

That's definitely too long.

I downloaded the gdb.log file, and did:

$ grep FAIL gdb.log  | grep timeout | sed 's/.exp.*/.exp/g' | sort | uniq -c | sort -n
      1 FAIL: gdb.base/watch-cond.exp
      1 FAIL: gdb.multi/watchpoint-multi-exit.exp
      1 FAIL: gdb.threads/interrupted-hand-call.exp
      1 FAIL: gdb.threads/thread-unwindonsignal.exp
      2 FAIL: gdb.base/value-double-free.exp
      3 FAIL: gdb.mi/mi-async.exp
      3 FAIL: gdb.threads/process-dies-while-detaching.exp
      4 FAIL: gdb.base/pr11022.exp
     10 FAIL: gdb.base/watch-bitfields.exp
     15 FAIL: gdb.base/watchpoints.exp
     20 FAIL: gdb.threads/interrupt-while-step-over.exp
     32 FAIL: gdb.threads/watchpoint-fork.exp
     45 FAIL: gdb.threads/step-over-trips-on-watchpoint.exp
     46 FAIL: gdb.base/display.exp
     51 FAIL: gdb.base/watchpoint.exp

Not _that_ many.  Could they explain the long time?  I suspect not.

We see this:

 $ grep "Test run by" gdb.log | head -n 3
 Test run by dje on Tue Nov 21 03:23:01 2017
 Test run by dje on Tue Nov 21 03:23:01 2017
 Test run by dje on Tue Nov 21 03:23:01 2017

 $ grep "Test run by" gdb.log | tail -n 3
 Test run by dje on Tue Nov 21 03:29:54 2017
 Test run by dje on Tue Nov 21 03:29:54 2017
 Test run by dje on Tue Nov 21 03:29:54 2017

So most of the testsuite actually ran for 7 minutes.  And then
something hung for 9 hours?  I have no idea how that
could happen from the existing logs.  The tail end of the log has:

~~~
FAIL: gdb.base/watchpoint.exp: delete all breakpoints in delete_breakpoints (timeout)
ERROR: breakpoints not deleted
ERROR: breakpoints not deleted

command timed out: 1200 seconds without output running ['make', '-k', 'check', 'RUNTESTFLAGS=--target_board native-gdbserver', '-j8', 'FORCE_PARALLEL=1'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=32576.210392
~~~

I don't understand how 7 minutes plus 1200 seconds (~20min)
resulted in "elapsedTime=32576.210392" (~9h).  Maybe that number
isn't to be trusted.

Anyway, I'm sorry, but I really don't have the time to be
looking at this.  Someone with the motivation and access to
the machine could try running the testsuite manually,
for example, see how long that takes, and where the hang is.

> The Debian Jessie system also runs a Python buildslave without
> problem.  The system has 4 virtual cpus and 16GB of memory, which
> should be more than adequately sized.

Thanks,
Pedro Alves


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]