This is the mail archive of the gdb-testers@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Failures on Fedora-ppc64be-native-extended-gdbserver-m64, branch master

From: Pedro Alves <palves at redhat dot com>
To: Sergio Durigan Junior <sergiodj at redhat dot com>, gdb-testers at sourceware dot org
Date: Fri, 06 Feb 2015 11:32:35 +0100
Subject: Re: Failures on Fedora-ppc64be-native-extended-gdbserver-m64, branch master
Authentication-results: sourceware.org; auth=none
References: <E1YJG3L-0002Wg-HL at kwanyin dot sergiodj dot net> <87386kaxyd dot fsf at redhat dot com>

On 02/05/2015 09:46 PM, Sergio Durigan Junior wrote:

> I'm thinking about including gdb-sigterm.exp in the "--ignore" list for
> RUNTESTFLAGS.  Opinions?

I think that should be the last resource, and that we should avoid it
as best we can.

If a test is racy, then it's better to XFAIL it.  Putting it in --ignore is
just like not having the test at all.  If the reason that leads us to
consider --ignore is that the test is racy _and_ the set of FAILs
changes frequently from run to run, then we either should make the
test handle fails in a better way.  After all, we also run the tests
ourselves locally, and racy random fails is no good locally either.
Failing that (or in addition) maybe we should make the XFAIL machinery
cope, like e.g., with regexs.

So, I took a better look at this one.

The test sets the inferior stepping forever, and then sends a SIGTERM
to GDB, expecting it to exit.  It sounds like either the SIGTERM never
reaches GDB which could be either a kernel or testsuite machinery bug; or
GDB is somehow ignoring/losing it sometimes, which is exactly what this
test is supposed to catch.

Thinking this is load related, I stressed all cores by running a few
infinite loops in the shell, like:

 $ (set -e; while true; do :; done)&

and then set the test running in a loop forever, like:

 $ (set -e; while true; do make check RUNTESTFLAGS="--target_board=native-gdbserver gdb-sigterm.exp"; done)

This more or less emulates a busy machine, though it's just CPU bound, no extra I/O.

With this, I occasionally see a FAIL.

Then adding a bit of extra logging to GDB, with this patch:

---
 gdb/event-top.c                        |  5 +++++
 gdb/testsuite/gdb.base/gdb-sigterm.exp | 11 ++++++-----
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/gdb/event-top.c b/gdb/event-top.c
index bbda5dc..4b512f0 100644
--- a/gdb/event-top.c
+++ b/gdb/event-top.c
@@ -863,6 +863,8 @@ handle_sigint (int sig)
 static void
 async_sigterm_handler (gdb_client_data arg)
 {
+  if (debug_infrun)
+    fprintf_unfiltered (gdb_stdlog, "infrun: handling async SIGTERM\n");
   quit_force (NULL, stdin == instream);
 }

@@ -876,6 +878,9 @@ handle_sigterm (int sig)
 {
   signal (sig, handle_sigterm);

+  if (debug_infrun)
+    write (2, "infrun: got SIGTERM\n", strlen ("infrun: got SIGTERM\n"));
+
   /* Call quit_force in a signal safe way.
      quit_force itself is not signal safe.  */
   if (target_can_async_p ())

I see that in the FAIL case, SIGTERM doesn't reach GDB.  At least
not before the 200 steps are issued.  With a trivial systemtap script, hooking
signal.send, I saw that SIGTERM _is_ sent, and is sent to a GDB process.
That still doesn't say whether the signal actually is actually ever
handled by GDB, though.

So I patched the test file too, like this:

diff --git a/gdb/testsuite/gdb.base/gdb-sigterm.exp b/gdb/testsuite/gdb.base/gdb-sigterm.exp
index 12eaf24..256a84d 100644
--- a/gdb/testsuite/gdb.base/gdb-sigterm.exp
+++ b/gdb/testsuite/gdb.base/gdb-sigterm.exp
@@ -61,18 +61,19 @@ proc do_test { pass } {
 	    verbose -log "$pf_prefix $test"
 	    set abort 0
 	}
+	timeout {
+	    fail "$test (timeout, stepped $stepping times)"
+	}
 	-re "infrun: stepping inside range" {
 	    incr stepping
-	    if { $stepping > 200 } {
-		fail "$test (stepping inside range $stepping times)"
-	    } else {
-		exp_continue
-	    }
+	    exp_continue
 	}
     }
     if $abort {
 	return
     }
+
+    gdb_assert {$stepping < 200} "SIGTERM stepped $stepping times"
 }

 # Testcase was FAILing approx. on 10th pass with unpatched GDB.
-- 

And voila, here's what I occasionally see, after a few minutes testing:

...
infrun: stop_pc = 0x4005de
infrun: stepping inside range [0x4005de-0x4005e0]
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 30011] at 0x4005de
infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   30011 [Thread 30011],
infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x4005de
infrun: stepping inside range [0x4005de-0x4005e0]
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 30011] at 0x4005de
infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   30011 [Thread 30011],
infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x4005de
infrun: got SIGTERM
infrun: stepping inside range [0x4005de-0x4005e0]
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 30011] at 0x4005de
infrun: prepare_to_wait
infrun: handling async SIGTERM
Cannot execute this command while the target is running.
Use the "interrupt" command to stop the target
and then try again.
gdb.base/gdb-sigterm.exp: expect eof #27
FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times

Note, SIGTERM did arrive, but it took more than 200 single-steps.

gdb.sum shows:

PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times
PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times
PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times
PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times
FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times
PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times
PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times
PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times
PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times
PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 9 times
PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times
PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times
PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times

That FAIL comes from the gdb_assert line.  Because we don't
see a "timeout" fail, we know that the test would actually pass, if
I removed that gdb_assert.  It just happens that occasionally it takes
more than 200 single-steps before the SIGTERM reaches GDB.

I pushed a fix:

  [pushed] gdb.base/gdb-sigterm.exp: Fix spurious FAILs
  https://sourceware.org/ml/gdb-patches/2015-02/msg00151.html

Crossing finger now!

Thanks,
Pedro Alves

References:
- Failures on Fedora-ppc64be-native-extended-gdbserver-m64, branch master
  - From: sergiodj
- Re: Failures on Fedora-ppc64be-native-extended-gdbserver-m64, branch master
  - From: Sergio Durigan Junior

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]