This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] fix gdb.threads/non-stop-fair-events.exp timeouts


On 09/04/2015 05:54 PM, Sandra Loosemore wrote:
> While running GDB tests on nios2-linux-gnu with gdbserver and "target 
> remote", I've been seeing random failures in 
> gdb.threads/non-stop-fair-events.exp.  E.g. in one test run I got
> 
> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=6: thread 1 
> broke out of loop (timeout)
> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=6: thread 2 
> broke out of loop (timeout)
> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=6: thread 3 
> broke out of loop (timeout)
> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=7: thread 1 
> broke out of loop (timeout)
> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=10: thread 1 
> broke out of loop (timeout)
> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=10: thread 2 
> broke out of loop (timeout)
> 
> and in other test runs I got a different ones.  The pattern seemed to be 
> that sometimes it took an extra long time for the first thread to break 
> out of the loop, but once that happened they would all stop correctly 
> and send the expected replies even though GDB had given up on waiting 
> for the first few already.

Yeah, I've seen this before with a local series I use for debugging
software single-step things that implements software single-stepping
on x86.  I just re-tried it now after rebasing that series to
current mainline, and I still see the time outs against gdbserver.

AFAICS, nios2 is a software single-step target that does not implement
displaced stepping either.  I had a patch for this that I had
never posted.  See attached.

> 
> I've come up with the attached patch to factor the timeout for the 
> failing tests by the number of threads still running, which seems to 
> take care of the problem.  Does this seem reasonable?

I'd rather avoid it unconditionally; it's just 10 threads, and if the
target supports displaced stepping, if starvation avoidance in gdb is
working correctly, the test should complete quickly.  I takes a couple
seconds on my getting-old x86-64 laptop.

> 
> I'm somewhat confused because, in spite of it sometimes taking at least 
> 3 times the normal timeout for the first stop message to appear, the 
> alarm in the test case (which is tied to the normal timeout) was never 
> triggering.  My best theory on that is that the slowness is not in the 
> test case, but rather in gdbserver.  IOW, all the threads are already 
> stopped by the time the alarm would expire, but gdb and gdbserver 
> haven't finished all the notifications and requests to print a stop 
> message for any of the threads yet.  Is that plausible?  Should the 
> timeout for the alarm be factored by the number of threads, too, just to 
> be safe?

Or maybe it was, and the SIGALRM never manages to be processed by gdb
and passed down to the inferior.

> 
> I'm also not entirely sure what this test case is supposed to test. 
>  From the original commit message and comments in the .exp file it seems 
> like timeouts were supposed to be a sign of a broken kernel with thread 
> starvation problems, not bugs in gdb or gdbserver.  

On the kernel side, "waitpid(-1, ...)" just walks the task list linearly
looking for the first that had an event.  Say you have two threads, A and
B which are constantly hitting events/breakpoints.  If A is quick enough,
"waitpid(-1, ...)" returns the event for thread A over and over, and thread B is
starved.  The linux backends in both gdb and gdbserver have code in place that
picks an event LWP at random out of all that have had events.  A similar problem
exists as soon as events are queued out of the target backends into gdb's core
run control -- events can end up pending for processing later in gdb's core
data structures too, and so if gdb just picked those events by walking its own
thread list looking for the first thread that has an event pending, it'd starve
some threads.  So again infrun.c has similar randomization code to avoid
starvation (random_pending_event_thread).

> But, don't we 
> normally just skip tests that the target doesn't support or can't run 
> properly, rather than report them as FAILs?  And, I don't know how to 
> distinguish timeouts that mean the kernel is broken from timeouts that 
> mean the target is just slow and you need to set a bigger value in the 
> test harness.

Pedro Alves

>From f84e249f33a8d80f9ffc137f8505f5fd79cd13ab Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Mon, 13 Apr 2015 20:59:37 +0100
Subject: [PATCH 1/2] Make it easier to debug non-stop-fair-events.exp

If we enable infrun debug running this test, it quickly fails with a
full expect buffer.  That can be simply handled with a couple
exp_continues.  As it's annoying to hack this every time we need to
debug the test, this patch adds bits to enable debugging support
easily, with a one-line change.

And then, if any iteration of the test fails, we end up with a long
cascade of time outs.  Just bail out when we see the first fail.

gdb/testsuite/
2015-09-08  Pedro Alves  <palves@redhat.com>

	* gdb.threads/non-stop-fair-events.exp (gdb_test_no_anchor)
	(enable_debug): New procedures.
	(test): Use them.  Bail out if waiting for threads fails.
	(top level): Bail out if a test fails.
---
 gdb/testsuite/gdb.threads/non-stop-fair-events.exp | 57 ++++++++++++++++++++--
 1 file changed, 54 insertions(+), 3 deletions(-)

diff --git a/gdb/testsuite/gdb.threads/non-stop-fair-events.exp b/gdb/testsuite/gdb.threads/non-stop-fair-events.exp
index e2d3f7d..37f5bcb 100644
--- a/gdb/testsuite/gdb.threads/non-stop-fair-events.exp
+++ b/gdb/testsuite/gdb.threads/non-stop-fair-events.exp
@@ -98,6 +98,29 @@ proc restart {} {
     delete_breakpoints
 }
 
+# Run command and wait for the prompt, without end anchor.
+
+proc gdb_test_no_anchor {cmd} {
+    global gdb_prompt
+
+    gdb_test_multiple $cmd $cmd {
+	-re "$gdb_prompt " {
+	    pass $cmd
+	}
+    }
+}
+
+# Enable/disable debugging.
+
+proc enable_debug {enable} {
+
+    # Comment out to debug problems with the test.
+    return
+
+    gdb_test_no_anchor "set debug infrun $enable"
+    gdb_test_no_anchor "set debug displaced $enable"
+}
+
 # The test proper.  SIGNAL_THREAD is the thread that has been elected
 # to receive the SIGUSR1 signal.
 
@@ -126,30 +149,55 @@ proc test {signal_thread} {
 
 	# Let the main thread queue the signal.
 	gdb_breakpoint "loop_broke"
+
+	enable_debug 1
+
+	set saw_continuing 0
 	set test "continue &"
 	gdb_test_multiple $test $test {
-	    -re "Continuing.\r\n$gdb_prompt " {
-		pass $test
+	    -re "Continuing.\r\n" {
+		set saw_continuing 1
+		exp_continue
+	    }
+	    -re "$gdb_prompt " {
+		gdb_assert $saw_continuing $test
+	    }
+	    -re "infrun:" {
+		exp_continue
 	    }
 	}
 
+	set gotit 0
+
 	# Wait for all threads to finish their steps, and for the main
 	# thread to hit the breakpoint.
 	for {set i 1} { $i <= $NUM_THREADS } { incr i } {
 	    set test "thread $i broke out of loop"
+	    set gotit 0
 	    gdb_test_multiple "" $test {
 		-re "loop_broke" {
 		    # The prompt was already matched in the "continue
 		    # &" test above.  We're now consuming asynchronous
 		    # output that comes after the prompt.
+		    set gotit 1
 		    pass $test
 		}
+		-re "infrun:" {
+		    exp_continue
+		}
+	    }
+	    if {!$gotit} {
+		break
 	    }
 	}
 
+	enable_debug 0
+
 	# It's helpful to have this in the log if the test ever
 	# happens to fail.
 	gdb_test "info threads"
+
+	return $gotit
     }
 }
 
@@ -158,5 +206,8 @@ proc test {signal_thread} {
 # with lowest kernel thread ID.  So test once with the signal pending
 # in each thread, except the main thread.
 for {set i 2} { $i <= $NUM_THREADS } { incr i } {
-    test $i
+    if {![test $i]} {
+	# Avoid cascading timeouts, and bail out.
+	return
+    }
 }
-- 
1.9.3

>From ccd1e1cca1409ea8eb6f564561280f345f321077 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Tue, 8 Sep 2015 17:26:02 +0100
Subject: [PATCH 2/2] non-stop-fair-events.exp slower on software single-step
 && !displ-step targets

On software single-step targets that don't support displaced stepping,
threads keep hitting each other's single-step breakpoints, and then
GDB needs to pause all threads to step past those.  The end result is
that progress in the main thread will be slower and it may take a bit
longer for the signal to be queued.  This patch bumps the timeout on
such targets.

gdb/testsuite/ChangeLog:
2015-09-08  Pedro Alves  <palves@redhat.com>

	* gdb.threads/non-stop-fair-events.c (timeout): New global.
	(SECONDS): Redefine.
	(main): Call pthread_kill and alarm early.
	* gdb.threads/non-stop-fair-events.exp: Probe displaced stepping
	support.
	(test): If the target can't hardware step and doesn't support
	displaced stepping, increase the timeout.
---
 gdb/testsuite/gdb.threads/non-stop-fair-events.c   |  9 ++-
 gdb/testsuite/gdb.threads/non-stop-fair-events.exp | 91 +++++++++++++++-------
 gdb/testsuite/lib/gdb.exp                          | 20 +++--
 3 files changed, 82 insertions(+), 38 deletions(-)

diff --git a/gdb/testsuite/gdb.threads/non-stop-fair-events.c b/gdb/testsuite/gdb.threads/non-stop-fair-events.c
index f82c366..700676b 100644
--- a/gdb/testsuite/gdb.threads/non-stop-fair-events.c
+++ b/gdb/testsuite/gdb.threads/non-stop-fair-events.c
@@ -24,7 +24,9 @@
 const int num_threads = NUM_THREADS;
 /* Allow for as much timeout as DejaGnu wants, plus a bit of
    slack.  */
-#define SECONDS (TIMEOUT + 20)
+
+volatile unsigned int timeout = TIMEOUT;
+#define SECONDS (timeout + 20)
 
 pthread_t child_thread[NUM_THREADS];
 volatile pthread_t signal_thread;
@@ -69,6 +71,11 @@ main (void)
   int res;
   int i;
 
+  /* Call these early so that we're sure their PLTs are quickly
+     resolved now, instead of in the busy threads.  */
+  pthread_kill (pthread_self (), 0);
+  alarm (0);
+
   signal (SIGUSR1, handler);
 
   for (i = 0; i < NUM_THREADS; i++)
diff --git a/gdb/testsuite/gdb.threads/non-stop-fair-events.exp b/gdb/testsuite/gdb.threads/non-stop-fair-events.exp
index 37f5bcb..ba14f6a 100644
--- a/gdb/testsuite/gdb.threads/non-stop-fair-events.exp
+++ b/gdb/testsuite/gdb.threads/non-stop-fair-events.exp
@@ -62,6 +62,19 @@ set NUM_THREADS [get_value "num_threads" "get num_threads"]
 # Account for the main thread.
 incr NUM_THREADS
 
+# Probe for displaced stepping support.
+set displaced_stepping_enabled 0
+set msg "check displaced-stepping"
+gdb_test_no_output "set debug displaced 1"
+gdb_test_multiple "si" $msg {
+    -re "displaced pc to.*$gdb_prompt $" {
+	set displaced_stepping_enabled 1
+    }
+    -re ".*$gdb_prompt $" {
+    }
+}
+gdb_test_no_output "set debug displaced 0"
+
 # Run threads to their start positions.  This prepares for a new test
 # sequence.
 
@@ -127,6 +140,8 @@ proc enable_debug {enable} {
 proc test {signal_thread} {
     global gdb_prompt
     global NUM_THREADS
+    global timeout
+    global displaced_stepping_enabled
 
     with_test_prefix "signal_thread=$signal_thread" {
 	restart
@@ -152,42 +167,58 @@ proc test {signal_thread} {
 
 	enable_debug 1
 
-	set saw_continuing 0
-	set test "continue &"
-	gdb_test_multiple $test $test {
-	    -re "Continuing.\r\n" {
-		set saw_continuing 1
-		exp_continue
-	    }
-	    -re "$gdb_prompt " {
-		gdb_assert $saw_continuing $test
-	    }
-	    -re "infrun:" {
-		exp_continue
-	    }
+	# On software single-step targets that don't support displaced
+	# stepping, threads keep hitting each others' single-step
+	# breakpoints, and then GDB needs to pause all threads to step
+	# past those.  The end result is that progress in the main
+	# thread will be slower and it may take a bit longer for the
+	# signal to be queued; bump the timeout.
+	if {!$displaced_stepping_enabled && ![can_hardware_single_step]} {
+	    set factor 5
+	} else {
+	    set factor 1
 	}
-
-	set gotit 0
-
-	# Wait for all threads to finish their steps, and for the main
-	# thread to hit the breakpoint.
-	for {set i 1} { $i <= $NUM_THREADS } { incr i } {
-	    set test "thread $i broke out of loop"
-	    set gotit 0
-	    gdb_test_multiple "" $test {
-		-re "loop_broke" {
-		    # The prompt was already matched in the "continue
-		    # &" test above.  We're now consuming asynchronous
-		    # output that comes after the prompt.
-		    set gotit 1
-		    pass $test
+	with_timeout_factor $factor {
+	    gdb_test "print timeout = $timeout" " = $timeout" \
+		"set timeout in the inferior"
+
+	    set saw_continuing 0
+	    set test "continue &"
+	    gdb_test_multiple $test $test {
+		-re "Continuing.\r\n" {
+		    set saw_continuing 1
+		    exp_continue
+		}
+		-re "$gdb_prompt " {
+		    gdb_assert $saw_continuing $test
 		}
 		-re "infrun:" {
 		    exp_continue
 		}
 	    }
-	    if {!$gotit} {
-		break
+
+	    set gotit 0
+
+	    # Wait for all threads to finish their steps, and for the main
+	    # thread to hit the breakpoint.
+	    for {set i 1} { $i <= $NUM_THREADS } { incr i } {
+		set test "thread $i broke out of loop"
+		set gotit 0
+		gdb_test_multiple "" $test {
+		    -re "loop_broke" {
+			# The prompt was already matched in the "continue
+			# &" test above.  We're now consuming asynchronous
+			# output that comes after the prompt.
+			set gotit 1
+			pass $test
+		    }
+		    -re "infrun:" {
+			exp_continue
+		    }
+		}
+		if {!$gotit} {
+		    break
+		}
 	    }
 	}
 
diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
index 56cde7a..9eaf721 100644
--- a/gdb/testsuite/lib/gdb.exp
+++ b/gdb/testsuite/lib/gdb.exp
@@ -2150,15 +2150,10 @@ proc supports_get_siginfo_type {} {
     }
 }
 
-# Return 1 if target hardware or OS supports single stepping to signal
-# handler, otherwise, return 0.
+# Return 1 if the target supports hardware single stepping.
 
-proc can_single_step_to_signal_handler {} {
+proc can_hardware_single_step {} {
 
-    # Targets don't have hardware single step.  On these targets, when
-    # a signal is delivered during software single step, gdb is unable
-    # to determine the next instruction addresses, because start of signal
-    # handler is one of them.
     if { [istarget "arm*-*-*"] || [istarget "mips*-*-*"]
 	 || [istarget "tic6x-*-*"] || [istarget "sparc*-*-linux*"]
 	 || [istarget "nios2-*-*"] } {
@@ -2168,6 +2163,17 @@ proc can_single_step_to_signal_handler {} {
     return 1
 }
 
+# Return 1 if target hardware or OS supports single stepping to signal
+# handler, otherwise, return 0.
+
+proc can_single_step_to_signal_handler {} {
+    # Targets don't have hardware single step.  On these targets, when
+    # a signal is delivered during software single step, gdb is unable
+    # to determine the next instruction addresses, because start of signal
+    # handler is one of them.
+    return [can_hardware_single_step]
+}
+
 # Return 1 if target supports process record, otherwise return 0.
 
 proc supports_process_record {} {
-- 
1.9.3


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]