This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread

From: Sterling Augustine <saugustine at google dot com>
To: gdb-patches <gdb-patches at sourceware dot org>, Pedro Alves <palves at redhat dot com>
Date: Tue, 21 Jan 2014 14:17:37 -0800
Subject: [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread
Authentication-results: sourceware.org; auth=none

I am having trouble reducing a large and heavily multi-threaded
program (with many SIGPROFs) to a publishable test case, but I have
been intermittently hitting the following assertion in gdb's "resume"
function:

gdb_assert (pc_in_thread_step_range (pc, tp));

The PC is in a function called inside the step range, and therefore
you would expect process_event_stop_test's checks for this (starting
with the comment, "We stepped out of the stepping range") to catch it
and fix it up.

But the problem is that the thread out of the step range is not the
currently stepped thread. Process_event_stop_test calls
switch_back_to_stepped_thread, which, in turn, calls resume, bypassing
the extra logic in process_event_stop_test to fix up the step range,
and leading to the assertion error.

But I don't see any reason to assume that only the current thread
would need the additional cleanup found in process_event_stop_test. In
fact, switch_back_to_stepped_thread has a special case (hidden in
currently_stepping_or_nexting_callback), to _prevent_ it from
restarting the current thread, presumably so it that thread can get
the additional cleanup.

The enclosed patch does two things:

1. Adds a large amount of tracing, which helped me diagnose the problem.
2. Changes switch_back_to_stepped_thread to still switch back to a
stepped thread, but to avoid restarting it, allowing the additional
checks in process_event_stop_test to work their magic.

The GDB testsuite still passes (in particular, no new failures in
testsuite/gdb.threads), but I know there are a lot of subtleties here.

This is the best fix I can come up with, but are there any other
suggestions on how to approach the problem? (I'll do a ChangeLog entry
after taking comments.)

Thanks,

Sterling

diff --git a/gdb/infcmd.c b/gdb/infcmd.c
index 2d50f41..c042e0d 100644
--- a/gdb/infcmd.c
+++ b/gdb/infcmd.c
@@ -32,6 +32,7 @@
 #include "gdbcore.h"
 #include "target.h"
 #include "language.h"
+//#include "symfile.h"
 #include "objfiles.h"
 #include "completer.h"
 #include "ui-out.h"
@@ -1049,6 +1050,10 @@ step_once (int skip_subroutines, int single_inst, int count, int thread)
 				 &tp->control.step_range_start,
 				 &tp->control.step_range_end);
 
+          if (debug_infrun)
+            fprintf_unfiltered (gdb_stdlog,
+				"infrun: %s may range step\n",
+                                target_pid_to_str (tp->ptid));
 	  tp->control.may_range_step = 1;
 
 	  /* If we have no line info, switch to stepi mode.  */
@@ -1056,6 +1061,10 @@ step_once (int skip_subroutines, int single_inst, int count, int thread)
 	    {
 	      tp->control.step_range_start = tp->control.step_range_end = 1;
 	      tp->control.may_range_step = 0;
+              if (debug_infrun)
+                fprintf_unfiltered (gdb_stdlog,
+                                    "infrun: %s may not range step\n",
+                                    target_pid_to_str (tp->ptid));
 	    }
 	  else if (tp->control.step_range_end == 0)
 	    {
@@ -1346,6 +1355,10 @@ until_next_command (int from_tty)
       tp->control.step_range_end = sal.end;
     }
   tp->control.may_range_step = 1;
+  if (debug_infrun)
+    fprintf_unfiltered (gdb_stdlog,
+                        "infrun: %s may range step\n",
+                        target_pid_to_str (tp->ptid));
 
   tp->control.step_over_calls = STEP_OVER_ALL;
 
diff --git a/gdb/infrun.c b/gdb/infrun.c
index 311bf9c..e6f51c9 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -1315,6 +1315,10 @@ displaced_step_prepare (ptid_t ptid)
      the scratch buffer lands within the stepping range (e.g., a
      jump/branch).  */
   tp->control.may_range_step = 0;
+  if (debug_infrun)
+    fprintf_unfiltered (gdb_stdlog,
+                        "infrun: %s may not range step\n",
+                        target_pid_to_str (tp->ptid));
 
   /* We have to displaced step one thread at a time, as we only have
      access to a single scratch space per inferior.  */
@@ -1775,7 +1779,14 @@ a command like `return' or `jump' to continue execution."));
   /* If we have a breakpoint to step over, make sure to do a single
      step only.  Same if we have software watchpoints.  */
   if (tp->control.trap_expected || bpstat_should_step ())
-    tp->control.may_range_step = 0;
+    {
+      tp->control.may_range_step = 0;
+      if (debug_infrun)
+        fprintf_unfiltered (gdb_stdlog,
+                            "infrun: %s may not range step\n",
+                            target_pid_to_str (tp->ptid));
+    }
+
 
   /* If enabled, step over breakpoints by executing a copy of the
      instruction at a different address.
@@ -1945,6 +1956,14 @@ a command like `return' or `jump' to continue execution."));
 	     operation, like stepping the thread out of the dynamic
 	     linker or the displaced stepping scratch pad.  We
 	     shouldn't have allowed a range step then.  */
+	  if (debug_infrun)
+	    fprintf_unfiltered (gdb_stdlog,
+				"infrun: %s step range = [%lx-%lx] pc = %lx\n",
+                                target_pid_to_str (tp->ptid),
+                                tp->control.step_range_start,
+                                tp->control.step_range_end,
+                                pc);
+
 	  gdb_assert (pc_in_thread_step_range (pc, tp));
 	}
 
@@ -1990,6 +2009,10 @@ clear_proceed_status_thread (struct thread_info *tp)
   tp->control.step_range_start = 0;
   tp->control.step_range_end = 0;
   tp->control.may_range_step = 0;
+  if (debug_infrun)
+    fprintf_unfiltered (gdb_stdlog,
+                        "infrun: %s may not range step\n",
+                        target_pid_to_str (tp->ptid));
   tp->control.step_frame_id = null_frame_id;
   tp->control.step_stack_frame_id = null_frame_id;
   tp->control.step_over_calls = STEP_OVER_UNDEBUGGABLE;
@@ -3254,6 +3277,10 @@ handle_inferior_event (struct execution_control_state *ecs)
       /* Disable range stepping.  If the next step request could use a
 	 range, this will be end up re-enabled then.  */
       ecs->event_thread->control.may_range_step = 0;
+      if (debug_infrun)
+        fprintf_unfiltered (gdb_stdlog,
+                            "infrun: %s may not range step\n",
+                            target_pid_to_str (ecs->ptid));
     }
 
   /* Dependent on valid ECS->EVENT_THREAD.  */
@@ -4644,7 +4671,9 @@ process_event_stop_test (struct execution_control_state *ecs)
 
     case BPSTAT_WHAT_HP_STEP_RESUME:
       if (debug_infrun)
-	fprintf_unfiltered (gdb_stdlog, "infrun: BPSTAT_WHAT_HP_STEP_RESUME\n");
+	fprintf_unfiltered (gdb_stdlog, "infrun: BPSTAT_WHAT_HP_STEP_RESUME "
+                            "step_after_step_resume_breakpoint = %d\n",
+                            ecs->event_thread->step_after_step_resume_breakpoint);
 
       delete_step_resume_breakpoint (ecs->event_thread);
       if (ecs->event_thread->step_after_step_resume_breakpoint)
@@ -4727,6 +4756,10 @@ process_event_stop_test (struct execution_control_state *ecs)
 	 necessary (e.g., if we're stepping over a breakpoint or we
 	 have software watchpoints).  */
       ecs->event_thread->control.may_range_step = 1;
+      if (debug_infrun)
+        fprintf_unfiltered (gdb_stdlog,
+                            "infrun: %s may range step\n",
+                            target_pid_to_str (ecs->ptid));
 
       /* When stepping backward, stop at beginning of line range
 	 (unless it's the function entry point, in which case
@@ -5245,6 +5278,10 @@ process_event_stop_test (struct execution_control_state *ecs)
   ecs->event_thread->control.step_range_start = stop_pc_sal.pc;
   ecs->event_thread->control.step_range_end = stop_pc_sal.end;
   ecs->event_thread->control.may_range_step = 1;
+  if (debug_infrun)
+    fprintf_unfiltered (gdb_stdlog,
+                        "infrun: %s may range step\n",
+                        target_pid_to_str (ecs->ptid));
   set_step_info (frame, stop_pc_sal);
 
   if (debug_infrun)
@@ -5274,10 +5311,7 @@ switch_back_to_stepped_thread (struct execution_control_state *ecs)
 	  if ((ecs->event_thread->control.trap_expected
 	       && ecs->event_thread->suspend.stop_signal != GDB_SIGNAL_TRAP)
 	      || ecs->event_thread->stepping_over_breakpoint)
-	    {
-	      keep_going (ecs);
-	      return 1;
-	    }
+            return 0;
 
 	  /* If the stepping thread exited, then don't try to switch
 	     back and resume it, which could fail in several different
@@ -5307,24 +5341,18 @@ switch_back_to_stepped_thread (struct execution_control_state *ecs)
 				    "stepped thread, it has vanished\n");
 
 	      delete_thread (tp->ptid);
-	      keep_going (ecs);
-	      return 1;
+	      return 0;
 	    }
 
-	  /* Otherwise, we no longer expect a trap in the current thread.
-	     Clear the trap_expected flag before switching back -- this is
-	     what keep_going would do as well, if we called it.  */
-	  ecs->event_thread->control.trap_expected = 0;
-
 	  if (debug_infrun)
 	    fprintf_unfiltered (gdb_stdlog,
-				"infrun: switching back to stepped thread\n");
+				"infrun: switching back to stepped thread %s\n",
+                                target_pid_to_str (tp->ptid));
 
 	  ecs->event_thread = tp;
 	  ecs->ptid = tp->ptid;
 	  context_switch (ecs->ptid);
-	  keep_going (ecs);
-	  return 1;
+          return 0;
 	}
     }
   return 0;
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 2371ad4..1c6bd45 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -46,6 +46,7 @@
 #include "gregset.h"		/* for gregset */
 #include "gdbcore.h"		/* for get_exec_file */
 #include <ctype.h>		/* for isdigit */
+#include "gdbthread.h"		/* for struct thread_info etc.  */
 #include <sys/stat.h>		/* for struct stat */
 #include <fcntl.h>		/* for O_RDONLY */
 #include "inf-loop.h"
@@ -64,6 +65,7 @@
 #include "agent.h"
 #include "tracepoint.h"
 #include "exceptions.h"
+#include "linux-ptrace.h"
 #include "buffer.h"
 #include "target-descriptions.h"
 #include "filestuff.h"
@@ -2601,6 +2603,9 @@ stop_wait_callback (struct lwp_info *lp, void *data)
 
       if (WSTOPSIG (status) != SIGSTOP)
 	{
+          struct thread_info *tp = find_thread_ptid (lp->ptid);
+          CORE_ADDR pc = regcache_read_pc (get_thread_regcache (lp->ptid));
+
 	  /* The thread was stopped with a signal other than SIGSTOP.  */
 
 	  save_sigtrap (lp);
@@ -2611,6 +2616,18 @@ stop_wait_callback (struct lwp_info *lp, void *data)
 				status_to_str ((int) status),
 				target_pid_to_str (lp->ptid));
 
+          if (debug_linux_nat
+              && tp->control.may_range_step
+              && !pc_in_thread_step_range (pc, tp))
+            {
+              fprintf_unfiltered (gdb_stdlog,
+                                  "SWC: %s %lx out of step range [%lx-%lx]\n",
+                                  target_pid_to_str (tp->ptid),
+                                  pc,
+                                  tp->control.step_range_start,
+                                  tp->control.step_range_end);
+            }
+
 	  /* Save the sigtrap event.  */
 	  lp->status = status;
 	  gdb_assert (!lp->stopped);
@@ -2686,7 +2703,13 @@ count_events_callback (struct lwp_info *lp, void *data)
 
   /* Count only resumed LWPs that have a SIGTRAP event pending.  */
   if (lp->resumed && linux_nat_lp_status_is_event (lp))
-    (*count)++;
+    {
+      if (debug_linux_nat)
+	fprintf_unfiltered (gdb_stdlog,
+			    "CEC: LWP %ld has an event\n",
+			    ptid_get_lwp (lp->ptid));
+      (*count)++;
+    }
 
   return 0;
 }
@@ -3212,6 +3235,9 @@ linux_nat_wait_1 (struct target_ops *ops,
   block_child_signals (&prev_mask);
 
 retry:
+  if (debug_linux_nat)
+    fprintf_unfiltered (gdb_stdlog, "LLW: retry\n");
+
   lp = NULL;
   status = 0;

Follow-Ups:
- Re: [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread
  - From: Pedro Alves

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]