[PATCH/7.10 2/2] gdbserver: Fix non-stop / fork / step-over issues

Wed Aug 5 11:41:00 GMT 2015

Pedro Alves <palves@redhat.com> writes:

> I've now extended the test to run in a few different modes, along
> a couple different axis.  One axis is with/without conditional
> breakpoints on the target enabled.  That exposes the same fails you
> saw on arm, on x86 gdbserver as well.  And then the other axis is
> with/without joining _all_ threads before exiting.  If we gracefully
> terminate the breakpoint thread (new mode), then the test should be
> passing everywhere, because what fails is gdb's handling of the inferior
> disappearing while a thread is stopped (and being inspected).
> Therefore that new mode is not kfailed.
>
> For testing convenience, I've pushed this along with the previous patch
> to the users/palves/gdbserver-fork-issues branch on sourceware.org.
> Let me know if this works for you.

Thanks, Pedro.  There are no fails on arm-linux with GDBserver.  Some
comments on your patch below,

> @@ -3089,8 +3142,25 @@ linux_wait_1 (ptid_t ptid,
>  	info_p = &info;
>        else
>  	info_p = NULL;
> -      linux_resume_one_lwp (event_child, event_child->stepping,
> -			    WSTOPSIG (w), info_p);
> +
> +      if (step_over_finished)
> +	{
> +	  /* We cancelled this thread's step-over above.  We still
> +	     need to unsuspend all other LWPs, and set then back

s/set then/set them/?

> +	     running again while the signal handler runs.  */
> +	  unsuspend_all_lwps (event_child);
> +
> +	  /* Enqueue the pending signal info so that proceed_all_lwps
> +	     doesn't lose it.  */
> +	  enqueue_pending_signal (event_child, WSTOPSIG (w), info_p);
> +
> +	  proceed_all_lwps ();
> +	}
> +      else
> +	{
> +	  linux_resume_one_lwp (event_child, event_child->stepping,
> +				WSTOPSIG (w), info_p);
> +	}
>        return ignore_event (ourstatus);
>      }
>  
> @@ -3111,8 +3181,15 @@ linux_wait_1 (ptid_t ptid,
>  		   || (current_thread->last_resume_kind == resume_step
>  		       && !in_step_range)
>  		   || event_child->stop_reason == TARGET_STOPPED_BY_WATCHPOINT
> -		   || (!step_over_finished && !in_step_range
> -		       && !bp_explains_trap && !trace_event)
> +		   || (!in_step_range
> +		       && !bp_explains_trap
> +		       && !trace_event
> +		       /* A step-over was finished just now?  */
> +		       && !step_over_finished
> +		       /* A step-over had been finished previously,
> +			  and the single-step was left pending?  */
> +		       && !(current_thread->last_resume_kind == resume_continue
> +			    && event_child->stop_reason == TARGET_STOPPED_BY_SINGLE_STEP))
>  		   || (gdb_breakpoint_here (event_child->stop_pc)

I don't fully understand this, what is a case that "step-over had been
finished previously, but the single-step was left pending"?

> 	(linux_wait_1): If passing a signal to the inferior after
> 	finishing a step-over, unsuspend and re-resume all lwps.  If we
> 	see a single-step event but the thread should be continuing, don't
> 	pass the trap to gdb.

however, the explanations in ChangeLog look more reasonable.

>  		       && gdb_condition_true_at_breakpoint (event_child->stop_pc)
>  		       && gdb_no_commands_at_breakpoint (event_child->stop_pc))

> @@ -4189,16 +4298,36 @@ static int
>  start_step_over (struct lwp_info *lwp)
>  {
>    struct thread_info *thread = get_lwp_thread (lwp);
> +  ptid_t thread_ptid;
>    struct thread_info *saved_thread;
>    CORE_ADDR pc;
>    int step;
>  
> +  thread_ptid = ptid_of (thread);
> +
>    if (debug_threads)
>      debug_printf ("Starting step-over on LWP %ld.  Stopping all threads\n",
>  		  lwpid_of (thread));
>  
>    stop_all_lwps (1, lwp);
> -  gdb_assert (lwp->suspended == 0);
> +
> +  /* Re-find the LWP as it may have exited.  */
> +  lwp = find_lwp_pid (thread_ptid);
> +  if (lwp == NULL || lwp_is_marked_dead (lwp))
> +    {
> +      if (debug_threads)
> +	debug_printf ("Step-over thread died "
> +		      "(another thread exited the process?).\n");
> +      unstop_all_lwps (1, lwp);
> +      return 0;
> +    }
> +
> +  if (lwp->suspended != 0)
> +    {
> +      internal_error (__FILE__, __LINE__,
> +		      "LWP %ld suspended=%d\n", lwpid_of (thread),
> +		      lwp->suspended);
> +    }
>  
>    if (debug_threads)
>      debug_printf ("Done stopping all threads for step-over.\n");
> @@ -4229,7 +4358,19 @@ start_step_over (struct lwp_info *lwp)
>  
>    current_thread = saved_thread;
>  
> -  linux_resume_one_lwp (lwp, step, 0, NULL);
> +  TRY
> +    {
> +      linux_resume_one_lwp_throw (lwp, step, 0, NULL);
> +    }

IIUC, we do TRY/CATCH here because thread may have exited, but we've
done that in this function (start_step_over), do we still need to worry
about these exited threads?

> +  CATCH (ex, RETURN_MASK_ERROR)
> +    {
> +      unstop_all_lwps (1, lwp);
> +
> +      if (!check_ptrace_stopped_lwp_gone (lwp))

I am thinking about the order of these two function calls.  Does the
order matter here?  Probably we need to check_ptrace_stopped_lwp_gone
first before unstop_all_lwps.

> +	throw_exception (ex);
> +      return 0;
> +    }
> +  END_CATCH
>  
>    /* Require next event from this LWP.  */
>    step_over_bkpt = thread->entry.id;
> @@ -4270,6 +4411,39 @@ finish_step_over (struct lwp_info *lwp)
>      return 0;
>  }
>  
> +/* If there's a step over in progress, wait until all threads stop
> +   (that is, until the stepping thread finishes its step), and
> +   unsuspend all lwps.  The stepping thread ends with its status
> +   pending, which is processed later when we get back to processing
> +   events.  */
> +
> +static void
> +complete_ongoing_step_over (void)
> +{
> +  if (!ptid_equal (step_over_bkpt, null_ptid))
> +    {
> +      struct lwp_info *lwp;
> +      int wstat;
> +      int ret;
> +
> +      if (debug_threads)
> +	debug_printf ("detach: step over in progress, finish it first\n");

"detach:" in the debug output looks obsolete.

-- 
Yao (齐尧)