[PATCH 09/12] gdb: move displaced stepping logic to gdbarch, allow starting concurrent displaced steps

Mon Nov 30 20:26:55 GMT 2020

On 2020-11-26 9:24 a.m., Pedro Alves wrote:
> The function still throws errors anyway, e.g. a bit above:
>
>   int status = target_read_memory (m_addr, m_saved_copy.data (), len);
>   if (status != 0)
>     throw_error (MEMORY_ERROR,
> 		 _("Error accessing memory address %s (%s) for "
> 		   "displaced-stepping scratch space."),
> 		 paddress (arch, m_addr), safe_strerror (status));
>
> That's then caught by displaced_step_prepare, which results in
> the "disabling displaced stepping" warning, and GDB disabling
> displaced stepping for the inferior.
>
> Also, NOT_SUPPORTED_ERROR is handled the same way.  See:
>
>  commit 16b4184277c4ad5b4a20278060fd3f6259d1ed49
>  Author:     Pedro Alves <palves@redhat.com>
>  AuthorDate: Tue Mar 15 16:33:04 2016 +0000
>
>     Fix PR gdb/19676: Disable displaced stepping if /proc not mounted
>
> Any other error is just propagated out, likely all the way to
> the prompt:
>
>  static displaced_step_prepare_status
>  displaced_step_prepare (thread_info *thread)
>  {
>  ...
>    catch (const gdb_exception_error &ex)
>      {
>       struct displaced_step_inferior_state *displaced_state;
>
>       if (ex.error != MEMORY_ERROR
> 	  && ex.error != NOT_SUPPORTED_ERROR)
> 	throw;
>
>> Would it make sense to remove the _ERROR status code and report all
>> errors using exceptions then?
>
> I don't think so, because there are 3 kinds of "errors":
>
>  #1 - MEMORY and NOT_SUPPORTED errors result in disabling displaced stepping
>       for the inferior permanently.
>
>  #2 - other errors/exceptions propagate outwards.
>
>  #3 - "prepare" returning "Can't do displace step for this instruction" for
>       some reason.  Like the backend not supporting displaced stepping
>       that particular instruction, or the breakpoint address falling
>       within the scratch pad.  I.e., these cases:
>
> ...
>    if (breakpoint_in_range_p (aspace, m_addr, len))
>     {
>       /* There's a breakpoint set in the scratch pad location range
> 	 (which is usually around the entry point).  We'd either
> 	 install it before resuming, which would overwrite/corrupt the
> 	 scratch pad, or if it was already inserted, this displaced
> 	 step would overwrite it.  The latter is OK in the sense that
> 	 we already assume that no thread is going to execute the code
> 	 in the scratch pad range (after initial startup) anyway, but
> 	 the former is unacceptable.  Simply punt and fallback to
> 	 stepping over this breakpoint in-line.  */
>       displaced_debug_printf ("breakpoint set in scratch pad.  "
> 			      "Stepping over breakpoint in-line instead.");
>
>       return DISPLACED_STEP_PREPARE_STATUS_ERROR;
>     }
> ...
>    m_copy_insn_closure = gdbarch_displaced_step_copy_insn (arch,
>  							  m_original_pc,
>  							  m_addr,
>  							  regcache);
>    if (m_copy_insn_closure == nullptr)
>      {
>        /* The architecture doesn't know how or want to displaced step
>          this instruction or instruction sequence.  Fallback to
>          stepping over the breakpoint in-line.  */
>        return DISPLACED_STEP_PREPARE_STATUS_ERROR;
>      }
>
> The "failed to write pc" try/catch case seems debatable, but I'd
> lean on it being case #2.
>
> I think that case #3 calls for renaming DISPLACED_STEP_PREPARE_STATUS_ERROR
> to something else to avoid confusion.  Note that in master, -1 wasn't
> documented in terms of error:
>
> ...
>    Returns 1 if preparing was successful -- this thread is going to be
>    stepped now; 0 if displaced stepping this thread got queued; or -1
>    if this instruction can't be displaced stepped.  */
>
>  static int
>  displaced_step_prepare_throw (thread_info *tp)

Ok, thanks for this explanation.  I didn't understand well the different
error modes (can't displaced-step this particular instruction vs some
other unexpected error) and how they were handled.

I'll rename DISPLACED_STEP_PREPARE_STATUS_ERROR to
DISPLACED_STEP_PREPARE_STATUS_CANT.  It fits well with the comments in
the code where it is used, that all say variants of "can't displaced
step this instruction, fall back on in-line stepping".

>>>> -  return false;
>>>> +    /* If there are threads left in the THREADS_TO_STEP list, but we have
>>>> +       detected that we can't start anything more, put back these threads
>>>> +       in the global list.  */
>>>
>>> Do we also need to do this if an exception happens to escape the function?
>>> We might end up pretty bonkers anyhow if that happens, though...
>>
>> Yes, I realized that when playing with this code again yesterday.  A
>> scope_exit would help for this I think.
>>
>> But I'm also wondering if we should enqueue back the thread TP that was
>> dequeued, that caused an error.  This one has already been dequeued from
>> THREADS_TO_STEP.  For example, if thread_still_needs_step_over throws
>> (it reads the PC, so that could happen), should we put back the thread
>> in the global chain?  If we do, and the same error happens over and
>> over, the thread will never leave the step over chain and be the first
>> in line, preventing any other displaced step to happen.
>>
>> If, on the other hand, the error was just an intermittent one and we
>> don't put it back in the queue, the next time we'll resume the thread
>> we'll realize it needs a step over and enqueue it again.  So it sounds
>> less risky to me to just not enqueue back the thread on error.  I'm
>> really not sure, I find it difficult to reason about all the possible
>> cases.
>
> Yeah, not enqueuing sounds OK.  And indeed, error handling in all of
> infrun is very hard to reason about, and likely to end up with things
> messed up, e.g., with a thread marked as running when it is really
> stopped.

Ok, did that.

>> Also, re-reading that code, I noticed that we only return "true" when an
>> inline step over is started, or if a displaced step on an all-stop
>> target was started.  That's the case even before the patch.  I suppose
>> that's on purpose, because the caller wants to know whether it can
>> resume more stuff or not?
>
> Yes.  If it returns true, then the caller shouldn't resume anything
> and go straight to waiting.  ISTR having some trouble coming up with
> the right conditions at the callers.  Maybe it could be simplified.
> I no longer remember why start_step_over returns false here:
>
>   /* Don't start a new step-over if we already have an in-line
>      step-over operation ongoing.  */
>   if (step_over_info_valid_p ())
>     return false;
>
> Off hand, it seems like that should return true.  But it may
> be that the intention was to only return true if anything
> that was stopped before was started.  I'd need to experiment
> to know better now.  :-/

Ok, I'll leave it as-is for now and treat is as a separete issue.

>>>> @@ -5293,25 +5226,23 @@ handle_inferior_event (struct execution_control_state *ecs)
>>>>        {
>>>>  	struct regcache *regcache = get_thread_regcache (ecs->event_thread);
>>>>  	struct gdbarch *gdbarch = regcache->arch ();
>>>> +	inferior *parent_inf = find_inferior_ptid (ecs->target, ecs->ptid);
>>>> +
>>>> +	if (ecs->ws.kind == TARGET_WAITKIND_FORKED)
>>>> +	  {
>>>> +	    /* Restore in the child process any displaced stepping buffers that
>>>> +	       were in use at the time of the fork.  */
>>>> +	    gdbarch_displaced_step_restore_all_in_ptid
>>>> +	      (gdbarch, parent_inf, ecs->ws.value.related_pid);
>>>> +	  }
>>>
>>> Why was this moved out of the displaced_step_in_progress_thread check
>>> below?
>>
>> If:
>>
>> 1. thread 1 is doing a displaced step
>> 2. thread 2 does a fork, not letting time for thread 1 to complete
>>
>> event_thread is thread 2, and it is not doing a displaced step, so we
>> don't enter the
>>
>>   if (displaced_step_in_progress_thread (ecs->event_thread))
>>
>> But we still want to restore the bytes in the child used as the
>> displaced stepping buffer for thread 1.  Does that make sense?  Is it
>> possible that the current code is not correct in that regard?
>
> Indeed, looks like it.  Good catch.  How about fixing that
> in a separate patch, then, with its own commit log?

Will do.  I didn't do a separate patch at first because I initially
thought: "now that we can do multiple displaced steps in parallel, it
can be that the other displaced steps does a fork!".  I didn't think
right away of the case where it's a non-displaced-stepping thread who
does a fork.

Simon