[PATCH] [RFC][AArch64] Fix some erroneous behavior in gdb.base/step-over-syscall.exp

Fri Jan 10 19:54:00 GMT 2020

On 1/9/20 12:21 PM, Alan Hayward wrote:
> 
> 
>> On 30 Dec 2019, at 16:24, Luis Machado <luis.machado@linaro.org> wrote:
>>
>> I've been chasing/investigating this particular bug for a while now, and this
>> is a tentative patch to fix the problem.
>>
>> Although this may not be aarch64-specific, i can reproduce it consitently in
>> this particular architecture.
>>
>> In summary, this bug is a mix of random signal handling (SIGCHLD) and
>> fork/vfork handling when single-stepping.
>>
>> Forks and vforks are handled in 3 to 4 steps.
>>
>> Fork
>>
>> 1 - We get a PTRACE_EVENT_FORK event.
>> 2 - We single-step the inferior to get a SIGTRAP.
>> 3 - Inferior is ready to be resumed as we wish.
>>
>> Vfork
>>
>> 1 - We get a PTRACE_EVENT_VFORK event.
>> 2 - We continue the inferior to get a PTRACE_EVENT_VFORK_DONE event.
>> 3 - We single-step the inferior to get a SIGTRAP.
>> 4 - Inferior is ready to be resumed as we wish.
>>
>> The problem manifests itself when we are sitting at a syscall instruction and
>> we try to instruction-step past it. There may or may not be a breakpoint
>> inserted at the syscall instruction location.
>>
>> The expected outcome is that we step a single instruction and the resulting PC
>> points at the next instruction ($pc + 4 in aarch64).
>>
>> What i see is that we end up in $pc + 8, that is, one instruction further than
>> we should've been.
>>
>> This happens because, when forking/vforking, the child exits too quickly (in the
>> particular case of gdb.base/step-over-syscall.exp) and a SIGCHLD signal is
>> issued to the parent. Sometimes this signal arrives before GDB is done handling
>> the fork/vfork situation. Usually in step 2 for fork and step 3 for vfork.
>>
>> In turn, this causes GDB to handle that SIGCHLD as a random signal that must be
>> passed to the inferior, which is correct. But this particular code in infrun.c
>> doesn't cope with this particular situation and GDB inserts a HP step-resume
>> breakpoint and registers its wish for the inferior to be stepped yet again.
>>
>> Thus we end up in $pc + 8, even though we shouldn't have stepped again in this
>> particular situation.
> 
> Iâ€™ve tried debugging this issue in the past, and didnâ€™t figure out the issue.
> This _sounds_ like a reasonable explanation of why.
> 
>>
>> The delivery of SIGCHLD in between fork/vfork handling steps is sensitive to
>> timing. If we tweak things a little, enable debugging output or do any other
>> thing that affects the timing, we may not see this behavior.
>>
>> This bug doesn't show up as failures in gdb.base/step-over-syscall.exp due to
>> the way the test is written.  As long as the PC is the same from the previous
>> test to the next, GDB thinks it is good. The proposed patch doesn't cause a
>> change in the number of PASSes/FAIL's.
>>
> 
> Is it possible to add an explicit test to test for the issue? (possibly as a new
> .exp file).

I'll have to think about that. The timing-sensitive nature of this 
problem is within the kernel. The kernel lets the child process run and 
then notifies the parent. I can delay the child, but i can't delay the 
kernel notifying the parent.

If i could defer the kernel notification to the parent enough so the 
child exits meanwhile, that would work.

> 
> 
>> The proposed patch adds a variable waiting_for_fork_sigtrap that gets set just
>> before the last step of fork/vfork handling and gets reset when we end up
>> seeing that SIGTRAP we wanted.
>>
>> This variable is used in the random_signal handling of handle_signal_stop,
>> where there are a couple conditional blocks that handle signals reaching the
>> inferior at an unexpected time. If waiting_for_fork_sigtrap is set, we don't
>> set GDB to do the additional single-step it wants to do.
>>
>> I gave this a thought and tried to find a better place to put the check in,
>> but this seemed the most appropriate place to me.
>>
>> I also tried to solve this without having to introduce a new variable.
>> But it seems we can't distinguish between a random signal reaching GDB in
>> the middle of single-stepping and a random signal reaching GDB in the middle
>> of single-stepping AND handling fork/vfork.
>>
>> Thoughts?
>>
>> gdb/ChangeLog:
>>
>> 2019-12-30  Luis Machado  <luis.machado@linaro.org>
>>
>> 	* inferior.h (class inferior) <waiting_for_fork_sigtrap>: New member.
>> 	Set to false by default.
>> 	* infrun.c (handle_inferior_event): Set waiting_for_fork_sigtrap when
>> 	a fork or vfork_done is detected.
>> 	(handle_signal_stop): Don't step again if waiting_for_fork_sigtrap is
>> 	true.
>> 	Reset waiting_for_fork_sigtrap to false when nexti/stepi is detected.
>>
>> Change-Id: I2849e0dc49ad9c0a026daa8ced4610aa0ddbe637
>> ---
>> gdb/inferior.h | 12 ++++++++++++
>> gdb/infrun.c   | 31 +++++++++++++++++++++++++++++--
>> 2 files changed, 41 insertions(+), 2 deletions(-)
>>
>> diff --git a/gdb/inferior.h b/gdb/inferior.h
>> index 3bd9e8c3d7..2c23d7302b 100644
>> --- a/gdb/inferior.h
>> +++ b/gdb/inferior.h
>> @@ -447,6 +447,18 @@ public:
>>       exits or execs.  */
>>    bool pending_detach = false;
>>
>> +  /* True if we are still waiting for the final SIGTRAP when handling
>> +     fork or vfork events.  This is used in case a signal gets caught amidst
>> +     handling fork and vfork events.
>> +
>> +     For fork events we first get a PTRACE_EVENT_FORK, then SINGLESTEP the
>> +     process and then get a SIGTRAP.
>> +
>> +     For vfork events we first get a PTRACE_EVENT_VFORK, then we CONTINUE
>> +     the process, get a PTRACE_EVENT_VFORK_DONE event, SINGLESTEP the process
>> +     again and then get a SIGTRAP.  */
>> +  bool waiting_for_fork_sigtrap = false;
>> +
> 
> 
> What happens with a multithreaded process that forks/vforks? Should this bool be
> held per thread instead? (A test would be nice to have here).
> 
> 

I'll have to try that. Unlike waiting for vfork-done, where the parent 
can't move until the child is done with the shared memory region, 
waiting for this final SIGTRAP doesn't have this particular restriction.

Maybe Pedro has a better insight into this particular case.

>>    /* True if this inferior is a vfork parent waiting for a vfork child
>>       not under our control to be done with the shared memory region,
>>       either by exiting or execing.  */
>> diff --git a/gdb/infrun.c b/gdb/infrun.c
>> index 7ddd21dd09..c3316a891a 100644
>> --- a/gdb/infrun.c
>> +++ b/gdb/infrun.c
>> @@ -4937,6 +4937,10 @@ Cannot fill $_exitsignal with the correct signal number.\n"));
>> 	struct regcache *regcache = get_thread_regcache (ecs->event_thread);
>> 	struct gdbarch *gdbarch = regcache->arch ();
>>
>> +	if (ecs->ws.kind == TARGET_WAITKIND_FORKED)
>> +	  /* We should see a SIGTRAP next time we stop.  */
>> +	  current_inferior ()->waiting_for_fork_sigtrap = true;
>> +
>> 	/* If checking displaced stepping is supported, and thread
>> 	   ecs->ptid is displaced stepping.  */
>> 	if (displaced_step_in_progress_thread (ecs->event_thread))
>> @@ -5094,6 +5098,8 @@ Cannot fill $_exitsignal with the correct signal number.\n"));
>>
>>        context_switch (ecs);
>>
>> +      /* We should see a SIGTRAP next time we stop.  */
>> +      current_inferior ()->waiting_for_fork_sigtrap = true;
>>        current_inferior ()->waiting_for_vfork_done = 0;
>>        current_inferior ()->pspace->breakpoints_not_allowed = 0;
>>
>> @@ -5912,7 +5918,15 @@ handle_signal_stop (struct execution_control_state *ecs)
>>                                  "breakpoint\n");
>>
>> 	  insert_hp_step_resume_breakpoint_at_frame (frame);
>> -	  ecs->event_thread->step_after_step_resume_breakpoint = 1;
>> +
>> +	  /* If we're still waiting for the final fork/vfork SIGTRAP and we
>> +	     got interrupted by a random signal, don't step further after
>> +	     returning from the signal handler.  */
>> +	  if (current_inferior ()->waiting_for_fork_sigtrap)
>> +	    ecs->event_thread->step_after_step_resume_breakpoint = 0;
>> +	  else
>> +	    ecs->event_thread->step_after_step_resume_breakpoint = 1;
>> +
> 
> Simpler to write as this?
> ecs->event_thread->step_after_step_resume_breakpoint = !current_inferior ()->waiting_for_fork_sigtrap;
> 
> 

I'll change it.

>> 	  /* Reset trap_expected to ensure breakpoints are re-inserted.  */
>> 	  ecs->event_thread->control.trap_expected = 0;
>>
>> @@ -5947,7 +5961,15 @@ handle_signal_stop (struct execution_control_state *ecs)
>>
>> 	  clear_step_over_info ();
>> 	  insert_hp_step_resume_breakpoint_at_frame (frame);
>> -	  ecs->event_thread->step_after_step_resume_breakpoint = 1;
>> +
>> +	  /* If we're still waiting for the final fork/vfork SIGTRAP and we
>> +	     got interrupted by a random signal, don't step further after
>> +	     returning from the signal handler.  */
>> +	  if (current_inferior ()->waiting_for_fork_sigtrap)
>> +	    ecs->event_thread->step_after_step_resume_breakpoint = 0;
>> +	  else
>> +	    ecs->event_thread->step_after_step_resume_breakpoint = 1;
>> +
> 
> Ditto.
> 

I'll change it as well.