[PATCH] [RFC][AArch64] Fix some erroneous behavior in gdb.base/step-over-syscall.exp

Thu Jul 16 19:44:35 GMT 2020

Hi,

For the record, this problem seems to go away with the fix to a 
single-stepping issue I reported. The kernel series is here: 
https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=316353.

Trying the testcase with the patched kernel doesn't show the failures 
anymore.

I'm dropping this patch then. The bug doesn't look harmful enough for us 
to have to work around it in GDB.

On 1/10/20 4:54 PM, Luis Machado wrote:
> On 1/9/20 12:21 PM, Alan Hayward wrote:
>>
>>
>>> On 30 Dec 2019, at 16:24, Luis Machado <luis.machado@linaro.org> wrote:
>>>
>>> I've been chasing/investigating this particular bug for a while now, 
>>> and this
>>> is a tentative patch to fix the problem.
>>>
>>> Although this may not be aarch64-specific, i can reproduce it 
>>> consitently in
>>> this particular architecture.
>>>
>>> In summary, this bug is a mix of random signal handling (SIGCHLD) and
>>> fork/vfork handling when single-stepping.
>>>
>>> Forks and vforks are handled in 3 to 4 steps.
>>>
>>> Fork
>>>
>>> 1 - We get a PTRACE_EVENT_FORK event.
>>> 2 - We single-step the inferior to get a SIGTRAP.
>>> 3 - Inferior is ready to be resumed as we wish.
>>>
>>> Vfork
>>>
>>> 1 - We get a PTRACE_EVENT_VFORK event.
>>> 2 - We continue the inferior to get a PTRACE_EVENT_VFORK_DONE event.
>>> 3 - We single-step the inferior to get a SIGTRAP.
>>> 4 - Inferior is ready to be resumed as we wish.
>>>
>>> The problem manifests itself when we are sitting at a syscall 
>>> instruction and
>>> we try to instruction-step past it. There may or may not be a breakpoint
>>> inserted at the syscall instruction location.
>>>
>>> The expected outcome is that we step a single instruction and the 
>>> resulting PC
>>> points at the next instruction ($pc + 4 in aarch64).
>>>
>>> What i see is that we end up in $pc + 8, that is, one instruction 
>>> further than
>>> we should've been.
>>>
>>> This happens because, when forking/vforking, the child exits too 
>>> quickly (in the
>>> particular case of gdb.base/step-over-syscall.exp) and a SIGCHLD 
>>> signal is
>>> issued to the parent. Sometimes this signal arrives before GDB is 
>>> done handling
>>> the fork/vfork situation. Usually in step 2 for fork and step 3 for 
>>> vfork.
>>>
>>> In turn, this causes GDB to handle that SIGCHLD as a random signal 
>>> that must be
>>> passed to the inferior, which is correct. But this particular code in 
>>> infrun.c
>>> doesn't cope with this particular situation and GDB inserts a HP 
>>> step-resume
>>> breakpoint and registers its wish for the inferior to be stepped yet 
>>> again.
>>>
>>> Thus we end up in $pc + 8, even though we shouldn't have stepped 
>>> again in this
>>> particular situation.
>>
>> I’ve tried debugging this issue in the past, and didn’t figure out the 
>> issue.
>> This _sounds_ like a reasonable explanation of why.
>>
>>>
>>> The delivery of SIGCHLD in between fork/vfork handling steps is 
>>> sensitive to
>>> timing. If we tweak things a little, enable debugging output or do 
>>> any other
>>> thing that affects the timing, we may not see this behavior.
>>>
>>> This bug doesn't show up as failures in 
>>> gdb.base/step-over-syscall.exp due to
>>> the way the test is written.  As long as the PC is the same from the 
>>> previous
>>> test to the next, GDB thinks it is good. The proposed patch doesn't 
>>> cause a
>>> change in the number of PASSes/FAIL's.
>>>
>>
>> Is it possible to add an explicit test to test for the issue? 
>> (possibly as a new
>> .exp file).
> 
> I'll have to think about that. The timing-sensitive nature of this 
> problem is within the kernel. The kernel lets the child process run and 
> then notifies the parent. I can delay the child, but i can't delay the 
> kernel notifying the parent.
> 
> If i could defer the kernel notification to the parent enough so the 
> child exits meanwhile, that would work.
> 
>>
>>
>>> The proposed patch adds a variable waiting_for_fork_sigtrap that gets 
>>> set just
>>> before the last step of fork/vfork handling and gets reset when we 
>>> end up
>>> seeing that SIGTRAP we wanted.
>>>
>>> This variable is used in the random_signal handling of 
>>> handle_signal_stop,
>>> where there are a couple conditional blocks that handle signals 
>>> reaching the
>>> inferior at an unexpected time. If waiting_for_fork_sigtrap is set, 
>>> we don't
>>> set GDB to do the additional single-step it wants to do.
>>>
>>> I gave this a thought and tried to find a better place to put the 
>>> check in,
>>> but this seemed the most appropriate place to me.
>>>
>>> I also tried to solve this without having to introduce a new variable.
>>> But it seems we can't distinguish between a random signal reaching 
>>> GDB in
>>> the middle of single-stepping and a random signal reaching GDB in the 
>>> middle
>>> of single-stepping AND handling fork/vfork.
>>>
>>> Thoughts?
>>>
>>> gdb/ChangeLog:
>>>
>>> 2019-12-30  Luis Machado  <luis.machado@linaro.org>
>>>
>>>     * inferior.h (class inferior) <waiting_for_fork_sigtrap>: New 
>>> member.
>>>     Set to false by default.
>>>     * infrun.c (handle_inferior_event): Set waiting_for_fork_sigtrap 
>>> when
>>>     a fork or vfork_done is detected.
>>>     (handle_signal_stop): Don't step again if 
>>> waiting_for_fork_sigtrap is
>>>     true.
>>>     Reset waiting_for_fork_sigtrap to false when nexti/stepi is 
>>> detected.
>>>
>>> Change-Id: I2849e0dc49ad9c0a026daa8ced4610aa0ddbe637
>>> ---
>>> gdb/inferior.h | 12 ++++++++++++
>>> gdb/infrun.c   | 31 +++++++++++++++++++++++++++++--
>>> 2 files changed, 41 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/gdb/inferior.h b/gdb/inferior.h
>>> index 3bd9e8c3d7..2c23d7302b 100644
>>> --- a/gdb/inferior.h
>>> +++ b/gdb/inferior.h
>>> @@ -447,6 +447,18 @@ public:
>>>       exits or execs.  */
>>>    bool pending_detach = false;
>>>
>>> +  /* True if we are still waiting for the final SIGTRAP when handling
>>> +     fork or vfork events.  This is used in case a signal gets 
>>> caught amidst
>>> +     handling fork and vfork events.
>>> +
>>> +     For fork events we first get a PTRACE_EVENT_FORK, then 
>>> SINGLESTEP the
>>> +     process and then get a SIGTRAP.
>>> +
>>> +     For vfork events we first get a PTRACE_EVENT_VFORK, then we 
>>> CONTINUE
>>> +     the process, get a PTRACE_EVENT_VFORK_DONE event, SINGLESTEP 
>>> the process
>>> +     again and then get a SIGTRAP.  */
>>> +  bool waiting_for_fork_sigtrap = false;
>>> +
>>
>>
>> What happens with a multithreaded process that forks/vforks? Should 
>> this bool be
>> held per thread instead? (A test would be nice to have here).
>>
>>
> 
> I'll have to try that. Unlike waiting for vfork-done, where the parent 
> can't move until the child is done with the shared memory region, 
> waiting for this final SIGTRAP doesn't have this particular restriction.
> 
> Maybe Pedro has a better insight into this particular case.
> 
>>>    /* True if this inferior is a vfork parent waiting for a vfork child
>>>       not under our control to be done with the shared memory region,
>>>       either by exiting or execing.  */
>>> diff --git a/gdb/infrun.c b/gdb/infrun.c
>>> index 7ddd21dd09..c3316a891a 100644
>>> --- a/gdb/infrun.c
>>> +++ b/gdb/infrun.c
>>> @@ -4937,6 +4937,10 @@ Cannot fill $_exitsignal with the correct 
>>> signal number.\n"));
>>>     struct regcache *regcache = get_thread_regcache (ecs->event_thread);
>>>     struct gdbarch *gdbarch = regcache->arch ();
>>>
>>> +    if (ecs->ws.kind == TARGET_WAITKIND_FORKED)
>>> +      /* We should see a SIGTRAP next time we stop.  */
>>> +      current_inferior ()->waiting_for_fork_sigtrap = true;
>>> +
>>>     /* If checking displaced stepping is supported, and thread
>>>        ecs->ptid is displaced stepping.  */
>>>     if (displaced_step_in_progress_thread (ecs->event_thread))
>>> @@ -5094,6 +5098,8 @@ Cannot fill $_exitsignal with the correct 
>>> signal number.\n"));
>>>
>>>        context_switch (ecs);
>>>
>>> +      /* We should see a SIGTRAP next time we stop.  */
>>> +      current_inferior ()->waiting_for_fork_sigtrap = true;
>>>        current_inferior ()->waiting_for_vfork_done = 0;
>>>        current_inferior ()->pspace->breakpoints_not_allowed = 0;
>>>
>>> @@ -5912,7 +5918,15 @@ handle_signal_stop (struct 
>>> execution_control_state *ecs)
>>>                                  "breakpoint\n");
>>>
>>>       insert_hp_step_resume_breakpoint_at_frame (frame);
>>> -      ecs->event_thread->step_after_step_resume_breakpoint = 1;
>>> +
>>> +      /* If we're still waiting for the final fork/vfork SIGTRAP and we
>>> +         got interrupted by a random signal, don't step further after
>>> +         returning from the signal handler.  */
>>> +      if (current_inferior ()->waiting_for_fork_sigtrap)
>>> +        ecs->event_thread->step_after_step_resume_breakpoint = 0;
>>> +      else
>>> +        ecs->event_thread->step_after_step_resume_breakpoint = 1;
>>> +
>>
>> Simpler to write as this?
>> ecs->event_thread->step_after_step_resume_breakpoint = 
>> !current_inferior ()->waiting_for_fork_sigtrap;
>>
>>
> 
> I'll change it.
> 
>>>       /* Reset trap_expected to ensure breakpoints are re-inserted.  */
>>>       ecs->event_thread->control.trap_expected = 0;
>>>
>>> @@ -5947,7 +5961,15 @@ handle_signal_stop (struct 
>>> execution_control_state *ecs)
>>>
>>>       clear_step_over_info ();
>>>       insert_hp_step_resume_breakpoint_at_frame (frame);
>>> -      ecs->event_thread->step_after_step_resume_breakpoint = 1;
>>> +
>>> +      /* If we're still waiting for the final fork/vfork SIGTRAP and we
>>> +         got interrupted by a random signal, don't step further after
>>> +         returning from the signal handler.  */
>>> +      if (current_inferior ()->waiting_for_fork_sigtrap)
>>> +        ecs->event_thread->step_after_step_resume_breakpoint = 0;
>>> +      else
>>> +        ecs->event_thread->step_after_step_resume_breakpoint = 1;
>>> +
>>
>> Ditto.
>>
> 
> I'll change it as well.