RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS

Mon Dec 15 13:11:00 GMT 2014

On 12/13/2014 03:46 PM, Joel Brobecker wrote:
> Hi Pedro,
> 
>> The main issue is that we're trying to move the thread past a
>> breakpoint.  Barring displaced stepping support, to move the
>> thread past the breakpoint, we have to remove the breakpoint from
>> the target temporarily.  But then we _cannot_ resume other threads
>> but the one that is stopped at the breakpoint, because then those
>> other threads could fly by the removed breakpoint and miss it.
> 
> Attached is a patch that does just that, tested on ppc-lynx5 and
> ppc-lynx178.  I waited a while before posting it here, because
> I wanted to put it in observation for a while first...
> 
> gdb/gdbserver/ChangeLog:
> 
>         * lynx-low.c (lynx_resume): Use PTRACE_SINGLESTEP_ONE if N == 1.
>         Remove FIXME comment about assumption about N.
> 
> OK to commit?

Sure, OK.

> 
> Note that parallel to that, I came across another issue, which I am
> going to call a limitation for now: consider the case where we have
> 2 threads, A and B, and we are tring to next/step some code in thread
> A. While doing so, thread B receives a signal, and therefore reports
> it to GDB. GDB sees that this signal is configured as
> nostop/noprint/pass, so presumably, you would think that we'd resume
> the inferior passing that signal to thread B. However, how do you do
> that while at the same time stepping thread A?

GDB nowadays sends a single vCont packet that both steps thread A,
continues thread B with a signal and continues all other threads with
no signal (previously in some cases it'd just lose control of the
inferior, or deliver the signal to the wrong thread).  Something like:

  vCont;s:A;C SIG:B;c

See the switch_back_to_stepped_thread calls within:

  if (random_signal)
    {

at the tail end of handle_signal_stop, and
remote.c:append_pending_thread_resumptions.

There are tests in the testsuite that result in packets
just like that.

> 
> IIRC, what happens currently in this case is that GDB keeps trying
> to resume/step thread A, and the kernel keeps telling GDB "no,
> thread B just received a signal", and so GDB and the kernel go
> into that infinite loop where nothing advances. I'm not quite sure
> why we keep getting the signal for thread B, if it's a new signal
> each time, or if it's about the signal not being passed back (the
> program I saw this in is fairly large and complicated).
> 
> In any case, I don't see how we could improve this situation
> without settting sss-like breakpoints... Something I'm not really
> eager to do, at least for now, since "set scheduler-locking step"
> seems to work around the issue.

Couldn't you iterate over the threads, and use PTRACE_STEP_ONE
for the stepped threads, and PTRACE_CONT_ONE for the others,
instead of PTRACE_CONT ?  For the case above, lynx_resume would
end up issuing:

 PTRACE_STEP_ONE, thread A, sig 0
 PTRACE_CONT_ONE, thread B, sig SIG
 PTRACE_CONT_ONE, thread C, sig 0
 PTRACE_CONT_ONE, thread D, sig 0
 ...

Otherwise, yeah, sounds like handling the step request with
breakpoints instead might be the solution.

Thanks,
Pedro Alves