Move threads out of jumppad without single step

Fri Jan 29 20:43:00 GMT 2016

On 01/27/2016 11:47 AM, Antoine Tremblay wrote:
>
>
> On 12/01/2015 06:36 AM, Yao Qi wrote:
>> Pedro Alves <palves@redhat.com> writes:
>>
>>> You may be able to handle this by retrieving state from the saved
>>> registers
>>> buffer in the jump pad, similar to how gdb_collect cooks up a
>>> regcache, though
>>> unlike gdb_collect, you'll have to handle the case of the thread
>>> stopping
>>> midway through that register saving too (some registers already
>>> saved, some not
>>> yet).
>>
>> Compute the next PCs on the basis of cooked up regcache from stack is
>> what I intended, but I didn't consider the case thread is stopped in the
>> middle way of register saving.
>>
>>>
>>> So I assume it's much simpler to just run to [1] as well, and then issue
>>> a normal software single-step when you get there.
>>
>> Then, looks we have to use software single step.
>>
>
> Hi,
>    I'm testing using software single stepping to move threads out of the
> the jump pad and I've ran into a problem which I'm really unsure how to
> fix and since the run control stuff is quite hard to follow help is
> appreciated.
>
> Some context :
>
> I'm using the following program to test on ARM :
>
> #include "trace-common.h"
> #include <stdio.h>
>
> static void
> begin (void)
> {}
> static void
> end (void)
> {}
>
> int
> main ()
> {
>    begin ();
>    FAST_TRACEPOINT_LABEL(set_point);
>    FAST_TRACEPOINT_LABEL(other_point);
>    end ();
>    return 0;
> }
>
> To compile :
> gcc -marm -Wl,--no-as-needed move-out.c libinproctrace.so -g
> -Wl,-rpath,$ORIGIN -lm -o move-out
>
> My gdb version is a soon to be posted fast tracepoint branch for ARM:
> https://github.com/hexa00/binutils-gdb/commit/14912518f37abc2eb4594b42ca161c912ef6b6cd
>
>
> Running gdbserver under gdb as such :
>
> gdb gdbserver  --debug -ex "break linux_stabilize_threads" -ex "run
> --once :7777 ./move-out
>
> and gdb with the following commands:
> set pagination off
> set non-stop on
> set remotetimeout unlimited
> tar rem :7777
> break main
> break end
> c
> #break on last jump pad instruction
> break *gdb_agent_gdb_jump_pad_buffer + 43*4
> ftrace set_point
> tstart
> c
> #delete the breakpoint to fool gdbserver a bit.
> delete 3
> ftrace other_point
>
> In the logs I can see :
>
> stop_all_lwps done, setting stopping_threads back to !stopping
> <<<< exiting stop_all_lwps
> Checking whether LWP 5148 needs to move out of the jump pad.
> fast_tracepoint_collecting
> in jump pad of tpoint (4, 85d8); jump_pad(33000, 330b0); adj_insn(330ac,
> 9393939393939393)
> fast_tracepoint_collecting, returning need-single-step
> (330ac-9393939393939393)
> Checking whether LWP 5148 needs to move out of the jump pad...it does
> LWP 5148 needs stabilizing (in jump pad)
> Resuming lwp 5148 (continue, signal 0, stop not expected)
> lwp 5148 wants to get out of fast tracepoint jump pad single-stepping
> stop pc is 0x330ac
> pc is 0x330ac
> Writing f001f0e7 to 0x000085dc in process 5148
> stop pc is 0x330ac
>    continue from pc 0x330ac
> Checking whether LWP 5650 needs to move out of the jump pad.
> sigchld_handler
> fast_tracepoint_collecting
> fast_tracepoint_collecting: not collecting (and nobody is).
> Checking whether LWP 5650 needs to move out of the jump pad...no
>  >>>> entering linux_wait_1
> linux_wait_1: [<all threads>]
> my_waitpid (-1, 0x40000001)
> my_waitpid (-1, 0x40000001): status(57f), 5148
> LWFE: waitpid(-1, ...) returned 5148, ERRNO-OK
> LLW: waitpid 5148 received Trace/breakpoint trap (stopped)
> stop pc is 0x85dc
> pc is 0x85dc
> CSBB: LWP 5148.5148 stopped by software breakpoint
> my_waitpid (-1, 0x40000001)
> my_waitpid (-1, 0x40000001): status(ffffffff), 0
> LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
> leader_pid=5148, leader_lp!=NULL=1, num_lwps=2, zombie=0
> LLW: exit (no unwaited-for LWP)
> linux_wait_1 ret = null_ptid, TARGET_WAITKIND_NO_RESUMED
> <<<< exiting linux_wait_1
> ../../../gdb/gdbserver/linux-low.c:1922: A problem internal to GDBserver
> has been detected.
> unsuspend LWP 5148, suspended=-1
>
>
> The main problem seems to be that as we enter linux_wait_1 in
> stabilize_threads gdbserver gets the stopped event of the single step
> breakpoint :
>
> LLW: waitpid 5148 received Trace/breakpoint trap (stopped)
> stop pc is 0x85dc
> pc is 0x85dc
> CSBB: LWP 5148.5148 stopped by software breakpoint
>
> but this event is filtered out by: linux-low.c:2683
>
>   /* ... and find an LWP with a status to report to the core, if
>       any.  */
>        event_thread = (struct thread_info *)
>      find_inferior (&all_threads, status_pending_p_callback, &filter_ptid);
>
> Here status_pending_p_callback find that lwp_resumed is true and returns
> 0 thus filtering out the event.
>
> Thus we go back to linux_stabilize_threads do nothing, end the loop
> since lwp is now stopped.
>
> And then try to unsuspend a thread that was not suspended and hit the
> assert.
>
> Ideas on how this should be fixed ?
>
> Thanks,
> Antoine
>
>

I managed to fix it like so :

--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -1693,7 +1693,10 @@ status_pending_p_callback (struct 
inferior_list_entry *entry, void *arg)
    if (!ptid_match (ptid_of (thread), ptid))
      return 0;

-  if (!lwp_resumed (lp))
+  /* If we are stabilizing threads, threads have been stopped except the
+     ones that are moving out of the jump pad. The events of those threads
+     need to be reported whatever the last_resume_kind is.  */
+  if (!lwp_resumed (lp) && !stabilizing_threads)
      return 0;

    if (lp->status_pending_p

I've tested this in all stop/non stop and it works properly.

Basically what happens is that if stabilize_threads is not called in the 
context of linux_resume and that gdbserver needs to report an event, it 
won't since last_resume_kind can be resume_stop.

In the current case gdbserver is in cmd_qtdp, the last command was 
continue (vCont;c) in all stop mode so last_resume_kind is resume_stop.

So when going in linux_wait, the event is filtered out by :
  event_thread = (struct thread_info *)
	find_inferior (&all_threads, status_pending_p_callback, &filter_ptid);

Since status_pending_p_callback returns false.

Note that this fix may not the best one... but it may be some progress...

Any ideas are welcome, otherwise I will add it to my patch set and there 
can be more discussion at review.

Thanks,
Antoine