This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [non-stop] 08/10 linux native support
A Wednesday 25 June 2008 21:19:46, Daniel Jacobowitz wrote:
> On Sun, Jun 15, 2008 at 10:05:49PM +0100, Pedro Alves wrote:
> > @@ -920,7 +929,7 @@ delete_lwp (ptid_t ptid)
> > /* Return a pointer to the structure describing the LWP corresponding
> > to PID. If no corresponding LWP could be found, return NULL. */
> >
> > -static struct lwp_info *
> > +struct lwp_info *
> > find_lwp_pid (ptid_t ptid)
> > {
> > struct lwp_info *lp;
>
> If you need this function global, please rename it first.
>
Ack, will do.
> > @@ -1306,16 +1315,76 @@ get_pending_status (struct lwp_info *lp,
> > events are always cached in waitpid_queue. */
> >
> > *status = 0;
> > - if (GET_LWP (lp->ptid) == GET_LWP (last_ptid))
> > +
> > + if (non_stop)
> > {
> > - if (stop_signal != TARGET_SIGNAL_0
> > - && signal_pass_state (stop_signal))
> > - *status = W_STOPCODE (target_signal_to_host (stop_signal));
> > + enum target_signal signo = TARGET_SIGNAL_0;
> > +
> > + if (is_executing (lp->ptid))
> > + {
> > + /* If the core thought this lwp was executing, we can only
> > + have pending events in the local queue. */
> > + if (queued_waitpid (GET_LWP (lp->ptid), status, __WALL) != -1)
> > + {
> > + if (WIFSTOPPED (status))
> > + signo = target_signal_from_host (WSTOPSIG (status));
> > +
> > + /* If not stopped, then the lwp is gone, no use in
> > + resending a signal. */
> > + }
>
> How do we get here if the core thinks the thread is executing? Is it
> when linux-nat.c resumes the thread without telling the core it
> stopped? A little more detail here would be helpful.
Nope, this function is used while detaching, as you know. Due to the
fact that PTRACE_DETACH needs the threads to be stopped to work,
there's a stop_callback/stop_wait_callback sequence over
all threads just before detaching. In non-stop mode, for
threads that *were* running when linux_nat_detach is reached, the core
will still believe they were executing, as the executing state
is managed in handle_inferior_event. I'll more comments there, unless
you think I should do things differently.
>
> > + else
> > + {
> > + /* If the core knows the thread is not executing, then we
> > + have then last signal recorded in
> > + thread_info->stop_signal, unless this is inferior_ptid,
> > + in which case, it's in the global stop_signal, due to
> > + context switching. */
>
> I wish we could keep this stuff in the thread struct all the time...
>
Working on it... That pesky context switching...
> > @@ -1489,6 +1580,9 @@ linux_nat_resume (ptid_t ptid, int step,
> > /* Mark this LWP as resumed. */
> > lp->resumed = 1;
> >
> > + /* Remove the SIGINT mark. Used in non-stop mode. */
> > + lp->sigint = 0;
> > +
>
> Confused. Why does resuming the thread affect whether we have sent it
> a SIGINT, but not received it back yet?
>
Hmm, I was under the impression that it was possible to push more
than one SIGINT into a thread's signal queue, but I just tried it, and
it doesn't seem like it is. This check was meant to prevent that
happening.
> > @@ -1650,20 +1746,43 @@ linux_handle_extended_wait (struct lwp_i
> > else
> > status = 0;
> >
> > + /* Make thread_db aware of this thread. We do this this
> > + early, so in non-stop mode, threads show up as they're
> > + created, instead of on next stop. thread_db needs a
> > + stopped inferior_ptid --- since we know LP is stopped,
> > + use it this time. */
> > + old_chain = save_inferior_ptid ();
> > + inferior_ptid = lp->ptid;
> > + lp->stopped = 1;
> > + target_find_new_threads ();
> > + do_cleanups (old_chain);
> > + if (!in_thread_list (new_lp->ptid))
> > + {
> > + /* We're not using thread_db. Attach and add it to
> > + GDB's list. */
> > + lin_lwp_attach_lwp (new_lp->ptid);
> > + target_post_attach (GET_LWP (new_lp->ptid));
> > + add_thread (new_lp->ptid);
> > + }
> > +
>
> This may be trouble. Sometimes the thread state is not
> atomically updated, so peeking at it right after creation but before
> an event can fail.
>
Oh, that's not nice. Is this something that's worth and/or possible
to fix in libthreaddb?
> Why is it necessary? We already know the ptid since we made them
> independent of thread_db TID some time ago. attach_thread should cope
> if the thread is already in GDB's thread list when the event
> eventually arrives. So we should be able to just add the new
> thread directly.
That's right, the only thing we'll miss if we do that, is the
thread_db id of the thread in output like:
[New Thread 0xf7e11b90 (LWP 26100)]
^^^^^^^^
And info threads:
2 Thread 0xf7e11b90 (LWP 26100) (running)
^^^^^^^^
Those will only show up on the next stop event (of any thread).
It may take a while, if all threads are running (unless we do
momentarily stop threads trick).
Having a TARGET_WAITKIND_NEW_THREAD so we could pass the event
to the thread-db layer (and do the immediate resume either there,
or in handle_inferior_event would also get rid of the
target_find_new_threads call, but it has then the
same race issue...
>
> > @@ -2796,13 +2915,26 @@ static int
> > kill_callback (struct lwp_info *lp, void *data)
> > {
> > errno = 0;
> > - ptrace (PTRACE_KILL, GET_LWP (lp->ptid), 0, 0);
> > - if (debug_linux_nat)
> > - fprintf_unfiltered (gdb_stdlog,
> > - "KC: PTRACE_KILL %s, 0, 0 (%s)\n",
> > - target_pid_to_str (lp->ptid),
> > - errno ? safe_strerror (errno) : "OK");
> >
> > + /* PTRACE_KILL doesn't work when the thread is running. */
> > + if (!lp->stopped)
> > + {
> > + kill_lwp (GET_LWP (lp->ptid), SIGKILL);
> > + if (debug_linux_nat)
> > + fprintf_unfiltered (gdb_stdlog,
> > + "KC: kill_lwp (SIGKILL) %s (%s)\n",
> > + target_pid_to_str (lp->ptid),
> > + errno ? safe_strerror (errno) : "OK");
> > + }
> > + else
> > + {
> > + ptrace (PTRACE_KILL, GET_LWP (lp->ptid), 0, 0);
> > + if (debug_linux_nat)
> > + fprintf_unfiltered (gdb_stdlog,
> > + "KC: PTRACE_KILL %s, 0, 0 (%s)\n",
> > + target_pid_to_str (lp->ptid),
> > + errno ? safe_strerror (errno) : "OK");
> > + }
> > return 0;
> > }
>
> SIGKILL should work even if the thread is stopped.
I think I'll need a SIGCONT as well in that case. For some
reason, I wasn't getting that to work all the times. I'll
experiment some more.
As always, thanks much for the review.
--
Pedro Alves