This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [non-stop] 08/10 linux native support

From: Pedro Alves <pedro at codesourcery dot com>
To: Daniel Jacobowitz <drow at false dot org>
Cc: gdb-patches at sourceware dot org
Date: Wed, 25 Jun 2008 22:17:25 +0100
Subject: Re: [non-stop] 08/10 linux native support
References: <200806152205.49241.pedro@codesourcery.com> <20080625201946.GH25575@caradoc.them.org>

A Wednesday 25 June 2008 21:19:46, Daniel Jacobowitz wrote:
> On Sun, Jun 15, 2008 at 10:05:49PM +0100, Pedro Alves wrote:
> > @@ -920,7 +929,7 @@ delete_lwp (ptid_t ptid)
> >  /* Return a pointer to the structure describing the LWP corresponding
> >     to PID.  If no corresponding LWP could be found, return NULL.  */
> >
> > -static struct lwp_info *
> > +struct lwp_info *
> >  find_lwp_pid (ptid_t ptid)
> >  {
> >    struct lwp_info *lp;
>
> If you need this function global, please rename it first.
>

Ack, will do.

> > @@ -1306,16 +1315,76 @@ get_pending_status (struct lwp_info *lp,
> >       events are always cached in waitpid_queue.  */
> >
> >    *status = 0;
> > -  if (GET_LWP (lp->ptid) == GET_LWP (last_ptid))
> > +
> > +  if (non_stop)
> >      {
> > -      if (stop_signal != TARGET_SIGNAL_0
> > -	  && signal_pass_state (stop_signal))
> > -	*status = W_STOPCODE (target_signal_to_host (stop_signal));
> > +      enum target_signal signo = TARGET_SIGNAL_0;
> > +
> > +      if (is_executing (lp->ptid))
> > +	{
> > +	  /* If the core thought this lwp was executing, we can only
> > +	     have pending events in the local queue.  */
> > +	  if (queued_waitpid (GET_LWP (lp->ptid), status, __WALL) != -1)
> > +	    {
> > +	      if (WIFSTOPPED (status))
> > +		signo = target_signal_from_host (WSTOPSIG (status));
> > +
> > +	      /* If not stopped, then the lwp is gone, no use in
> > +		 resending a signal.  */
> > +	    }
>
> How do we get here if the core thinks the thread is executing?  Is it
> when linux-nat.c resumes the thread without telling the core it
> stopped?  A little more detail here would be helpful.

Nope, this function is used while detaching, as you know.  Due to the
fact that PTRACE_DETACH needs the threads to be stopped to work,
there's a stop_callback/stop_wait_callback sequence over
all threads just before detaching.  In non-stop mode, for
threads that *were* running when linux_nat_detach is reached, the core
will still believe they were executing, as the executing state
is managed in handle_inferior_event.  I'll more comments there, unless
you think I should do things differently.

>
> > +      else
> > +	{
> > +	  /* If the core knows the thread is not executing, then we
> > +	     have then last signal recorded in
> > +	     thread_info->stop_signal, unless this is inferior_ptid,
> > +	     in which case, it's in the global stop_signal, due to
> > +	     context switching.  */
>
> I wish we could keep this stuff in the thread struct all the time...
>


Working on it... That pesky context switching...


> > @@ -1489,6 +1580,9 @@ linux_nat_resume (ptid_t ptid, int step,
> >    /* Mark this LWP as resumed.  */
> >    lp->resumed = 1;
> >
> > +  /* Remove the SIGINT mark.  Used in non-stop mode.  */
> > +  lp->sigint = 0;
> > +
>
> Confused.  Why does resuming the thread affect whether we have sent it
> a SIGINT, but not received it back yet?
>

Hmm, I was under the impression that it was possible to push more
than one SIGINT into a thread's signal queue, but I just tried it, and
it doesn't seem like it is.  This check was meant to prevent that
happening.

> > @@ -1650,20 +1746,43 @@ linux_handle_extended_wait (struct lwp_i
> >  	  else
> >  	    status = 0;
> >
> > +	  /* Make thread_db aware of this thread.  We do this this
> > +	     early, so in non-stop mode, threads show up as they're
> > +	     created, instead of on next stop.  thread_db needs a
> > +	     stopped inferior_ptid --- since we know LP is stopped,
> > +	     use it this time.  */
> > +	  old_chain = save_inferior_ptid ();
> > +	  inferior_ptid = lp->ptid;
> > +	  lp->stopped = 1;
> > +	  target_find_new_threads ();
> > +	  do_cleanups (old_chain);
> > +	  if (!in_thread_list (new_lp->ptid))
> > +	    {
> > +	      /* We're not using thread_db.  Attach and add it to
> > +		 GDB's list.  */
> > +	      lin_lwp_attach_lwp (new_lp->ptid);
> > +	      target_post_attach (GET_LWP (new_lp->ptid));
> > +	      add_thread (new_lp->ptid);
> > +	    }
> > +
>
> This may be trouble.  Sometimes the thread state is not
> atomically updated, so peeking at it right after creation but before
> an event can fail.
>

Oh, that's not nice.  Is this something that's worth and/or possible
to fix in libthreaddb?

> Why is it necessary?  We already know the ptid since we made them
> independent of thread_db TID some time ago.  attach_thread should cope
> if the thread is already in GDB's thread list when the event
> eventually arrives.  So we should be able to just add the new
> thread directly.

That's right, the only thing we'll miss if we do that, is the
thread_db id of the thread in output like:

[New Thread 0xf7e11b90 (LWP 26100)]
             ^^^^^^^^
And info threads:

  2 Thread 0xf7e11b90 (LWP 26100)  (running)
             ^^^^^^^^

Those will only show up on the next stop event (of any thread).
It may take a while, if all threads are running (unless we do
momentarily stop threads trick).

Having a TARGET_WAITKIND_NEW_THREAD so we could pass the event
to the thread-db layer (and do the immediate resume either there,
or in handle_inferior_event would also get rid of the
target_find_new_threads call, but it has then the
same race issue...

>
> > @@ -2796,13 +2915,26 @@ static int
> >  kill_callback (struct lwp_info *lp, void *data)
> >  {
> >    errno = 0;
> > -  ptrace (PTRACE_KILL, GET_LWP (lp->ptid), 0, 0);
> > -  if (debug_linux_nat)
> > -    fprintf_unfiltered (gdb_stdlog,
> > -			"KC:  PTRACE_KILL %s, 0, 0 (%s)\n",
> > -			target_pid_to_str (lp->ptid),
> > -			errno ? safe_strerror (errno) : "OK");
> >
> > +  /* PTRACE_KILL doesn't work when the thread is running.  */
> > +  if (!lp->stopped)
> > +    {
> > +      kill_lwp (GET_LWP (lp->ptid), SIGKILL);
> > +      if (debug_linux_nat)
> > +	fprintf_unfiltered (gdb_stdlog,
> > +			    "KC:  kill_lwp (SIGKILL) %s (%s)\n",
> > +			    target_pid_to_str (lp->ptid),
> > +			    errno ? safe_strerror (errno) : "OK");
> > +    }
> > +  else
> > +    {
> > +      ptrace (PTRACE_KILL, GET_LWP (lp->ptid), 0, 0);
> > +      if (debug_linux_nat)
> > +	fprintf_unfiltered (gdb_stdlog,
> > +			    "KC:  PTRACE_KILL %s, 0, 0 (%s)\n",
> > +			    target_pid_to_str (lp->ptid),
> > +			    errno ? safe_strerror (errno) : "OK");
> > +    }
> >    return 0;
> >  }
>

> SIGKILL should work even if the thread is stopped.

I think I'll need a SIGCONT as well in that case.  For some
reason, I wasn't getting that to work all the times.  I'll
experiment some more.

As always, thanks much for the review.

-- 
Pedro Alves

Follow-Ups:
- Re: [non-stop] 08/10 linux native support
  - From: Pedro Alves
- Re: [non-stop] 08/10 linux native support
  - From: Daniel Jacobowitz

References:
- [non-stop] 08/10 linux native support
  - From: Pedro Alves
- Re: [non-stop] 08/10 linux native support
  - From: Daniel Jacobowitz

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]