[PATCHv3 1/2] gdb/remote: some fixes for 'maint set target-async off'

Andrew Burgess aburgess@redhat.com
Fri Dec 17 13:35:07 GMT 2021


* Pedro Alves <pedro@palves.net> [2021-12-16 21:15:31 +0000]:

> On 2021-12-01 10:40, Andrew Burgess via Gdb-patches wrote:
> 
> > This problem can clearly be seen I feel by looking at the
> > remote_state::cached_wait_status flag.  This flag tells GDB if there
> > is a wait status cached in remote_state::buf.  However, in
> > remote_target::putpkt_binary and remote_target::getpkt_or_notif_sane_1
> > this flag is just set back to 0, doing this immediately discards any
> > cached data.
> > 
> > I don't know if this scheme ever made sense, maybe once upon a time,
> > GDB would, when it found it had no cached stop, simply re-request the
> > stop information from the target, however, this certainly isn't the
> > case now, and resetting the cached_wait_status is (I claim) a bad
> > thing.
> 
> I don't think that was ever the case.  Take a look at 2d717e4f8a54,
> where the cached status was introduced to handle "attach".  It was simply the case
> back then that nothing would talk to the target between the initial attach
> and consuming the event.  It's not clear to me why putpkt/getpkt would
> need to clear the flag back then.  Looks more like a "just in case" safeguard.

Thanks for this insight.  I've updated the commit message to try and
describe the history here a little more accurately, including adding
the commit SHA you gave above, which should help if anyone needs to
dig into this code in the future.

> 
> > So, finally, in this commit, I propose to remove the
> > remote_state::cached_wait_status flag and to stop using the ::buf to
> > cache stop replies.  Instead, stop replies will now always be stored
> > in the ::stop_reply_queue.
> 
> To be honest, I don't recall exactly why I didn't do that when introducing
> the stop reply queue.
> 
> > @@ -8163,15 +8151,12 @@ remote_target::wait_as (ptid_t ptid, target_waitstatus *status,
> >  
> >    stop_reply = queued_stop_reply (ptid);
> >    if (stop_reply != NULL)
> > -    return process_stop_reply (stop_reply, status);
> > -
> > -  if (rs->cached_wait_status)
> > -    /* Use the cached wait status, but only once.  */
> > -    rs->cached_wait_status = 0;
> > +    {
> > +      rs->waiting_for_stop_reply = 0;
> 
> This is a difference described in the commit log, but looking at the resulting code,
> I don't understand why clearing this flag is needed here, it looks like dead code to me.
> I mean, if we have a cached status already, then we're not waiting for a stop reply
> from the target.  Did you run into a case where it was needed?

No I never hit a case where it was definitely needed.  Honestly, I
just looked at the different code paths, and saw that it was pretty
easy to merge the paths and carry out all the actions, so did that.

However, you're right to call me out on that.  It's exactly this sort
of "just in case" code that was causing the cached packets to be
discarded originally without warning.

So, I've changed the unconditional "set flag to false" with an assert,
and a comment.  If it turns out we are wrong in our understanding of
the situation, then hopefully, the problem will get reported, and we
can figure out what's going on at that point!

Below is the updated patch.  The only code change is the assert I
mention above.  All other changes are in the commit message.

I'll give this a few days in case you want to follow up, then I'll
push this.

Thanks,
Andrew

---

commit 723053fbcc4e89467d95b28ee317c3d4d5431da1
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Wed Nov 17 17:17:37 2021 +0000

    gdb/remote: some fixes for 'maint set target-async off'
    
    While working on another patch relating to remote targets, I wanted to
    test with 'maint set target-async off' in place.  Unfortunately I ran
    into some problems.  This commit is an attempt to fix one of the
    issues I hit.
    
    In my particular case I was actually running with:
    
      maint set target-async off
      maint set target-non-stop off
    
    that is, we're telling GDB to force the targets to operate in
    non-async mode, and in all-stop mode.  Here's my GDB session showing
    the problem:
    
      (gdb) maintenance set target-async off
      (gdb) maintenance set target-non-stop off
      (gdb) target extended-remote :54321
      Remote debugging using :54321
      (gdb) attach 2365960
      Attaching to process 2365960
      No unwaited-for children left.
      (gdb)
    
    Notice the 'No unwaited-for children left.' error, this is the
    problem.  There's no reason why GDB should not be able to attach to
    the process.
    
    The problem is this:
    
      1. The user runs 'attach PID' and this sends GDB into attach_command
      in infcmd.c.  From here we call the ::attach method on the attach
      target, which will be the extended_remote_target.
    
      2. In extended_remote_target::attach, we attach to the remote target
      and get the first reply (which is a stop packet).  We put off
      processing the stop packet until the end of ::attach.  We setup the
      inferior and thread to represent the process we attached to, and
      download the target description.  Finally, we process the initial
      stop packet.
    
      If '!target_is_non_stop_p ()' and '!target_can_async_p ()', which is
      the case for us given the maintenance commands we used, we cache the
      stop packet within the remote_state::buf for later processing.
    
      3. Back in attach_command, if 'target_is_non_stop_p ()' then we
      request that the target stops.  This will either process any cached
      stop replies, or request that the target stops, and process the stop
      replies.  However, this code is not what we use due to non-stop mode
      being disabled.  So, we skip to the next step which is to call
      validate_exec_file.
    
      4. Calling validate_exec_file can cause packets to be sent to the
      remote target, and replies received, the first path I hit is the
      call to target_pid_to_exec_file, which calls
      remote_target::pid_to_exec_file, which can then try to read the
      executable from the remote.  Sending an receiving packets will make
      use of the remote_state::buf object.
    
      5. The attempt to attach continues, but the damage is already done...
    
    So, the problem is that, in step #2 we cache a stop reply in the
    remote_state::buf, and then in step #4 we reuse the remote_state::buf
    object, discarding any cached stop reply.  As a result, the initial
    stop, which is sent when GDB first attaches to the target, is lost.
    
    This problem can clearly be seen, I feel, by looking at the
    remote_state::cached_wait_status flag.  This flag tells GDB if there
    is a wait status cached in remote_state::buf.  However, in
    remote_target::putpkt_binary and remote_target::getpkt_or_notif_sane_1
    this flag is just set back to 0, doing this immediately discards any
    cached data.
    
    I don't know if this scheme ever made sense,  looking at commit
    2d717e4f8a54, where the cached_wait_status flag was added, it appears
    that there was nothing between where the stop was cached, and where
    the stop was consumed, so, I suspect, there never was a situation
    where we ended up in putpkt_binary or getpkt_or_notif_sane_1 and
    needed to clear to the flag, maybe the clearing was added "just in
    case".  Whatever the history, I claim that this clearing this flag is
    no longer a good idea.
    
    So, my first step toward fixing this issue was to replace the two
    instances of 'rs->cached_wait_status = 0;' in ::putpkt_binary and
    ::getpkt_or_notif_sane_1 with 'gdb_assert (rs->cached_wait_status ==
    0);', this, at least would show me when GDB was doing something
    dangerous, and indeed, this assert is now hit in my test case above.
    
    I did play with using some kind of scoped restore to backup, and
    restore the remote_state::buf object in all the places within remote.c
    that I was hitting where the ::buf was being corrupted.  The first
    problem with this is that, where the ::cached_wait_status flag is
    reset is _not_ where ::buf is corrupted.  For the ::putpkt_binary
    case, by the time we get to the method the buffer has already been
    corrupted in many cases, so we end up needing to add the scoped
    save/restore within the callers, which means we need the save/restore
    in _lots_ of places.
    
    Plus, using this save/restore model feels like the wrong solution.  I
    don't think that it's obvious that the buffer might be holding cached
    data, and I think it would be too easy for new corruptions of the
    buffer to be introduced, which could easily go unnoticed for a long
    time.
    
    So, I really wanted a solution that didn't require us to cache data in
    the ::buf object.
    
    Luckily, I think we already have such a solution in place, the
    remote_state::stop_reply_queue, it seems like this does exactly the
    same task, just in a slightly different way.  With the
    ::stop_reply_queue, the stop packets are processed upon receipt and
    the stop_reply object is added to the queue.  With the ::buf cache
    solution, the unprocessed stop reply is cached in the ::buf, and
    processed later.
    
    So, finally, in this commit, I propose to remove the
    remote_state::cached_wait_status flag and to stop using the ::buf to
    cache stop replies.  Instead, stop replies will now always be stored
    in the ::stop_reply_queue.
    
    There are two places where we use the ::buf to hold a cached stop
    reply, the first is in the ::attach method, and the second is in
    remote_target::start_remote, however, the second of these cases is far
    less problematic, as after caching the stop reply in ::buf we call the
    global start_remote function, which does very little work before
    calling normal_stop, which processes the cached stop reply.  However,
    my plan is to switch both users over to using ::stop_reply_queue so
    that the old (unsafe) ::cached_wait_status mechanism can be completely
    removed.
    
    The next problem is that the ::stop_reply_queue is currently only used
    for async-mode, and so, in remote_target::push_stop_reply, where we
    push stop_reply objects into the ::stop_reply_queue, we currently also
    mark the async event token.  I've modified this so we only mark the
    async event token if 'target_is_async_p ()' - note, _is_, not _can_
    here. The ::push_stop_reply method is called in places where async
    mode has been temporarily disabled, but, when async mode is switched
    back on (see remote_target::async) we will mark the event token if
    there are events in the queue.
    
    Another change of interest is in remote_target::remote_interrupt_as.
    Previously this code checked ::cached_wait_status, but didn't check
    for events in the ::stop_reply_queue.  Now that ::cached_wait_status
    has been removed we now check the queue length instead, which should
    have the same result.
    
    Finally, in remote_target::wait_as, I've tried to merge the processing
    of the ::stop_reply_queue with how we used to handle the
    ::cached_wait_status flag.
    
    Currently, when processing the ::stop_reply_queue we call
    process_stop_reply and immediately return.  However, when handling
    ::cached_wait_status we run through the whole of ::wait_as, and return
    at the end of the function.
    
    If we consider a standard stop packet, the two differences I see are:
    
      1. Resetting of the remote_state::waiting_for_stop_reply, flag; this
      is not currently done when processing a stop from the
      ::stop_reply_queue.
    
      2. The final return value has the possibility of being adjusted at
      the end of ::wait_as, as well as there being calls to
      record_currthread, non of which are done if we process a stop from
      the ::stop_reply_queue.
    
    After discussion on the mailing list:
    
      https://sourceware.org/pipermail/gdb-patches/2021-December/184535.html
    
    it was suggested that, when an event is pushed into the
    ::stop_reply_queue, the ::waiting_for_stop_reply flag is never going
    to be set.  As a result, we don't need to worry about the first
    difference.  I have however, added a gdb_assert to validate the
    assumption that the flag is never going to be set.  If in future the
    situation ever changes, then we should find out pretty quickly.
    
    As for the second difference, I have resolved this by having all stop
    packets taken from the ::stop_reply_queue, pass through the return
    value adjustment code at the end of ::wait_as.
    
    An example of a test that reveals the benefits of this commit is:
    
      make check-gdb \
        RUNTESTFLAGS="--target_board=native-extended-gdbserver \
                      GDBFLAGS='-ex maint\ set\ target-async\ off \
                                -ex maint\ set\ target-non-stop\ off' \
                      gdb.base/attach.exp"
    
    For testing I've been running test on x86-64/GNU Linux, and run with
    target boards unix, native-gdbserver, and native-extended-gdbserver.
    For each board I've run with the default GDBFLAGS, as well as with:
    
      GDBFLAGS='-ex maint\ set\ target-async\ off \
                -ex maint\ set\ target-non-stop\ off' \
    
    Though running with the above GDBFLAGS is clearly a lot more unstable
    both before and after my patch, I'm not seeing any consistent new
    failures with my patch, except, with the native-extended-gdbserver
    board, where I am seeing new failures, but only because more tests are
    now running.  For that configuration alone I see the number of
    unresolved go down by 49, the number of passes goes up by 446, and the
    number of failures also increases by 144.  All of the failures are new
    tests as far as I can tell.

diff --git a/gdb/remote.c b/gdb/remote.c
index 1f977d57fba..74770649d9c 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -258,15 +258,6 @@ class remote_state
      Otherwise zero, meaning to use the guessed size.  */
   long explicit_packet_size = 0;
 
-  /* remote_wait is normally called when the target is running and
-     waits for a stop reply packet.  But sometimes we need to call it
-     when the target is already stopped.  We can send a "?" packet
-     and have remote_wait read the response.  Or, if we already have
-     the response, we can stash it in BUF and tell remote_wait to
-     skip calling getpkt.  This flag is set when BUF contains a
-     stop reply packet and the target is not waiting.  */
-  int cached_wait_status = 0;
-
   /* True, if in no ack mode.  That is, neither GDB nor the stub will
      expect acks from each other.  The connection is assumed to be
      reliable.  */
@@ -4969,8 +4960,9 @@ remote_target::start_remote_1 (int from_tty, int extended_p)
 
       /* Use the previously fetched status.  */
       gdb_assert (wait_status != NULL);
-      strcpy (rs->buf.data (), wait_status);
-      rs->cached_wait_status = 1;
+      struct notif_event *reply
+	= remote_notif_parse (this, &notif_client_stop, wait_status);
+      push_stop_reply ((struct stop_reply *) reply);
 
       ::start_remote (from_tty); /* Initialize gdb process mechanisms.  */
     }
@@ -5804,7 +5796,6 @@ remote_target::open_1 (const char *name, int from_tty, int extended_p)
   /* Reset the target state; these things will be queried either by
      remote_query_supported or as they are needed.  */
   reset_all_packet_configs_support ();
-  rs->cached_wait_status = 0;
   rs->explicit_packet_size = 0;
   rs->noack_mode = 0;
   rs->extended = extended_p;
@@ -6199,21 +6190,13 @@ extended_remote_target::attach (const char *args, int from_tty)
       /* Use the previously fetched status.  */
       gdb_assert (wait_status != NULL);
 
-      if (target_can_async_p ())
-	{
-	  struct notif_event *reply
-	    =  remote_notif_parse (this, &notif_client_stop, wait_status);
+      struct notif_event *reply
+	=  remote_notif_parse (this, &notif_client_stop, wait_status);
 
-	  push_stop_reply ((struct stop_reply *) reply);
+      push_stop_reply ((struct stop_reply *) reply);
 
-	  target_async (1);
-	}
-      else
-	{
-	  gdb_assert (wait_status != NULL);
-	  strcpy (rs->buf.data (), wait_status);
-	  rs->cached_wait_status = 1;
-	}
+      if (target_can_async_p ())
+	target_async (1);
     }
   else
     {
@@ -7084,9 +7067,9 @@ remote_target::remote_interrupt_as ()
   rs->ctrlc_pending_p = 1;
 
   /* If the inferior is stopped already, but the core didn't know
-     about it yet, just ignore the request.  The cached wait status
+     about it yet, just ignore the request.  The pending stop events
      will be collected in remote_wait.  */
-  if (rs->cached_wait_status)
+  if (stop_reply_queue_length () > 0)
     return;
 
   /* Send interrupt_sequence to remote target.  */
@@ -7480,7 +7463,7 @@ remote_target::queued_stop_reply (ptid_t ptid)
   remote_state *rs = get_remote_state ();
   struct stop_reply *r = remote_notif_remove_queued_reply (ptid);
 
-  if (!rs->stop_reply_queue.empty ())
+  if (!rs->stop_reply_queue.empty () && target_can_async_p ())
     {
       /* There's still at least an event left.  */
       mark_async_event_handler (rs->remote_async_inferior_event_token);
@@ -7505,7 +7488,12 @@ remote_target::push_stop_reply (struct stop_reply *new_event)
 			target_pid_to_str (new_event->ptid).c_str (),
 			int (rs->stop_reply_queue.size ()));
 
-  mark_async_event_handler (rs->remote_async_inferior_event_token);
+  /* Mark the pending event queue only if async mode is currently enabled.
+     If async mode is not currently enabled, then, if it later becomes
+     enabled, and there are events in this queue, we will mark the event
+     token at that point, see remote_target::async.  */
+  if (target_is_async_p ())
+    mark_async_event_handler (rs->remote_async_inferior_event_token);
 }
 
 /* Returns true if we have a stop reply for PTID.  */
@@ -8214,15 +8202,14 @@ remote_target::wait_as (ptid_t ptid, target_waitstatus *status,
 
   stop_reply = queued_stop_reply (ptid);
   if (stop_reply != NULL)
-    return process_stop_reply (stop_reply, status);
-
-  if (rs->cached_wait_status)
-    /* Use the cached wait status, but only once.  */
-    rs->cached_wait_status = 0;
+    {
+      /* Currently non of the paths that push a stop reply onto the queue
+	 will have set the waiting_for_stop_reply flag.  */
+      gdb_assert (!rs->waiting_for_stop_reply);
+      event_ptid = process_stop_reply (stop_reply, status);
+    }
   else
     {
-      int ret;
-      int is_notif;
       int forever = ((options & TARGET_WNOHANG) == 0
 		     && rs->wait_forever_enabled_p);
 
@@ -8236,7 +8223,8 @@ remote_target::wait_as (ptid_t ptid, target_waitstatus *status,
 	 _never_ wait for ever -> test on target_is_async_p().
 	 However, before we do that we need to ensure that the caller
 	 knows how to take the target into/out of async mode.  */
-      ret = getpkt_or_notif_sane (&rs->buf, forever, &is_notif);
+      int is_notif;
+      int ret = getpkt_or_notif_sane (&rs->buf, forever, &is_notif);
 
       /* GDB gets a notification.  Return to core as this event is
 	 not interesting.  */
@@ -8245,73 +8233,73 @@ remote_target::wait_as (ptid_t ptid, target_waitstatus *status,
 
       if (ret == -1 && (options & TARGET_WNOHANG) != 0)
 	return minus_one_ptid;
-    }
 
-  buf = rs->buf.data ();
+      buf = rs->buf.data ();
 
-  /* Assume that the target has acknowledged Ctrl-C unless we receive
-     an 'F' or 'O' packet.  */
-  if (buf[0] != 'F' && buf[0] != 'O')
-    rs->ctrlc_pending_p = 0;
+      /* Assume that the target has acknowledged Ctrl-C unless we receive
+	 an 'F' or 'O' packet.  */
+      if (buf[0] != 'F' && buf[0] != 'O')
+	rs->ctrlc_pending_p = 0;
 
-  switch (buf[0])
-    {
-    case 'E':		/* Error of some sort.	*/
-      /* We're out of sync with the target now.  Did it continue or
-	 not?  Not is more likely, so report a stop.  */
-      rs->waiting_for_stop_reply = 0;
+      switch (buf[0])
+	{
+	case 'E':		/* Error of some sort.	*/
+	  /* We're out of sync with the target now.  Did it continue or
+	     not?  Not is more likely, so report a stop.  */
+	  rs->waiting_for_stop_reply = 0;
 
-      warning (_("Remote failure reply: %s"), buf);
-      status->set_stopped (GDB_SIGNAL_0);
-      break;
-    case 'F':		/* File-I/O request.  */
-      /* GDB may access the inferior memory while handling the File-I/O
-	 request, but we don't want GDB accessing memory while waiting
-	 for a stop reply.  See the comments in putpkt_binary.  Set
-	 waiting_for_stop_reply to 0 temporarily.  */
-      rs->waiting_for_stop_reply = 0;
-      remote_fileio_request (this, buf, rs->ctrlc_pending_p);
-      rs->ctrlc_pending_p = 0;
-      /* GDB handled the File-I/O request, and the target is running
-	 again.  Keep waiting for events.  */
-      rs->waiting_for_stop_reply = 1;
-      break;
-    case 'N': case 'T': case 'S': case 'X': case 'W':
-      {
-	/* There is a stop reply to handle.  */
-	rs->waiting_for_stop_reply = 0;
+	  warning (_("Remote failure reply: %s"), buf);
+	  status->set_stopped (GDB_SIGNAL_0);
+	  break;
+	case 'F':		/* File-I/O request.  */
+	  /* GDB may access the inferior memory while handling the File-I/O
+	     request, but we don't want GDB accessing memory while waiting
+	     for a stop reply.  See the comments in putpkt_binary.  Set
+	     waiting_for_stop_reply to 0 temporarily.  */
+	  rs->waiting_for_stop_reply = 0;
+	  remote_fileio_request (this, buf, rs->ctrlc_pending_p);
+	  rs->ctrlc_pending_p = 0;
+	  /* GDB handled the File-I/O request, and the target is running
+	     again.  Keep waiting for events.  */
+	  rs->waiting_for_stop_reply = 1;
+	  break;
+	case 'N': case 'T': case 'S': case 'X': case 'W':
+	  {
+	    /* There is a stop reply to handle.  */
+	    rs->waiting_for_stop_reply = 0;
 
-	stop_reply
-	  = (struct stop_reply *) remote_notif_parse (this,
-						      &notif_client_stop,
-						      rs->buf.data ());
+	    stop_reply
+	      = (struct stop_reply *) remote_notif_parse (this,
+							  &notif_client_stop,
+							  rs->buf.data ());
 
-	event_ptid = process_stop_reply (stop_reply, status);
-	break;
-      }
-    case 'O':		/* Console output.  */
-      remote_console_output (buf + 1);
-      break;
-    case '\0':
-      if (rs->last_sent_signal != GDB_SIGNAL_0)
-	{
-	  /* Zero length reply means that we tried 'S' or 'C' and the
-	     remote system doesn't support it.  */
-	  target_terminal::ours_for_output ();
-	  printf_filtered
-	    ("Can't send signals to this remote system.  %s not sent.\n",
-	     gdb_signal_to_name (rs->last_sent_signal));
-	  rs->last_sent_signal = GDB_SIGNAL_0;
-	  target_terminal::inferior ();
-
-	  strcpy (buf, rs->last_sent_step ? "s" : "c");
-	  putpkt (buf);
+	    event_ptid = process_stop_reply (stop_reply, status);
+	    break;
+	  }
+	case 'O':		/* Console output.  */
+	  remote_console_output (buf + 1);
+	  break;
+	case '\0':
+	  if (rs->last_sent_signal != GDB_SIGNAL_0)
+	    {
+	      /* Zero length reply means that we tried 'S' or 'C' and the
+		 remote system doesn't support it.  */
+	      target_terminal::ours_for_output ();
+	      printf_filtered
+		("Can't send signals to this remote system.  %s not sent.\n",
+		 gdb_signal_to_name (rs->last_sent_signal));
+	      rs->last_sent_signal = GDB_SIGNAL_0;
+	      target_terminal::inferior ();
+
+	      strcpy (buf, rs->last_sent_step ? "s" : "c");
+	      putpkt (buf);
+	      break;
+	    }
+	  /* fallthrough */
+	default:
+	  warning (_("Invalid remote reply: %s"), buf);
 	  break;
 	}
-      /* fallthrough */
-    default:
-      warning (_("Invalid remote reply: %s"), buf);
-      break;
     }
 
   if (status->kind () == TARGET_WAITKIND_NO_RESUMED)
@@ -9596,10 +9584,6 @@ remote_target::putpkt_binary (const char *buf, int cnt)
 	       "and then try again."));
     }
 
-  /* We're sending out a new packet.  Make sure we don't look at a
-     stale cached response.  */
-  rs->cached_wait_status = 0;
-
   /* Copy the packet into buffer BUF2, encapsulating it
      and giving it a checksum.  */
 
@@ -9937,10 +9921,6 @@ remote_target::getpkt_or_notif_sane_1 (gdb::char_vector *buf,
   int timeout;
   int val = -1;
 
-  /* We're reading a new response.  Make sure we don't look at a
-     previously cached response.  */
-  rs->cached_wait_status = 0;
-
   strcpy (buf->data (), "timeout");
 
   if (forever)



More information about the Gdb-patches mailing list