Bug 22597 - Regression: backwards compatibility with old GDBservers broken
Summary: Regression: backwards compatibility with old GDBservers broken
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: remote (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: 8.1
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-12 21:35 UTC by Maciej W. Rozycki
Modified: 2020-03-02 15:08 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Maciej W. Rozycki 2017-12-12 21:35:50 UTC
At <https://sourceware.org/ml/gdb-patches/2017-12/msg00285.html>, I wrote:

On Tue, 12 Dec 2017, Maciej W. Rozycki wrote:

>  Another regression.  This makes `mips-mti-linux-gnu' GDB stop working 
> with older stubs.  I have discovered it in an attempt to regression-test 
> said GDB with a remote n64 target and `x86_64-pc-linux-gnu' host, and an 
> outstanding change which affects non-XML stubs.  Any attempt to continue 
> execution after the initial connection fails with:
> 
> [...]
> Process .../gdb/testsuite/outputs/gdb.base/advance/advance created; pid = 2670
> Listening on port 2346
> target remote [...]:2346
> Remote debugging using [...]:2346
> Reading symbols from .../lib64/ld.so.1...done.
> [Switching to Thread <main>]
> (gdb) continue
> Cannot execute this command without a live selected thread.
> (gdb) 
> 
> This is as from commit 5cd63fda035d.  Up to 5cd63fda035d^ instead I get:
> 
> [...]
> Process .../gdb/testsuite/outputs/gdb.base/advance/advance created; pid = 30044
> Listening on port 2346
> target remote [...]:2346
> Remote debugging using [...]:2346
> Reading symbols from .../lib64/ld.so.1...done.
> 0x000000fff79534f0 in __start () from .../lib64/ld.so.1
> (gdb) continue
> Continuing.
> warning: Could not load shared library symbols for linux-vdso.so.1.
> Do you need "set solib-search-path" or "set sysroot"?
> 
> Breakpoint 1, main () at .../gdb/testsuite/gdb.base/advance.c:41
> 41	  c = 5;
> (gdb) 
> 
> At the protocol level the difference starts here (bad):
> 
> Sending packet: $qfThreadInfo#bb...Ack
> Packet received: m1814
> Sending packet: $qsThreadInfo#c8...Ack
> Packet received: l
> [Switching to Thread <main>]
> Sending packet: $qSymbol::#5b...Ack
> Packet received: qSymbol:6e70746c5f76657273696f6e
> Sending packet: $qSymbol::6e70746c5f76657273696f6e#4d...Ack
> Packet received: OK
> Sending packet: $Hg1814#7d...Ack
> Packet received: OK
> Sending packet: $Hg0#df...Ack
> Packet received: E01
> (gdb) continue
> Cannot execute this command without a live selected thread.
> (gdb)
> 
> vs (good):
> 
> Sending packet: $qfThreadInfo#bb...Ack
> Packet received: m154d
> Sending packet: $qsThreadInfo#c8...Ack
> Packet received: l
> Sending packet: $mfff726c4f0,4#cb...Ack
> Packet received: 03e0c825
> Sending packet: $mfff726c4ec,4#fd...Ack
> Packet received: 00000000
> Sending packet: $mfff726c4f0,4#cb...Ack
> Packet received: 03e0c825
> 0x000000fff726c4f0 in __start () from .../lib64/ld.so.1
> Sending packet: $qSymbol::#5b...Ack
> Packet received: qSymbol:6e70746c5f76657273696f6e
> Sending packet: $qSymbol::6e70746c5f76657273696f6e#4d...Ack
> Packet received: OK
> (gdb) continue
> Continuing.
> Sending packet: $mfff726c4f0,4#cb...Ack
> Packet received: 03e0c825
> Sending packet: $Z0,120000d6c,4#36...Ack
> Packet received:
> Packet Z0 (software-breakpoint) is NOT supported
> [...]
> 
> The version of `gdbserver' causing this regression is as at commit 
> f8b73d13b7ca^, the last MIPS backend version with no XML support.  It's 
> likely that later versions hit this regression too, as the mode of failure 
> is clearly not XML-related.
Comment 1 Maciej W. Rozycki 2018-01-04 21:06:11 UTC
As investigated here:

<https://sourceware.org/ml/gdb-patches/2017-12/msg00299.html>

the problem comes from the use of a `Hg0' packet which isn't supported
by older stubs that support the general `Hg' form.  Given that `Hg1814'
has been previously issued there's no point in following up with `Hg0'.
Comment 2 Sourceware Commits 2018-01-11 00:25:15 UTC
The master branch has been updated by Pedro Alves <palves@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=3cada74087687907311b52781354ff551e10a0ed

commit 3cada74087687907311b52781354ff551e10a0ed
Author: Pedro Alves <palves@redhat.com>
Date:   Thu Jan 11 00:23:04 2018 +0000

    Fix backwards compatibility with old GDBservers (PR remote/22597)
    
    At <https://sourceware.org/ml/gdb-patches/2017-12/msg00285.html>,
    Maciej reported that commit:
    
      commit 5cd63fda035d4ba949e6478406162c4673b3c9ef
      Date: Wed Oct 4 18:21:10 2017 +0100
      Subject: Fix "Remote 'g' packet reply is too long" problems with multiple inferiors
    
    made GDB stop working with older stubs.  Any attempt to continue
    execution after the initial connection fails with:
    
      [...]
      Process .../gdb/testsuite/outputs/gdb.base/advance/advance created; pid = 2670
      Listening on port 2346
      target remote [...]:2346
      Remote debugging using [...]:2346
      Reading symbols from .../lib64/ld.so.1...done.
      [Switching to Thread <main>]
      (gdb) continue
      Cannot execute this command without a live selected thread.
      (gdb)
    
    The problem is:
    
      (gdb) c
      Cannot execute this command without a live selected thread.
      (gdb) info threads
        Id   Target Id         Frame
        1    Thread 14917      0x00007f341cd98ed0 in _start () from /lib64/ld-linux-x86-64.so.2
    
      The current thread <Thread ID 2> has terminated.  See `help thread'.
    		      ^^^^^^^^^^^
      (gdb)
    
    Note, thread _2_.  There's really only one thread in the inferior
    (it's still at the entry point), but still GDB added a bogus second
    thread.
    
    The reason GDB started adding a second thread after 5cd63fda035d is
    this hunk:
    
    +                 if (event->ptid == null_ptid)
    +                   {
    +                     const char *thr = strstr (p1 + 1, ";thread:");
    +                     if (thr != NULL)
    +                       event->ptid = read_ptid (thr + strlen (";thread:"),
    +                                                NULL);
    +                     else
    +                       event->ptid = magic_null_ptid;
    +                   }
    
    Note the else branch that falls back to magic_null_ptid.  We reach
    that when we process the initial stop reply sent back in response to
    the the "?" (status) packet early in the connection setup:
    
     Sending packet: $?#3f...Ack
     Packet received: T0506:0000000000000000;07:40a510f4fd7f0000;10:d0fe1201577f0000;
    
    And note that that response does not include a ";thread:XXX" part.
    
    This stop reply is processed after listing threads with qfThreadInfo /
    qsThreadInfo :
    
     Sending packet: $qfThreadInfo#bb...Ack
     Packet received: m3915
     Sending packet: $qsThreadInfo#c8...Ack
     Packet received: l
    
    meaning, when we process that stop reply, we treat the event as coming
    from a thread with ptid == magic_null_ptid, which is not yet in the
    thread list, so we add it then:
    
      (top-gdb) p ptid
      $1 = {m_pid = 42000, m_lwp = -1, m_tid = 1}
      (top-gdb) bt
      #0  0x0000000000840a8c in add_thread_silent(ptid_t) (ptid=...) at src/gdb/thread.c:269
      #1  0x00000000007ad61d in remote_add_thread(ptid_t, int, int) (ptid=..., running=0, executing=0)
          at src/gdb/remote.c:1838
      #2  0x00000000007ad8de in remote_notice_new_inferior(ptid_t, int) (currthread=..., executing=0)
          at src/gdb/remote.c:1921
      #3  0x00000000007b758b in process_stop_reply(stop_reply*, target_waitstatus*) (stop_reply=0x1158860, status=0x7fffffffcc00)
          at src/gdb/remote.c:7217
      #4  0x00000000007b7a38 in remote_wait_as(ptid_t, target_waitstatus*, int) (ptid=..., status=0x7fffffffcc00, options=0)
          at src/gdb/remote.c:7380
      #5  0x00000000007b7cd1 in remote_wait(target_ops*, ptid_t, target_waitstatus*, int) (ops=0x102fac0 <remote_ops>, ptid=..., status=0x7fffffffcc00, options=0) at src/gdb/remote.c:7446
      #6  0x000000000081587b in delegate_wait(target_ops*, ptid_t, target_waitstatus*, int) (self=0x102fac0 <remote_ops>, arg1=..., arg2=0x7fffffffcc00, arg3=0) at src/gdb/target-delegates.c:138
      #7  0x0000000000827d77 in target_wait(ptid_t, target_waitstatus*, int) (ptid=..., status=0x7fffffffcc00, options=0)
          at src/gdb/target.c:2179
      #8  0x0000000000715fda in do_target_wait(ptid_t, target_waitstatus*, int) (ptid=..., status=0x7fffffffcc00, options=0)
          at src/gdb/infrun.c:3589
      #9  0x0000000000716351 in wait_for_inferior() () at src/gdb/infrun.c:3707
      #10 0x0000000000715435 in start_remote(int) (from_tty=1) at src/gdb/infrun.c:3212
    
    things go downhill from this.
    
    We don't see the problem with current master gdbserver, because that
    version always sends the ";thread:" part in the initial stop reply:
    
     Sending packet: $?#3f...Packet received: T0506:0000000000000000;07:a0d4ffffff7f0000;10:d05eddf7ff7f0000;thread:p3cea.3cea;core:3;
    
    Years ago I had added a "--disable-packet=" command line option to
    gdbserver which comes in handy for testing this, since the existing
    "--disable-packet=Tthread" precisely makes gdbserver not send that
    ";thread:" part in stop replies.  The testcase added by this commit
    emulates old gdbserver making use of that.
    
    I've compared a testrun at 5cd63fda035d^ (before regression) with
    'current master+patch', against old gdbserver at f8b73d13b7ca^.  I
    hacked out --once, and "monitor exit" to be able to test.  The results
    are a bit too unstable to tell accurately, but it looked like there
    were no regressions.  Maciej confirmed this worked for him as well.
    
    No regressions on master (against master gdbserver).
    
    gdb/ChangeLog:
    2018-01-11  Pedro Alves  <palves@redhat.com>
    
    	PR remote/22597
    	* remote.c (remote_parse_stop_reply): Default to the last-set
    	general thread instead of to 'magic_null_ptid'.
    
    gdb/testsuite/ChangeLog:
    2018-01-11  Pedro Alves  <palves@redhat.com>
    
    	PR remote/22597
    	* gdb.server/stop-reply-no-thread.c: New file.
    	* gdb.server/stop-reply-no-thread.exp: New file.
Comment 3 Sourceware Commits 2018-01-11 00:31:22 UTC
The gdb-8.1-branch branch has been updated by Pedro Alves <palves@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=936a312c04f9f1dd856571bf7573b5a8f51ed895

commit 936a312c04f9f1dd856571bf7573b5a8f51ed895
Author: Pedro Alves <palves@redhat.com>
Date:   Thu Jan 11 00:24:59 2018 +0000

    Fix backwards compatibility with old GDBservers (PR remote/22597)
    
    At <https://sourceware.org/ml/gdb-patches/2017-12/msg00285.html>,
    Maciej reported that commit:
    
      commit 5cd63fda035d4ba949e6478406162c4673b3c9ef
      Date: Wed Oct 4 18:21:10 2017 +0100
      Subject: Fix "Remote 'g' packet reply is too long" problems with multiple inferiors
    
    made GDB stop working with older stubs.  Any attempt to continue
    execution after the initial connection fails with:
    
      [...]
      Process .../gdb/testsuite/outputs/gdb.base/advance/advance created; pid = 2670
      Listening on port 2346
      target remote [...]:2346
      Remote debugging using [...]:2346
      Reading symbols from .../lib64/ld.so.1...done.
      [Switching to Thread <main>]
      (gdb) continue
      Cannot execute this command without a live selected thread.
      (gdb)
    
    The problem is:
    
      (gdb) c
      Cannot execute this command without a live selected thread.
      (gdb) info threads
        Id   Target Id         Frame
        1    Thread 14917      0x00007f341cd98ed0 in _start () from /lib64/ld-linux-x86-64.so.2
    
      The current thread <Thread ID 2> has terminated.  See `help thread'.
    		      ^^^^^^^^^^^
      (gdb)
    
    Note, thread _2_.  There's really only one thread in the inferior
    (it's still at the entry point), but still GDB added a bogus second
    thread.
    
    The reason GDB started adding a second thread after 5cd63fda035d is
    this hunk:
    
    +                 if (event->ptid == null_ptid)
    +                   {
    +                     const char *thr = strstr (p1 + 1, ";thread:");
    +                     if (thr != NULL)
    +                       event->ptid = read_ptid (thr + strlen (";thread:"),
    +                                                NULL);
    +                     else
    +                       event->ptid = magic_null_ptid;
    +                   }
    
    Note the else branch that falls back to magic_null_ptid.  We reach
    that when we process the initial stop reply sent back in response to
    the the "?" (status) packet early in the connection setup:
    
     Sending packet: $?#3f...Ack
     Packet received: T0506:0000000000000000;07:40a510f4fd7f0000;10:d0fe1201577f0000;
    
    And note that that response does not include a ";thread:XXX" part.
    
    This stop reply is processed after listing threads with qfThreadInfo /
    qsThreadInfo :
    
     Sending packet: $qfThreadInfo#bb...Ack
     Packet received: m3915
     Sending packet: $qsThreadInfo#c8...Ack
     Packet received: l
    
    meaning, when we process that stop reply, we treat the event as coming
    from a thread with ptid == magic_null_ptid, which is not yet in the
    thread list, so we add it then:
    
      (top-gdb) p ptid
      $1 = {m_pid = 42000, m_lwp = -1, m_tid = 1}
      (top-gdb) bt
      #0  0x0000000000840a8c in add_thread_silent(ptid_t) (ptid=...) at src/gdb/thread.c:269
      #1  0x00000000007ad61d in remote_add_thread(ptid_t, int, int) (ptid=..., running=0, executing=0)
          at src/gdb/remote.c:1838
      #2  0x00000000007ad8de in remote_notice_new_inferior(ptid_t, int) (currthread=..., executing=0)
          at src/gdb/remote.c:1921
      #3  0x00000000007b758b in process_stop_reply(stop_reply*, target_waitstatus*) (stop_reply=0x1158860, status=0x7fffffffcc00)
          at src/gdb/remote.c:7217
      #4  0x00000000007b7a38 in remote_wait_as(ptid_t, target_waitstatus*, int) (ptid=..., status=0x7fffffffcc00, options=0)
          at src/gdb/remote.c:7380
      #5  0x00000000007b7cd1 in remote_wait(target_ops*, ptid_t, target_waitstatus*, int) (ops=0x102fac0 <remote_ops>, ptid=..., status=0x7fffffffcc00, options=0) at src/gdb/remote.c:7446
      #6  0x000000000081587b in delegate_wait(target_ops*, ptid_t, target_waitstatus*, int) (self=0x102fac0 <remote_ops>, arg1=..., arg2=0x7fffffffcc00, arg3=0) at src/gdb/target-delegates.c:138
      #7  0x0000000000827d77 in target_wait(ptid_t, target_waitstatus*, int) (ptid=..., status=0x7fffffffcc00, options=0)
          at src/gdb/target.c:2179
      #8  0x0000000000715fda in do_target_wait(ptid_t, target_waitstatus*, int) (ptid=..., status=0x7fffffffcc00, options=0)
          at src/gdb/infrun.c:3589
      #9  0x0000000000716351 in wait_for_inferior() () at src/gdb/infrun.c:3707
      #10 0x0000000000715435 in start_remote(int) (from_tty=1) at src/gdb/infrun.c:3212
    
    things go downhill from this.
    
    We don't see the problem with current master gdbserver, because that
    version always sends the ";thread:" part in the initial stop reply:
    
     Sending packet: $?#3f...Packet received: T0506:0000000000000000;07:a0d4ffffff7f0000;10:d05eddf7ff7f0000;thread:p3cea.3cea;core:3;
    
    Years ago I had added a "--disable-packet=" command line option to
    gdbserver which comes in handy for testing this, since the existing
    "--disable-packet=Tthread" precisely makes gdbserver not send that
    ";thread:" part in stop replies.  The testcase added by this commit
    emulates old gdbserver making use of that.
    
    I've compared a testrun at 5cd63fda035d^ (before regression) with
    'current master+patch', against old gdbserver at f8b73d13b7ca^.  I
    hacked out --once, and "monitor exit" to be able to test.  The results
    are a bit too unstable to tell accurately, but it looked like there
    were no regressions.  Maciej confirmed this worked for him as well.
    
    No regressions on master (against master gdbserver).
    
    gdb/ChangeLog:
    2018-01-11  Pedro Alves  <palves@redhat.com>
    
    	PR remote/22597
    	* remote.c (remote_parse_stop_reply): Default to the last-set
    	general thread instead of to 'magic_null_ptid'.
    
    gdb/testsuite/ChangeLog:
    2018-01-11  Pedro Alves  <palves@redhat.com>
    
    	PR remote/22597
    	* gdb.server/stop-reply-no-thread.c: New file.
    	* gdb.server/stop-reply-no-thread.exp: New file.
Comment 4 Pedro Alves 2018-01-11 00:37:49 UTC
Fix merged.
Comment 5 Sourceware Commits 2020-03-02 15:08:31 UTC
The master branch has been updated by Andrew Burgess <aburgess@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=24ed6739b699f329c2c45aedee5f8c7d2f54e493

commit 24ed6739b699f329c2c45aedee5f8c7d2f54e493
Author: Andrew Burgess <andrew.burgess@embecosm.com>
Date:   Thu Jan 30 14:35:40 2020 +0000

    gdb/remote: Restore support for 'S' stop reply packet
    
    With this commit:
    
      commit 5b6d1e4fa4fc6827c7b3f0e99ff120dfa14d65d2
      Date:   Fri Jan 10 20:06:08 2020 +0000
    
          Multi-target support
    
    There was a regression in GDB's support for older aspects of the
    remote protocol.  Specifically, when a target sends the 'S' stop reply
    packet (which doesn't include a thread-id) then GDB has to figure out
    which thread actually stopped.
    
    Before the above commit GDB figured this out by using inferior_ptid in
    process_stop_reply, which contained the ptid of the current
    process/thread.  This would be fine for single threaded
    targets (which is the only place using an S packet makes sense), but
    in the general case, relying on inferior_ptid for processing a stop is
    wrong - there's no reason to believe that what was GDB's current
    thread will be the same thread that just stopped in the target.
    
    With the above commit the inferior_ptid now has the value null_ptid
    inside process_stop_reply, this can be seen in do_target_wait, where
    we call switch_to_inferior_no_thread before calling do_target_wait_1.
    
    The problem this causes can be seen in the new test that runs
    gdbserver using the flag --disable-packet=T, and causes GDB to throw
    this assertion:
    
      inferior.c:279: internal-error: inferior* find_inferior_pid(process_stratum_target*, int): Assertion `pid != 0' failed.
    
    A similar problem was fixed in this commit:
    
      commit 3cada74087687907311b52781354ff551e10a0ed
      Date:   Thu Jan 11 00:23:04 2018 +0000
    
          Fix backwards compatibility with old GDBservers (PR remote/22597)
    
    However, this commit deals with the case where the T packet doesn't
    include a thread-id, not the S packet case.  This commit solves the
    problem providing a thread-id at the GDB side if the remote target
    doesn't provide one.  The thread-id provided comes from
    remote_state::general_thread, however, though this does work, I don't
    think it is the ideal solution.
    
    The remote_state tracks two threads, the continue_thread and the
    general_thread, these are updated when GDB asks the remote target to
    switch threads.  The general_thread is set before performing things
    like register or memory accesses, and the continue_thread is set
    before things like continue or step commands.  Further, the
    general_thread is updated after a target stops to reference the thread
    that stopped.
    
    The first thing to note from the above description is that we have a
    cycle of dependency, when a T packet arrives without a thread-id we
    fill in the thread-id from the general_thread data.  The thread-id
    from the stop event is then used to set the general_thread.  This in
    itself feels a little weird.
    
    The second question is why use the general_thread at all? You'd think
    given how they are originally set that the continue thread would be a
    better choice.  The problem with this is that the continue_thread, if
    the user just does "continue", will be set to the minus_one_ptid, in
    the remote protocol this means all threads.  When the stop arrives
    with no thread-id and we use continue_thread we end up with a very
    similar assertion to before because we now end up trying to lookup a
    thread using the minus_one_ptid.  By contrast, once GDB has connected
    to a remote target the general_thread will be set to a valid
    thread-id, after which, if the target is single threaded, and stop
    events arrive without a thread-id, everything works fine.
    
    There is one slight weirdness with the above behaviour though.  When
    GDB first connects to the remote target inferior_ptid is null_ptid,
    however, upon connecting we query the remote for its threads.  As the
    thread information arrives GDB adds the threads to its internal
    database, and this process involves setting inferior_ptid to the id of
    each new thread in turn.  Once we know about all the threads we wait
    for a stop event from the remote target to indicate that GDB is now in
    control of the target.
    
    The problem is that after adding the new threads we don't reset
    inferior_ptid, and the code path we use to wait for a stop event from
    the target also doesn't reset inferior_ptid, so it turns out that
    during the initial connection inferior_ptid is not null_ptid.  This is
    lucky, because during the initial connection the general_thread
    variable _is_ set to null_ptid.
    
    So, during the initial connection, if the first stop event is missing
    a thread-id then we "provide" a thead-id from general_thread.  This
    turns out to be null_ptid meaning no thread-id is known, and then
    during process_stop_reply we fill in the missing thread-id using
    inferior_ptid.
    
    This was all discussed on the mailing list here:
    
      https://sourceware.org/ml/gdb-patches/2020-02/msg01011.html
    
    My proposal for a fix then is:
    
     1. Move the call to switch_to_inferior_no_thread into
     do_target_wait_1, this means that in all cases where we are waiting
     for an inferior the inferior_ptid will be set to null_ptid.  This is
     good as no wait code should rely on inferior_ptid.
    
     2. Remove the use of general_thread from the 'T' packet processing.
     The general_thread read here was only ever correct by chance, and we
     shouldn't be using it this way.
    
     3. Remove use of inferior_ptid from process_stop_event as this is
     wrong, and will always be null_ptid now anyway.
    
     4. When a stop_event has null_ptid due to a lack of thread-id (either
     from a T packet or an S packet) then pick the first non exited thread
     in the target and use that.  This will be fine for single threaded
     targets.  A multi-thread or multi-inferior aware remote target
     should be using T packets with a thread-id, so we give a warning if
     the target is multi-threaded, and we are still missing a thread-id.
    
     5. Extend the existing test that covered the T packet with missing
     thread-id to also cover the S packet.
    
    gdb/ChangeLog:
    
    	* remote.c (remote_target::remote_parse_stop_reply): Don't use the
    	general_thread if the stop reply is missing a thread-id.
    	(remote_target::process_stop_reply): Use the first non-exited
    	thread if the target didn't pass a thread-id.
    	* infrun.c (do_target_wait): Move call to
    	switch_to_inferior_no_thread to ....
    	(do_target_wait_1): ... here.
    
    gdb/testsuite/ChangeLog:
    
    	* gdb.server/stop-reply-no-thread.exp: Add test where T packet is
    	disabled.