Bug 28405 - arm-none-eabi: internal-error: ptid_t remote_target::select_thread_for_ambiguous_stop_reply(const target_waitstatus*): Assertion `first_resumed_thread != nullptr' failed
Summary: arm-none-eabi: internal-error: ptid_t remote_target::select_thread_for_ambigu...
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: 11.1
: P2 normal
Target Milestone: 11.2
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-01 09:45 UTC by John Whittington
Modified: 2021-12-23 13:18 UTC (History)
4 users (show)

See Also:
Host: Arch
Target: arm-none-eabi
Build: 11.1.90.20210930-git
Last reconfirmed: 2021-10-04 00:00:00


Attachments
core dump (2.53 MB, application/octet-stream)
2021-10-01 09:45 UTC, John Whittington
Details
core dump 2 (2.54 MB, application/octet-stream)
2021-10-01 09:45 UTC, John Whittington
Details
capture of gdb prompt (1.61 KB, text/plain)
2021-10-01 09:46 UTC, John Whittington
Details
Possible fix. (2.97 KB, patch)
2021-10-04 15:29 UTC, Andrew Burgess
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description John Whittington 2021-10-01 09:45:08 UTC
Created attachment 13691 [details]
core dump

The changes in https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=8f66807b98f7634c43149ea62e454ea8f877691d appear to have broken usage with the Black Magic Probe remote debugger.

I'm not familiar with the remote protocol and whether the device should change but the developer suggests not and I would assume that the gdb changes should be backwards compatible: https://github.com/blacksphere/blackmagic/issues/929

The problem occurs when trying to attach to a target:

Attaching to Remote target                                                      
remote.c:7979: internal-error: ptid_t remote_target::select_thread_for_ambiguous_stop_reply(const target_waitstatus*): Assertion `first_resumed_thread != nullptr' failed.

I've attached two core dumps and a capture of gdb.
Comment 1 John Whittington 2021-10-01 09:45:43 UTC
Created attachment 13692 [details]
core dump 2
Comment 2 John Whittington 2021-10-01 09:46:01 UTC
Created attachment 13693 [details]
capture of gdb prompt
Comment 3 Simon Marchi 2021-10-01 13:48:19 UTC
Is there a way to reproduce without an actual device, like with a pseudo software blackmagic probe hooked to a simulator or something?
Comment 4 John Whittington 2021-10-01 15:15:26 UTC
Not that I'm aware of...not ideal I know. It's probably possible to flash a STM32103 in qemu or something with the black magic firmware and remote to that? Not sure how one would then attempt attaching to a target then though.

I have checked out the source however so can assist in any suggested fix - I had hoped it might be simple for someone familiar with the codebase, when considering the changes made in that commit.

I can attempt some digging myself but some pointers would be helpful.
Comment 5 Simon Marchi 2021-10-01 16:56:44 UTC
At least with the RSP logs you gave, it should be possible to understand what's going on.  The advantage of being able to reproduce it locally is that whoever works on a fix can test the fix themselves.  But worst case, they'll send you a patch to test.
Comment 6 Andrew Burgess 2021-10-04 13:52:08 UTC
This can be reproduced with standard gdbserver on x86-64 using current HEAD:

1. Start some process with process-id PID

2. Start gdbserver like this:
     gdbserver --disable-packet=Tthread --multi :54321

3. Then in GDB:
     target extended-remote :54321
     attach PID

I'm not seeing the assert, but I do see the segfault.  I'm continuing to investigate.
Comment 7 Andrew Burgess 2021-10-04 14:10:23 UTC
I told a little lie in my last comment.  I was actually testing with:

     target extended-remote :54321
     set debug remote 1
     attach PID

The segfault is caused by a dereference of `first_resumed_thread` from within a debug print out statement.  If 'set debug remote 1' is not done then I do, indeed, hit the assert.
Comment 8 Andrew Burgess 2021-10-04 15:29:36 UTC
Created attachment 13699 [details]
Possible fix.

This patch seems like it might fix the issue.  This is still in testing here.  Assume all tests pass I'll post this to the mailing list.
Comment 9 John Whittington 2021-10-05 14:20:31 UTC
I can confirm that the patch fixes the issue with the Black Magic Probe too.
Comment 10 Andrew Burgess 2021-10-06 08:11:04 UTC
John, thanks for confirming that.  I posted this patch to the mailing list here:

  https://sourceware.org/pipermail/gdb-patches/2021-October/182394.html
Comment 11 cvs-commit@gcc.gnu.org 2021-12-23 12:19:07 UTC
The master branch has been updated by Andrew Burgess <aburgess@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b622494ee378fd0a490c934c509364b4c7735273

commit b622494ee378fd0a490c934c509364b4c7735273
Author: Andrew Burgess <andrew.burgess@embecosm.com>
Date:   Mon Oct 4 15:48:11 2021 +0100

    gdb/remote: handle attach when stop packet lacks thread-id
    
    Bug PR gdb/28405 reports a regression when using attach with an
    extended-remote target.  In this case the target is not including a
    thread-id in the stop packet it sends back after the attach.
    
    The regression was introduced with this commit:
    
      commit 8f66807b98f7634c43149ea62e454ea8f877691d
      Date:   Wed Jan 13 20:26:58 2021 -0500
    
          gdb: better handling of 'S' packets
    
    The problem is that when GDB processes the stop packet, it sees that
    there is no thread-id and so has to "guess" which thread the stop
    should apply to.
    
    In this case the target only has one thread, so really, there's no
    guessing needed, but GDB still runs through the same process, this
    shouldn't cause us any problems.
    
    However, after the above commit, GDB now expects itself to be more
    internally consistent, specifically, only a thread that GDB thinks is
    resumed, can be a candidate for having stopped.
    
    It turns out that, when GDB attaches to a process through an
    extended-remote target, the threads of the process being attached too,
    are not, initially, marked as resumed.
    
    And so, when GDB tries to figure out which thread the stop might apply
    too, it finds no threads in the processes marked resumed, and so an
    assert triggers.
    
    In extended_remote_target::attach we create a new thread with a call
    to add_thread_silent, rather than remote_target::remote_add_thread,
    the reason is that calling the latter will result in a call to
    'add_thread' rather than 'add_thread_silent'.  However,
    remote_target::remote_add_thread includes additional
    actions (i.e. calling remote_thread_info::set_resumed and set_running)
    which are missing from extended_remote_target::attach.  These missing
    calls are what would serve to mark the new thread as resumed.
    
    In this commit I propose that we add an extra parameter to
    remote_target::remote_add_thread.  This new parameter will force the
    new thread to be added with a call to add_thread_silent.  We can now
    call remote_add_thread from the ::attach method, the extra
    actions (listed above) will now be performed, and the thread will be
    left in the correct state.
    
    Additionally, in PR gdb/28405, a segfault is reported.  This segfault
    triggers when 'set debug remote 1' is used before trying to reproduce
    the original assertion failure.  The cause of this is in
    remote_target::select_thread_for_ambiguous_stop_reply, where we do
    this:
    
      remote_debug_printf ("first resumed thread is %s",
                           pid_to_str (first_resumed_thread->ptid).c_str ());
      remote_debug_printf ("is this guess ambiguous? = %d", ambiguous);
    
      gdb_assert (first_resumed_thread != nullptr);
    
    Notice that when debug printing is on we dereference
    first_resumed_thread before we assert that the pointer is not
    nullptr.  This is the cause of the segfault, and is resolved by moving
    the assert before the debug printing code.
    
    I've extended an existing test, ext-attach.exp, so that the original
    test is run multiple times; we run in the original mode, as normal,
    but also, we now run with different packets disabled in gdbserver.  In
    particular, disabling Tthread would trigger the assertion as it was
    reported in the original bug.  I also run the test in all-stop and
    non-stop modes now for extra coverage, we also run the tests with
    target-async enabled, and disabled.
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28405
Comment 12 cvs-commit@gcc.gnu.org 2021-12-23 13:17:46 UTC
The gdb-11-branch branch has been updated by Andrew Burgess <aburgess@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=6eccc2c811ad292ce3234d2a0cd71b2184ac40eb

commit 6eccc2c811ad292ce3234d2a0cd71b2184ac40eb
Author: Andrew Burgess <andrew.burgess@embecosm.com>
Date:   Mon Oct 4 15:48:11 2021 +0100

    gdb/remote: handle attach when stop packet lacks thread-id
    
    Bug PR gdb/28405 reports a regression when using attach with an
    extended-remote target.  In this case the target is not including a
    thread-id in the stop packet it sends back after the attach.
    
    The regression was introduced with this commit:
    
      commit 8f66807b98f7634c43149ea62e454ea8f877691d
      Date:   Wed Jan 13 20:26:58 2021 -0500
    
          gdb: better handling of 'S' packets
    
    The problem is that when GDB processes the stop packet, it sees that
    there is no thread-id and so has to "guess" which thread the stop
    should apply to.
    
    In this case the target only has one thread, so really, there's no
    guessing needed, but GDB still runs through the same process, this
    shouldn't cause us any problems.
    
    However, after the above commit, GDB now expects itself to be more
    internally consistent, specifically, only a thread that GDB thinks is
    resumed, can be a candidate for having stopped.
    
    It turns out that, when GDB attaches to a process through an
    extended-remote target, the threads of the process being attached too,
    are not, initially, marked as resumed.
    
    And so, when GDB tries to figure out which thread the stop might apply
    too, it finds no threads in the processes marked resumed, and so an
    assert triggers.
    
    In extended_remote_target::attach we create a new thread with a call
    to add_thread_silent, rather than remote_target::remote_add_thread,
    the reason is that calling the latter will result in a call to
    'add_thread' rather than 'add_thread_silent'.  However,
    remote_target::remote_add_thread includes additional
    actions (i.e. calling remote_thread_info::set_resumed and set_running)
    which are missing from extended_remote_target::attach.  These missing
    calls are what would serve to mark the new thread as resumed.
    
    In this commit I propose that we add an extra parameter to
    remote_target::remote_add_thread.  This new parameter will force the
    new thread to be added with a call to add_thread_silent.  We can now
    call remote_add_thread from the ::attach method, the extra
    actions (listed above) will now be performed, and the thread will be
    left in the correct state.
    
    Additionally, in PR gdb/28405, a segfault is reported.  This segfault
    triggers when 'set debug remote 1' is used before trying to reproduce
    the original assertion failure.  The cause of this is in
    remote_target::select_thread_for_ambiguous_stop_reply, where we do
    this:
    
      remote_debug_printf ("first resumed thread is %s",
                           pid_to_str (first_resumed_thread->ptid).c_str ());
      remote_debug_printf ("is this guess ambiguous? = %d", ambiguous);
    
      gdb_assert (first_resumed_thread != nullptr);
    
    Notice that when debug printing is on we dereference
    first_resumed_thread before we assert that the pointer is not
    nullptr.  This is the cause of the segfault, and is resolved by moving
    the assert before the debug printing code.
    
    I've extended an existing test, ext-attach.exp, so that the original
    test is run multiple times; we run in the original mode, as normal,
    but also, we now run with different packets disabled in gdbserver.  In
    particular, disabling Tthread would trigger the assertion as it was
    reported in the original bug.  I also run the test in all-stop and
    non-stop modes now for extra coverage, we also run the tests with
    target-async enabled, and disabled.
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28405
    
    This is a cherry pick of commit b622494ee378fd0a490 with a minor edit
    in gdb.server/ext-attach.exp to disable some tests that fail due to
    unrelated bugs.  Those unrelated bugs have been fixed in the master
    branch.
    
    gdb/ChangeLog:
    
            PR gdb/28405
            * remote.c (remote_target::remote_add_thread): Add new silent_p
            argument, use as needed.
            (remote_target::remote_notice_new_inferior): Pass additional
            argument to remote_add_thread.
            (remote_target::remote_notice_new_inferior): Likewise.
            (extended_remote_target::attach): Call remote_add_thread instead
            of add_thred_silent directly.
            (remote_target::select_thread_for_ambiguous_stop_reply): Move
            assert earlier, before we use the thing we're asserting is not
            nullptr.
    
    gdb/testsuite/ChangeLog:
    
            PR gdb/28405
            * gdb.server/ext-attach.exp (run_test): New proc containing all of
            the old code for running the core of the test.  This proc is then
            called multiple times from the global scope.
Comment 13 Andrew Burgess 2021-12-23 13:18:26 UTC
This issue should now be resolved in both master, and gdb-11-branch.