Bug 26819 - RISC-V: internal-error: int finish_step_over(execution_control_state*): Assertion
Summary: RISC-V: internal-error: int finish_step_over(execution_control_state*): Asser...
Status: REOPENED
Alias: None
Product: gdb
Classification: Unclassified
Component: tdep (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Andrew Burgess
URL:
Keywords:
: 22925 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-10-30 16:11 UTC by Jan Matyas
Modified: 2022-11-29 19:38 UTC (History)
8 users (show)

See Also:
Host:
Target: riscv
Build:
Last reconfirmed: 2021-01-08 00:00:00


Attachments
All information needed to reproduce the issue (9.10 KB, application/gzip)
2020-10-30 16:11 UTC, Jan Matyas
Details
Logs from Spike, GDB and OpenOCD (3.78 KB, text/x-log)
2020-11-08 11:13 UTC, Jan Matyas
Details
20210116-101308-spike32_2-ProgramSwWatchpoint (4.12 KB, text/plain)
2021-01-16 11:03 UTC, Jan Matyas
Details
20210116-101128-spike32-CheckMisa (3.17 KB, text/x-log)
2021-01-16 11:04 UTC, Jan Matyas
Details
spike32-CheckMisa_with_debug (4.22 KB, text/plain)
2021-01-17 22:37 UTC, Jan Matyas
Details
spike32_2-ProgramSwWatchpoint_with_debug (5.97 KB, text/plain)
2021-01-17 22:38 UTC, Jan Matyas
Details
attachment-3656629-0.html (2.09 KB, text/html)
2021-02-08 02:00 UTC, Simon Marchi
Details
attachment-3668536-0.html (219 bytes, text/html)
2021-02-08 02:02 UTC, Simon Marchi
Details
Steps to reproduce (1.48 KB, text/plain)
2021-03-04 23:23 UTC, Jan Matyas
Details
gdb-26819-thread1-verbose.txt (7.14 KB, text/plain)
2021-03-04 23:24 UTC, Jan Matyas
Details
gdb-26819-thread2.txt (1.41 KB, text/plain)
2021-03-04 23:25 UTC, Jan Matyas
Details
gdb-26819-thread2-verbose.txt (9.88 KB, text/plain)
2021-03-04 23:25 UTC, Jan Matyas
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Matyas 2020-10-30 16:11:49 UTC
Created attachment 12935 [details]
All information needed to reproduce the issue

Dear GDB developers,

when debugging a remote RISC-V target, I have run into the following assertion:

../../binutils-gdb/gdb/infrun.c:5692: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.


Step-by-step instructions how to reproduce it are attached, and the scenario utilizes: 
- Spike RISC-V ISA simulator as the actual target
- OpenOCD as the gdb server

Background: The actual scenario has been encountered when running this automated RISC-V debugger test suite (https://github.com/riscv/riscv-tests/tree/master/debug) but has been isolated and instructions written up how to re-create the scenario manually via several simple steps (see the attached .txt file).

Attached is also information about the environment in which I encountered this problem.

If there are more useful details that I could provide, please do let me know.

Kind regards,
Jan
Comment 1 Andrew Burgess 2020-11-08 10:07:23 UTC
Jan,

Thank you for including such clear steps to reproduce, they were very helpful.

I tried following the steps but could not reproduce this issue, however, thanks to your great instructions I was able to see where I diverge from your behaviour.

When you start GDB, you report this:

 (gdb) info threads
   Id   Target Id                     Frame 
 * 1    Thread 1 (Name: Hart 0, RV64) 0x00000012123408b0 in stack_bottom ()
   2    Thread 2 (Name: Hart 1, RV64) 0x000000121234089c in stack_bottom ()
 (gdb) thread 2

However, when I start GDB, I see this:

 (gdb) info threads
   Id   Target Id                     Frame 
 * 1    Thread 1 (Name: Hart 0, RV64) 0x0000001212340840 in stack_bottom ()

I did try following the rest of your steps using thread 1, but the bug didn't reproduce.

The fact that I only see a single thread doesn't seem too surprising given that in my openocd startup I see this:

 Warn : `-rtos riscv` is deprecated and will be removed at the end of 2020! Please
 Warn : change your configuration to use `-rtos hwthread` instead. To do that, you
 Warn : will have to explicitly list every hart in the system as a separate target. See
 Warn : https://github.com/riscv/riscv-tests/blob/ec6537fc4a527ca88be2f045e01c460e640ab9c5/debug/targets/SiFive/HiFiveUnleashed.cfg#L11
 Warn : for an example.

And:

 Info : Examined RISC-V core; found 1 harts
 Info :  hart 0: XLEN=64, misa=0x8000000000141129

Could you confirm the exact git SHA you are using for spike and openocd please.  Or any other suggestions for what might be different between our environments?
Comment 2 Jan Matyas 2020-11-08 11:10:28 UTC
Hello Andrew,

thank you for taking the time to look at this.

I have re-run the scenario in question against the very latest spike and openocd, and can confirm the assertion still gets triggered.

These are the revisions of Spike and RISC-V OpenOCD I have used. Both should be the latest "trunk" code as of now:

riscv-openocd:
1ba1b8784 (HEAD -> riscv, origin/riscv, origin/HEAD) Fix build outside of a git repo. (#551)

riscv-isa-sim:
641d7d0 (HEAD -> master, origin/master, origin/HEAD) Merge pull request #588 from kito-cheng/master

Riscv-gcc and riscv-gdb I have taken from this fresh build by Embecosm:
https://buildbot.embecosm.com/job/riscv32-gcc-ubuntu1804/25/

The OpenOCD deprecation warning about "-rtos riscv" shows in my setup, too, and I believe it is of no consequence for this issue.

However, our results differ in the number of harts discovered by OpenOCD:

Your output:
 Info : Examined RISC-V core; found 1 harts
 Info :  hart 0: XLEN=64, misa=0x8000000000141129

My results:
 Info : Examined RISC-V core; found 2 harts
 Info :  hart 0: currently disabled
 Info :  hart 1: XLEN=64, misa=0x8000000000141129

Could it be that your spike is configured differently - with just a single hart?

For reference, I am attaching "20201108-102839-spike64_2-PrivChange.log" which shows logs from GDB, OpenOCD and Spike. Note that the exact GDB command sequence may be slightly different from the original scenario, but showing the same issue.

I'd be happy to provide further data on the issue, if needed.

Regards,
Jan
Comment 3 Jan Matyas 2020-11-08 11:13:11 UTC
Created attachment 12946 [details]
Logs from Spike, GDB and OpenOCD
Comment 4 Andrew Burgess 2020-11-09 09:51:52 UTC
Jan,

Thanks.  By using the command lines in the new logs I was able to reproduce this bug.  I can see what's going on and am working on a fix.
Comment 5 Jan Matyas 2021-01-06 06:37:03 UTC
Hi Andrew
Have you please had a chance to figure a fix for this assertion? Is there anything I can help with, e.g. re-running the scenario with a patched GDB?

Thank you. 

Wishing a good year 2021,
Jan
Comment 6 Andrew Burgess 2021-01-06 09:35:53 UTC
There's a patch posted here:
  https://sourceware.org/pipermail/gdb-patches/2020-December/174277.html

which should resolve this issue.  Do feel free to apply this to current HEAD of GDB and test, your feedback would be most welcome.
Comment 7 Jan Matyas 2021-01-08 11:03:55 UTC
Andrew,
Thank you for pointing me to the patch. I will be able to re-test it within approx. 4 days and will be happy to share the results here.
Comment 8 Simon Marchi 2021-01-08 16:09:30 UTC
(In reply to Jan Matyas from comment #7)
> Andrew,
> Thank you for pointing me to the patch. I will be able to re-test it within
> approx. 4 days and will be happy to share the results here.

I posted an updated version of Andrew's patch here:

https://sourceware.org/pipermail/gdb-patches/2021-January/174786.html

Can you please use this for your testing?
Comment 9 Sourceware Commits 2021-01-14 01:27:45 UTC
The master branch has been updated by Simon Marchi <simark@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=8f66807b98f7634c43149ea62e454ea8f877691d

commit 8f66807b98f7634c43149ea62e454ea8f877691d
Author: Andrew Burgess <andrew.burgess@embecosm.com>
Date:   Wed Jan 13 20:26:58 2021 -0500

    gdb: better handling of 'S' packets
    
    This commit builds on work started in the following two commits:
    
      commit 24ed6739b699f329c2c45aedee5f8c7d2f54e493
      Date:   Thu Jan 30 14:35:40 2020 +0000
    
          gdb/remote: Restore support for 'S' stop reply packet
    
      commit cada5fc921e39a1945c422eea055c8b326d8d353
      Date:   Wed Mar 11 12:30:13 2020 +0000
    
          gdb: Handle W and X remote packets without giving a warning
    
    This is related to how GDB handles remote targets that send back 'S'
    packets.
    
    In the first of the above commits we fixed GDB's ability to handle a
    single process, single threaded target that sends back 'S' packets.
    Although the 'T' packet would always be preferred to 'S' these days,
    there's nothing really wrong with 'S' for this situation.
    
    The second commit above fixed an oversight in the first commit, a
    single-process, multi-threaded target can send back a process wide
    event, for example the process exited event 'W' without including a
    process-id, this also is fine as there is no ambiguity in this case.
    
    In PR gdb/26819 we run into yet another problem with the above
    commits.  In this case we have a single process with two threads, GDB
    hits a breakpoint in thread 2 and then performs a stepi:
    
      (gdb) b main
      Breakpoint 1 at 0x1212340830: file infinite_loop.S, line 10.
      (gdb) c
      Continuing.
    
      Thread 2 hit Breakpoint 1, main () at infinite_loop.S:10
      10    in infinite_loop.S
      (gdb) set debug remote 1
      (gdb) stepi
      Sending packet: $vCont;s:2#24...Packet received: S05
      ../binutils-gdb/gdb/infrun.c:5807: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed.
    
    What happens in this case is that on the RISC-V target displaced
    stepping is not supported, so when the stepi is issued GDB steps just
    thread 2.  As only a single thread was set running the target decides
    that is can get away with sending back an 'S' packet without a
    thread-id.  GDB then associates the stop with thread 1 (the first
    non-exited thread), but as thread 1 was not previously set executing
    the assertion seen above triggers.
    
    As an aside I am surprised that the target sends pack 'S' in this
    situation.  The target is happy to send back 'T' (including thread-id)
    when multiple threads are set running, so (to me) it would seem easier
    to just always use the 'T' packet when multiple threads are in use.
    However, the target only uses 'T' when multiple threads are actually
    executing, otherwise an 'S' packet it used.
    
    Still, when looking at the above situation we can see that GDB should
    be able to understand which thread the 'S' reply is referring too.
    
    The problem is that is that in commit 24ed6739b699 (above) when a stop
    reply comes in with no thread-id we look for the first non-exited
    thread and select that as the thread the stop applies too.
    
    What we should really do is select the first non-exited, resumed thread,
    and associate the stop event with this thread.  In the above example
    both thread 1 and 2 are non-exited, but only thread 2 is resumed, so
    this is what we should use.
    
    There's a test for this issue included which works with stock
    gdbserver by disabling use of the 'T' packet, and enabling
    'scheduler-locking' within GDB so only one thread is set running.
    
    gdb/ChangeLog:
    
            PR gdb/26819
            * remote.c
            (remote_target::select_thread_for_ambiguous_stop_reply): New
            member function.
            (remote_target::process_stop_reply): Call
            select_thread_for_ambiguous_stop_reply.
    
    gdb/testsuite/ChangeLog:
    
            PR gdb/26819
            * gdb.server/stop-reply-no-thread-multi.c: New file.
            * gdb.server/stop-reply-no-thread-multi.exp: New file.
    
    Change-Id: I9b49d76c2a99063dcc76203fa0f5270a72825d15
Comment 10 Simon Marchi 2021-01-14 01:29:43 UTC
Andrew, can we consider this bug fixed, or did the commit only address part of this issue?
Comment 11 Andrew Burgess 2021-01-14 09:11:32 UTC
Yes, this issue should be fixed now.  

Jan, you should be able to retest now using current HEAD of master.  If you find the issue is still present then feel free to reopen this bug.
Comment 12 Jan Matyas 2021-01-14 09:41:21 UTC
Hi Andrew and all, 

Thank you for all your work towards fixing this, greatly appreciated. 

I haven't yet managed to re-test the original scenarios. But it is on my TODO list and I will be happy to inform you about the outcome here, once I find a moment to do so.

Best regards,
Jan
Comment 13 Jan Matyas 2021-01-16 11:00:46 UTC
Hello,

I have finally managed to re-test the updated code. I am sorry to report back that the issue appears not yet fully resolved. Re-running the GDB test scenarios in which this issue was originally encountered (that is: https://github.com/riscv/riscv-tests/tree/master/debug) now triggers a different assert. 


(1)
In many of the above test scenarios, it is the following assert that gets triggered immediately after connecting GDB to the remote server, i.e. OpenOCD. 
OpenOCD is at that point already connected to the underlying target, which is the Spike RISC-V ISA simulator:

(gdb) target extended-remote localhost:44553
Remote debugging using localhost:44553
warning: No executable has been specified and target does not support
determining executable automatically.  Try using the "file" command.
../../../binutils-gdb/gdb/remote.c:7820: internal-error: ptid_t remote_target::select_thread_for_ambiguous_stop_reply(const target_waitstatus*): Assertion `first_resumed_thread != nullptr' failed.

A full log from one such test run is attached - 20210116-101128-spike32-CheckMisa.log. It shows the steps to reproduce the issue - most importantly the cmdline arguments to start Spike and OpenOCD.


(2)
There was one test run in which the original assert was encountered:

../../../binutils-gdb/gdb/infrun.c:5614: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed.

It is recorded in file: 20210116-101308-spike32_2-ProgramSwWatchpoint.log



The revisions of Spike ISA sim, OpenOCD and gdb-binutils that I used were in all cases the latest HEADs as of now:

- binutils-gdb: 68b007788a (Fri Jan 15 18:07:07 2021 -0800)
- Spike: 35d50bc (Thu Jan 14 09:43:04 2021 -0800)
- OpenOCD: ed37337b0


Please let me know if any further information is needed for reproduction or if you'd like me to re-test any other scenario.

Best regards,
Jan
Comment 14 Jan Matyas 2021-01-16 11:03:18 UTC
Created attachment 13122 [details]
20210116-101308-spike32_2-ProgramSwWatchpoint
Comment 15 Jan Matyas 2021-01-16 11:04:11 UTC
Created attachment 13123 [details]
20210116-101128-spike32-CheckMisa
Comment 16 Simon Marchi 2021-01-17 03:36:37 UTC
Ok, it hits an assertion added in the fix.

Having the logs you attached, but with "set debug remote 1" enabled _before_ you connect, would be useful.
Comment 17 Jan Matyas 2021-01-17 22:37:37 UTC
Created attachment 13127 [details]
spike32-CheckMisa_with_debug
Comment 18 Jan Matyas 2021-01-17 22:38:13 UTC
Created attachment 13128 [details]
spike32_2-ProgramSwWatchpoint_with_debug
Comment 19 Jan Matyas 2021-01-17 22:39:53 UTC
I have attached the logs with "set debug remote 1".

Regards, 
Jan
Comment 20 Simon Marchi 2021-01-18 05:15:24 UTC
I tried connecting to simavr (since I've used that in the past and it's really easy) and hit the same assert:

I think the reason is that the target reports no threads at all.  So the sole implicit thread is added by "add_current_inferior_and_thread" path in "remote_target::start_remote".

With GDBserver, even when disabling fetching threads with qXfer:threads:read, GDB still fetches them with qfThreadInfo.  So the thread is added by remote_notice_new_inferior called by update_thread_list.  In there, the thread's remote resume state is set to "resumed", and them becomes "not resumed" when we process the stop reply.

When the remote stub doesn't return threads with qxfer/qfThreadInfo/qL, GDB adds one using add_current_inferior_and_thread, because we need one thread object in GDB after all.  In that path, the thread's remote resume state is left to the initial "not resumed" value.  So it is not considered when we look for a thread for the ambiguous stop reply.

The patch below should fix it.

Note that we can extend the gdb.server/stop-reply-no-thread.exp test case to cover this, by disabling some more gdbserver features:

../gdbserver/gdbserver  --disable-packet=T --disable-packet=qfThreadInfo --once :1234 /bin/ls
../gdbserver/gdbserver  --disable-packet=threads --once :1234 /bin/ls

With both of these, this hits the assert:

./gdb -nx --data-directory=data-directory /bin/ls -ex "set remote multiprocess-feature-packet off" -ex "set remote threads-packet off" -ex "tar rem :1234"


From 8576dcab919ba839c97a2cd391ad1f4d6e6a6e2f Mon Sep 17 00:00:00 2001
From: Simon Marchi <simon.marchi@polymtl.ca>
Date: Sun, 17 Jan 2021 21:20:16 -0500
Subject: [PATCH] Fix for 26819

Change-Id: I32f1bd02256db1cdd5ca7c882bd7fae43f1d4000
---
 gdb/remote.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index 28391fe957b9..1e1eae3956e3 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -741,7 +741,7 @@ class remote_target : public process_stratum_target
   int remote_resume_with_vcont (ptid_t ptid, int step,
 				gdb_signal siggnal);
 
-  void add_current_inferior_and_thread (char *wait_status);
+  thread_info *add_current_inferior_and_thread (char *wait_status);
 
   ptid_t wait_ns (ptid_t ptid, struct target_waitstatus *status,
 		  target_wait_flags options);
@@ -4412,7 +4412,7 @@ remote_target::get_current_thread (char *wait_status)
    qC query, we infer the current thread from that stop reply, passed
    in in WAIT_STATUS, which may be NULL.  */
 
-void
+thread_info *
 remote_target::add_current_inferior_and_thread (char *wait_status)
 {
   struct remote_state *rs = get_remote_state ();
@@ -4446,6 +4446,8 @@ remote_target::add_current_inferior_and_thread (char *wait_status)
      yet.  */
   thread_info *tp = add_thread_silent (this, curr_ptid);
   switch_to_thread_no_regs (tp);
+
+  return tp;
 }
 
 /* Print info about a thread that was found already stopped on
@@ -4800,7 +4802,8 @@ remote_target::start_remote (int from_tty, int extended_p)
 	  /* Target has no concept of threads at all.  GDB treats
 	     non-threaded target as single-threaded; add a main
 	     thread.  */
-	  add_current_inferior_and_thread (wait_status);
+	  thread_info *tp = add_current_inferior_and_thread (wait_status);
+	  get_remote_thread_info (tp)->set_resumed ();
 	}
       else
 	{
-- 
2.29.2
Comment 21 Jan Matyas 2021-01-18 11:01:18 UTC
Without further exploration of the patch, I have used it and rebuilt GDB. Then I have re-run the RISC-V debug test suite (mentioned earlier).

The patch works fine - I am no longer getting any assertions.

Thank you, 
Jan
Comment 22 Sebastian Huber 2021-01-18 15:23:19 UTC
(In reply to Simon Marchi from comment #20)
[...]
> From 8576dcab919ba839c97a2cd391ad1f4d6e6a6e2f Mon Sep 17 00:00:00 2001
> From: Simon Marchi <simon.marchi@polymtl.ca>
> Date: Sun, 17 Jan 2021 21:20:16 -0500
> Subject: [PATCH] Fix for 26819
> 
> Change-Id: I32f1bd02256db1cdd5ca7c882bd7fae43f1d4000
[...]
This patch fixes also the issue while connecting to a nios2-gdb-server:

../../sourceware-mirror-binutils-gdb-edf0f28/gdb/remote.c:7820: internal-error: ptid_t remote_target::select_thread_for_ambiguous_stop_reply(const target_waitstatus*): Assertion `first_resumed_thread != nullptr' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Comment 23 Jan Matyas 2021-01-27 12:35:54 UTC
Hello,

I would like to ask about the status of the latest patch for this issue. As a newcomer, I am not certain what process is followed to have the patch merged. Is there any way I can help the process?

Thank you, regards
Jan
Comment 24 Simon Marchi 2021-01-27 16:00:23 UTC
(In reply to Jan Matyas from comment #23)
> Hello,
> 
> I would like to ask about the status of the latest patch for this issue. As
> a newcomer, I am not certain what process is followed to have the patch
> merged. Is there any way I can help the process?
> 
> Thank you, regards
> Jan

You can if you'd like.  The steps missing are:

- Write a good commit message that explains why things fail currently and why the patch fixes them (including a ChangeLog entry)
- Enhance the test (gdb.server/stop-reply-no-thread.exp I think) so it covers the two scenarios I mentioned ("--disable-packet=T --disable-packet=qfThreadInfo" and "--disable-packet=threads")
- Send the patch to the mailing list for review.
Comment 25 Jan Matyas 2021-02-07 21:20:24 UTC
(In reply to Simon Marchi from comment #24)
> [...]  The steps missing are:
> 
> - Write a good commit message that explains why things fail currently and
> why the patch fixes them (including a ChangeLog entry)
> - Enhance the test (gdb.server/stop-reply-no-thread.exp I think) so it
> covers the two scenarios I mentioned ("--disable-packet=T
> --disable-packet=qfThreadInfo" and "--disable-packet=threads")
> - Send the patch to the mailing list for review.

Hi Simon,
I have rebased your patch from Jan-17 and enhanced the test case as suggested. Please, check if that is all right.

As my knowledge of gdb code is low, I'd appreciate if you could write the commit message as the author of the patch (and submit it to the mailing list).

Thank you, regards,
Jan



From c99eb5f06a0c25129875ed5addcc2cf2e7a9ab39 Mon Sep 17 00:00:00 2001
From: Jan Matyas <jmatyas@codasip.com>
Date: Sun, 7 Feb 2021 22:04:50 +0100
Subject: [PATCH] Patch for gdb/26819

TODO: Proper commit message

- Rebased Simon Marchi's patch from Jan/17.
- Extended the gdb.server/stop-reply-no-thread.exp test case
  to cover the situation when the gdbserver does not report
  any threads.
---
 gdb/remote.c                                      | 9 ++++++---
 gdb/testsuite/gdb.server/stop-reply-no-thread.exp | 9 +++++++++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index 31c6e17a1c..4cda15470e 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -741,7 +741,7 @@ class remote_target : public process_stratum_target
   int remote_resume_with_vcont (ptid_t ptid, int step,
 				gdb_signal siggnal);
 
-  void add_current_inferior_and_thread (const char *wait_status);
+  thread_info *add_current_inferior_and_thread (const char *wait_status);
 
   ptid_t wait_ns (ptid_t ptid, struct target_waitstatus *status,
 		  target_wait_flags options);
@@ -4411,7 +4411,7 @@ remote_target::get_current_thread (const char *wait_status)
    qC query, we infer the current thread from that stop reply, passed
    in in WAIT_STATUS, which may be NULL.  */
 
-void
+thread_info *
 remote_target::add_current_inferior_and_thread (const char *wait_status)
 {
   struct remote_state *rs = get_remote_state ();
@@ -4445,6 +4445,8 @@ remote_target::add_current_inferior_and_thread (const char *wait_status)
      yet.  */
   thread_info *tp = add_thread_silent (this, curr_ptid);
   switch_to_thread_no_regs (tp);
+
+  return tp;
 }
 
 /* Print info about a thread that was found already stopped on
@@ -4800,7 +4802,8 @@ remote_target::start_remote (int from_tty, int extended_p)
 	  /* Target has no concept of threads at all.  GDB treats
 	     non-threaded target as single-threaded; add a main
 	     thread.  */
-	  add_current_inferior_and_thread (wait_status);
+	  thread_info *tp = add_current_inferior_and_thread (wait_status);
+	  get_remote_thread_info (tp)->set_resumed ();
 	}
       else
 	{
diff --git a/gdb/testsuite/gdb.server/stop-reply-no-thread.exp b/gdb/testsuite/gdb.server/stop-reply-no-thread.exp
index 68bf42ac1a..e698eaff6e 100644
--- a/gdb/testsuite/gdb.server/stop-reply-no-thread.exp
+++ b/gdb/testsuite/gdb.server/stop-reply-no-thread.exp
@@ -112,3 +112,12 @@ foreach_with_prefix to_disable { "" Tthread T } {
 	run_test $to_disable $t_nonstop
     }
 }
+
+# Regression for gdb/26819: Cover the case when GDBserver does not report
+# any threads. Non-stop Mode is off.
+#
+# Scenario 1: Disable 'T' and 'qfThreadInfo' packets
+# Scenario 2: Disable all threading packets
+foreach_with_prefix to_disable { "T,qfThreadInfo" "threads" } {
+    run_test $to_disable off
+}
-- 
2.20.1
Comment 26 Simon Marchi 2021-02-08 02:00:09 UTC
Created attachment 13208 [details]
attachment-3656629-0.html

Hi Jan,

(off list) 

I just went into paternity leave, so I will probably won't be able to take a look before a few weeks.  If you want to submit the patch for merging and get a review, please send it to the gdb-patches list as described here.

https://sourceware.org/gdb/wiki/ContributionChecklist

Thanks,

Simon 

On February 7, 2021 4:20:24 PM EST, jmatyas at codasip dot com <sourceware-bugzilla@sourceware.org> wrote:
>https://sourceware.org/bugzilla/show_bug.cgi?id=26819
>
>--- Comment #25 from Jan Matyas <jmatyas at codasip dot com> ---
>(In reply to Simon Marchi from comment #24)
>> [...]  The steps missing are:
>> 
>> - Write a good commit message that explains why things fail currently
>and
>> why the patch fixes them (including a ChangeLog entry)
>> - Enhance the test (gdb.server/stop-reply-no-thread.exp I think) so
>it
>> covers the two scenarios I mentioned ("--disable-packet=T
>> --disable-packet=qfThreadInfo" and "--disable-packet=threads")
>> - Send the patch to the mailing list for review.
>
>Hi Simon,
>I have rebased your patch from Jan-17 and enhanced the test case as
>suggested.
>Please, check if that is all right.
>
>As my knowledge of gdb code is low, I'd appreciate if you could write
>the
>commit message as the author of the patch (and submit it to the mailing
>list).
>
>Thank you, regards,
>Jan
>
>
>
>From c99eb5f06a0c25129875ed5addcc2cf2e7a9ab39 Mon Sep 17 00:00:00 2001
>From: Jan Matyas <jmatyas@codasip.com>
>Date: Sun, 7 Feb 2021 22:04:50 +0100
>Subject: [PATCH] Patch for gdb/26819
>
>TODO: Proper commit message
>
>- Rebased Simon Marchi's patch from Jan/17.
>- Extended the gdb.server/stop-reply-no-thread.exp test case
>  to cover the situation when the gdbserver does not report
>  any threads.
>---
> gdb/remote.c                                      | 9 ++++++---
> gdb/testsuite/gdb.server/stop-reply-no-thread.exp | 9 +++++++++
> 2 files changed, 15 insertions(+), 3 deletions(-)
>
>diff --git a/gdb/remote.c b/gdb/remote.c
>index 31c6e17a1c..4cda15470e 100644
>--- a/gdb/remote.c
>+++ b/gdb/remote.c
>@@ -741,7 +741,7 @@ class remote_target : public process_stratum_target
>   int remote_resume_with_vcont (ptid_t ptid, int step,
>                                gdb_signal siggnal);
>
>-  void add_current_inferior_and_thread (const char *wait_status);
>+  thread_info *add_current_inferior_and_thread (const char
>*wait_status);
>
>   ptid_t wait_ns (ptid_t ptid, struct target_waitstatus *status,
>                  target_wait_flags options);
>@@ -4411,7 +4411,7 @@ remote_target::get_current_thread (const char
>*wait_status)
>    qC query, we infer the current thread from that stop reply, passed
>    in in WAIT_STATUS, which may be NULL.  */
>
>-void
>+thread_info *
>remote_target::add_current_inferior_and_thread (const char
>*wait_status)
> {
>   struct remote_state *rs = get_remote_state ();
>@@ -4445,6 +4445,8 @@ remote_target::add_current_inferior_and_thread
>(const
>char *wait_status)
>      yet.  */
>   thread_info *tp = add_thread_silent (this, curr_ptid);
>   switch_to_thread_no_regs (tp);
>+
>+  return tp;
> }
>
> /* Print info about a thread that was found already stopped on
>@@ -4800,7 +4802,8 @@ remote_target::start_remote (int from_tty, int
>extended_p)
>          /* Target has no concept of threads at all.  GDB treats
>             non-threaded target as single-threaded; add a main
>             thread.  */
>-         add_current_inferior_and_thread (wait_status);
>+         thread_info *tp = add_current_inferior_and_thread
>(wait_status);
>+         get_remote_thread_info (tp)->set_resumed ();
>        }
>       else
>        {
>diff --git a/gdb/testsuite/gdb.server/stop-reply-no-thread.exp
>b/gdb/testsuite/gdb.server/stop-reply-no-thread.exp
>index 68bf42ac1a..e698eaff6e 100644
>--- a/gdb/testsuite/gdb.server/stop-reply-no-thread.exp
>+++ b/gdb/testsuite/gdb.server/stop-reply-no-thread.exp
>@@ -112,3 +112,12 @@ foreach_with_prefix to_disable { "" Tthread T } {
>        run_test $to_disable $t_nonstop
>     }
> }
>+
>+# Regression for gdb/26819: Cover the case when GDBserver does not
>report
>+# any threads. Non-stop Mode is off.
>+#
>+# Scenario 1: Disable 'T' and 'qfThreadInfo' packets
>+# Scenario 2: Disable all threading packets
>+foreach_with_prefix to_disable { "T,qfThreadInfo" "threads" } {
>+    run_test $to_disable off
>+}
>-- 
>2.20.1
>
>-- 
>You are receiving this mail because:
>You are on the CC list for the bug.
Comment 27 Simon Marchi 2021-02-08 02:02:29 UTC
Created attachment 13209 [details]
attachment-3668536-0.html

Oh well apparently if you reply to the email it goes to Bugzilla even though it looked like Jan's email address. That's what I get for trying to answer on my phone. Anyhow, the info about sending to the list is still valid :)
Comment 28 Jan Matyas 2021-02-24 09:17:44 UTC
Gentlemen, 

I have assembled, tested and submitted the patch for review per your suggestions:

https://sourceware.org/pipermail/gdb-patches/2021-February/176542.html

If you could help by looking at the patch, that would be very appreciated.

Thank you, regards
Jan
Comment 29 Jan Matyas 2021-02-25 07:41:37 UTC
Hi Simon,
Thank you for pointing out that git-send-email should be used, I missed that part in the ContributionChecklist.

I have re-submitted the patch that way:
https://sourceware.org/pipermail/gdb-patches/2021-February/176569.html

Now I realize I missed the version 2 marker (v2) in the subject this time. If that should be added, shall I re-submit it one more time?

Jan
Comment 30 Simon Marchi 2021-02-25 20:50:46 UTC
This should be fixed now, commit 64d38fdd9956 ("Fix initial thread state of non-threaded remote targets").  Please reopen or open a new bug if you stumble on something else.  Thanks for preparing and submitting the patch on the list.
Comment 31 Jan Matyas 2021-03-04 23:23:37 UTC
Created attachment 13284 [details]
Steps to reproduce
Comment 32 Jan Matyas 2021-03-04 23:24:48 UTC
Created attachment 13285 [details]
gdb-26819-thread1-verbose.txt
Comment 33 Jan Matyas 2021-03-04 23:25:10 UTC
Created attachment 13286 [details]
gdb-26819-thread2.txt
Comment 34 Jan Matyas 2021-03-04 23:25:52 UTC
Created attachment 13287 [details]
gdb-26819-thread2-verbose.txt
Comment 35 Simon Marchi 2021-03-04 23:27:52 UTC
Did you find another issue related to this that isn't fixed?
Comment 36 Jan Matyas 2021-03-04 23:46:33 UTC
Hello,

I am sorry to re-open this, but I have come across one more scenario where the initial assertion gets triggered:

../../gdb/infrun.c:5672: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed.

The assertion appears when software watchpoints are used on a riscv32 bare metal extended-remote target with two threads.

The target is represented by Spike RISC-V ISA simulator and the remote server is OpenOCD, i.e. the same as already discussed in this ticket.

The test scenario is a simple program in which a SW watchpoint on "counter == 5" is used:

int main()
{
    int sum = 0;
    int counter = 0;
    while (counter < 10) {
        counter = counter + 1;
        sum = sum + counter;
    }
    return counter;
}

Then "c" (continue) command is issued three times. The assertion is triggered after the third "c", which I guess is somehow related to the watchpoint expression going out of scope.

The assertion only appears on software watchpoints, and when the second thread is the selected one ("thread 2").

I am attaching the logs:
- gdb-26819-thread2.txt - brief log, showing the sequence of the commands
- gdb-26819-thread2-verbose.txt - verbose log with debug prints for "infrun" and "remote

When "thread 1" is selected, the assertion does not appear. However the log shows that in such case "thread 2" is not being single-stepped after the target gets resumed ("c"). Only thread 1 is stepped. Could this be another issue or is it expected behavior?

The log for this second case is in: gdb-26819-thread1-verbose.txt

The steps to reproduce (building OpenOCD, Spike, compiling the demo program ...) are also attached and should be hopefully easy to follow.

I have tried this SW watchpoint scenario on a native machine (x86_64) with gdbserver, where it would be easier to reproduce, but I couldn't get it to fail.

Please let me know if I can provide any further info.

Regards,
Jan
Comment 37 Jan Matyas 2021-03-04 23:49:09 UTC
Reopening, per the previous comment.
Comment 38 Andrew Burgess 2021-06-03 20:41:30 UTC
I took a look at this issue and I can reproduce the failure.

First, an interesting aside: I originally tried to reproduce this issue on a GDB built with --enable-targets=all, and couldn't reproduce the failure.  But, when I built a GDB with --target=riscv-elf I was able to reproduce the issue just fine.

The problem turns out to be that by default the all-targets GDB would default to osabi GNU/Linux, while the riscv-elf GDB does not include Linux support, so defaults to osabi 'none'.  If I use the all-targets GDB and explicitly do 'set osabi none' then I can reproduce the failure with all-targets GDB.

The difference is that GNU/Linux RISC-V doesn't support single stepping, so, when using that osabi we will make use of software single-step.  The bare-metal 'none' osabi assumes that single step support is available.

So, why is this test failing when using single stepping?

It is my belief that the problem here is a bug in the multi-core support of openocd.

My first clue is the following sequence of vCont and stop replies sent between GDB and openocd:

  Sending packet: $vCont?#49
  Packet received: vCont;c;C;s;S
  Sending packet: $vCont;c#a8
  Packet received: T05thread:2;
  Sending packet: $vCont;s:2#24
  Packet received: T05
  Sending packet: $vCont;s:2;c#c2
  Packet received: T05
  Sending packet: $vCont;s:1;c#c1
  Packet received: T05thread:2;

Notice that after the first 'vCont;c' we get back 'T05thread:2' clearly indicating which thread stopped.

Next GDB sends 'vCont;s:2', so steps only the single thread '2', now we get back 'T05'.  This is annoying (no thread-id), but the original patches for this issue addressed this, as only one thread was set running GDB correctly "guesses" the thread and carries on.

After that GDB sends 'vCont;s:2;c', now we're single stepping thread 2, but allowing thread 1 to continue.  Again the reply comes back 'T05'.  This time GDB guesses thread 1 as the stopped thread, and things start to go off the rails.

Notice however, that in the next packet GDB sends 'vCont;s:1;c', which is kind of the reverse, step thread 1, continue thread 2, but now we get a reply 'T05thread:2' which includes a thread-id.  Weird!

The summary of the above then is that sometimes openocd does not return a thread-id even when multiple threads are running.

I looked a little into openocd, specifically I looked at the function gdb_signal_reply in server/gdb_server.c.  This function is passed a 'struct target *target'.  Whether we send a thread-id back or not depends on whether the 'target->rtos' field is set or not.

Now, if I debug this function I see two different target pointers passed in at different times, one target represents "riscv.cpu0", this target has its rtos field set, the other target represents "riscv.cpu1", this target does not have its rtos field set.

Now, I don't know the openocd internals, but what seems to happen is that sometimes the target stops, and gdb_signal_reply is called with "riscv.cpu0" target, but because target->rtos is set the stop is reported against target->rtos->current_thread, which can be thread 2.

Then, sometimes, the target stops with "riscv.cpu1", in this case openocd just makes no attempt to add a thread-id as target->rtos is NULL.

So, where does this leave us?

The above explains why GDB starts getting confused, but doesn't fully explain why we eventually hit the assert.  I'm still looking into the details of that, but wanted to record what I knew so far.

GDB _could_ possibly be slightly smarter when it guesses, as in if we send 'vCont;s:2;c', then maybe we should guess that thread-2 is the likely thread to have stopped, as most of the time thread-2 single stepping will complete before thread-1 hits anything interesting.... most of the time.

But this really feels like working around a broken target.
Comment 39 Andrew Burgess 2021-06-03 21:19:02 UTC
At the point where we run into the assertion the vCont/stop-reply packets look 
like:

  ...
  Sending packet: $vCont;s:2;c#c2
  Packet received: T05
  Sending packet: $vCont;s:1;c#c1
  Packet received: T05thread:2;
  Sending packet: $vCont;s:2;c#c2
  Packet received: T05
  Sending packet: $vCont;s:1;c#c1
  Packet received: T05thread:2;
  Sending packet: $vCont;s:2;c#c2
  Packet received: T05
  Sending packet: $vCont;s:1#23
  Packet received: T05thread:2

Notice all of the 'vCont;s:2;c' packets get a 'T05' reply.  In each of these cases it is almost certain that thread-2 was the one that stopped (after the single step), but in each case GDB will have guessed thread-1.

But things really go wrong when with the last packet we send 'vCont;s:1', but get back the reply 'T05thread:2' - a thread that shouldn't be running reports a stop!
Comment 40 Andrew Burgess 2021-06-03 21:38:45 UTC
Final comment for the night.  I now have this test passing reliably, I changed the file spike-2-hwthread.cfg (this is one of the files that the steps to reproduce document has us download), this is the openocd config file.  Within this file are two lines like this:

  target create $_TARGETNAME_0 riscv -chain-position $_CHIPNAME.cpu -rtos hwthread
  target create $_TARGETNAME_1 riscv -chain-position $_CHIPNAME.cpu -coreid 1

I changed these lines to be this:

  target create $_TARGETNAME_0 riscv -chain-position $_CHIPNAME.cpu -rtos hwthread -coreid 0
  target create $_TARGETNAME_1 riscv -chain-position $_CHIPNAME.cpu -rtos hwthread -coreid 1

And now the test passes just fine.

As I've said in previous comments, I'm no openocd expert, I added the extra '-rtos hwthread' for the second core as it is the presence of this rtos setting that is causing openocd to send back the thread-id.  Maybe there's more going on here and adding this flag is bad for some reason?  I added the '-coreid 0' to the first core just for symmetry, I suspect 0 is the default.

Unless there's new evidence that comes up, I'm pretty convinced that the issue here was either openocd has a bug, or openocd was misconfigured, however,  I think there is an open issue for whether GDB could/should have handled this situation better?

I think that the packets coming from the remote should be considered user input, and as such, we should not rely on them being "correct".  As such, when invalid packets arrive GDB should be throwing an error, not quitting with an assertion failure.

I'm not really sure where we should be catching this particular error, or what we should do.  I doubt we can reasonably expect to recover, I think the target has shown itself to be unreliable in this case (reporting a stop against a thread that should not be running)...
Comment 41 Joel Brobecker 2021-06-06 14:28:00 UTC
Thanks very much for the analysis, Andrew.

I agree with you that OpenOCD seems to be behavior very strangely (to say the least). It would be nice indeed to improve the situation, but I agree again that it's not clear where and under what condition we could decide to throw the towel. What I'm concerned about is users starting to get some errors in borderline scenarios that have been working OK thanks to the current heuristics. If we change that, we could negatively affect them. Maybe raise an error when we're getting a stop from a thread that wasn't supposed to run at all, like you highlighted?

In any case, I agree also that this is not blocking for the release anymore, so I'm removing the 11.1 target milestone.
Comment 42 Jan Matyas 2021-06-07 05:28:44 UTC
(In reply to Joel Brobecker from comment #41)
> Thanks very much for the analysis, Andrew.
> 
> I agree with you that OpenOCD seems to be behavior very strangely (to say
> the least). 

Gentlemen, thank you for looking at this. If the finger is pointing at OpenOCD, I'd be happy to take a closer look at that and eventually provide an OpenOCD fix, if possible.
Comment 43 Andrew Burgess 2021-06-07 08:30:47 UTC
Jan,

My guess for what is happening is this (and it is just a guess):

  - GDB sets all threads running, maybe does a step for one thread 
    and continue for all the others, or maybe it really does continue
    all threads, I doubt it matters.

  - One thread (lets call it thread 1) stops for a legitimate 
    reason (e.g. breakpoint).

  - As we are in all stop mode, openocd then tries to stop the remaining
    threads, but

  - Before the other thread(s) can be stopped one of them (lets call this 
    thread 2) stops for some other legitimate reason (e.g. breakpoint).

  - OpenOCD reports the first stop (from thread 1) to GDB,

  - GDB decides to single step forward just that one thread (thread 1),
    and so sends a vCont;s:2 to the remote, but

  - OpenOCD spots that it has a pending stop for thread 2, and so sends
    that stop back to GDB.

It also seems weird that OpenOCD can have multiple threads in existence, but gates its use of sending back the 'thread-id' in a stop packet on whether the thread has rtos data attached.  I don't know OpenOCD internals, but it feels like, if OpenOCD has multiple target objects in existence then it should always be sending back a thread-id.  To be honest, it's really not going to hurt if OpenOCD _always_ sent back a thread-id, even if there is only 1 thread - much better that than the current situation where sometimes there is no thread-id at all.

Joel,

I started looking at whether we can spot an invalid stop packet coming in, I think I have a plan, I'll post something to the list once I have a patch.
Comment 44 Lennor 2021-08-06 01:41:10 UTC Comment hidden (spam)
Comment 45 Tom Tromey 2022-11-29 19:38:19 UTC
*** Bug 22925 has been marked as a duplicate of this bug. ***