Created attachment 12935 [details] All information needed to reproduce the issue Dear GDB developers, when debugging a remote RISC-V target, I have run into the following assertion: ../../binutils-gdb/gdb/infrun.c:5692: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Step-by-step instructions how to reproduce it are attached, and the scenario utilizes: - Spike RISC-V ISA simulator as the actual target - OpenOCD as the gdb server Background: The actual scenario has been encountered when running this automated RISC-V debugger test suite (https://github.com/riscv/riscv-tests/tree/master/debug) but has been isolated and instructions written up how to re-create the scenario manually via several simple steps (see the attached .txt file). Attached is also information about the environment in which I encountered this problem. If there are more useful details that I could provide, please do let me know. Kind regards, Jan
Jan, Thank you for including such clear steps to reproduce, they were very helpful. I tried following the steps but could not reproduce this issue, however, thanks to your great instructions I was able to see where I diverge from your behaviour. When you start GDB, you report this: (gdb) info threads Id Target Id Frame * 1 Thread 1 (Name: Hart 0, RV64) 0x00000012123408b0 in stack_bottom () 2 Thread 2 (Name: Hart 1, RV64) 0x000000121234089c in stack_bottom () (gdb) thread 2 However, when I start GDB, I see this: (gdb) info threads Id Target Id Frame * 1 Thread 1 (Name: Hart 0, RV64) 0x0000001212340840 in stack_bottom () I did try following the rest of your steps using thread 1, but the bug didn't reproduce. The fact that I only see a single thread doesn't seem too surprising given that in my openocd startup I see this: Warn : `-rtos riscv` is deprecated and will be removed at the end of 2020! Please Warn : change your configuration to use `-rtos hwthread` instead. To do that, you Warn : will have to explicitly list every hart in the system as a separate target. See Warn : https://github.com/riscv/riscv-tests/blob/ec6537fc4a527ca88be2f045e01c460e640ab9c5/debug/targets/SiFive/HiFiveUnleashed.cfg#L11 Warn : for an example. And: Info : Examined RISC-V core; found 1 harts Info : hart 0: XLEN=64, misa=0x8000000000141129 Could you confirm the exact git SHA you are using for spike and openocd please. Or any other suggestions for what might be different between our environments?
Hello Andrew, thank you for taking the time to look at this. I have re-run the scenario in question against the very latest spike and openocd, and can confirm the assertion still gets triggered. These are the revisions of Spike and RISC-V OpenOCD I have used. Both should be the latest "trunk" code as of now: riscv-openocd: 1ba1b8784 (HEAD -> riscv, origin/riscv, origin/HEAD) Fix build outside of a git repo. (#551) riscv-isa-sim: 641d7d0 (HEAD -> master, origin/master, origin/HEAD) Merge pull request #588 from kito-cheng/master Riscv-gcc and riscv-gdb I have taken from this fresh build by Embecosm: https://buildbot.embecosm.com/job/riscv32-gcc-ubuntu1804/25/ The OpenOCD deprecation warning about "-rtos riscv" shows in my setup, too, and I believe it is of no consequence for this issue. However, our results differ in the number of harts discovered by OpenOCD: Your output: Info : Examined RISC-V core; found 1 harts Info : hart 0: XLEN=64, misa=0x8000000000141129 My results: Info : Examined RISC-V core; found 2 harts Info : hart 0: currently disabled Info : hart 1: XLEN=64, misa=0x8000000000141129 Could it be that your spike is configured differently - with just a single hart? For reference, I am attaching "20201108-102839-spike64_2-PrivChange.log" which shows logs from GDB, OpenOCD and Spike. Note that the exact GDB command sequence may be slightly different from the original scenario, but showing the same issue. I'd be happy to provide further data on the issue, if needed. Regards, Jan
Created attachment 12946 [details] Logs from Spike, GDB and OpenOCD
Jan, Thanks. By using the command lines in the new logs I was able to reproduce this bug. I can see what's going on and am working on a fix.
Hi Andrew Have you please had a chance to figure a fix for this assertion? Is there anything I can help with, e.g. re-running the scenario with a patched GDB? Thank you. Wishing a good year 2021, Jan
There's a patch posted here: https://sourceware.org/pipermail/gdb-patches/2020-December/174277.html which should resolve this issue. Do feel free to apply this to current HEAD of GDB and test, your feedback would be most welcome.
Andrew, Thank you for pointing me to the patch. I will be able to re-test it within approx. 4 days and will be happy to share the results here.
(In reply to Jan Matyas from comment #7) > Andrew, > Thank you for pointing me to the patch. I will be able to re-test it within > approx. 4 days and will be happy to share the results here. I posted an updated version of Andrew's patch here: https://sourceware.org/pipermail/gdb-patches/2021-January/174786.html Can you please use this for your testing?
The master branch has been updated by Simon Marchi <simark@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=8f66807b98f7634c43149ea62e454ea8f877691d commit 8f66807b98f7634c43149ea62e454ea8f877691d Author: Andrew Burgess <andrew.burgess@embecosm.com> Date: Wed Jan 13 20:26:58 2021 -0500 gdb: better handling of 'S' packets This commit builds on work started in the following two commits: commit 24ed6739b699f329c2c45aedee5f8c7d2f54e493 Date: Thu Jan 30 14:35:40 2020 +0000 gdb/remote: Restore support for 'S' stop reply packet commit cada5fc921e39a1945c422eea055c8b326d8d353 Date: Wed Mar 11 12:30:13 2020 +0000 gdb: Handle W and X remote packets without giving a warning This is related to how GDB handles remote targets that send back 'S' packets. In the first of the above commits we fixed GDB's ability to handle a single process, single threaded target that sends back 'S' packets. Although the 'T' packet would always be preferred to 'S' these days, there's nothing really wrong with 'S' for this situation. The second commit above fixed an oversight in the first commit, a single-process, multi-threaded target can send back a process wide event, for example the process exited event 'W' without including a process-id, this also is fine as there is no ambiguity in this case. In PR gdb/26819 we run into yet another problem with the above commits. In this case we have a single process with two threads, GDB hits a breakpoint in thread 2 and then performs a stepi: (gdb) b main Breakpoint 1 at 0x1212340830: file infinite_loop.S, line 10. (gdb) c Continuing. Thread 2 hit Breakpoint 1, main () at infinite_loop.S:10 10 in infinite_loop.S (gdb) set debug remote 1 (gdb) stepi Sending packet: $vCont;s:2#24...Packet received: S05 ../binutils-gdb/gdb/infrun.c:5807: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed. What happens in this case is that on the RISC-V target displaced stepping is not supported, so when the stepi is issued GDB steps just thread 2. As only a single thread was set running the target decides that is can get away with sending back an 'S' packet without a thread-id. GDB then associates the stop with thread 1 (the first non-exited thread), but as thread 1 was not previously set executing the assertion seen above triggers. As an aside I am surprised that the target sends pack 'S' in this situation. The target is happy to send back 'T' (including thread-id) when multiple threads are set running, so (to me) it would seem easier to just always use the 'T' packet when multiple threads are in use. However, the target only uses 'T' when multiple threads are actually executing, otherwise an 'S' packet it used. Still, when looking at the above situation we can see that GDB should be able to understand which thread the 'S' reply is referring too. The problem is that is that in commit 24ed6739b699 (above) when a stop reply comes in with no thread-id we look for the first non-exited thread and select that as the thread the stop applies too. What we should really do is select the first non-exited, resumed thread, and associate the stop event with this thread. In the above example both thread 1 and 2 are non-exited, but only thread 2 is resumed, so this is what we should use. There's a test for this issue included which works with stock gdbserver by disabling use of the 'T' packet, and enabling 'scheduler-locking' within GDB so only one thread is set running. gdb/ChangeLog: PR gdb/26819 * remote.c (remote_target::select_thread_for_ambiguous_stop_reply): New member function. (remote_target::process_stop_reply): Call select_thread_for_ambiguous_stop_reply. gdb/testsuite/ChangeLog: PR gdb/26819 * gdb.server/stop-reply-no-thread-multi.c: New file. * gdb.server/stop-reply-no-thread-multi.exp: New file. Change-Id: I9b49d76c2a99063dcc76203fa0f5270a72825d15
Andrew, can we consider this bug fixed, or did the commit only address part of this issue?
Yes, this issue should be fixed now. Jan, you should be able to retest now using current HEAD of master. If you find the issue is still present then feel free to reopen this bug.
Hi Andrew and all, Thank you for all your work towards fixing this, greatly appreciated. I haven't yet managed to re-test the original scenarios. But it is on my TODO list and I will be happy to inform you about the outcome here, once I find a moment to do so. Best regards, Jan
Hello, I have finally managed to re-test the updated code. I am sorry to report back that the issue appears not yet fully resolved. Re-running the GDB test scenarios in which this issue was originally encountered (that is: https://github.com/riscv/riscv-tests/tree/master/debug) now triggers a different assert. (1) In many of the above test scenarios, it is the following assert that gets triggered immediately after connecting GDB to the remote server, i.e. OpenOCD. OpenOCD is at that point already connected to the underlying target, which is the Spike RISC-V ISA simulator: (gdb) target extended-remote localhost:44553 Remote debugging using localhost:44553 warning: No executable has been specified and target does not support determining executable automatically. Try using the "file" command. ../../../binutils-gdb/gdb/remote.c:7820: internal-error: ptid_t remote_target::select_thread_for_ambiguous_stop_reply(const target_waitstatus*): Assertion `first_resumed_thread != nullptr' failed. A full log from one such test run is attached - 20210116-101128-spike32-CheckMisa.log. It shows the steps to reproduce the issue - most importantly the cmdline arguments to start Spike and OpenOCD. (2) There was one test run in which the original assert was encountered: ../../../binutils-gdb/gdb/infrun.c:5614: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed. It is recorded in file: 20210116-101308-spike32_2-ProgramSwWatchpoint.log The revisions of Spike ISA sim, OpenOCD and gdb-binutils that I used were in all cases the latest HEADs as of now: - binutils-gdb: 68b007788a (Fri Jan 15 18:07:07 2021 -0800) - Spike: 35d50bc (Thu Jan 14 09:43:04 2021 -0800) - OpenOCD: ed37337b0 Please let me know if any further information is needed for reproduction or if you'd like me to re-test any other scenario. Best regards, Jan
Created attachment 13122 [details] 20210116-101308-spike32_2-ProgramSwWatchpoint
Created attachment 13123 [details] 20210116-101128-spike32-CheckMisa
Ok, it hits an assertion added in the fix. Having the logs you attached, but with "set debug remote 1" enabled _before_ you connect, would be useful.
Created attachment 13127 [details] spike32-CheckMisa_with_debug
Created attachment 13128 [details] spike32_2-ProgramSwWatchpoint_with_debug
I have attached the logs with "set debug remote 1". Regards, Jan
I tried connecting to simavr (since I've used that in the past and it's really easy) and hit the same assert: I think the reason is that the target reports no threads at all. So the sole implicit thread is added by "add_current_inferior_and_thread" path in "remote_target::start_remote". With GDBserver, even when disabling fetching threads with qXfer:threads:read, GDB still fetches them with qfThreadInfo. So the thread is added by remote_notice_new_inferior called by update_thread_list. In there, the thread's remote resume state is set to "resumed", and them becomes "not resumed" when we process the stop reply. When the remote stub doesn't return threads with qxfer/qfThreadInfo/qL, GDB adds one using add_current_inferior_and_thread, because we need one thread object in GDB after all. In that path, the thread's remote resume state is left to the initial "not resumed" value. So it is not considered when we look for a thread for the ambiguous stop reply. The patch below should fix it. Note that we can extend the gdb.server/stop-reply-no-thread.exp test case to cover this, by disabling some more gdbserver features: ../gdbserver/gdbserver --disable-packet=T --disable-packet=qfThreadInfo --once :1234 /bin/ls ../gdbserver/gdbserver --disable-packet=threads --once :1234 /bin/ls With both of these, this hits the assert: ./gdb -nx --data-directory=data-directory /bin/ls -ex "set remote multiprocess-feature-packet off" -ex "set remote threads-packet off" -ex "tar rem :1234" From 8576dcab919ba839c97a2cd391ad1f4d6e6a6e2f Mon Sep 17 00:00:00 2001 From: Simon Marchi <simon.marchi@polymtl.ca> Date: Sun, 17 Jan 2021 21:20:16 -0500 Subject: [PATCH] Fix for 26819 Change-Id: I32f1bd02256db1cdd5ca7c882bd7fae43f1d4000 --- gdb/remote.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/gdb/remote.c b/gdb/remote.c index 28391fe957b9..1e1eae3956e3 100644 --- a/gdb/remote.c +++ b/gdb/remote.c @@ -741,7 +741,7 @@ class remote_target : public process_stratum_target int remote_resume_with_vcont (ptid_t ptid, int step, gdb_signal siggnal); - void add_current_inferior_and_thread (char *wait_status); + thread_info *add_current_inferior_and_thread (char *wait_status); ptid_t wait_ns (ptid_t ptid, struct target_waitstatus *status, target_wait_flags options); @@ -4412,7 +4412,7 @@ remote_target::get_current_thread (char *wait_status) qC query, we infer the current thread from that stop reply, passed in in WAIT_STATUS, which may be NULL. */ -void +thread_info * remote_target::add_current_inferior_and_thread (char *wait_status) { struct remote_state *rs = get_remote_state (); @@ -4446,6 +4446,8 @@ remote_target::add_current_inferior_and_thread (char *wait_status) yet. */ thread_info *tp = add_thread_silent (this, curr_ptid); switch_to_thread_no_regs (tp); + + return tp; } /* Print info about a thread that was found already stopped on @@ -4800,7 +4802,8 @@ remote_target::start_remote (int from_tty, int extended_p) /* Target has no concept of threads at all. GDB treats non-threaded target as single-threaded; add a main thread. */ - add_current_inferior_and_thread (wait_status); + thread_info *tp = add_current_inferior_and_thread (wait_status); + get_remote_thread_info (tp)->set_resumed (); } else { -- 2.29.2
Without further exploration of the patch, I have used it and rebuilt GDB. Then I have re-run the RISC-V debug test suite (mentioned earlier). The patch works fine - I am no longer getting any assertions. Thank you, Jan
(In reply to Simon Marchi from comment #20) [...] > From 8576dcab919ba839c97a2cd391ad1f4d6e6a6e2f Mon Sep 17 00:00:00 2001 > From: Simon Marchi <simon.marchi@polymtl.ca> > Date: Sun, 17 Jan 2021 21:20:16 -0500 > Subject: [PATCH] Fix for 26819 > > Change-Id: I32f1bd02256db1cdd5ca7c882bd7fae43f1d4000 [...] This patch fixes also the issue while connecting to a nios2-gdb-server: ../../sourceware-mirror-binutils-gdb-edf0f28/gdb/remote.c:7820: internal-error: ptid_t remote_target::select_thread_for_ambiguous_stop_reply(const target_waitstatus*): Assertion `first_resumed_thread != nullptr' failed. A problem internal to GDB has been detected, further debugging may prove unreliable.
Hello, I would like to ask about the status of the latest patch for this issue. As a newcomer, I am not certain what process is followed to have the patch merged. Is there any way I can help the process? Thank you, regards Jan
(In reply to Jan Matyas from comment #23) > Hello, > > I would like to ask about the status of the latest patch for this issue. As > a newcomer, I am not certain what process is followed to have the patch > merged. Is there any way I can help the process? > > Thank you, regards > Jan You can if you'd like. The steps missing are: - Write a good commit message that explains why things fail currently and why the patch fixes them (including a ChangeLog entry) - Enhance the test (gdb.server/stop-reply-no-thread.exp I think) so it covers the two scenarios I mentioned ("--disable-packet=T --disable-packet=qfThreadInfo" and "--disable-packet=threads") - Send the patch to the mailing list for review.
(In reply to Simon Marchi from comment #24) > [...] The steps missing are: > > - Write a good commit message that explains why things fail currently and > why the patch fixes them (including a ChangeLog entry) > - Enhance the test (gdb.server/stop-reply-no-thread.exp I think) so it > covers the two scenarios I mentioned ("--disable-packet=T > --disable-packet=qfThreadInfo" and "--disable-packet=threads") > - Send the patch to the mailing list for review. Hi Simon, I have rebased your patch from Jan-17 and enhanced the test case as suggested. Please, check if that is all right. As my knowledge of gdb code is low, I'd appreciate if you could write the commit message as the author of the patch (and submit it to the mailing list). Thank you, regards, Jan From c99eb5f06a0c25129875ed5addcc2cf2e7a9ab39 Mon Sep 17 00:00:00 2001 From: Jan Matyas <jmatyas@codasip.com> Date: Sun, 7 Feb 2021 22:04:50 +0100 Subject: [PATCH] Patch for gdb/26819 TODO: Proper commit message - Rebased Simon Marchi's patch from Jan/17. - Extended the gdb.server/stop-reply-no-thread.exp test case to cover the situation when the gdbserver does not report any threads. --- gdb/remote.c | 9 ++++++--- gdb/testsuite/gdb.server/stop-reply-no-thread.exp | 9 +++++++++ 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/gdb/remote.c b/gdb/remote.c index 31c6e17a1c..4cda15470e 100644 --- a/gdb/remote.c +++ b/gdb/remote.c @@ -741,7 +741,7 @@ class remote_target : public process_stratum_target int remote_resume_with_vcont (ptid_t ptid, int step, gdb_signal siggnal); - void add_current_inferior_and_thread (const char *wait_status); + thread_info *add_current_inferior_and_thread (const char *wait_status); ptid_t wait_ns (ptid_t ptid, struct target_waitstatus *status, target_wait_flags options); @@ -4411,7 +4411,7 @@ remote_target::get_current_thread (const char *wait_status) qC query, we infer the current thread from that stop reply, passed in in WAIT_STATUS, which may be NULL. */ -void +thread_info * remote_target::add_current_inferior_and_thread (const char *wait_status) { struct remote_state *rs = get_remote_state (); @@ -4445,6 +4445,8 @@ remote_target::add_current_inferior_and_thread (const char *wait_status) yet. */ thread_info *tp = add_thread_silent (this, curr_ptid); switch_to_thread_no_regs (tp); + + return tp; } /* Print info about a thread that was found already stopped on @@ -4800,7 +4802,8 @@ remote_target::start_remote (int from_tty, int extended_p) /* Target has no concept of threads at all. GDB treats non-threaded target as single-threaded; add a main thread. */ - add_current_inferior_and_thread (wait_status); + thread_info *tp = add_current_inferior_and_thread (wait_status); + get_remote_thread_info (tp)->set_resumed (); } else { diff --git a/gdb/testsuite/gdb.server/stop-reply-no-thread.exp b/gdb/testsuite/gdb.server/stop-reply-no-thread.exp index 68bf42ac1a..e698eaff6e 100644 --- a/gdb/testsuite/gdb.server/stop-reply-no-thread.exp +++ b/gdb/testsuite/gdb.server/stop-reply-no-thread.exp @@ -112,3 +112,12 @@ foreach_with_prefix to_disable { "" Tthread T } { run_test $to_disable $t_nonstop } } + +# Regression for gdb/26819: Cover the case when GDBserver does not report +# any threads. Non-stop Mode is off. +# +# Scenario 1: Disable 'T' and 'qfThreadInfo' packets +# Scenario 2: Disable all threading packets +foreach_with_prefix to_disable { "T,qfThreadInfo" "threads" } { + run_test $to_disable off +} -- 2.20.1
Created attachment 13208 [details] attachment-3656629-0.html Hi Jan, (off list) I just went into paternity leave, so I will probably won't be able to take a look before a few weeks. If you want to submit the patch for merging and get a review, please send it to the gdb-patches list as described here. https://sourceware.org/gdb/wiki/ContributionChecklist Thanks, Simon On February 7, 2021 4:20:24 PM EST, jmatyas at codasip dot com <sourceware-bugzilla@sourceware.org> wrote: >https://sourceware.org/bugzilla/show_bug.cgi?id=26819 > >--- Comment #25 from Jan Matyas <jmatyas at codasip dot com> --- >(In reply to Simon Marchi from comment #24) >> [...] The steps missing are: >> >> - Write a good commit message that explains why things fail currently >and >> why the patch fixes them (including a ChangeLog entry) >> - Enhance the test (gdb.server/stop-reply-no-thread.exp I think) so >it >> covers the two scenarios I mentioned ("--disable-packet=T >> --disable-packet=qfThreadInfo" and "--disable-packet=threads") >> - Send the patch to the mailing list for review. > >Hi Simon, >I have rebased your patch from Jan-17 and enhanced the test case as >suggested. >Please, check if that is all right. > >As my knowledge of gdb code is low, I'd appreciate if you could write >the >commit message as the author of the patch (and submit it to the mailing >list). > >Thank you, regards, >Jan > > > >From c99eb5f06a0c25129875ed5addcc2cf2e7a9ab39 Mon Sep 17 00:00:00 2001 >From: Jan Matyas <jmatyas@codasip.com> >Date: Sun, 7 Feb 2021 22:04:50 +0100 >Subject: [PATCH] Patch for gdb/26819 > >TODO: Proper commit message > >- Rebased Simon Marchi's patch from Jan/17. >- Extended the gdb.server/stop-reply-no-thread.exp test case > to cover the situation when the gdbserver does not report > any threads. >--- > gdb/remote.c | 9 ++++++--- > gdb/testsuite/gdb.server/stop-reply-no-thread.exp | 9 +++++++++ > 2 files changed, 15 insertions(+), 3 deletions(-) > >diff --git a/gdb/remote.c b/gdb/remote.c >index 31c6e17a1c..4cda15470e 100644 >--- a/gdb/remote.c >+++ b/gdb/remote.c >@@ -741,7 +741,7 @@ class remote_target : public process_stratum_target > int remote_resume_with_vcont (ptid_t ptid, int step, > gdb_signal siggnal); > >- void add_current_inferior_and_thread (const char *wait_status); >+ thread_info *add_current_inferior_and_thread (const char >*wait_status); > > ptid_t wait_ns (ptid_t ptid, struct target_waitstatus *status, > target_wait_flags options); >@@ -4411,7 +4411,7 @@ remote_target::get_current_thread (const char >*wait_status) > qC query, we infer the current thread from that stop reply, passed > in in WAIT_STATUS, which may be NULL. */ > >-void >+thread_info * >remote_target::add_current_inferior_and_thread (const char >*wait_status) > { > struct remote_state *rs = get_remote_state (); >@@ -4445,6 +4445,8 @@ remote_target::add_current_inferior_and_thread >(const >char *wait_status) > yet. */ > thread_info *tp = add_thread_silent (this, curr_ptid); > switch_to_thread_no_regs (tp); >+ >+ return tp; > } > > /* Print info about a thread that was found already stopped on >@@ -4800,7 +4802,8 @@ remote_target::start_remote (int from_tty, int >extended_p) > /* Target has no concept of threads at all. GDB treats > non-threaded target as single-threaded; add a main > thread. */ >- add_current_inferior_and_thread (wait_status); >+ thread_info *tp = add_current_inferior_and_thread >(wait_status); >+ get_remote_thread_info (tp)->set_resumed (); > } > else > { >diff --git a/gdb/testsuite/gdb.server/stop-reply-no-thread.exp >b/gdb/testsuite/gdb.server/stop-reply-no-thread.exp >index 68bf42ac1a..e698eaff6e 100644 >--- a/gdb/testsuite/gdb.server/stop-reply-no-thread.exp >+++ b/gdb/testsuite/gdb.server/stop-reply-no-thread.exp >@@ -112,3 +112,12 @@ foreach_with_prefix to_disable { "" Tthread T } { > run_test $to_disable $t_nonstop > } > } >+ >+# Regression for gdb/26819: Cover the case when GDBserver does not >report >+# any threads. Non-stop Mode is off. >+# >+# Scenario 1: Disable 'T' and 'qfThreadInfo' packets >+# Scenario 2: Disable all threading packets >+foreach_with_prefix to_disable { "T,qfThreadInfo" "threads" } { >+ run_test $to_disable off >+} >-- >2.20.1 > >-- >You are receiving this mail because: >You are on the CC list for the bug.
Created attachment 13209 [details] attachment-3668536-0.html Oh well apparently if you reply to the email it goes to Bugzilla even though it looked like Jan's email address. That's what I get for trying to answer on my phone. Anyhow, the info about sending to the list is still valid :)
Gentlemen, I have assembled, tested and submitted the patch for review per your suggestions: https://sourceware.org/pipermail/gdb-patches/2021-February/176542.html If you could help by looking at the patch, that would be very appreciated. Thank you, regards Jan
Hi Simon, Thank you for pointing out that git-send-email should be used, I missed that part in the ContributionChecklist. I have re-submitted the patch that way: https://sourceware.org/pipermail/gdb-patches/2021-February/176569.html Now I realize I missed the version 2 marker (v2) in the subject this time. If that should be added, shall I re-submit it one more time? Jan
This should be fixed now, commit 64d38fdd9956 ("Fix initial thread state of non-threaded remote targets"). Please reopen or open a new bug if you stumble on something else. Thanks for preparing and submitting the patch on the list.
Created attachment 13284 [details] Steps to reproduce
Created attachment 13285 [details] gdb-26819-thread1-verbose.txt
Created attachment 13286 [details] gdb-26819-thread2.txt
Created attachment 13287 [details] gdb-26819-thread2-verbose.txt
Did you find another issue related to this that isn't fixed?
Hello, I am sorry to re-open this, but I have come across one more scenario where the initial assertion gets triggered: ../../gdb/infrun.c:5672: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed. The assertion appears when software watchpoints are used on a riscv32 bare metal extended-remote target with two threads. The target is represented by Spike RISC-V ISA simulator and the remote server is OpenOCD, i.e. the same as already discussed in this ticket. The test scenario is a simple program in which a SW watchpoint on "counter == 5" is used: int main() { int sum = 0; int counter = 0; while (counter < 10) { counter = counter + 1; sum = sum + counter; } return counter; } Then "c" (continue) command is issued three times. The assertion is triggered after the third "c", which I guess is somehow related to the watchpoint expression going out of scope. The assertion only appears on software watchpoints, and when the second thread is the selected one ("thread 2"). I am attaching the logs: - gdb-26819-thread2.txt - brief log, showing the sequence of the commands - gdb-26819-thread2-verbose.txt - verbose log with debug prints for "infrun" and "remote When "thread 1" is selected, the assertion does not appear. However the log shows that in such case "thread 2" is not being single-stepped after the target gets resumed ("c"). Only thread 1 is stepped. Could this be another issue or is it expected behavior? The log for this second case is in: gdb-26819-thread1-verbose.txt The steps to reproduce (building OpenOCD, Spike, compiling the demo program ...) are also attached and should be hopefully easy to follow. I have tried this SW watchpoint scenario on a native machine (x86_64) with gdbserver, where it would be easier to reproduce, but I couldn't get it to fail. Please let me know if I can provide any further info. Regards, Jan
Reopening, per the previous comment.
I took a look at this issue and I can reproduce the failure. First, an interesting aside: I originally tried to reproduce this issue on a GDB built with --enable-targets=all, and couldn't reproduce the failure. But, when I built a GDB with --target=riscv-elf I was able to reproduce the issue just fine. The problem turns out to be that by default the all-targets GDB would default to osabi GNU/Linux, while the riscv-elf GDB does not include Linux support, so defaults to osabi 'none'. If I use the all-targets GDB and explicitly do 'set osabi none' then I can reproduce the failure with all-targets GDB. The difference is that GNU/Linux RISC-V doesn't support single stepping, so, when using that osabi we will make use of software single-step. The bare-metal 'none' osabi assumes that single step support is available. So, why is this test failing when using single stepping? It is my belief that the problem here is a bug in the multi-core support of openocd. My first clue is the following sequence of vCont and stop replies sent between GDB and openocd: Sending packet: $vCont?#49 Packet received: vCont;c;C;s;S Sending packet: $vCont;c#a8 Packet received: T05thread:2; Sending packet: $vCont;s:2#24 Packet received: T05 Sending packet: $vCont;s:2;c#c2 Packet received: T05 Sending packet: $vCont;s:1;c#c1 Packet received: T05thread:2; Notice that after the first 'vCont;c' we get back 'T05thread:2' clearly indicating which thread stopped. Next GDB sends 'vCont;s:2', so steps only the single thread '2', now we get back 'T05'. This is annoying (no thread-id), but the original patches for this issue addressed this, as only one thread was set running GDB correctly "guesses" the thread and carries on. After that GDB sends 'vCont;s:2;c', now we're single stepping thread 2, but allowing thread 1 to continue. Again the reply comes back 'T05'. This time GDB guesses thread 1 as the stopped thread, and things start to go off the rails. Notice however, that in the next packet GDB sends 'vCont;s:1;c', which is kind of the reverse, step thread 1, continue thread 2, but now we get a reply 'T05thread:2' which includes a thread-id. Weird! The summary of the above then is that sometimes openocd does not return a thread-id even when multiple threads are running. I looked a little into openocd, specifically I looked at the function gdb_signal_reply in server/gdb_server.c. This function is passed a 'struct target *target'. Whether we send a thread-id back or not depends on whether the 'target->rtos' field is set or not. Now, if I debug this function I see two different target pointers passed in at different times, one target represents "riscv.cpu0", this target has its rtos field set, the other target represents "riscv.cpu1", this target does not have its rtos field set. Now, I don't know the openocd internals, but what seems to happen is that sometimes the target stops, and gdb_signal_reply is called with "riscv.cpu0" target, but because target->rtos is set the stop is reported against target->rtos->current_thread, which can be thread 2. Then, sometimes, the target stops with "riscv.cpu1", in this case openocd just makes no attempt to add a thread-id as target->rtos is NULL. So, where does this leave us? The above explains why GDB starts getting confused, but doesn't fully explain why we eventually hit the assert. I'm still looking into the details of that, but wanted to record what I knew so far. GDB _could_ possibly be slightly smarter when it guesses, as in if we send 'vCont;s:2;c', then maybe we should guess that thread-2 is the likely thread to have stopped, as most of the time thread-2 single stepping will complete before thread-1 hits anything interesting.... most of the time. But this really feels like working around a broken target.
At the point where we run into the assertion the vCont/stop-reply packets look like: ... Sending packet: $vCont;s:2;c#c2 Packet received: T05 Sending packet: $vCont;s:1;c#c1 Packet received: T05thread:2; Sending packet: $vCont;s:2;c#c2 Packet received: T05 Sending packet: $vCont;s:1;c#c1 Packet received: T05thread:2; Sending packet: $vCont;s:2;c#c2 Packet received: T05 Sending packet: $vCont;s:1#23 Packet received: T05thread:2 Notice all of the 'vCont;s:2;c' packets get a 'T05' reply. In each of these cases it is almost certain that thread-2 was the one that stopped (after the single step), but in each case GDB will have guessed thread-1. But things really go wrong when with the last packet we send 'vCont;s:1', but get back the reply 'T05thread:2' - a thread that shouldn't be running reports a stop!
Final comment for the night. I now have this test passing reliably, I changed the file spike-2-hwthread.cfg (this is one of the files that the steps to reproduce document has us download), this is the openocd config file. Within this file are two lines like this: target create $_TARGETNAME_0 riscv -chain-position $_CHIPNAME.cpu -rtos hwthread target create $_TARGETNAME_1 riscv -chain-position $_CHIPNAME.cpu -coreid 1 I changed these lines to be this: target create $_TARGETNAME_0 riscv -chain-position $_CHIPNAME.cpu -rtos hwthread -coreid 0 target create $_TARGETNAME_1 riscv -chain-position $_CHIPNAME.cpu -rtos hwthread -coreid 1 And now the test passes just fine. As I've said in previous comments, I'm no openocd expert, I added the extra '-rtos hwthread' for the second core as it is the presence of this rtos setting that is causing openocd to send back the thread-id. Maybe there's more going on here and adding this flag is bad for some reason? I added the '-coreid 0' to the first core just for symmetry, I suspect 0 is the default. Unless there's new evidence that comes up, I'm pretty convinced that the issue here was either openocd has a bug, or openocd was misconfigured, however, I think there is an open issue for whether GDB could/should have handled this situation better? I think that the packets coming from the remote should be considered user input, and as such, we should not rely on them being "correct". As such, when invalid packets arrive GDB should be throwing an error, not quitting with an assertion failure. I'm not really sure where we should be catching this particular error, or what we should do. I doubt we can reasonably expect to recover, I think the target has shown itself to be unreliable in this case (reporting a stop against a thread that should not be running)...
Thanks very much for the analysis, Andrew. I agree with you that OpenOCD seems to be behavior very strangely (to say the least). It would be nice indeed to improve the situation, but I agree again that it's not clear where and under what condition we could decide to throw the towel. What I'm concerned about is users starting to get some errors in borderline scenarios that have been working OK thanks to the current heuristics. If we change that, we could negatively affect them. Maybe raise an error when we're getting a stop from a thread that wasn't supposed to run at all, like you highlighted? In any case, I agree also that this is not blocking for the release anymore, so I'm removing the 11.1 target milestone.
(In reply to Joel Brobecker from comment #41) > Thanks very much for the analysis, Andrew. > > I agree with you that OpenOCD seems to be behavior very strangely (to say > the least). Gentlemen, thank you for looking at this. If the finger is pointing at OpenOCD, I'd be happy to take a closer look at that and eventually provide an OpenOCD fix, if possible.
Jan, My guess for what is happening is this (and it is just a guess): - GDB sets all threads running, maybe does a step for one thread and continue for all the others, or maybe it really does continue all threads, I doubt it matters. - One thread (lets call it thread 1) stops for a legitimate reason (e.g. breakpoint). - As we are in all stop mode, openocd then tries to stop the remaining threads, but - Before the other thread(s) can be stopped one of them (lets call this thread 2) stops for some other legitimate reason (e.g. breakpoint). - OpenOCD reports the first stop (from thread 1) to GDB, - GDB decides to single step forward just that one thread (thread 1), and so sends a vCont;s:2 to the remote, but - OpenOCD spots that it has a pending stop for thread 2, and so sends that stop back to GDB. It also seems weird that OpenOCD can have multiple threads in existence, but gates its use of sending back the 'thread-id' in a stop packet on whether the thread has rtos data attached. I don't know OpenOCD internals, but it feels like, if OpenOCD has multiple target objects in existence then it should always be sending back a thread-id. To be honest, it's really not going to hurt if OpenOCD _always_ sent back a thread-id, even if there is only 1 thread - much better that than the current situation where sometimes there is no thread-id at all. Joel, I started looking at whether we can spot an invalid stop packet coming in, I think I have a plan, I'll post something to the list once I have a patch.
Great question and answer website! Thank you so much, keep it up the good deed! Please visit our sites also: https://www.bradentonrescreening.com/screen-repair--re-screening--st-petersburg-fl.html/ : https://www.bradentonrescreening.com/screen-repair--re-screening--sarasota-fl.html/ : https://www.vahomeloanmi.com/
*** Bug 22925 has been marked as a duplicate of this bug. ***