Summary: | gdb-13.1/gdb/target.c:2641: internal-error: target_resume: Assertion `inferior_ptid != null_ptid' failed | ||
---|---|---|---|
Product: | gdb | Reporter: | Petr Šumbera <petr.sumbera> |
Component: | tdep | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | marcel, pedro, simark, tromey |
Priority: | P2 | ||
Version: | 13.1 | ||
Target Milestone: | 14.1 | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: |
assertion hit with set debug infrun 1
Fix |
Description
Petr Šumbera
2023-03-20 16:00:19 UTC
The problematic commit added following two assertions: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/target.c;h=0cebecfafc379534180a6563d246cdbc3488311e;hb=HEAD#l2648 2648 gdb_assert (inferior_ptid != null_ptid); 2649 gdb_assert (inferior_ptid.matches (scope_ptid)); When they are removed it seems to work as expected/before. I'm not super conversant with this area but I think that, most likely, patching out these asserts will let it work in some scenarios but crash in others. Probably the native target needs some extra work here, though unfortunately I don't know exactly what. (In reply to Tom Tromey from comment #2) > I'm not super conversant with this area but I think that, most likely, > patching out these asserts will let it work in some scenarios but > crash in others. Probably the native target needs some extra work here, > though unfortunately I don't know exactly what. I agree. Can you run the same test case, but with doing "set debug infrun 1" first? Created attachment 14771 [details]
assertion hit with set debug infrun 1
Thanks for looking at it. What we are looking for?
The problem is that the backend called target_resume without switching to the leader thread of that resumption. - inferior_ptid is a global in gdb that points at the currently selected thread. - inferior_ptid == null_ptid means no thread is selected at that point. - The target_resume interface is: /* Resume execution (or prepare for execution) of the current thread (INFERIOR_PTID), while optionally letting other threads of the current process or all processes run free. ... Thus calling target_resume with inferior_ptid == null_ptid is bogus. target_wait (which leads to procfs_target::wait on Solaris) is called with inferior_ptid == null_ptid on entry exactly to help catch such bogus uses. It's procfs_target::wait that is calling target_resume incorrectly. From the backtrace, it seems that the line in question is 2187, which has: /* How to keep going without returning to wfi: */ target_continue_no_signal (ptid); goto wait_again; target_continue_no_signal is a small wrapper around target_resume, which would make sense. I guess we don't see target_continue_no_signal in the backtrace because that is an optimized gdb build. I'm writing a fix. Created attachment 14777 [details]
Fix
Could you give the attached patch a try?
I was able to confirm that gdb/procfs.c (the modified file) at least compiles using an old Solaris 11 VM I had lying around, but I wasn't able to build GDB fully due to other issues.
(In reply to Pedro Alves from comment #6) > Created attachment 14777 [details] > Fix > > Could you give the attached patch a try? Yes. Thank you! Assertion is not hit in my test case. > I was able to confirm that gdb/procfs.c (the modified file) at least > compiles using an old Solaris 11 VM I had lying around, but I wasn't able to > build GDB fully due to other issues. There is one year old release which you should be able to use for such testing: https://blogs.oracle.com/solaris/post/building-open-source-software-on-oracle-solaris-114-cbe-release I hit the same bug with gdb 13.2 on OpenIndiana (illumos based distro) and the attached patch solved the problem. The master branch has been updated by Pedro Alves <palves@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b2ad7bb9e6a012699195d3eda9d40679c406ebdc commit b2ad7bb9e6a012699195d3eda9d40679c406ebdc Author: Pedro Alves <pedro@palves.net> Date: Thu Jul 6 15:05:11 2023 +0100 Fix Solaris regression (PR tdep/30252) PR tdep/30252 reports that using GDB on Solaris fails an assertion in target_resume: target.c:2648: internal-error: target_resume: Assertion `inferior_ptid != null_ptid' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) The backtrace, after running it through c++filt, looks like: ----- Backtrace ----- 0xa18914 gdb_internal_backtrace_1 /root/binutils-gdb/gdb/bt-utils.c:122 0xa18914 gdb_internal_backtrace() /root/binutils-gdb/gdb/bt-utils.c:168 0xdec834 internal_vproblem /root/binutils-gdb/gdb/utils.c:401 0xdecad8 internal_verror(char const*, int, char const*, __va_list_tag*) /root/binutils-gdb/gdb/utils.c:481 0xf3638c internal_error_loc(char const*, int, char const*, ...) /root/binutils-gdb/gdbsupport/errors.cc:58 0xd70580 target_resume(ptid_t, int, gdb_signal) /root/binutils-gdb/gdb/target.c:2648 0xc59e85 procfs_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) /root/binutils-gdb/gdb/procfs.c:2187 0xcf6da7 sol_thread_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) /root/binutils-gdb/gdb/sol-thread.c:442 0xd73711 target_wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) /root/binutils-gdb/gdb/target.c:2586 ... The problem is that the procfs backend, while inside target_wait, called target_resume without switching to the leader thread of that resumption. The target_resume interface is: /* Resume execution (or prepare for execution) of the current thread (INFERIOR_PTID), while optionally letting other threads of the current process or all processes run free. ... Thus calling target_resume with inferior_ptid == null_ptid is bogus. target_wait (which leads to procfs_target::wait on Solaris) is called with inferior_ptid == null_ptid on entry exactly to help catch such bogus uses. From the backtrace, it seems that the relevant line in question is procfs.c:2187: 2186 /* How to keep going without returning to wfi: */ 2187 target_continue_no_signal (ptid); 2188 goto wait_again; target_continue_no_signal is a small wrapper around target_resume, which would make sense. The fix is to not call target_resume or go via the target stack at all. Instead, factor out a new proc_resume function out of procfs_target::resume, and call that. The new function does not rely on inferior_ptid. I've not been able to test it myself, but Petr confirmed it fixes the assertion failure with his test case, and Marcel Telka also confirmed it solves the problem. Tested-By: Petr Å umbera <petr.sumbera@oracle.com> Tested-By: Marcel Telka <marcel@telka.sk> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30252 Change-Id: I6213c59b081d400a22e799ee621c2eff6dcafbf3 The gdb-13-branch branch has been updated by Pedro Alves <palves@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=dfe07f10de81a38f595022556873d16585ea2b7e commit dfe07f10de81a38f595022556873d16585ea2b7e Author: Pedro Alves <pedro@palves.net> Date: Thu Jul 6 15:05:11 2023 +0100 Fix Solaris regression (PR tdep/30252) PR tdep/30252 reports that using GDB on Solaris fails an assertion in target_resume: target.c:2648: internal-error: target_resume: Assertion `inferior_ptid != null_ptid' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) The backtrace, after running it through c++filt, looks like: ----- Backtrace ----- 0xa18914 gdb_internal_backtrace_1 /root/binutils-gdb/gdb/bt-utils.c:122 0xa18914 gdb_internal_backtrace() /root/binutils-gdb/gdb/bt-utils.c:168 0xdec834 internal_vproblem /root/binutils-gdb/gdb/utils.c:401 0xdecad8 internal_verror(char const*, int, char const*, __va_list_tag*) /root/binutils-gdb/gdb/utils.c:481 0xf3638c internal_error_loc(char const*, int, char const*, ...) /root/binutils-gdb/gdbsupport/errors.cc:58 0xd70580 target_resume(ptid_t, int, gdb_signal) /root/binutils-gdb/gdb/target.c:2648 0xc59e85 procfs_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) /root/binutils-gdb/gdb/procfs.c:2187 0xcf6da7 sol_thread_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) /root/binutils-gdb/gdb/sol-thread.c:442 0xd73711 target_wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) /root/binutils-gdb/gdb/target.c:2586 ... The problem is that the procfs backend, while inside target_wait, called target_resume without switching to the leader thread of that resumption. The target_resume interface is: /* Resume execution (or prepare for execution) of the current thread (INFERIOR_PTID), while optionally letting other threads of the current process or all processes run free. ... Thus calling target_resume with inferior_ptid == null_ptid is bogus. target_wait (which leads to procfs_target::wait on Solaris) is called with inferior_ptid == null_ptid on entry exactly to help catch such bogus uses. From the backtrace, it seems that the relevant line in question is procfs.c:2187: 2186 /* How to keep going without returning to wfi: */ 2187 target_continue_no_signal (ptid); 2188 goto wait_again; target_continue_no_signal is a small wrapper around target_resume, which would make sense. The fix is to not call target_resume or go via the target stack at all. Instead, factor out a new proc_resume function out of procfs_target::resume, and call that. The new function does not rely on inferior_ptid. I've not been able to test it myself, but Petr confirmed it fixes the assertion failure with his test case, and Marcel Telka also confirmed it solves the problem. Tested-By: Petr Å umbera <petr.sumbera@oracle.com> Tested-By: Marcel Telka <marcel@telka.sk> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30252 Change-Id: I6213c59b081d400a22e799ee621c2eff6dcafbf3 Thanks for extra confirmation Marcel, I would have forgotten about this patch without it. I've now merged the patch to both master and the gdb 13 branch. I am not sure if there will ever be another GDB 13 release, most likely there won't, but, if you need to pull the fix for some downstream release, it's there. |