I see following regression/internal-error when using gdb (13.1) on Solaris x86: (gdb) run -P Starting program: /usr/bin/firefox -P [Thread debugging using libthread_db enabled] warning: could not convert 'mutex_t' from the host encoding (ISO-8859-1) to UTF-32. This normally should not happen, please file a bug report. [New Thread 1 (LWP 1)] /builds/psumbera/userland-gdb-13/components/gdb/gdb-13.1/gdb/target.c:2641: internal-error: target_resume: Assertion `inferior_ptid != null_ptid' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. ----- Backtrace ----- 0x9b6213 ??? 0xd3e52c ??? 0xd3e778 ??? 0xe798bc ??? 0xcee510 ??? 0xbdefca ??? 0xc7273b ??? 0xcf1443 ??? 0xb3b9c4 ??? 0xb4d008 ??? 0x979b2f ??? 0xe7a684 ??? 0xb777f1 ??? 0xb791e4 ??? 0x922da6 ??? 0x922c12 ??? --------------------- /builds/psumbera/userland-gdb-13/components/gdb/gdb-13.1/gdb/target.c:2641: internal-error: target_resume: Assertion `inferior_ptid != null_ptid' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. -- I used git bisect and it says: d51926f06a7f4854bebdd71dcb0a78dbaa2f4168 is the first bad commit commit d51926f06a7f4854bebdd71dcb0a78dbaa2f4168 Author: Pedro Alves <pedro@palves.net> Date: Thu Apr 21 14:20:36 2022 +0100 Slightly tweak and clarify target_resume's interface
The problematic commit added following two assertions: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/target.c;h=0cebecfafc379534180a6563d246cdbc3488311e;hb=HEAD#l2648 2648 gdb_assert (inferior_ptid != null_ptid); 2649 gdb_assert (inferior_ptid.matches (scope_ptid)); When they are removed it seems to work as expected/before.
I'm not super conversant with this area but I think that, most likely, patching out these asserts will let it work in some scenarios but crash in others. Probably the native target needs some extra work here, though unfortunately I don't know exactly what.
(In reply to Tom Tromey from comment #2) > I'm not super conversant with this area but I think that, most likely, > patching out these asserts will let it work in some scenarios but > crash in others. Probably the native target needs some extra work here, > though unfortunately I don't know exactly what. I agree. Can you run the same test case, but with doing "set debug infrun 1" first?
Created attachment 14771 [details] assertion hit with set debug infrun 1 Thanks for looking at it. What we are looking for?
The problem is that the backend called target_resume without switching to the leader thread of that resumption. - inferior_ptid is a global in gdb that points at the currently selected thread. - inferior_ptid == null_ptid means no thread is selected at that point. - The target_resume interface is: /* Resume execution (or prepare for execution) of the current thread (INFERIOR_PTID), while optionally letting other threads of the current process or all processes run free. ... Thus calling target_resume with inferior_ptid == null_ptid is bogus. target_wait (which leads to procfs_target::wait on Solaris) is called with inferior_ptid == null_ptid on entry exactly to help catch such bogus uses. It's procfs_target::wait that is calling target_resume incorrectly. From the backtrace, it seems that the line in question is 2187, which has: /* How to keep going without returning to wfi: */ target_continue_no_signal (ptid); goto wait_again; target_continue_no_signal is a small wrapper around target_resume, which would make sense. I guess we don't see target_continue_no_signal in the backtrace because that is an optimized gdb build. I'm writing a fix.
Created attachment 14777 [details] Fix Could you give the attached patch a try? I was able to confirm that gdb/procfs.c (the modified file) at least compiles using an old Solaris 11 VM I had lying around, but I wasn't able to build GDB fully due to other issues.
(In reply to Pedro Alves from comment #6) > Created attachment 14777 [details] > Fix > > Could you give the attached patch a try? Yes. Thank you! Assertion is not hit in my test case. > I was able to confirm that gdb/procfs.c (the modified file) at least > compiles using an old Solaris 11 VM I had lying around, but I wasn't able to > build GDB fully due to other issues. There is one year old release which you should be able to use for such testing: https://blogs.oracle.com/solaris/post/building-open-source-software-on-oracle-solaris-114-cbe-release
I hit the same bug with gdb 13.2 on OpenIndiana (illumos based distro) and the attached patch solved the problem.
The master branch has been updated by Pedro Alves <palves@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b2ad7bb9e6a012699195d3eda9d40679c406ebdc commit b2ad7bb9e6a012699195d3eda9d40679c406ebdc Author: Pedro Alves <pedro@palves.net> Date: Thu Jul 6 15:05:11 2023 +0100 Fix Solaris regression (PR tdep/30252) PR tdep/30252 reports that using GDB on Solaris fails an assertion in target_resume: target.c:2648: internal-error: target_resume: Assertion `inferior_ptid != null_ptid' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) The backtrace, after running it through c++filt, looks like: ----- Backtrace ----- 0xa18914 gdb_internal_backtrace_1 /root/binutils-gdb/gdb/bt-utils.c:122 0xa18914 gdb_internal_backtrace() /root/binutils-gdb/gdb/bt-utils.c:168 0xdec834 internal_vproblem /root/binutils-gdb/gdb/utils.c:401 0xdecad8 internal_verror(char const*, int, char const*, __va_list_tag*) /root/binutils-gdb/gdb/utils.c:481 0xf3638c internal_error_loc(char const*, int, char const*, ...) /root/binutils-gdb/gdbsupport/errors.cc:58 0xd70580 target_resume(ptid_t, int, gdb_signal) /root/binutils-gdb/gdb/target.c:2648 0xc59e85 procfs_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) /root/binutils-gdb/gdb/procfs.c:2187 0xcf6da7 sol_thread_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) /root/binutils-gdb/gdb/sol-thread.c:442 0xd73711 target_wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) /root/binutils-gdb/gdb/target.c:2586 ... The problem is that the procfs backend, while inside target_wait, called target_resume without switching to the leader thread of that resumption. The target_resume interface is: /* Resume execution (or prepare for execution) of the current thread (INFERIOR_PTID), while optionally letting other threads of the current process or all processes run free. ... Thus calling target_resume with inferior_ptid == null_ptid is bogus. target_wait (which leads to procfs_target::wait on Solaris) is called with inferior_ptid == null_ptid on entry exactly to help catch such bogus uses. From the backtrace, it seems that the relevant line in question is procfs.c:2187: 2186 /* How to keep going without returning to wfi: */ 2187 target_continue_no_signal (ptid); 2188 goto wait_again; target_continue_no_signal is a small wrapper around target_resume, which would make sense. The fix is to not call target_resume or go via the target stack at all. Instead, factor out a new proc_resume function out of procfs_target::resume, and call that. The new function does not rely on inferior_ptid. I've not been able to test it myself, but Petr confirmed it fixes the assertion failure with his test case, and Marcel Telka also confirmed it solves the problem. Tested-By: Petr Å umbera <petr.sumbera@oracle.com> Tested-By: Marcel Telka <marcel@telka.sk> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30252 Change-Id: I6213c59b081d400a22e799ee621c2eff6dcafbf3
The gdb-13-branch branch has been updated by Pedro Alves <palves@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=dfe07f10de81a38f595022556873d16585ea2b7e commit dfe07f10de81a38f595022556873d16585ea2b7e Author: Pedro Alves <pedro@palves.net> Date: Thu Jul 6 15:05:11 2023 +0100 Fix Solaris regression (PR tdep/30252) PR tdep/30252 reports that using GDB on Solaris fails an assertion in target_resume: target.c:2648: internal-error: target_resume: Assertion `inferior_ptid != null_ptid' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) The backtrace, after running it through c++filt, looks like: ----- Backtrace ----- 0xa18914 gdb_internal_backtrace_1 /root/binutils-gdb/gdb/bt-utils.c:122 0xa18914 gdb_internal_backtrace() /root/binutils-gdb/gdb/bt-utils.c:168 0xdec834 internal_vproblem /root/binutils-gdb/gdb/utils.c:401 0xdecad8 internal_verror(char const*, int, char const*, __va_list_tag*) /root/binutils-gdb/gdb/utils.c:481 0xf3638c internal_error_loc(char const*, int, char const*, ...) /root/binutils-gdb/gdbsupport/errors.cc:58 0xd70580 target_resume(ptid_t, int, gdb_signal) /root/binutils-gdb/gdb/target.c:2648 0xc59e85 procfs_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) /root/binutils-gdb/gdb/procfs.c:2187 0xcf6da7 sol_thread_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) /root/binutils-gdb/gdb/sol-thread.c:442 0xd73711 target_wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) /root/binutils-gdb/gdb/target.c:2586 ... The problem is that the procfs backend, while inside target_wait, called target_resume without switching to the leader thread of that resumption. The target_resume interface is: /* Resume execution (or prepare for execution) of the current thread (INFERIOR_PTID), while optionally letting other threads of the current process or all processes run free. ... Thus calling target_resume with inferior_ptid == null_ptid is bogus. target_wait (which leads to procfs_target::wait on Solaris) is called with inferior_ptid == null_ptid on entry exactly to help catch such bogus uses. From the backtrace, it seems that the relevant line in question is procfs.c:2187: 2186 /* How to keep going without returning to wfi: */ 2187 target_continue_no_signal (ptid); 2188 goto wait_again; target_continue_no_signal is a small wrapper around target_resume, which would make sense. The fix is to not call target_resume or go via the target stack at all. Instead, factor out a new proc_resume function out of procfs_target::resume, and call that. The new function does not rely on inferior_ptid. I've not been able to test it myself, but Petr confirmed it fixes the assertion failure with his test case, and Marcel Telka also confirmed it solves the problem. Tested-By: Petr Å umbera <petr.sumbera@oracle.com> Tested-By: Marcel Telka <marcel@telka.sk> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30252 Change-Id: I6213c59b081d400a22e799ee621c2eff6dcafbf3
Thanks for extra confirmation Marcel, I would have forgotten about this patch without it. I've now merged the patch to both master and the gdb 13 branch. I am not sure if there will ever be another GDB 13 release, most likely there won't, but, if you need to pull the fix for some downstream release, it's there.