After quite some time, I just built gdb master on Solaris 11.4 (both SPARC and x86) and found that, while it still compiles thanks to the buildbot, every single execution test FAILs with an assertion failure: $ ./gdb -D ./data-directory ./hello GNU gdb (GDB) 10.0.50.20200106-git [...] Reading symbols from ./hello... (gdb) run Starting program: /vol/obj/gnu/gdb/gdb/reghunt/no-resync/122448/gdb/hello /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:336: internal-error: thread_info::thread_info(inferior*, ptid_t): Assertion `inf_ != NULL' failed. Looing at the changes to procfs.c within the last half year, I came across the Multi-target patch as a likely culprit. Indeed, the revision before it tested ok (well, just as bad as always), while the multi-target one showed the problem above, which renders gdb useless. Since I know nothing about this code, I'd be greatful for suggestions where to look. A very first attempt found the following (debugging with gdb 9.1): (top-gdb) run -D data-directory hello Starting program: /vol/obj/gnu/gdb/gdb/reghunt/no-resync/122448/gdb/gdb -D data-directory hello [...] GNU gdb (GDB) 10.0.50.20200106-git [...] Reading symbols from hello... (gdb) run Starting program: /vol/obj/gnu/gdb/gdb/reghunt/no-resync/122448/gdb/hello [New Thread 2 ] [...] [New Thread 25 ] [Switching to Thread 1 (LWP 1)] Thread 2 hit Breakpoint 1, internal_error (During symbol reading: incomplete CFI data; unspecified registers (e.g., rax) at 0xc34524 file=file@entry=0x966150 "/vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c", line=line@entry=336, fmt=0x9ddb94 "%s: Assertion `%s' failed.") at /vol/src/gnu/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c:51 51 { (top-gdb) where #0 internal_error ( file=file@entry=0x966150 "/vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c", line=line@entry=336, fmt=0x9ddb94 "%s: Assertion `%s' failed.") at /vol/src/gnu/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c:51 #1 0x0000000000ef81f4 in thread_info::thread_info (this=0x1212020, inf_=<optimized out>, ptid_=...) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:344 #2 0x0000000000ef82cd in new_thread (inf=inf@entry=0x0, ptid=...) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:239 #3 0x0000000000efac3c in add_thread_silent ( targ=targ@entry=0x11b0940 <the_procfs_target>, ptid=...) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:304 #4 0x0000000000d90692 in procfs_target::create_inferior ( this=0x11b0940 <the_procfs_target>, exec_file=0x13dbef0 "/vol/obj/gnu/gdb/gdb/reghunt/no-resync/122448/gdb/hello", allargs="", env=0x13c48f0, from_tty=<optimized out>) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/gdbsupport/ptid.h:47 #5 0x0000000000c84e64 in run_command_1 (args=<optimized out>, from_tty=1, run_how=run_how@entry=RUN_NORMAL) at /vol/gcc-9/include/c++/9.1.0/bits/basic_string.h:263 #6 0x0000000000c85007 in run_command (args=<optimized out>, from_tty=<optimized out>) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/infcmd.c:687 #7 0x0000000000b006c7 in do_const_cfunc (c=<optimized out>, args=<optimized out>, from_tty=<optimized out>) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/cli/cli-decode.c:107 #8 0x0000000000b02839 in cmd_func (During symbol reading: block at 0xa22919 out of order cmd=cmd@entry=0x1338a30, args=args@entry=0x0, from_tty=from_tty@entry=1) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/cli/cli-decode.c:1952 #9 0x0000000000f07212 in execute_command (p=<optimized out>, p@entry=0x120b320 "", from_tty=1) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/top.c:653 #10 0x0000000000c00bf2 in command_handler (command=0x120b320 "") at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-top.c:587 #11 0x0000000000c01df7 in command_line_handler (rl=...) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-top.c:772 #12 0x0000000000c0172c in gdb_rl_callback_handler (rl=0x1211c80 "run") at /vol/gcc-9/include/c++/9.1.0/bits/unique_ptr.h:153 #13 0x0000000000fccba2 in rl_callback_read_char () at /vol/src/gnu/gdb/hg/master/reghunt/readline/readline/callback.c:281 #14 0x0000000000c012ad in gdb_rl_callback_read_char_wrapper_noexcept () at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-top.c:176 #15 gdb_rl_callback_read_char_wrapper (client_data=<optimized out>) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-top.c:193 #16 0x0000000000c004e5 in stdin_event_handler (error=<optimized out>, client_data=0x12109b0) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-top.c:515 #17 0x0000000000bff700 in handle_file_event (file_ptr=0x13f6fe0, ready_mask=<optimized out>) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-loop.c:731 #18 0x0000000000bff9ba in gdb_wait_for_event (block=block@entry=1) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-loop.c:857 #19 0x0000000000bffd1e in gdb_do_one_event () at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-loop.c:346 #20 0x0000000000c003cd in start_event_loop () at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-loop.c:370 #21 0x0000000000ce5fff in captured_command_loop () at /vol/src/gnu/gdb/hg/master/reghunt/gdb/main.c:360 #22 0x0000000000ce8ec9 in captured_main (data=0x7fffbfffecc0) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/main.c:1203 #23 gdb_main (args=args@entry=0x7fffbfffecf0) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/main.c:1218 #24 0x0000000000a1ca59 in main (argc=<optimized out>, argv=<optimized out>) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/gdb.c:32 One thing immediately sticks out as fishy here: when starting the inferior (hello) 24 threads are created. For one, this didn't happen before and the test program doesn't create any threads by itself. 24 happens to be the number of cores on the test system, btw. I turned out that debugging this is useful with an optimized gdb binary, so will rebuild with -g3 -O0 next.
The master branch has been updated by Rainer Orth <ro@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=7f2043399809c0ba5c4819172214371ed820e8c6 commit 7f2043399809c0ba5c4819172214371ed820e8c6 Author: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> Date: Mon May 18 17:56:00 2020 +0200 Remove unused ps_lgetLDT etc. on Solaris/x86 [PR25981] As reported in PR build/25981, a future Solaris 11.4 update will soon remove the short i386 register names like SS etc. from <sys/regset.h>. They could leak into user code (e.g. via <signal.h> -> <sys/signal.h> -> <sys/ucontext.h>) and pollute the user namespace. Affected code would have a hard time avoiding the issue: LLVM is one of those. While the short names are required to be present by the i386 psABI, that document only demands that they exist in <ucontext.h>, which is what the upcoming update assures. With this change, in a 64-bit-default configuration, procfs.c fails to compile on Solaris/x86: /vol/src/gnu/gdb/hg/master/git/gdb/procfs.c: In function 'ssd* procfs_find_LDT_entry(ptid_t)': /vol/src/gnu/gdb/hg/master/git/gdb/procfs.c:1643:18: error: 'GS' was not declared in this scope 1643 | key = (*gregs)[GS] & 0xffff; | ^~ make[2]: *** [Makefile:1607: procfs.o] Error 1 Initially I meant to provide a definition using the planned replacement macro, but closer inspection revealed a better way. procfs_find_LDT_entry and its helper proc_get_LDT_entry are only used to implement ps_lgetLDT, one of the callback functions required by libthread_db.so.1 (cf. <proc_service.h>). While that function is still documented as being required even in Solaris 11.4, I found that calls to it had been removed long ago in Solaris 9, so just removing the three functions above is the easiest fix. The following patch does just that. It compiled successfully on amd64-pc-solaris2.11, however, as reported in PR gdb/25939, master is completely broken on Solaris since the multi-target patch. The patch applies cleanly to the gdb-9 branch and there I could test it successfully. PR build/25981 * procfs.c [(__i386__ || __x86_64__) && sun] (proc_get_LDT_entry, procfs_find_LDT_entry): Remove. * procfs.h [(__i386__ || __x86_64__) && sun] (struct ssd, procfs_find_LDT_entry): Remove. * sol-thread.c [(__i386__ || __x86_64__) && sun] (ps_lgetLDT): Remove.
The gdb-9-branch branch has been updated by Rainer Orth <ro@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ff6da943781d65f7ebc8e2e5ab52fb85590ea60b commit ff6da943781d65f7ebc8e2e5ab52fb85590ea60b Author: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> Date: Tue May 19 10:03:14 2020 +0200 Remove unused ps_lgetLDT etc. on Solaris/x86 [PR25981] As reported in PR build/25981, a future Solaris 11.4 update will soon remove the short i386 register names like SS etc. from <sys/regset.h>. They could leak into user code (e.g. via <signal.h> -> <sys/signal.h> -> <sys/ucontext.h>) and pollute the user namespace. Affected code would have a hard time avoiding the issue: LLVM is one of those. While the short names are required to be present by the i386 psABI, that document only demands that they exist in <ucontext.h>, which is what the upcoming update assures. With this change, in a 64-bit-default configuration, procfs.c fails to compile on Solaris/x86: /vol/src/gnu/gdb/hg/master/git/gdb/procfs.c: In function 'ssd* procfs_find_LDT_entry(ptid_t)': /vol/src/gnu/gdb/hg/master/git/gdb/procfs.c:1643:18: error: 'GS' was not declared in this scope 1643 | key = (*gregs)[GS] & 0xffff; | ^~ make[2]: *** [Makefile:1607: procfs.o] Error 1 Initially I meant to provide a definition using the planned replacement macro, but closer inspection revealed a better way. procfs_find_LDT_entry and its helper proc_get_LDT_entry are only used to implement ps_lgetLDT, one of the callback functions required by libthread_db.so.1 (cf. <proc_service.h>). While that function is still documented as being required even in Solaris 11.4, I found that calls to it had been removed long ago in Solaris 9, so just removing the three functions above is the easiest fix. The following patch does just that. It compiled successfully on amd64-pc-solaris2.11, however, as reported in PR gdb/25939, master is completely broken on Solaris since the multi-target patch. The patch applies cleanly to the gdb-9 branch and there I could test it successfully. PR build/25981 * procfs.c [(__i386__ || __x86_64__) && sun] (proc_get_LDT_entry, procfs_find_LDT_entry): Remove. * procfs.h [(__i386__ || __x86_64__) && sun] (struct ssd, procfs_find_LDT_entry): Remove. * sol-thread.c [(__i386__ || __x86_64__) && sun] (ps_lgetLDT): Remove.
Hi Rainer, Assigning this one to you, since you seem to be on it. I see that you already pushed some changes for this PR. Does it mean we can now close it, or do you think there is still work to be done on that one?
> --- Comment #3 from Joel Brobecker <brobecker at gnat dot com> --- > Hi Rainer, > > Assigning this one to you, since you seem to be on it. At least a bit, though I haven't made much progress. Besides, I got distracted by the GCC 10 release process, among others. > I see that you already pushed some changes for this PR. Does it mean we can now > close it, or do you think there is still work to be done on that one? The two are unrelated: those patches were for compilation failures affecting only the very latest Solaris 11.4 updates. This bug is about the failure to run a program on any Solaris version, caused by the multi-target merge. What I've found so far is that in thread.c (add_thread_silent) the call to find_inferior_ptid returns NULL because the all_inferiors (targ) iterator comes up empty. Looking further, in add_thread_silent, I see m_target_stack = {m_top = file_stratum, m_stack = {0x20190e0 <the_dummy_target>, 0x200b8c0 <exec_ops>, 0x0, 0x0, 0x0, 0x0, 0x0}}} i.e. the_procfs_target is missing while Linux/x86_64 has (among others) the_amd64_linux_nat_target. I managed to get a bit further with this patch: diff --git a/gdb/procfs.c b/gdb/procfs.c --- a/gdb/procfs.c +++ b/gdb/procfs.c @@ -3086,6 +3090,9 @@ procfs_target::create_inferior (const ch shell_file = tryname; } + if (!target_is_pushed (this)) + push_target (this); + pid = fork_inferior (exec_file, allargs, env, procfs_set_exec_trap, NULL, NULL, shell_file, NULL); However, now the startup fails with procfs: couldn't find pid 0 in procinfo list. procfs: init_inferior, open_proc_files line 2878, /proc/6031: No such file or directory. I haven't got any further yet, unfortunately.
The master branch has been updated by Rainer Orth <ro@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=cf6f3e86ded2cd950f59a0f2c164f2c953ef534b commit cf6f3e86ded2cd950f59a0f2c164f2c953ef534b Author: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> Date: Sun Jun 21 18:32:27 2020 +0200 [PR gdb/25939] Move push_target call earlier in procfs.c Since the multi-target patch, the run command fails on Solaris with an assertion failure even for a trivial program: $ ./gdb -D ./data-directory ./hello GNU gdb (GDB) 10.0.50.20200106-git [...] Reading symbols from ./hello... (gdb) run Starting program: /vol/obj/gnu/gdb/gdb/reghunt/no-resync/122448/gdb/hello /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:336: internal-error: thread_info::thread_info(inferior*, ptid_t): Assertion `inf_ != NULL' failed. Here's the start of the corresponding stack trace: #0 internal_error ( file=file@entry=0x966150 "/vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c", line=line@entry=336, fmt=0x9ddb94 "%s: Assertion `%s' failed.") at /vol/src/gnu/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c:51 #1 0x0000000000ef81f4 in thread_info::thread_info (this=0x1212020, inf_=<optimized out>, ptid_=...) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:344 #2 0x0000000000ef82cd in new_thread (inf=inf@entry=0x0, ptid=...) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:239 #3 0x0000000000efac3c in add_thread_silent ( targ=targ@entry=0x11b0940 <the_procfs_target>, ptid=...) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:304 #4 0x0000000000d90692 in procfs_target::create_inferior ( this=0x11b0940 <the_procfs_target>, exec_file=0x13dbef0 "/vol/obj/gnu/gdb/gdb/reghunt/no-resync/122448/gdb/hello", allargs="", env=0x13c48f0, from_tty=<optimized out>) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/gdbsupport/ptid.h:47 #5 0x0000000000c84e64 in run_command_1 (args=<optimized out>, from_tty=1, run_how=run_how@entry=RUN_NORMAL) at /vol/gcc-9/include/c++/9.1.0/bits/basic_string.h:263 #6 0x0000000000c85007 in run_command (args=<optimized out>, from_tty=<optimized out>) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/infcmd.c:687 Looking closer, I found that in add_thread_silent as called from procfs.c (procfs_target::create_inferior) find_inferior_ptid returns NULL. The all_inferiors (targ) iterator comes up empty. Going from there, I see that in add_thread_silent m_target_stack = {m_top = file_stratum, m_stack = {0x20190e0 <the_dummy_target>, 0x200b8c0 <exec_ops>, 0x0, 0x0, 0x0, 0x0, 0x0}}} i.e. the_procfs_target is missing compared to the_amd64_linux_nat_target on Linux/x86_64. Moving the push_target call earlier allows debugging to get over the initial assertion failure. I run instead into procfs: couldn't find pid 0 in procinfo list. which is fixed by https://sourceware.org/pipermail/gdb-patches/2020-June/169674.html Both patches tested together on amd64-pc-solaris2.11. PR gdb/25939 * procfs.c (procfs_target::procfs_init_inferior): Move push_target call ... (procfs_target::create_inferior): ... here.
The master branch has been updated by Pedro Alves <palves@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=f809832224cc45eb58812f6d4bb03cbf52fad980 commit f809832224cc45eb58812f6d4bb03cbf52fad980 Author: Pedro Alves <palves@redhat.com> Date: Mon Jun 22 10:54:08 2020 +0100 Solaris, target_wait(), don't rely on inferior_ptid Debugging on Solaris is broken, with the procfs target backend failing with: procfs: couldn't find pid 0 in procinfo list. as soon as you start a program. The problem is procfs_target::wait assuming that inferior_ptid is meaningful on entry, but, since the multi-target series, inferior_ptid is null_ptid before we call target_wait, in infrun.c: static ptid_t do_target_wait_1 (inferior *inf, ptid_t ptid, target_waitstatus *status, int options) { ... /* We know that we are looking for an event in the target of inferior INF, but we don't know which thread the event might come from. As such we want to make sure that INFERIOR_PTID is reset so that none of the wait code relies on it - doing so is always a mistake. */ switch_to_inferior_no_thread (inf); This patch tweaks the backend to remove the assumption that inferior_ptid points at something. sol-thread.c (the thread_stratum that sits on top of procfs.c) also has the same issue. Some spots in procfs_target::wait were returning TARGET_WAITKIND_SPURIOUS+inferior_ptid. This commit replaces those with waiting again without returning to the core. This fixes the relying on inferior_ptid, and also should fix the issue discussed here: https://sourceware.org/pipermail/gdb/2020-May/048616.html https://sourceware.org/pipermail/gdb/2020-June/048660.html gdb/ChangeLog: 2020-06-22 Pedro Alves <palves@redhat.com> PR gdb/25939 * procfs.c (procfs_target::wait): Don't reference inferior_ptid. Use the current inferior instead. Don't return TARGET_WAITKIND_SPURIOUS/inferior_ptid -- instead continue and wait again. * sol-thread.c (sol_thread_target::wait): Don't reference inferior_ptid. (ps_lgetregs, ps_lsetregs, ps_lgetfpregs, ps_lsetfpregs) (sol_update_thread_list_callback): Use the current inferior's pid instead of inferior_ptid.
Fixes pushed.
Just a quick note to say Thank You to both of you guys. You rock :).