Bug 25939 - [10 regression] run fails with ICE on Solaris
Summary: [10 regression] run fails with ICE on Solaris
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: 10.1
Assignee: Rainer Orth
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-07 11:11 UTC by Rainer Orth
Modified: 2020-06-22 18:19 UTC (History)
3 users (show)

See Also:
Host: *-*-solaris2.11
Target: *-*-solaris2.11
Build: *-*-solaris2.11
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rainer Orth 2020-05-07 11:11:37 UTC
After quite some time, I just built gdb master on Solaris 11.4 (both SPARC and x86)
and found that, while it still compiles thanks to the buildbot, every single
execution test FAILs with an assertion failure:

$ ./gdb -D ./data-directory ./hello
GNU gdb (GDB) 10.0.50.20200106-git
[...]
Reading symbols from ./hello...
(gdb) run
Starting program: /vol/obj/gnu/gdb/gdb/reghunt/no-resync/122448/gdb/hello 
/vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:336: internal-error: thread_info::thread_info(inferior*, ptid_t): Assertion `inf_ != NULL' failed.

Looing at the changes to procfs.c within the last half year, I came across the
Multi-target patch as a likely culprit.  Indeed, the revision before it tested
ok (well, just as bad as always), while the multi-target one showed the problem
above, which renders gdb useless.

Since I know nothing about this code, I'd be greatful for suggestions where to
look.  A very first attempt found the following (debugging with gdb 9.1):

(top-gdb) run -D data-directory hello
Starting program: /vol/obj/gnu/gdb/gdb/reghunt/no-resync/122448/gdb/gdb -D data-directory hello
[...]
GNU gdb (GDB) 10.0.50.20200106-git
[...]
Reading symbols from hello...
(gdb) run
Starting program: /vol/obj/gnu/gdb/gdb/reghunt/no-resync/122448/gdb/hello 
[New Thread 2        ]
[...]
[New Thread 25        ]
[Switching to Thread 1 (LWP 1)]

Thread 2 hit Breakpoint 1, internal_error (During symbol reading: incomplete CFI data; unspecified registers (e.g., rax) at 0xc34524
file=file@entry=0x966150 "/vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c", line=line@entry=336, fmt=0x9ddb94 "%s: Assertion `%s' failed.") at /vol/src/gnu/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c:51
51	{
(top-gdb) where
#0  internal_error (
    file=file@entry=0x966150 "/vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c", line=line@entry=336, fmt=0x9ddb94 "%s: Assertion `%s' failed.")
    at /vol/src/gnu/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c:51
#1  0x0000000000ef81f4 in thread_info::thread_info (this=0x1212020, 
    inf_=<optimized out>, ptid_=...)
    at /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:344
#2  0x0000000000ef82cd in new_thread (inf=inf@entry=0x0, ptid=...)
    at /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:239
#3  0x0000000000efac3c in add_thread_silent (
    targ=targ@entry=0x11b0940 <the_procfs_target>, ptid=...)
    at /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:304
#4  0x0000000000d90692 in procfs_target::create_inferior (
    this=0x11b0940 <the_procfs_target>, 
    exec_file=0x13dbef0 "/vol/obj/gnu/gdb/gdb/reghunt/no-resync/122448/gdb/hello", allargs="", env=0x13c48f0, from_tty=<optimized out>)
    at /vol/src/gnu/gdb/hg/master/reghunt/gdb/gdbsupport/ptid.h:47
#5  0x0000000000c84e64 in run_command_1 (args=<optimized out>, from_tty=1, 
    run_how=run_how@entry=RUN_NORMAL)
    at /vol/gcc-9/include/c++/9.1.0/bits/basic_string.h:263
#6  0x0000000000c85007 in run_command (args=<optimized out>, 
    from_tty=<optimized out>)
    at /vol/src/gnu/gdb/hg/master/reghunt/gdb/infcmd.c:687
#7  0x0000000000b006c7 in do_const_cfunc (c=<optimized out>, args=<optimized out>, from_tty=<optimized out>) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/cli/cli-decode.c:107
#8  0x0000000000b02839 in cmd_func (During symbol reading: block at 0xa22919 out of order
cmd=cmd@entry=0x1338a30, args=args@entry=0x0, from_tty=from_tty@entry=1) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/cli/cli-decode.c:1952
#9  0x0000000000f07212 in execute_command (p=<optimized out>, p@entry=0x120b320 "", from_tty=1) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/top.c:653
#10 0x0000000000c00bf2 in command_handler (command=0x120b320 "") at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-top.c:587
#11 0x0000000000c01df7 in command_line_handler (rl=...) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-top.c:772
#12 0x0000000000c0172c in gdb_rl_callback_handler (rl=0x1211c80 "run") at /vol/gcc-9/include/c++/9.1.0/bits/unique_ptr.h:153
#13 0x0000000000fccba2 in rl_callback_read_char () at /vol/src/gnu/gdb/hg/master/reghunt/readline/readline/callback.c:281
#14 0x0000000000c012ad in gdb_rl_callback_read_char_wrapper_noexcept () at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-top.c:176
#15 gdb_rl_callback_read_char_wrapper (client_data=<optimized out>) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-top.c:193
#16 0x0000000000c004e5 in stdin_event_handler (error=<optimized out>, client_data=0x12109b0) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-top.c:515
#17 0x0000000000bff700 in handle_file_event (file_ptr=0x13f6fe0, ready_mask=<optimized out>) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-loop.c:731
#18 0x0000000000bff9ba in gdb_wait_for_event (block=block@entry=1) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-loop.c:857
#19 0x0000000000bffd1e in gdb_do_one_event () at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-loop.c:346
#20 0x0000000000c003cd in start_event_loop () at /vol/src/gnu/gdb/hg/master/reghunt/gdb/event-loop.c:370
#21 0x0000000000ce5fff in captured_command_loop () at /vol/src/gnu/gdb/hg/master/reghunt/gdb/main.c:360
#22 0x0000000000ce8ec9 in captured_main (data=0x7fffbfffecc0) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/main.c:1203
#23 gdb_main (args=args@entry=0x7fffbfffecf0) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/main.c:1218
#24 0x0000000000a1ca59 in main (argc=<optimized out>, argv=<optimized out>) at /vol/src/gnu/gdb/hg/master/reghunt/gdb/gdb.c:32

One thing immediately sticks out as fishy here: when starting the inferior (hello)
24 threads are created.  For one, this didn't happen before and the test program
doesn't create any threads by itself.  24 happens to be the number of cores on
the test system, btw.

I turned out that debugging this is useful with an optimized gdb binary, so will
rebuild with -g3 -O0 next.
Comment 1 Sourceware Commits 2020-05-18 15:57:49 UTC
The master branch has been updated by Rainer Orth <ro@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=7f2043399809c0ba5c4819172214371ed820e8c6

commit 7f2043399809c0ba5c4819172214371ed820e8c6
Author: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
Date:   Mon May 18 17:56:00 2020 +0200

    Remove unused ps_lgetLDT etc. on Solaris/x86 [PR25981]
    
    As reported in PR build/25981, a future Solaris 11.4 update will soon
    remove the short i386 register names like SS etc. from <sys/regset.h>.
    They could leak into user code (e.g. via <signal.h> -> <sys/signal.h> ->
    <sys/ucontext.h>) and pollute the user namespace.  Affected code would
    have a hard time avoiding the issue: LLVM is one of those.
    
    While the short names are required to be present by the i386 psABI, that
    document only demands that they exist in <ucontext.h>, which is what the
    upcoming update assures.
    
    With this change, in a 64-bit-default configuration, procfs.c fails to
    compile on Solaris/x86:
    
    /vol/src/gnu/gdb/hg/master/git/gdb/procfs.c: In function 'ssd* procfs_find_LDT_entry(ptid_t)':
    /vol/src/gnu/gdb/hg/master/git/gdb/procfs.c:1643:18: error: 'GS' was not declared in this scope
     1643 |   key = (*gregs)[GS] & 0xffff;
          |                  ^~
    make[2]: *** [Makefile:1607: procfs.o] Error 1
    
    Initially I meant to provide a definition using the planned replacement
    macro, but closer inspection revealed a better way.  procfs_find_LDT_entry
    and its helper proc_get_LDT_entry are only used to implement ps_lgetLDT,
    one of the callback functions required by libthread_db.so.1
    (cf. <proc_service.h>).  While that function is still documented as being
    required even in Solaris 11.4, I found that calls to it had been removed
    long ago in Solaris 9, so just removing the three functions above is the
    easiest fix.
    
    The following patch does just that.  It compiled successfully on
    amd64-pc-solaris2.11, however, as reported in PR gdb/25939, master is
    completely broken on Solaris since the multi-target patch.  The patch
    applies cleanly to the gdb-9 branch and there I could test it
    successfully.
    
            PR build/25981
            * procfs.c [(__i386__ || __x86_64__) && sun] (proc_get_LDT_entry,
            procfs_find_LDT_entry): Remove.
            * procfs.h [(__i386__ || __x86_64__) && sun] (struct ssd,
            procfs_find_LDT_entry): Remove.
            * sol-thread.c [(__i386__ || __x86_64__) && sun] (ps_lgetLDT):
            Remove.
Comment 2 Sourceware Commits 2020-05-19 08:04:38 UTC
The gdb-9-branch branch has been updated by Rainer Orth <ro@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ff6da943781d65f7ebc8e2e5ab52fb85590ea60b

commit ff6da943781d65f7ebc8e2e5ab52fb85590ea60b
Author: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
Date:   Tue May 19 10:03:14 2020 +0200

    Remove unused ps_lgetLDT etc. on Solaris/x86 [PR25981]
    
    As reported in PR build/25981, a future Solaris 11.4 update will soon
    remove the short i386 register names like SS etc. from <sys/regset.h>.
    They could leak into user code (e.g. via <signal.h> -> <sys/signal.h> ->
    <sys/ucontext.h>) and pollute the user namespace.  Affected code would
    have a hard time avoiding the issue: LLVM is one of those.
    
    While the short names are required to be present by the i386 psABI, that
    document only demands that they exist in <ucontext.h>, which is what the
    upcoming update assures.
    
    With this change, in a 64-bit-default configuration, procfs.c fails to
    compile on Solaris/x86:
    
    /vol/src/gnu/gdb/hg/master/git/gdb/procfs.c: In function 'ssd* procfs_find_LDT_entry(ptid_t)':
    /vol/src/gnu/gdb/hg/master/git/gdb/procfs.c:1643:18: error: 'GS' was not declared in this scope
     1643 |   key = (*gregs)[GS] & 0xffff;
          |                  ^~
    make[2]: *** [Makefile:1607: procfs.o] Error 1
    
    Initially I meant to provide a definition using the planned replacement
    macro, but closer inspection revealed a better way.  procfs_find_LDT_entry
    and its helper proc_get_LDT_entry are only used to implement ps_lgetLDT,
    one of the callback functions required by libthread_db.so.1
    (cf. <proc_service.h>).  While that function is still documented as being
    required even in Solaris 11.4, I found that calls to it had been removed
    long ago in Solaris 9, so just removing the three functions above is the
    easiest fix.
    
    The following patch does just that.  It compiled successfully on
    amd64-pc-solaris2.11, however, as reported in PR gdb/25939, master is
    completely broken on Solaris since the multi-target patch.  The patch
    applies cleanly to the gdb-9 branch and there I could test it
    successfully.
    
            PR build/25981
            * procfs.c [(__i386__ || __x86_64__) && sun] (proc_get_LDT_entry,
            procfs_find_LDT_entry): Remove.
            * procfs.h [(__i386__ || __x86_64__) && sun] (struct ssd,
            procfs_find_LDT_entry): Remove.
            * sol-thread.c [(__i386__ || __x86_64__) && sun] (ps_lgetLDT):
            Remove.
Comment 3 Joel Brobecker 2020-06-14 01:44:00 UTC
Hi Rainer,

Assigning this one to you, since you seem to be on it.

I see that you already pushed some changes for this PR. Does it mean we can now close it, or do you think there is still work to be done on that one?
Comment 4 Rainer Orth 2020-06-14 15:12:17 UTC
> --- Comment #3 from Joel Brobecker <brobecker at gnat dot com> ---
> Hi Rainer,
>
> Assigning this one to you, since you seem to be on it.

At least a bit, though I haven't made much progress.  Besides, I got
distracted by the GCC 10 release process, among others.

> I see that you already pushed some changes for this PR. Does it mean we can now
> close it, or do you think there is still work to be done on that one?

The two are unrelated: those patches were for compilation failures
affecting only the very latest Solaris 11.4 updates.  This bug is about
the failure to run a program on any Solaris version, caused by the
multi-target merge.

What I've found so far is that in thread.c (add_thread_silent) the call
to find_inferior_ptid returns NULL because the all_inferiors (targ)
iterator comes up empty.

Looking further, in add_thread_silent, I see

m_target_stack = {m_top = file_stratum, m_stack = {0x20190e0 <the_dummy_target>, 0x200b8c0 <exec_ops>, 0x0, 0x0, 0x0, 0x0, 0x0}}}

i.e. the_procfs_target is missing while Linux/x86_64 has (among others)
the_amd64_linux_nat_target.

I managed to get a bit further with this patch:

diff --git a/gdb/procfs.c b/gdb/procfs.c
--- a/gdb/procfs.c
+++ b/gdb/procfs.c
@@ -3086,6 +3090,9 @@ procfs_target::create_inferior (const ch
       shell_file = tryname;
     }
 
+  if (!target_is_pushed (this))
+    push_target (this);
+
   pid = fork_inferior (exec_file, allargs, env, procfs_set_exec_trap,
 		       NULL, NULL, shell_file, NULL);
 

However, now the startup fails with

procfs: couldn't find pid 0 in procinfo list.
procfs: init_inferior, open_proc_files line 2878, /proc/6031: No such file or directory.

I haven't got any further yet, unfortunately.
Comment 5 Sourceware Commits 2020-06-21 16:34:32 UTC
The master branch has been updated by Rainer Orth <ro@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=cf6f3e86ded2cd950f59a0f2c164f2c953ef534b

commit cf6f3e86ded2cd950f59a0f2c164f2c953ef534b
Author: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
Date:   Sun Jun 21 18:32:27 2020 +0200

    [PR gdb/25939] Move push_target call earlier in procfs.c
    
    Since the multi-target patch, the run command fails on Solaris with an
    assertion failure even for a trivial program:
    
    $ ./gdb -D ./data-directory ./hello
    GNU gdb (GDB) 10.0.50.20200106-git
    [...]
    Reading symbols from ./hello...
    (gdb) run
    Starting program: /vol/obj/gnu/gdb/gdb/reghunt/no-resync/122448/gdb/hello
    /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:336: internal-error:
    thread_info::thread_info(inferior*, ptid_t): Assertion `inf_ != NULL'
    failed.
    
    Here's the start of the corresponding stack trace:
    
    #0  internal_error (
        file=file@entry=0x966150
    "/vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c", line=line@entry=336,
    fmt=0x9ddb94 "%s: Assertion `%s' failed.")
        at /vol/src/gnu/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c:51
    #1  0x0000000000ef81f4 in thread_info::thread_info (this=0x1212020,
        inf_=<optimized out>, ptid_=...)
        at /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:344
    #2  0x0000000000ef82cd in new_thread (inf=inf@entry=0x0, ptid=...)
        at /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:239
    #3  0x0000000000efac3c in add_thread_silent (
        targ=targ@entry=0x11b0940 <the_procfs_target>, ptid=...)
        at /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:304
    #4  0x0000000000d90692 in procfs_target::create_inferior (
        this=0x11b0940 <the_procfs_target>,
        exec_file=0x13dbef0
    "/vol/obj/gnu/gdb/gdb/reghunt/no-resync/122448/gdb/hello", allargs="",
    env=0x13c48f0, from_tty=<optimized out>)
        at /vol/src/gnu/gdb/hg/master/reghunt/gdb/gdbsupport/ptid.h:47
    #5  0x0000000000c84e64 in run_command_1 (args=<optimized out>, from_tty=1,
        run_how=run_how@entry=RUN_NORMAL)
        at /vol/gcc-9/include/c++/9.1.0/bits/basic_string.h:263
    #6  0x0000000000c85007 in run_command (args=<optimized out>,
        from_tty=<optimized out>)
        at /vol/src/gnu/gdb/hg/master/reghunt/gdb/infcmd.c:687
    
    Looking closer, I found that in add_thread_silent as called from
    procfs.c (procfs_target::create_inferior) find_inferior_ptid returns
    NULL.  The all_inferiors (targ) iterator comes up empty.
    
    Going from there, I see that in add_thread_silent
    
    m_target_stack = {m_top = file_stratum, m_stack = {0x20190e0
    <the_dummy_target>, 0x200b8c0 <exec_ops>, 0x0, 0x0, 0x0, 0x0, 0x0}}}
    
    i.e. the_procfs_target is missing compared to the_amd64_linux_nat_target
    on Linux/x86_64.
    
    Moving the push_target call earlier allows debugging to get over the
    initial assertion failure.  I run instead into
    
    procfs: couldn't find pid 0 in procinfo list.
    
    which is fixed by
    
            https://sourceware.org/pipermail/gdb-patches/2020-June/169674.html
    
    Both patches tested together on amd64-pc-solaris2.11.
    
            PR gdb/25939
            * procfs.c (procfs_target::procfs_init_inferior): Move push_target
            call ...
            (procfs_target::create_inferior): ... here.
Comment 6 Sourceware Commits 2020-06-22 10:18:31 UTC
The master branch has been updated by Pedro Alves <palves@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=f809832224cc45eb58812f6d4bb03cbf52fad980

commit f809832224cc45eb58812f6d4bb03cbf52fad980
Author: Pedro Alves <palves@redhat.com>
Date:   Mon Jun 22 10:54:08 2020 +0100

    Solaris, target_wait(), don't rely on inferior_ptid
    
    Debugging on Solaris is broken, with the procfs target backend failing
    with:
    
     procfs: couldn't find pid 0 in procinfo list.
    
    as soon as you start a program.
    
    The problem is procfs_target::wait assuming that inferior_ptid is
    meaningful on entry, but, since the multi-target series, inferior_ptid
    is null_ptid before we call target_wait, in infrun.c:
    
      static ptid_t
      do_target_wait_1 (inferior *inf, ptid_t ptid,
                        target_waitstatus *status, int options)
      {
    ...
        /* We know that we are looking for an event in the target of inferior
           INF, but we don't know which thread the event might come from.  As
           such we want to make sure that INFERIOR_PTID is reset so that none of
           the wait code relies on it - doing so is always a mistake.  */
        switch_to_inferior_no_thread (inf);
    
    This patch tweaks the backend to remove the assumption that
    inferior_ptid points at something.  sol-thread.c (the thread_stratum
    that sits on top of procfs.c) also has the same issue.
    
    Some spots in procfs_target::wait were returning
    TARGET_WAITKIND_SPURIOUS+inferior_ptid.  This commit replaces those
    with waiting again without returning to the core.  This fixes the
    relying on inferior_ptid, and also should fix the issue discussed
    here:
      https://sourceware.org/pipermail/gdb/2020-May/048616.html
      https://sourceware.org/pipermail/gdb/2020-June/048660.html
    
    gdb/ChangeLog:
    2020-06-22  Pedro Alves  <palves@redhat.com>
    
            PR gdb/25939
            * procfs.c (procfs_target::wait): Don't reference inferior_ptid.
            Use the current inferior instead.  Don't return
            TARGET_WAITKIND_SPURIOUS/inferior_ptid -- instead continue and
            wait again.
            * sol-thread.c (sol_thread_target::wait): Don't reference
            inferior_ptid.
            (ps_lgetregs, ps_lsetregs, ps_lgetfpregs, ps_lsetfpregs)
            (sol_update_thread_list_callback): Use the current inferior's pid
            instead of inferior_ptid.
Comment 7 Pedro Alves 2020-06-22 10:20:25 UTC
Fixes pushed.
Comment 8 Joel Brobecker 2020-06-22 18:19:03 UTC
Just a quick note to say Thank You to both of you guys. You rock :).