Unbreaking gdb on Solaris post-multitarget [PR 25939]

Rainer Orth ro@CeBiTec.Uni-Bielefeld.DE
Wed Jun 17 14:45:51 GMT 2020


Hi Pedro,

> On 6/16/20 3:21 PM, Rainer Orth wrote:
>> Some time ago, when testing gdb master on Solaris again after several
>> months, I discovered that gdb couldn't execute even a trivial program
>> anymore.  This had gone unnoticed by the Solaris buildbots since the
>> code continued to compile just fine.  Those bots are build-only since
>> many tests (especially thread tests) are either flaky or time out.
>> 
>> A reghunt identified the multi-target merge as the culprit.
>
> I'm sorry about that.

no worries: the Solaris port had been in a relatively bad shape even
before, so maybe this will allow to get to the bottom of things and fix
them.

>> I've managed to get a bit further with the following patch which is
>> intended to push the procfs target first:
>
> That patch looks good to me.

Thanks.

>> However, while I now get over the initial assertion failure, I run
>> instead into
>> 
>> procfs: couldn't find pid 0 in procinfo list.
>> procfs: init_inferior, open_proc_files line 2878, /proc/6031: No such file or directory.
>> 
>> When I break in procfs.c (procfs_init_inferior), I can see that
>> create_procinfo succeeds.  However, looking at the process tree at this
>> point, I see that the debuggee is still marked as defunct
>> 
>>                   18377 /vol/gcc/bin/gdb -i=mi /vol/gnu/obj/gdb/gdb/reghunt/no-r
>>                     18379 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb 
>>                       18382 <defunct>
>> 
>> so open_procinfo_files fails because /proc/<pid> only contains psinfo
>> and usage, but no ctl file yet.
>> 
>> I tried to do the same with a version of gdb from immediately before the
>> multi-target merge: while that can run a test program interactively just
>> fine, 
>
> It's not clear to me whether you're saying that a version from before
> the multi-target changes can run a test program fine due to not needing
> the push_target fix, or whether the multi-target patchset itself caused
> this second issue you're observing even when debugging a simple hello
> program.

I've experimented a bit more yesterday.  Immediately before the
multi-target patch, I have:

$ cat top-gdb.gdb
file ./gdb
run -q -D data-directory -x bottom-gdb.gdb
$ cat bottom-gdb.gdb
file ./hello
b main
run
$ gdb-9 -q -x top-gdb.gdb
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x196c898: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x179e138: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/cli/cli-cmds.c, line 201.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
[New LWP    3        ]
[New LWP    4        ]
[New LWP    5        ]
[New LWP    6        ]
[New LWP    7        ]
[New LWP    8        ]
[New LWP    9        ]
Breakpoint 1 at 0x401036: file hello.c, line 6.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[Switching to Thread 1 (LWP 1)]

Thread 2 hit Breakpoint 1, main () at hello.c:6
6	  printf ("Hello world\n");

At that point the process hierarchy is as expected:

                22745 gdb-9 -q -x top-gdb.gdb
                  22761 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122456/gdb/gdb -q
                    22768 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122456/gdb/hell

With the multi-target merge, my push_target and the worker-threads
disabled (more below), I get instead

$ gdb -q -x ~/top-gdb.gdb 
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x197ca44: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x17adf8a: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/cli/cli-cmds.c, line 201.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb.gdb:3: Error in sourced command file:
procfs: couldn't find pid 0 in procinfo list.

and this process tree:

                23011 gdb-9 -q -x top-gdb.gdb
                  23012 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb -q
                    23013 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/hell

However, if I add

b find_procinfo_or_die

to investigate the above error ("couldn't find pid 0), with the mt patch
there's

Setting up the environment for debugging gdb.
Breakpoint 1 at 0x197ca44: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x17adf8a: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/cli/cli-cmds.c, line 201.
Breakpoint 3 at 0x1afc288: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/procfs.c, line 327.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb.gdb:3: Error in sourced command file:
procfs: init_inferior, open_proc_files line 2879, /proc/23022: No such file or directory.
[Switching to Thread 1 (LWP 1)]

Thread 2 hit Breakpoint 3, find_procinfo_or_die (pid=23022, tid=0)
    at /vol/gnu/src/gdb/hg/master/reghunt/gdb/procfs.c:327
327	  procinfo *pi = find_procinfo (pid, tid);

which is no wonder given the child process is marked as defunct, so its
/proc files cannot be opened:

                23020 gdb-9 -q -x top-gdb.gdb
                  23021 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb -q
                    23022 <defunct>

However, when I try the same in the pre-mt-patch gdb:

Setting up the environment for debugging gdb.
Breakpoint 1 at 0x196c898: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x179e138: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/cli/cli-cmds.c, line 201.
Breakpoint 3 at 0x1ae7e26: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/procfs.c, line 325.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
[New LWP    3        ]
[New LWP    4        ]
[New LWP    5        ]
[New LWP    6        ]
[New LWP    7        ]
[New LWP    8        ]
[New LWP    9        ]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb.gdb:3: Error in sourced command file:
procfs: init_inferior, open_proc_files line 2870, /proc/23028: No such file or directory.
[New Thread 2        ]
[New Thread 3        ]
[New Thread 4        ]
[New Thread 5        ]
[New Thread 6        ]
[New Thread 7        ]
[New Thread 8        ]
[New Thread 9        ]
[Switching to Thread 1 (LWP 1)]

Thread 2 hit Breakpoint 3, find_procinfo_or_die (pid=23028, tid=0) at /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/procfs.c:325
325	  procinfo *pi = find_procinfo (pid, tid);

I get the same error and the same defunct process:

                23026 gdb-9 -q -x top-gdb.gdb
                  23027 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122456/gdb/gdb -q
                    23028 <defunct>

This obviously makes debugging extra hard ;-(  However, this error isn't
entirely new: when running the gdb testsuite before the mt merge, I get
several variations of this error

$ grep -a "couldn't find pid" gdb.log |sort|uniq -c
      2 Error in re-setting breakpoint 2: procfs: couldn't find pid 0 in procinfo list.
      2 Error in re-setting breakpoint 5: procfs: couldn't find pid 0 in procinfo list.
     99 procfs: couldn't find pid -1 in procinfo list.
     22 procfs: couldn't find pid 0 in procinfo list.
      5 procfs: couldn't find pid 21415 in procinfo list.
      5 procfs: couldn't find pid 21618 in procinfo list.
     10 procfs: couldn't find pid 22032 in procinfo list.
      5 procfs: couldn't find pid 22457 in procinfo list.
      5 procfs: couldn't find pid 22678 in procinfo list.
     10 procfs: couldn't find pid 22985 in procinfo list.

> running that gdb under gdb itself most often leads to the same
>> error.  This very much seems like a race condition to me, but at the
>> moment I'm pretty much at a loss how to investigate this further.
>
> Could this be a race somehow more exposed now due to GDB now spawning worker
> threads?  What happens if you debug a GDB that doesn't spawn worker
> threads?  Like:
>
> ./gdb -D ./data-directory --args ./gdb -ex "maint set worker-threads 0"

This doesn't work because master gdb cannot debug anything, without or
with the push_target fix.

When instead I use a gdb 9.1 as top gdb, I get

$ gdb-9 -q --args ./gdb -D data-directory -ex "maint set worker-threads 0"
Reading symbols from ./gdb...
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x197ca44: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x17adf8a: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/cli/cli-cmds.c, line 201.
(top-gdb) run
Starting program: /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb can't handle command-line argument containing whitespace

When instead I use

$ cat top-gdb-mt.gdb
file ./gdb-mt
run -q -D data-directory -x bottom-gdb-mt.gdb
$ cat bottom-gdb-mt.gdb
maint set worker-threads 0
file ./hello
b main
run
$ gdb-9 -q -x top-gdb-mt.gdb
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
[New LWP    3        ]
[New LWP    4        ]
[New LWP    5        ]
[New LWP    6        ]
[New LWP    7        ]
[New LWP    8        ]
[New LWP    9        ]
[LWP    8         exited]
[New LWP    8        ]
[LWP    6         exited]
[New LWP    6        ]
[LWP    9         exited]
[New LWP    9        ]
[LWP    5         exited]
[New LWP    5        ]
[LWP    7         exited]
[New LWP    7        ]
[LWP    2         exited]
[New LWP    2        ]
[LWP    3         exited]
[New LWP    3        ]
[LWP    4         exited]
[New LWP    4        ]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb-mt.gdb:4: Error in sourced command file:
procfs: couldn't find pid 0 in procinfo list.

> Does that problem trigger as often that way?

The failure is still reproducible that way, but even more verbose
(imagine that on that 160-core system I spoke of ;-)

To avoid that for the moment, I've changed n_worker_threads to 0 for now.

> Or, what happens if you use master GDB with your push_target fix
> to debug an older GDB?

Master GDB cannot debug anything, unfortunately.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University


More information about the Gdb-patches mailing list