Unbreaking gdb on Solaris post-multitarget [PR 25939]
Rainer Orth
ro@CeBiTec.Uni-Bielefeld.DE
Wed Jun 17 14:45:51 GMT 2020
Hi Pedro,
> On 6/16/20 3:21 PM, Rainer Orth wrote:
>> Some time ago, when testing gdb master on Solaris again after several
>> months, I discovered that gdb couldn't execute even a trivial program
>> anymore. This had gone unnoticed by the Solaris buildbots since the
>> code continued to compile just fine. Those bots are build-only since
>> many tests (especially thread tests) are either flaky or time out.
>>
>> A reghunt identified the multi-target merge as the culprit.
>
> I'm sorry about that.
no worries: the Solaris port had been in a relatively bad shape even
before, so maybe this will allow to get to the bottom of things and fix
them.
>> I've managed to get a bit further with the following patch which is
>> intended to push the procfs target first:
>
> That patch looks good to me.
Thanks.
>> However, while I now get over the initial assertion failure, I run
>> instead into
>>
>> procfs: couldn't find pid 0 in procinfo list.
>> procfs: init_inferior, open_proc_files line 2878, /proc/6031: No such file or directory.
>>
>> When I break in procfs.c (procfs_init_inferior), I can see that
>> create_procinfo succeeds. However, looking at the process tree at this
>> point, I see that the debuggee is still marked as defunct
>>
>> 18377 /vol/gcc/bin/gdb -i=mi /vol/gnu/obj/gdb/gdb/reghunt/no-r
>> 18379 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb
>> 18382 <defunct>
>>
>> so open_procinfo_files fails because /proc/<pid> only contains psinfo
>> and usage, but no ctl file yet.
>>
>> I tried to do the same with a version of gdb from immediately before the
>> multi-target merge: while that can run a test program interactively just
>> fine,
>
> It's not clear to me whether you're saying that a version from before
> the multi-target changes can run a test program fine due to not needing
> the push_target fix, or whether the multi-target patchset itself caused
> this second issue you're observing even when debugging a simple hello
> program.
I've experimented a bit more yesterday. Immediately before the
multi-target patch, I have:
$ cat top-gdb.gdb
file ./gdb
run -q -D data-directory -x bottom-gdb.gdb
$ cat bottom-gdb.gdb
file ./hello
b main
run
$ gdb-9 -q -x top-gdb.gdb
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x196c898: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x179e138: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/cli/cli-cmds.c, line 201.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP 2 ]
[New LWP 3 ]
[New LWP 4 ]
[New LWP 5 ]
[New LWP 6 ]
[New LWP 7 ]
[New LWP 8 ]
[New LWP 9 ]
Breakpoint 1 at 0x401036: file hello.c, line 6.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[Switching to Thread 1 (LWP 1)]
Thread 2 hit Breakpoint 1, main () at hello.c:6
6 printf ("Hello world\n");
At that point the process hierarchy is as expected:
22745 gdb-9 -q -x top-gdb.gdb
22761 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122456/gdb/gdb -q
22768 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122456/gdb/hell
With the multi-target merge, my push_target and the worker-threads
disabled (more below), I get instead
$ gdb -q -x ~/top-gdb.gdb
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x197ca44: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x17adf8a: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/cli/cli-cmds.c, line 201.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb.gdb:3: Error in sourced command file:
procfs: couldn't find pid 0 in procinfo list.
and this process tree:
23011 gdb-9 -q -x top-gdb.gdb
23012 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb -q
23013 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/hell
However, if I add
b find_procinfo_or_die
to investigate the above error ("couldn't find pid 0), with the mt patch
there's
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x197ca44: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x17adf8a: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/cli/cli-cmds.c, line 201.
Breakpoint 3 at 0x1afc288: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/procfs.c, line 327.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb.gdb:3: Error in sourced command file:
procfs: init_inferior, open_proc_files line 2879, /proc/23022: No such file or directory.
[Switching to Thread 1 (LWP 1)]
Thread 2 hit Breakpoint 3, find_procinfo_or_die (pid=23022, tid=0)
at /vol/gnu/src/gdb/hg/master/reghunt/gdb/procfs.c:327
327 procinfo *pi = find_procinfo (pid, tid);
which is no wonder given the child process is marked as defunct, so its
/proc files cannot be opened:
23020 gdb-9 -q -x top-gdb.gdb
23021 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb -q
23022 <defunct>
However, when I try the same in the pre-mt-patch gdb:
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x196c898: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x179e138: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/cli/cli-cmds.c, line 201.
Breakpoint 3 at 0x1ae7e26: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/procfs.c, line 325.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP 2 ]
[New LWP 3 ]
[New LWP 4 ]
[New LWP 5 ]
[New LWP 6 ]
[New LWP 7 ]
[New LWP 8 ]
[New LWP 9 ]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb.gdb:3: Error in sourced command file:
procfs: init_inferior, open_proc_files line 2870, /proc/23028: No such file or directory.
[New Thread 2 ]
[New Thread 3 ]
[New Thread 4 ]
[New Thread 5 ]
[New Thread 6 ]
[New Thread 7 ]
[New Thread 8 ]
[New Thread 9 ]
[Switching to Thread 1 (LWP 1)]
Thread 2 hit Breakpoint 3, find_procinfo_or_die (pid=23028, tid=0) at /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/procfs.c:325
325 procinfo *pi = find_procinfo (pid, tid);
I get the same error and the same defunct process:
23026 gdb-9 -q -x top-gdb.gdb
23027 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122456/gdb/gdb -q
23028 <defunct>
This obviously makes debugging extra hard ;-( However, this error isn't
entirely new: when running the gdb testsuite before the mt merge, I get
several variations of this error
$ grep -a "couldn't find pid" gdb.log |sort|uniq -c
2 Error in re-setting breakpoint 2: procfs: couldn't find pid 0 in procinfo list.
2 Error in re-setting breakpoint 5: procfs: couldn't find pid 0 in procinfo list.
99 procfs: couldn't find pid -1 in procinfo list.
22 procfs: couldn't find pid 0 in procinfo list.
5 procfs: couldn't find pid 21415 in procinfo list.
5 procfs: couldn't find pid 21618 in procinfo list.
10 procfs: couldn't find pid 22032 in procinfo list.
5 procfs: couldn't find pid 22457 in procinfo list.
5 procfs: couldn't find pid 22678 in procinfo list.
10 procfs: couldn't find pid 22985 in procinfo list.
> running that gdb under gdb itself most often leads to the same
>> error. This very much seems like a race condition to me, but at the
>> moment I'm pretty much at a loss how to investigate this further.
>
> Could this be a race somehow more exposed now due to GDB now spawning worker
> threads? What happens if you debug a GDB that doesn't spawn worker
> threads? Like:
>
> ./gdb -D ./data-directory --args ./gdb -ex "maint set worker-threads 0"
This doesn't work because master gdb cannot debug anything, without or
with the push_target fix.
When instead I use a gdb 9.1 as top gdb, I get
$ gdb-9 -q --args ./gdb -D data-directory -ex "maint set worker-threads 0"
Reading symbols from ./gdb...
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x197ca44: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x17adf8a: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/cli/cli-cmds.c, line 201.
(top-gdb) run
Starting program: /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb can't handle command-line argument containing whitespace
When instead I use
$ cat top-gdb-mt.gdb
file ./gdb-mt
run -q -D data-directory -x bottom-gdb-mt.gdb
$ cat bottom-gdb-mt.gdb
maint set worker-threads 0
file ./hello
b main
run
$ gdb-9 -q -x top-gdb-mt.gdb
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP 2 ]
[New LWP 3 ]
[New LWP 4 ]
[New LWP 5 ]
[New LWP 6 ]
[New LWP 7 ]
[New LWP 8 ]
[New LWP 9 ]
[LWP 8 exited]
[New LWP 8 ]
[LWP 6 exited]
[New LWP 6 ]
[LWP 9 exited]
[New LWP 9 ]
[LWP 5 exited]
[New LWP 5 ]
[LWP 7 exited]
[New LWP 7 ]
[LWP 2 exited]
[New LWP 2 ]
[LWP 3 exited]
[New LWP 3 ]
[LWP 4 exited]
[New LWP 4 ]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb-mt.gdb:4: Error in sourced command file:
procfs: couldn't find pid 0 in procinfo list.
> Does that problem trigger as often that way?
The failure is still reproducible that way, but even more verbose
(imagine that on that 160-core system I spoke of ;-)
To avoid that for the moment, I've changed n_worker_threads to 0 for now.
> Or, what happens if you use master GDB with your push_target fix
> to debug an older GDB?
Master GDB cannot debug anything, unfortunately.
Rainer
--
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University
More information about the Gdb-patches
mailing list