Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

Tue Jun 2 17:14:24 GMT 2020

On 6/2/20 5:30 PM, Petr Sumbera wrote:

> I have modified your change to gdb 9.2 and to correct occurrence (you have added it to second occurrence of 'exited'):
> 
> --- ../../gdb-9.2/gdb/procfs.c.orig     2020-06-02 17:10:32.057735432 +0000
> +++ ../../gdb-9.2/gdb/procfs.c  2020-06-02 18:02:45.496117117 +0000
> @@ -2207,9 +2207,10 @@
>                     if (print_thread_events)
>                       printf_unfiltered (_("[%s exited]\n"),
>                                          target_pid_to_str (retval).c_str ());
> -                   delete_thread (find_thread_ptid (retval));
> -                   status->kind = TARGET_WAITKIND_SPURIOUS;
> -                   return retval;
> +                   thread_info *thr = find_thread_ptid (retval);
> +                   if (thr)
> +                     delete_thread (thr);
> +                   goto wait_again;
>                   }
>                 else if (syscall_is_exit (pi, what))
>                   {
> 
> But this time exited message repeats forever:
> 
> [LWP    24         exited]
> [LWP    24         exited]
> [LWP    24         exited]

Sounds like the LWP is stuck with the status, or the status is
cached.  We probably need to resume the process to move it out
of the syscall, I guess.  There's this bit in the file, at
another spot we call goto wait_again:

	/* How to keep going without returning to wfi: */
	target_continue_no_signal (ptid);
	goto wait_again;

wfi == wait_for_inferior, the name of a function that used
to be pretty core in infrun.c.  Nowadays handle_inferior_event
took the role.

Try doing the same.  Like:

	delete_thread (find_thread_ptid (this, retval));
	target_continue_no_signal (ptid);
	goto wait_again;

You may need to split the delete_thread/find_thread bits, or
you may not.  I'm not sure.

The TARGET_WAITKIND_SPURIOUS handling in infrun.c also
just calls resume(GDB_SIGNAL_0), so I _think_ this will work as
well as before.  I have no idea how this was supposed to handle
the case of an LWP exiting while another one is single
stepping.  Looks like we lose the original single-stepping
request.  Maybe.  Not sure.  But doesn't look like we're
making things any worse.

Thanks,
Pedro Alves