Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

Petr Sumbera petr.sumbera@oracle.com
Wed Jun 3 13:09:14 GMT 2020


On 02.06.2020 19:14, Pedro Alves wrote:
> On 6/2/20 5:30 PM, Petr Sumbera wrote:
> 
>> I have modified your change to gdb 9.2 and to correct occurrence (you have added it to second occurrence of 'exited'):
>>
>> --- ../../gdb-9.2/gdb/procfs.c.orig     2020-06-02 17:10:32.057735432 +0000
>> +++ ../../gdb-9.2/gdb/procfs.c  2020-06-02 18:02:45.496117117 +0000
>> @@ -2207,9 +2207,10 @@
>>                      if (print_thread_events)
>>                        printf_unfiltered (_("[%s exited]\n"),
>>                                           target_pid_to_str (retval).c_str ());
>> -                   delete_thread (find_thread_ptid (retval));
>> -                   status->kind = TARGET_WAITKIND_SPURIOUS;
>> -                   return retval;
>> +                   thread_info *thr = find_thread_ptid (retval);
>> +                   if (thr)
>> +                     delete_thread (thr);
>> +                   goto wait_again;
>>                    }
>>                  else if (syscall_is_exit (pi, what))
>>                    {
>>
>> But this time exited message repeats forever:
>>
>> [LWP    24         exited]
>> [LWP    24         exited]
>> [LWP    24         exited]
> 
> Sounds like the LWP is stuck with the status, or the status is
> cached.  We probably need to resume the process to move it out
> of the syscall, I guess.  There's this bit in the file, at
> another spot we call goto wait_again:
> 
> 	/* How to keep going without returning to wfi: */
> 	target_continue_no_signal (ptid);
> 	goto wait_again;
> 
> wfi == wait_for_inferior, the name of a function that used
> to be pretty core in infrun.c.  Nowadays handle_inferior_event
> took the role.
> 
> Try doing the same.  Like:
> 
> 	delete_thread (find_thread_ptid (this, retval));
> 	target_continue_no_signal (ptid);
> 	goto wait_again;
> 
> You may need to split the delete_thread/find_thread bits, or
> you may not.  I'm not sure.
> 
> The TARGET_WAITKIND_SPURIOUS handling in infrun.c also
> just calls resume(GDB_SIGNAL_0), so I _think_ this will work as
> well as before.  I have no idea how this was supposed to handle
> the case of an LWP exiting while another one is single
> stepping.  Looks like we lose the original single-stepping
> request.  Maybe.  Not sure.  But doesn't look like we're
> making things any worse.

This time it looks very promising. This is gdb 9.2 patch:

--- gdb-9.2/gdb/procfs.c
+++ gdb-9.2/gdb/procfs.c
@@ -2208,8 +2208,8 @@
                       printf_unfiltered (_("[%s exited]\n"),
                                          target_pid_to_str 
(retval).c_str ());
                     delete_thread (find_thread_ptid (retval));
-                   status->kind = TARGET_WAITKIND_SPURIOUS;
-                   return retval;
+                   target_continue_no_signal (ptid);
+                   goto wait_again;
                   }
                 else if (syscall_is_exit (pi, what))
                   {


This works for few test cases. And I actually started gdb tests to see 
if it makes any regression (but it might take some time to run it though).

But in one particular case it returns following:

..
[LWP    33         exited1]
[LWP    31         exited1]
[LWP    32         exited1]
[LWP    28         exited1]
[LWP    30         exited1]
[LWP    2         exited1]
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
satisfy query
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
satisfy query
(gdb)

It might be related...

Thank you very much!

Petr


More information about the Gdb mailing list