Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

Petr Sumbera petr.sumbera@oracle.com
Tue Jun 2 16:30:32 GMT 2020


On 02.06.2020 16:53, Pedro Alves wrote:
> On 6/2/20 8:32 AM, Petr Sumbera via Gdb wrote:
>> On 01.06.2020 21:12, Pedro Alves wrote:
>>> On 6/1/20 12:39 PM, Petr Sumbera via Gdb wrote:
>>>> The issue seems to be that the LWP exits and the status->kind is set to TARGET_WAITKIND_SPURIOUS:
>>>>
>>>> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/procfs.c;h=f6c6b0e71c16224d3e7345ca09e011cdcf06349a;hb=HEAD#l2214
>>>>
>>>> But instantly it's added into the list again here:
>>>>
>>>> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/infrun.c;h=95fc3bfe45930b53c33cb4de165db9c070449ad8;hb=HEAD#l5200
>>>>
>>>> But there is no longer such LWP in /proc.
>>>>
>>>> Any suggestion?
>>
>> Thanks for looking at it!
>>
>>> Either:
>>>
>>> - replace TARGET_WAITKIND_SPURIOUS with TARGET_WAITKIND_THREAD_EXITED, or,
>>
>> With this I'm getting:
>>
>> [LWP    21         exited]
>> [LWP    21         exited]
>> /builds/psumbera/userland-gdb-procinfo/components/gdb/gdb-9.2/gdb/thread.c:459: internal-error: void delete_thread_1(thread_info*, bool): Assertion `thr != nullptr' failed.
>> A problem internal to GDB has been detected,
>> further debugging may prove unreliable.
>>
>>> - replace
>>>       status->kind = TARGET_WAITKIND_SPURIOUS;
>>>       return retval;
>>>     with
>>>       goto wait_again;
>>>     instead.
>>
>> and with this:
>>
>> [LWP    20         exited]
>> [LWP    20         exited]
>> /builds/psumbera/userland-gdb-procinfo/components/gdb/gdb-9.2/gdb/thread.c:459: internal-error: void delete_thread_1(thread_info*, bool): Assertion `thr != nullptr' failed.
>> A problem internal to GDB has been detected,
>> further debugging may prove unreliable.
>>
>> -- 
>>
>> Note that in both cases there are TWO exits for one LWP. But LWP numbers differ.
> 
> You mean, it was 21 in one run, and 20 in another run?
> Those were two different runs, and some timing difference
> probably tweaked the order of which thread exits first or
> something.  Doesn't seem unusual.
> 
> Sounds like the patch below would fix it.

Unfortunately no.

> But why do we get two exits in a row for each LWP?  Oh, I guess
> once for PR_SYSENTRY of the exit syscall, and another time for
> PR_SYSEXIT.

Only PR_SYSENTRY is called for my test case (the first occurrence of 
'exited]' - I changed that strings to distinguish between each other).

>  From 0be6c82e754dd676e9f1259ab0f9a7849d985ffd Mon Sep 17 00:00:00 2001
> From: Pedro Alves <pedro@palves.net>
> Date: Tue, 2 Jun 2020 15:44:54 +0100
> Subject: [PATCH] fix-solaris
> 
> ---
>   gdb/procfs.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/gdb/procfs.c b/gdb/procfs.c
> index f6c6b0e71c1..e2042f3edc4 100644
> --- a/gdb/procfs.c
> +++ b/gdb/procfs.c
> @@ -2331,9 +2331,10 @@ procfs_target::wait (ptid_t ptid, struct target_waitstatus *status,
>   		    if (print_thread_events)
>   		      printf_unfiltered (_("[%s exited]\n"),
>   					 target_pid_to_str (retval).c_str ());
> -		    delete_thread (find_thread_ptid (this, retval));
> -		    status->kind = TARGET_WAITKIND_SPURIOUS;
> -		    return retval;
> +		    thread_info *thr = find_thread_ptid (this, retval);
> +		    if (thr != nullptr)
> +		      delete_thread (thr);
> +		    goto wait_again;
>   		  }
>   		else if (0)
>   		  {
> 
> base-commit: f6eee2d098049afd18f90b8f4bb6a5d1a49d900c
> 

I have modified your change to gdb 9.2 and to correct occurrence (you 
have added it to second occurrence of 'exited'):

--- ../../gdb-9.2/gdb/procfs.c.orig     2020-06-02 17:10:32.057735432 +0000
+++ ../../gdb-9.2/gdb/procfs.c  2020-06-02 18:02:45.496117117 +0000
@@ -2207,9 +2207,10 @@
                     if (print_thread_events)
                       printf_unfiltered (_("[%s exited]\n"),
                                          target_pid_to_str 
(retval).c_str ());
-                   delete_thread (find_thread_ptid (retval));
-                   status->kind = TARGET_WAITKIND_SPURIOUS;
-                   return retval;
+                   thread_info *thr = find_thread_ptid (retval);
+                   if (thr)
+                     delete_thread (thr);
+                   goto wait_again;
                   }
                 else if (syscall_is_exit (pi, what))
                   {

But this time exited message repeats forever:

[LWP    24         exited]
[LWP    24         exited]
[LWP    24         exited]
..

---

Petr


More information about the Gdb mailing list