Fw: RE: [PATCH] Fix assert pid != 0 assertion failure in AIX

Simon Marchi simark@simark.ca
Wed Jul 6 17:50:54 GMT 2022


On 7/6/22 00:25, Aditya Vidyadhar Kamath wrote:
>
> Morning Simon.
>
> The reason we were adding one more inferior_ptid!= 0 , condition is the previous condition in the and logic i.e. pid != inferior_ptd.pid() will satisfy as -1 is not equal to 0. [inferior_ptid is set to null before coming into wait]. So, in the next iteration since the process has exited waitpid (), will lead to ERRCHILD error though the current iteration fetched the right pid using waitpid ().
>
> However, we get your point that inferior_ptid should not be used initially [for any condition check till we fetch the pid using waitpid ()] as it is being reset.
>
> Please find attached our modified patch where we do a check of the inferior being in the list. I hope this solution matches to what you suggested.
>
> [See 0001-Fix-gdb_assert-pid-0-assertion-failure-in-AIX.patch file attached to this email]
>
> Have a great day.
>
> Thanks and regards,
> Aditya.
>


> From a1c5ab5338a5d46eab675a85c28a9b00256d395a Mon Sep 17 00:00:00 2001
> From: "aditya@ibm" <aditya.vidyadhar.kamath@ibm.com>
> Date: Tue, 5 Jul 2022 23:05:18 -0500
> Subject: [PATCH] Fix gdb_assert (pid != 0); assertion failure in AIX
>
> ---
>  gdb/aix-thread.c     | 2 ++
>  gdb/rs6000-aix-nat.c | 4 ++--
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/gdb/aix-thread.c b/gdb/aix-thread.c
> index ecd8200b692..e5c287a3fad 100644
> --- a/gdb/aix-thread.c
> +++ b/gdb/aix-thread.c
> @@ -1091,6 +1091,8 @@ aix_thread_target::wait (ptid_t ptid, struct target_waitstatus *status,
>    if (ptid.pid () == -1)
>      return ptid_t (-1);
>
> +  inferior_ptid = ptid;

I get why you are doing this, because the other functions (pd_update and
friends) then use it.  However, it would be nicer to just change them
all to not use inferior_ptid, but take whatever information is needed
through parameters.  See below.

> +
>    /* Check whether libpthdebug might be ready to be initialized.  */
>    if (!pd_active && status->kind () == TARGET_WAITKIND_STOPPED
>        && status->sig () == GDB_SIGNAL_TRAP)
> diff --git a/gdb/rs6000-aix-nat.c b/gdb/rs6000-aix-nat.c
> index 8563aea313a..24071a3742f 100644
> --- a/gdb/rs6000-aix-nat.c
> +++ b/gdb/rs6000-aix-nat.c
> @@ -525,11 +525,11 @@ rs6000_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
>
>  	  /* Claim it exited with unknown signal.  */
>  	  ourstatus->set_signalled (GDB_SIGNAL_UNKNOWN);
> -	  return inferior_ptid;
> +	  return ptid_t(pid);

This is not right, as we return a "signalled" event with a
minus_one_ptid (pid is -1 here).  This is unexpected to the core of GDB,
because a "signalled" event means that some inferior has received a
fatal signal.  So the returned ptid should say which inferior received
the signal.

The code in rs6000_nat_target::wait appears to have been copied from
inf_ptrace_target::wait (the base class of rs6000_nat_target).  In
inf_ptrace_target::wait, that snippet has been changed to return an
"ignore" status in that case, so I suppose we should to that here.  The
change in inf_ptrace_target::wait was done here:

  https://gitlab.com/gnutools/binutils-gdb/-/commit/85e8c48c73a5c39a6980f9b2bd16ec96062fc4c3

See my patch below.

>  	}
>
>        /* Ignore terminated detached child processes.  */
> -      if (!WIFSTOPPED (status) && pid != inferior_ptid.pid ())
> +      if (!WIFSTOPPED (status) && find_inferior_pid(this,pid) == NULL)

I think this is correct.  Just make sure to add spaces where appropriate.
And we prefer nullptr over NULL for new code.  See my patch below.

I managed to build GDB on gcc119, on the GCC compile farm and wrote the
patch below that at least fix things enough to be able to debug a simple
program.  I tried a multi-threaded program, and while gdb did not crash,
I was not able to see the multiple threads, so there's more work to do.
But at least, this should be a good starting point.  Please let me know
what you think.

Simon


>From e9e35416a45a8454dc87cabf9462e6cf4040d088 Mon Sep 17 00:00:00 2001
From: Simon Marchi <simon.marchi@efficios.com>
Date: Wed, 6 Jul 2022 13:39:22 -0400
Subject: [PATCH] gdb: fix {rs6000_nat_target,aix_thread_target}::wait to not
 use inferior_ptid

Trying to run a simple program (empty main) on AIX, I get:

    (gdb) run
    Starting program: /scratch/simark/build/gdb/a.out
    Child process unexpectedly missing: There are no child processes..
    ../../src/binutils-gdb/gdb/inferior.c:304: internal-error: find_inferior_pid: Assertion `pid != 0' failed.
    A problem internal to GDB has been detected,
    further debugging may prove unreliable.
    ----- Backtrace -----
    0x10ef12a8 gdb_internal_backtrace_1()
            ../../src/binutils-gdb/gdb/bt-utils.c:122
    0x10ef1470 gdb_internal_backtrace()
            ../../src/binutils-gdb/gdb/bt-utils.c:168
    0x1004d368 internal_vproblem(internal_problem*, char const*, int, char const*, char*)
            ../../src/binutils-gdb/gdb/utils.c:396
    0x1004d8a8 internal_verror(char const*, int, char const*, char*)
            ../../src/binutils-gdb/gdb/utils.c:476
    0x1004c424 internal_error(char const*, int, char const*, ...)
            ../../src/binutils-gdb/gdbsupport/errors.cc:55
    0x102ab344 find_inferior_pid(process_stratum_target*, int)
            ../../src/binutils-gdb/gdb/inferior.c:304
    0x102ab4a4 find_inferior_ptid(process_stratum_target*, ptid_t)
            ../../src/binutils-gdb/gdb/inferior.c:318
    0x1061bae8 find_thread_ptid(process_stratum_target*, ptid_t)
            ../../src/binutils-gdb/gdb/thread.c:519
    0x10319e98 handle_inferior_event(execution_control_state*)
            ../../src/binutils-gdb/gdb/infrun.c:5532
    0x10315544 fetch_inferior_event()
            ../../src/binutils-gdb/gdb/infrun.c:4221
    0x10952e34 inferior_event_handler(inferior_event_type)
            ../../src/binutils-gdb/gdb/inf-loop.c:41
    0x1032640c infrun_async_inferior_event_handler(void*)
            ../../src/binutils-gdb/gdb/infrun.c:9548
    0x10673188 check_async_event_handlers()
            ../../src/binutils-gdb/gdb/async-event.c:335
    0x1066fce4 gdb_do_one_event()
            ../../src/binutils-gdb/gdbsupport/event-loop.cc:214
    0x10001a94 start_event_loop()
            ../../src/binutils-gdb/gdb/main.c:411
    0x10001ca0 captured_command_loop()
            ../../src/binutils-gdb/gdb/main.c:471
    0x10003d74 captured_main(void*)
            ../../src/binutils-gdb/gdb/main.c:1329
    0x10003e48 gdb_main(captured_main_args*)
            ../../src/binutils-gdb/gdb/main.c:1344
    0x10000744 main
            ../../src/binutils-gdb/gdb/gdb.c:32
    ---------------------
    ../../src/binutils-gdb/gdb/inferior.c:304: internal-error: find_inferior_pid: Assertion `pid != 0' failed.
    A problem internal to GDB has been detected,
    further debugging may prove unreliable.
    Quit this debugging session? (y or n)

This is due to some bit-rot in the AIX port, still relying on the entry
value of inferior_ptid in the wait methods.

Problem #1 is in rs6000_nat_target::wait, here:

      /* Ignore terminated detached child processes.  */
      if (!WIFSTOPPED (status) && pid != inferior_ptid.pid ())
	pid = -1;

At this point, waitpid has returned an "exited" status for some pid, so
pid is non-zero.  Since inferior_ptid is set to null_ptid on entry, the
pid returned by wait is not equal to `inferior_ptid.pid ()`, so we reset
pid to -1 and go to waiting again.  Since there are not more children to
wait for, waitpid then returns -1 so we get here:

      if (pid == -1)
	{
	  gdb_printf (gdb_stderr,
		      _("Child process unexpectedly missing: %s.\n"),
		      safe_strerror (save_errno));

	  /* Claim it exited with unknown signal.  */
	  ourstatus->set_signalled (GDB_SIGNAL_UNKNOWN);
	  return inferior_ptid;
	}

We therefore return a "signalled" status with a null_ptid (again,
inferior_ptid is null_ptid).  This confuses infrun, because if the
target returns a "signalled" status, it should be coupled with a ptid
for an inferior that exists.

So, the first step is to fix the snippets above to not use
inferior_ptid.  In the first snippet, use find_inferior_pid to see if
we know the event process.  If there is no inferior with that pid, we
assume it's a detached child process to we ignore the event.  That
should be enough to fix the problem, because it should make it so we
won't go into the second snippet.  But still, fix the second snippet to
return an "ignore" status.  This is copied from inf_ptrace_target::wait,
which is where rs6000_nat_target::wait appears to be copied from in the
first place.

These changes, are not sufficient, as the aix_thread_target, which sits
on top of rs6000_nat_target, also relies on inferior_ptid.
aix_thread_target::wait, by calling pd_update, assumes that
rs6000_nat_target has set inferior_ptid to the appropriate value (the
ptid of the event thread), but that's not the case.  pd_update
returns inferior_ptid - null_ptid - and therefore
aix_thread_target::wait returns null_ptid too, and we still hit the
assert shown above.

Fix this by changing pd_activate, pd_update, sync_threadlists and
get_signaled_thread to all avoid using inferior_ptid.  Instead, they
accept as a parameter the pid of the process we are working on.

With this patch, I am able to run the program to completion:

    (gdb) r
    Starting program: /scratch/simark/build/gdb/a.out
    [Inferior 1 (process 11010794) exited normally]

As well as break on main:

    (gdb) b main
    Breakpoint 1 at 0x1000036c
    (gdb) r
    Starting program: /scratch/simark/build/gdb/a.out

    Breakpoint 1, 0x1000036c in main ()
    (gdb) c
    Continuing.
    [Inferior 1 (process 26083688) exited normally]

Change-Id: I7c2613bbefe487d75fa1a0c0994423471d961ee9
---
 gdb/aix-thread.c     | 59 +++++++++++++++++++++-----------------------
 gdb/rs6000-aix-nat.c |  7 +++---
 2 files changed, 31 insertions(+), 35 deletions(-)

diff --git a/gdb/aix-thread.c b/gdb/aix-thread.c
index ecd8200b6928..d47f5132592a 100644
--- a/gdb/aix-thread.c
+++ b/gdb/aix-thread.c
@@ -701,14 +701,14 @@ gcmp (const void *t1v, const void *t2v)
    Return 0 if none found.  */

 static pthdb_tid_t
-get_signaled_thread (void)
+get_signaled_thread (int pid)
 {
   struct thrdsinfo64 thrinf;
   tid_t ktid = 0;

   while (1)
     {
-      if (getthrds (inferior_ptid.pid (), &thrinf,
+      if (getthrds (pid, &thrinf,
 		    sizeof (thrinf), &ktid, 1) != 1)
 	break;

@@ -734,9 +734,9 @@ get_signaled_thread (void)
        have difficulty with certain call patterns */

 static void
-sync_threadlists (void)
+sync_threadlists (int pid)
 {
-  int cmd, status, infpid;
+  int cmd, status;
   int pcount, psize, pi, gcount, gi;
   struct pd_thread *pbuf;
   struct thread_info **gbuf, **g, *thread;
@@ -790,8 +790,6 @@ sync_threadlists (void)
   qsort (gbuf, gcount, sizeof *gbuf, gcmp);

   /* Apply differences between the two arrays to GDB's thread list.  */
-
-  infpid = inferior_ptid.pid ();
   for (pi = gi = 0; pi < pcount || gi < gcount;)
     {
       if (pi == pcount)
@@ -808,7 +806,7 @@ sync_threadlists (void)
 	  process_stratum_target *proc_target
 	    = current_inferior ()->process_target ();
 	  thread = add_thread_with_info (proc_target,
-					 ptid_t (infpid, 0, pbuf[pi].pthid),
+					 ptid_t (pid, 0, pbuf[pi].pthid),
 					 priv);

 	  pi++;
@@ -818,7 +816,7 @@ sync_threadlists (void)
 	  ptid_t pptid, gptid;
 	  int cmp_result;

-	  pptid = ptid_t (infpid, 0, pbuf[pi].pthid);
+	  pptid = ptid_t (pid, 0, pbuf[pi].pthid);
 	  gptid = gbuf[gi]->ptid;
 	  pdtid = pbuf[pi].pdtid;
 	  tid = pbuf[pi].tid;
@@ -872,10 +870,11 @@ iter_tid (struct thread_info *thread, void *tidp)

 /* Synchronize libpthdebug's state with the inferior and with GDB,
    generate a composite process/thread <pid> for the current thread,
-   set inferior_ptid to <pid> if SET_INFPID, and return <pid>.  */
+   Return the ptid of the event thread if one can be found, else
+   return a pid-only ptid with PID.  */

 static ptid_t
-pd_update (int set_infpid)
+pd_update (int pid)
 {
   int status;
   ptid_t ptid;
@@ -883,36 +882,33 @@ pd_update (int set_infpid)
   struct thread_info *thread = NULL;

   if (!pd_active)
-    return inferior_ptid;
+    return ptid_t (pid);

   status = pthdb_session_update (pd_session);
   if (status != PTHDB_SUCCESS)
-    return inferior_ptid;
+    return ptid_t (pid);

-  sync_threadlists ();
+  sync_threadlists (pid);

   /* Define "current thread" as one that just received a trap signal.  */

-  tid = get_signaled_thread ();
+  tid = get_signaled_thread (pid);
   if (tid != 0)
     thread = iterate_over_threads (iter_tid, &tid);
   if (!thread)
-    ptid = inferior_ptid;
+    ptid = ptid_t (pid);
   else
-    {
-      ptid = thread->ptid;
-      if (set_infpid)
-	switch_to_thread (thread);
-    }
+    ptid = thread->ptid;
+
   return ptid;
 }

 /* Try to start debugging threads in the current process.
-   If successful and SET_INFPID, set inferior_ptid to reflect the
-   current thread.  */
+   If successful and there exists and we can find an event thread, return a ptid
+   for that thread.  Otherwise, return a ptid-only ptid using PID.  */

 static ptid_t
-pd_activate (int set_infpid)
+pd_activate (int pid)
 {
   int status;
 		
@@ -921,10 +917,10 @@ pd_activate (int set_infpid)
 			       &pd_session);
   if (status != PTHDB_SUCCESS)
     {
-      return inferior_ptid;
+      return ptid_t (pid);
     }
   pd_active = 1;
-  return pd_update (set_infpid);
+  return pd_update (pid);
 }

 /* Undo the effects of pd_activate().  */
@@ -1080,17 +1076,18 @@ aix_thread_target::wait (ptid_t ptid, struct target_waitstatus *status,
 			 target_wait_flags options)
 {
   {
-    scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid);
-
     pid_to_prc (&ptid);

-    inferior_ptid = ptid_t (inferior_ptid.pid ());
     ptid = beneath ()->wait (ptid, status, options);
   }

   if (ptid.pid () == -1)
     return ptid_t (-1);

+  /* The target beneath does not deal with threads, so it should only return
+     pid-only ptids.  */
+  gdb_assert (ptid.is_pid ());
+
   /* Check whether libpthdebug might be ready to be initialized.  */
   if (!pd_active && status->kind () == TARGET_WAITKIND_STOPPED
       && status->sig () == GDB_SIGNAL_TRAP)
@@ -1102,10 +1099,10 @@ aix_thread_target::wait (ptid_t ptid, struct target_waitstatus *status,

       if (regcache_read_pc (regcache)
 	  - gdbarch_decr_pc_after_break (gdbarch) == pd_brk_addr)
-	return pd_activate (0);
+	return pd_activate (ptid.pid ());
     }

-  return pd_update (0);
+  return pd_update (ptid.pid ());
 }

 /* Record that the 64-bit general-purpose registers contain VALS.  */
@@ -1765,7 +1762,7 @@ aix_thread_target::pid_to_str (ptid_t ptid)
   if (!PD_TID (ptid))
     return beneath ()->pid_to_str (ptid);

-  return string_printf (_("Thread %ld"), ptid.tid ());
+  return string_printf (_("Thread %s"), pulongest (ptid.tid ()));
 }

 /* Return a printable representation of extra information about
diff --git a/gdb/rs6000-aix-nat.c b/gdb/rs6000-aix-nat.c
index 8563aea313a2..f604f7d503e9 100644
--- a/gdb/rs6000-aix-nat.c
+++ b/gdb/rs6000-aix-nat.c
@@ -523,13 +523,12 @@ rs6000_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
 		      _("Child process unexpectedly missing: %s.\n"),
 		      safe_strerror (save_errno));

-	  /* Claim it exited with unknown signal.  */
-	  ourstatus->set_signalled (GDB_SIGNAL_UNKNOWN);
-	  return inferior_ptid;
+	  ourstatus->set_ignore ();
+	  return minus_one_ptid;
 	}

       /* Ignore terminated detached child processes.  */
-      if (!WIFSTOPPED (status) && pid != inferior_ptid.pid ())
+      if (!WIFSTOPPED (status) && find_inferior_pid (this, pid) == nullptr)
 	pid = -1;
     }
   while (pid == -1);
-- 
2.36.1



More information about the Gdb-patches mailing list