Possible regression on gdb.multi/multi-arch-exec.exp (was: Re: [PATCH] Use thread_info and inferior pointers more throughout)

Wed Jun 27 18:16:00 GMT 2018

On Thursday, June 07 2018, Pedro Alves wrote:

> This is more preparation bits for multi-target support.

Hi Pedro,

While preparing a new Fedora GDB rawhide release, I noticed a regression
related to this commit.  The curious thing is that I am only able to
reproduce the regression on a Fedora Rawhide system; it doesn't happen
on my Fedora 27 machine (initially I thought it might be related to GCC,
but testing against GCC HEAD on my Fedora 27 machine also did not
trigger the regression).

The test failing is gdb.multi/multi-arch-exec.exp, and here's what I'm seeing:

  (gdb) break all_started
  Breakpoint 1 at 0x400848: file /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c, line 42.
  (gdb) run 
  Starting program: /home/sergio/build/gdb/testsuite/outputs/gdb.multi/multi-arch-exec/1-multi-arch-exec 
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib64/libthread_db.so.1".
  [New Thread 0x7ffff7476700 (LWP 1354)]

  Thread 1 "1-multi-arch-ex" hit Breakpoint 1, all_started () at /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c:42
  42      }
  (gdb) delete breakpoints
  Delete all breakpoints? (y or n) y
  (gdb) info breakpoints
  No breakpoints or watchpoints.
  (gdb) break main
  Breakpoint 2 at 0x400862: file /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c, line 51.
  (gdb) thread 1
  [Switching to thread 1 (Thread 0x7ffff7fdf740 (LWP 1350))]
  #0  all_started () at /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c:42
  42      }
  (gdb) PASS: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: thread 1
  set follow-exec-mode new
  (gdb) PASS: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: set follow-exec-mode new
  continue
  Continuing.
  [Thread 0x7ffff7476700 (LWP 1354) exited]
  process 1350 is executing new program: /home/sergio/build/gdb/testsuite/outputs/gdb.multi/multi-arch-exec/1-multi-arch-exec-hello
  [New inferior 2 (process 0)]
  [New process 1350]
  ../../binutils-gdb/gdb/target.c:3200: internal-error: gdbarch* default_thread_architecture(target_ops*, ptid_t): Assertion `inf != NULL' failed.
  A problem internal to GDB has been detected,
  further debugging may prove unreliable.
  Quit this debugging session? (y or n) FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: continue across exec that changes architecture (GDB internal error)

I spent some time investigating this, and here's what I've learned so
far:

1) When infrun.c:handle_inferior_event_1 is called and deals with
TARGET_WAITKIND_EXECD (around line 5275), it does:

    ...
    case TARGET_WAITKIND_EXECD:
      if (debug_infrun)
        fprintf_unfiltered (gdb_stdlog, "infrun: TARGET_WAITKIND_EXECD\n");

      /* Note we can't read registers yet (the stop_pc), because we
	 don't yet know the inferior's post-exec architecture.
	 'stop_pc' is explicitly read below instead.  */
      switch_to_thread_no_regs (ecs->event_thread);

      /* Do whatever is necessary to the parent branch of the vfork.  */
      handle_vfork_child_exec_or_exit (1);

      /* This causes the eventpoints and symbol table to be reset.
         Must do this now, before trying to determine whether to
         stop.  */
      follow_exec (inferior_ptid, ecs->ws.value.execd_pathname);   // <---- #1

      stop_pc = regcache_read_pc (get_thread_regcache (ecs->event_thread)); // <---- #2
      ...

2) When follow_exec is called (#1 above), it does:

  ...
  /* The target reports the exec event to the main thread, even if
     some other thread does the exec, and even if the main thread was
     stopped or already gone.  We may still have non-leader threads of
     the process on our list.  E.g., on targets that don't have thread
     exit events (like remote); or on native Linux in non-stop mode if
     there were only two threads in the inferior and the non-leader
     one is the one that execs (and nothing forces an update of the
     thread list up to here).  When debugging remotely, it's best to
     avoid extra traffic, when possible, so avoid syncing the thread
     list with the target, and instead go ahead and delete all threads
     of the process but one that reported the event.  Note this must
     be done before calling update_breakpoints_after_exec, as
     otherwise clearing the threads' resources would reference stale
     thread breakpoints -- it may have been one of these threads that
     stepped across the exec.  We could just clear their stepping
     states, but as long as we're iterating, might as well delete
     them.  Deleting them now rather than at the next user-visible
     stop provides a nicer sequence of events for user and MI
     notifications.  */
  ALL_THREADS_SAFE (th, tmp)
    if (ptid_get_pid (th->ptid) == pid && !ptid_equal (th->ptid, ptid))
      delete_thread (th);
  ...

On my Fedora Rawhide box, delete_thread is being called to delete the
same thread as ecs->event_thread.  On my Fedora 27 machine, it deletes a
different thread.

3) Back to handle_inferior_event_1, when #2 is called, ecs->event_thread
points to an invalid object, which triggers the assertion.

I haven't progressed much further (other things to wrap up), but I
decided to get the ball rolling already.  If you need access to a Fedora
Rawhide VM, please let me know and I can provide this to you.

Thanks,

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/