This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFH: failed assert debugging threaded+fork program over gdbserver


Hello,

I have noticed the following problem, when debugging a program which
uses both threads and fork. The program is attached in copy, and
it was compiled by simply doing:

    % gnatmake -g a_test

The issue appears only randomly, but it seems to show up fairly
reliably when using certain versions of GNU/Linux such as RHES7,
or WRSLinux. I also see it on Ubuntu, but less reliably. Here is
what I have found, debugging on WRSLinux (we set it up as a cross,
but it should be the same with native GNU/Linux distros):

    % gdb a_test
    (gdb) break a_test.adb:30
    (gdb) break a_test.adb:39
    (gdb) target remote my_board:4444
    (gdb) continue
    Continuing.
    [...]
    [New Thread 866.868]
    [New Thread 866.869]
    [New Thread 870.870]
    /[...]/gdb/thread.c:89: internal-error: thread_info* inferior_thread(): Assertion `tp' failed.
    A problem internal to GDB has been detected,
    further debugging may prove unreliable.
    Quit this debugging session? (y or n) 

The error happens because GDBserver returns a list of threads
to GDB where a new thread as a different PID (870 in the case
above, instead of 866).

What this does is that it makes remote_notice_new_inferior
think that there is a new inferior (which is actually true,
in fact), thus causing it to call remote_add_inferior, which
does the following:

      /* In the traditional debugging scenario, there's a 1-1 match
         between program/address spaces.  We simply bind the inferior
         to the program space's address space.  */
      inf = current_inferior ();
      inferior_appeared (inf, pid);

These two lines cause the PID of the current inferior, to be changed
to the PID of the new process (from the fork). This is where I *think*
we're making the mistake; see below...

However, remote_notice_new_inferior also calls notice_new_inferior
a few lines later, which starts by setting up a cleanup to restoreu
the current thread:

  if (!ptid_equal (inferior_ptid, null_ptid))
    make_cleanup_restore_current_thread ();

At that point in time, the current thread, which is the thread
that receveived the event, is one of the thread belonging to
the original inferior (pid=866). That's what we setup to restore.
And unfortunately, the restoration does not go according to plan,
because our inferior list now still has one inferior in it, except
that its PID is no longer 866, but rather 870. If we look at
thread.c::do_restore_current_thread_cleanup, we see:

    tp = find_thread_ptid (old->inferior_ptid);

    /* If the previously selected thread belonged to a process that has
       in the mean time been deleted (due to normal exit, detach, etc.),
       then don't revert back to it, but instead simply drop back to no
       thread selected.  */
    if (tp
        && find_inferior_ptid (tp->ptid) != NULL)
      restore_current_thread (old->inferior_ptid);
    else
      {
        restore_current_thread (null_ptid);
        set_current_inferior (find_inferior_id (old->inf_id));
      }

In our case, find_inferior_ptid no longer finds an inferior with
the old PID, and so we go into the else branch, causing us to
set the inferior_ptid to the null_ptid. This causes problems
a little later, when doing a normal_stop:

    /* Notify observers about the stop.  This is where the interpreters
       print the stop event.  */
    if (!ptid_equal (inferior_ptid, null_ptid))
      observer_notify_normal_stop (inferior_thread ()->control.stop_bpstat,
                                   stop_print_frame);
    else
      observer_notify_normal_stop (NULL, stop_print_frame);

In our case, we're in the "else" branch. This leads to cli_on_normal_stop,
which calls print_stop_event -> print_stop_location, which starts by
calling inferior_thread:

  struct thread_info *tp = inferior_thread ();

And looking at inferior_thread, we see:

  struct thread_info *tp = find_thread_ptid (inferior_ptid);
  gdb_assert (tp);

We trip the assertion before inferior_ptid is the null_ptid.

At first sight, I think that the main problem is that we muck the
current_inferior's pid when we really shouldn't. I'm not really sure
how the new PID should be handled though, which is why I'm asking
for advice here.

I think it also unearthed a secondary issue - looks like normal_stop
really isn't prepared to handle a null inferior_ptid, even though
the fact that we call it after having checked that inferior_ptid is
null indicates that we should. But what does it mean, to be showing
where we stopped, when we don't know which thread caused the stop???
I think discussing this separately would be best, but I wanted to
mention it here, so it doesn't get overlooked.

Any advice on how I should be fixing the issue?

Thanks!
-- 
Joel

Attachment: a_test.adb
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]