[RFA] Fix crash on Linux 2.4 when threaded program exits

Joel Brobecker brobecker@adacore.com
Wed Apr 1 18:22:00 GMT 2009


The debugger crashes when debugging a threaded program when the program
exits:

    (gdb) run
    Starting program: /[...]/q 
    [Thread debugging using libthread_db enabled]
    [New Thread 0xb748ebb0 (LWP 9340)]
    [New Thread 0xb728abb0 (LWP 9341)]
    Test2
    Test1
    [Thread 0xb748ebb0 (LWP 9340) exited]
    [Thread 0xb728abb0 (LWP 9341) exited]
    [Thread 0xb75d9b80 (LWP 9337) exited]
    Recursive internal problem.
    zsh: 9330 abort      gdb-head q

It appears that this is only specific to Linux kernels 2.4, and the way
the NPTL behaves on that version of the kernel: With 2.4, we only receive
an "exited" notification for the main thread, whereas with 2.6, we receive
the notification for each and every thread.

What happens in the 2.4 case is that we delete the lp structure for
the thread that exited and then still try to use it shortly after.
At this point, the memory has been free'ed and the contents has been
corrupted. As a result, we hit an internal error that hits another
internal error that causes the abort.

The code in linux-nat.c:linux_nat_filter_event looks like this:

  if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1)
    {
      [delete threads that have vanished]

      exit_lwp (lp);

      /* If there is at least one more LWP, then the exit signal was
         not the end of the debugged application and should be
         ignored.  */
      if (num_lwps > 0)
        return NULL;
    }

As you can see, in the linux-2.4 case, we end up deleting all threads,
then call exit_lwp to delete the main thread. Next we check num_lwps
which is zero, so we continue. Shortly after that, in the same routine,
we already access lp (around line 2717, "lp->ignore_sigint"), but the
symptoms actually appear slightly later when accessing the lp ptid
in order to set the inferior_ptid which is used to get the associated
inferior.

The fix was to delete the lp and return NULL iff there are other
lwps that still exist.

2009-04-01  Joel Brobecker  <brobecker@adacore.com>

        * linux-nat.c (linux_nat_filter_events): Do not delete the lwp if
        this is the last one.

Tested on x86-linux (with a 2.4.21 Linux kernel). It fixes ~25 failures.
Tested on x86_64-linux (with a 2.6 kernel). No regression.

Does this look correct?

Thanks,
-- 
Joel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: threads-24.diff
Type: text/x-diff
Size: 782 bytes
Desc: not available
URL: <http://sourceware.org/pipermail/gdb-patches/attachments/20090401/d7bd6a8d/attachment.bin>


More information about the Gdb-patches mailing list