This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] GDBserver crashes when killing a multi-thread process


Hi Pedro,

Thanks for your patch.  It fixed the issue in my part.

Best,
Hui

On 07/10/14 23:16, Pedro Alves wrote:
Hi Hui,

On 07/02/2014 10:07 AM, Pedro Alves wrote:
This issue is introduced by the patches for PR 12702 that was pushed by you.
Maybe you could take a look on this patch.

Thanks.  I'll need to think a bit about whether this is the right
solution.

I've added it to the 7.8 regression queue at:

  https://sourceware.org/gdb/wiki/GDB_7.8_Release

I'll look better at it as soon as I have a chance.


First, it's really annoying that the testsuite doesn't catch this!
We're really missing such a basic test, so I went ahead and wrote one.
I was actually quite surprised that we didn't have any
test that makes sure "kill" works without complaints/spurious output.

However, unfortunately, the new test revealed that the fix isn't right.  :-/

Sometimes the test passed, but other times (that'll depend on
machine, of course), I'd get:

  kill
  Kill the program being debugged? (y or n) y
  Ignoring packet error, continuing...
  (gdb) FAIL: gdb.threads/kill.exp: kill

That means GDBserver didn't reply to the vKill packet.

Attaching to GDBserver at this point shows that it's stuck forever
in the new loop:


  +  /* Keep call kill_one_lwp_callback until find_inferior cannot find any
  +     lwps that is for pid.  */
  +  while (find_inferior (&all_threads, kill_one_lwp_callback , &pid) != NULL);


The reason was that the leader lwp goes zombie before the other
lwps are reaped, and linux_wait_for_event would delete it.  We'd be
left with e.g., only one lwp that is _not_ the leader.

That lwp had disappeared already (due to SIGKILL), but there's
no status to collect (waitpid would return -1).  Nothing in linux-low.c
would notice the lwp is already gone.  Because that lwp has lwp->stopped == 0,
linux_wait_for_event would return, here:

       if ((find_inferior (&all_threads,
			  not_stopped_callback,
			  &wait_ptid) == NULL))
	{
	  if (debug_threads)
	    debug_printf ("LLW: exit (no unwaited-for LWP)\n");
	  sigprocmask (SIG_SETMASK, &prev_mask, NULL);
	  return -1;
	}

So this would return true:

  --- a/gdb/gdbserver/linux-low.c
  +++ b/gdb/gdbserver/linux-low.c
  @@ -944,7 +944,9 @@ kill_one_lwp_callback (struct inferior_l
        pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL);
      } while (pid > 0 && WIFSTOPPED (wstat));

  -  return 0;
  +  /* Let find_inferior return because maybe other lwp in the list will
  +     be deleted by delete_lwp.  */
  +  return 1;
   }

So the new loop would never break.

Here's a different patch that passes the test cleanly for me.
Let me know if it works for you.

8<-------------------
 From ed3e42e5447e0276d6fc759a27408d06e9ccf0b7 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Thu, 10 Jul 2014 15:13:26 +0100
Subject: [PATCH] GDBserver crashes when killing a multi-thread process

Here's an example, with the new test:

  gdbserver :9999 gdb.threads/kill
  gdb gdb.threads/kill
  (gdb) b 52
  Breakpoint 1 at 0x4007f4: file kill.c, line 52.
  Continuing.

  Breakpoint 1, main () at kill.c:52
  52        return 0; /* set break here */
  (gdb) k
  Kill the program being debugged? (y or n) y

  gdbserver :9999 gdb.threads/kill
  Process gdb.base/watch_thread_num created; pid = 9719
  Listening on port 1234
  Remote debugging from host 127.0.0.1
  Killing all inferiors
  Segmentation fault (core dumped)

Backtrace:

  (gdb) bt
  #0  0x00000000004068a0 in find_inferior (list=0x66b060 <all_threads>, func=0x427637 <kill_one_lwp_callback>, arg=0x7fffffffd3fc) at src/gdb/gdbserver/inferiors.c:199
  #1  0x00000000004277b6 in linux_kill (pid=15708) at src/gdb/gdbserver/linux-low.c:966
  #2  0x000000000041354d in kill_inferior (pid=15708) at src/gdb/gdbserver/target.c:163
  #3  0x00000000004107e9 in kill_inferior_callback (entry=0x6704f0) at src/gdb/gdbserver/server.c:2934
  #4  0x0000000000406522 in for_each_inferior (list=0x66b050 <all_processes>, action=0x4107a6 <kill_inferior_callback>) at src/gdb/gdbserver/inferiors.c:57
  #5  0x0000000000412377 in process_serial_event () at src/gdb/gdbserver/server.c:3767
  #6  0x000000000041267c in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:3880
  #7  0x00000000004189ff in handle_file_event (event_file_desc=4) at src/gdb/gdbserver/event-loop.c:434
  #8  0x00000000004181c6 in process_event () at src/gdb/gdbserver/event-loop.c:189
  #9  0x0000000000418f45 in start_event_loop () at src/gdb/gdbserver/event-loop.c:552
  #10 0x0000000000411272 in main (argc=3, argv=0x7fffffffd8d8) at src/gdb/gdbserver/server.c:3283

The problem is that linux_wait_for_event deletes lwps that have exited
(even those not passed in as lwps of interest), while the lwp/thread
list is being walked on with find_inferior.  find_inferior can handle
the current iterated inferior being deleted, but not others.

When killing lwps, we don't really care about any of the pending
status handling of linux_wait_for_event.  We can just waitpid the lwps
directly, which is also what GDB does (see
linux-nat.c:kill_wait_callback).  This way the lwps are not deleted
while we're walking the list.  They'll be deleted by linux_mourn
afterwards.

This crash triggers several times when running the testsuite against
GDBserver with the native-gdbserver board (target remote), but as GDB
can't distinguish between GDBserver crashing and "kill" being
sucessful, as in both cases the connection is closed (the 'k' packet
doesn't require a reply), and the inferior is gone, that results in no
FAIL.

The patch adds a generic test that catches the issue with
extended-remote mode (and works fine with native testing too).  Here's
how it fails with the native-extended-gdbserver board without the fix:

  (gdb) info threads
    Id   Target Id         Frame
    6    Thread 15367.15374 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
    5    Thread 15367.15373 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
    4    Thread 15367.15372 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
    3    Thread 15367.15371 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
    2    Thread 15367.15370 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
  * 1    Thread 15367.15367 main () at .../gdb.threads/kill.c:52
  (gdb) kill
  Kill the program being debugged? (y or n) y
  Remote connection closed
  ^^^^^^^^^^^^^^^^^^^^^^^^
  (gdb) FAIL: gdb.threads/kill.exp: kill

Extended remote should remain connected after the kill.

gdb/gdbserver/
2014-07-10  Pedro Alves  <palves@redhat.com>

	* linux-low.c (kill_wait_lwp): New function, based on
	kill_one_lwp_callback, but use my_waitpid directly.
	(kill_one_lwp_callback, linux_kill): Use it.

gdb/testsuite/
2014-07-10  Pedro Alves  <palves@redhat.com>

	* gdb.threads/kill.c: New file.
	* gdb.threads/kill.exp: New file.
---
  gdb/gdbserver/linux-low.c          | 68 ++++++++++++++++++++-------------
  gdb/testsuite/gdb.threads/kill.c   | 64 +++++++++++++++++++++++++++++++
  gdb/testsuite/gdb.threads/kill.exp | 77 ++++++++++++++++++++++++++++++++++++++
  3 files changed, 183 insertions(+), 26 deletions(-)
  create mode 100644 gdb/testsuite/gdb.threads/kill.c
  create mode 100644 gdb/testsuite/gdb.threads/kill.exp

diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index 61552f4..bdc9aaa 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -909,6 +909,46 @@ linux_kill_one_lwp (struct lwp_info *lwp)
  		  errno ? strerror (errno) : "OK");
  }

+/* Kill LWP and wait for it to die.  */
+
+static void
+kill_wait_lwp (struct lwp_info *lwp)
+{
+  struct thread_info *thr = get_lwp_thread (lwp);
+  int pid = ptid_get_pid (ptid_of (thr));
+  int lwpid = ptid_get_lwp (ptid_of (thr));
+  int wstat;
+  int res;
+
+  if (debug_threads)
+    debug_printf ("kwl: killing lwp %d, for pid: %d\n", lwpid, pid);
+
+  do
+    {
+      linux_kill_one_lwp (lwp);
+
+      /* Make sure it died.  Notes:
+
+	 - The loop is most likely unnecessary.
+
+         - We don't use linux_wait_for_event as that could delete lwps
+           while we're iterating over them.  We're not interested in
+           any pending status at this point, only in making sure all
+           wait status on the kernel side are collected until the
+           process is reaped.
+
+	 - We don't use __WALL here as the __WALL emulation relies on
+	   SIGCHLD, and killing a stopped process doesn't generate
+	   one, nor an exit status.
+      */
+      res = my_waitpid (lwpid, &wstat, 0);
+      if (res == -1 && errno == ECHILD)
+	res = my_waitpid (lwpid, &wstat, __WCLONE);
+    } while (res > 0 && WIFSTOPPED (wstat));
+
+  gdb_assert (res > 0);
+}
+
  /* Callback for `find_inferior'.  Kills an lwp of a given process,
     except the leader.  */

@@ -917,7 +957,6 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args)
  {
    struct thread_info *thread = (struct thread_info *) entry;
    struct lwp_info *lwp = get_thread_lwp (thread);
-  int wstat;
    int pid = * (int *) args;

    if (ptid_get_pid (entry->id) != pid)
@@ -936,14 +975,7 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args)
        return 0;
      }

-  do
-    {
-      linux_kill_one_lwp (lwp);
-
-      /* Make sure it died.  The loop is most likely unnecessary.  */
-      pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL);
-    } while (pid > 0 && WIFSTOPPED (wstat));
-
+  kill_wait_lwp (lwp);
    return 0;
  }

@@ -952,8 +984,6 @@ linux_kill (int pid)
  {
    struct process_info *process;
    struct lwp_info *lwp;
-  int wstat;
-  int lwpid;

    process = find_process_pid (pid);
    if (process == NULL)
@@ -976,21 +1006,7 @@ linux_kill (int pid)
  		      pid);
      }
    else
-    {
-      struct thread_info *thr = get_lwp_thread (lwp);
-
-      if (debug_threads)
-	debug_printf ("lk_1: killing lwp %ld, for pid: %d\n",
-		      lwpid_of (thr), pid);
-
-      do
-	{
-	  linux_kill_one_lwp (lwp);
-
-	  /* Make sure it died.  The loop is most likely unnecessary.  */
-	  lwpid = linux_wait_for_event (thr->entry.id, &wstat, __WALL);
-	} while (lwpid > 0 && WIFSTOPPED (wstat));
-    }
+    kill_wait_lwp (lwp);

    the_target->mourn (process);

diff --git a/gdb/testsuite/gdb.threads/kill.c b/gdb/testsuite/gdb.threads/kill.c
new file mode 100644
index 0000000..aefbb06
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/kill.c
@@ -0,0 +1,64 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2014 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#ifdef USE_THREADS
+
+#include <unistd.h>
+#include <pthread.h>
+
+#define NUM 5
+
+pthread_barrier_t barrier;
+
+void *
+thread_function (void *arg)
+{
+  volatile unsigned int counter = 1;
+
+  pthread_barrier_wait (&barrier);
+
+  while (counter > 0)
+    {
+      counter++;
+      usleep (1);
+    }
+
+  pthread_exit (NULL);
+}
+
+#endif /* USE_THREADS */
+
+void
+setup (void)
+{
+#ifdef USE_THREADS
+  pthread_t threads[NUM];
+  int i;
+
+  pthread_barrier_init (&barrier, NULL, NUM + 1);
+  for (i = 0; i < NUM; i++)
+    pthread_create (&threads[i], NULL, thread_function, NULL);
+  pthread_barrier_wait (&barrier);
+#endif /* USE_THREADS */
+}
+
+int
+main (void)
+{
+  setup ();
+  return 0; /* set break here */
+}
diff --git a/gdb/testsuite/gdb.threads/kill.exp b/gdb/testsuite/gdb.threads/kill.exp
new file mode 100644
index 0000000..a3f921f
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/kill.exp
@@ -0,0 +1,77 @@
+# This testcase is part of GDB, the GNU debugger.
+
+# Copyright 2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+standard_testfile
+
+# Run the test proper.  THREADED indicates whether to build a threaded
+# program and spawn several threads before trying to kill the program.
+
+proc test {threaded} {
+    global testfile srcfile
+
+    with_test_prefix [expr ($threaded)?"threaded":"non-threaded"] {
+
+	set options {debug}
+	if {$threaded} {
+	    lappend options "pthreads"
+	    lappend options "additional_flags=-DUSE_THREADS"
+	    set prog ${testfile}_threads
+	} else {
+	    set prog ${testfile}_nothreads
+	}
+
+	if {[prepare_for_testing "failed to prepare" $prog $srcfile $options] == -1} {
+	    return -1
+	}
+
+	if { ![runto main] } then {
+	    fail "run to main"
+	    return
+	}
+
+	set linenum [gdb_get_line_number "set break here"]
+	gdb_breakpoint "$srcfile:$linenum"
+	gdb_continue_to_breakpoint "break here" ".*break here.*"
+
+	if {$threaded} {
+	    gdb_test "info threads" "6.*5.*4.*3.*2.*1.*" "all threads started"
+	}
+
+	# This kills and ensures no output other than the prompt comes out,
+	# like:
+	#
+	#  (gdb) kill
+	#  Kill the program being debugged? (y or n) y
+	#  (gdb)
+	#
+	# If we instead saw more output, like e.g., with an extended-remote
+	# connection:
+	#
+	#  (gdb) kill
+	#  Kill the program being debugged? (y or n) y
+	#  Remote connection closed
+	#  (gdb)
+	#
+	# the above would mean that the remote end crashed.
+
+	gdb_test "kill" "^y" "kill program" "Kill the program being debugged\\? \\(y or n\\) $" "y"
+    }
+}
+
+foreach threaded {true false} {
+    test $threaded
+}



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]