Hi Hui,
On 07/02/2014 10:07 AM, Pedro Alves wrote:
This issue is introduced by the patches for PR 12702 that was pushed by you.
Maybe you could take a look on this patch.
Thanks. I'll need to think a bit about whether this is the right
solution.
I've added it to the 7.8 regression queue at:
https://sourceware.org/gdb/wiki/GDB_7.8_Release
I'll look better at it as soon as I have a chance.
First, it's really annoying that the testsuite doesn't catch this!
We're really missing such a basic test, so I went ahead and wrote one.
I was actually quite surprised that we didn't have any
test that makes sure "kill" works without complaints/spurious output.
However, unfortunately, the new test revealed that the fix isn't right. :-/
Sometimes the test passed, but other times (that'll depend on
machine, of course), I'd get:
kill
Kill the program being debugged? (y or n) y
Ignoring packet error, continuing...
(gdb) FAIL: gdb.threads/kill.exp: kill
That means GDBserver didn't reply to the vKill packet.
Attaching to GDBserver at this point shows that it's stuck forever
in the new loop:
+ /* Keep call kill_one_lwp_callback until find_inferior cannot find any
+ lwps that is for pid. */
+ while (find_inferior (&all_threads, kill_one_lwp_callback , &pid) != NULL);
The reason was that the leader lwp goes zombie before the other
lwps are reaped, and linux_wait_for_event would delete it. We'd be
left with e.g., only one lwp that is _not_ the leader.
That lwp had disappeared already (due to SIGKILL), but there's
no status to collect (waitpid would return -1). Nothing in linux-low.c
would notice the lwp is already gone. Because that lwp has lwp->stopped == 0,
linux_wait_for_event would return, here:
if ((find_inferior (&all_threads,
not_stopped_callback,
&wait_ptid) == NULL))
{
if (debug_threads)
debug_printf ("LLW: exit (no unwaited-for LWP)\n");
sigprocmask (SIG_SETMASK, &prev_mask, NULL);
return -1;
}
So this would return true:
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -944,7 +944,9 @@ kill_one_lwp_callback (struct inferior_l
pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL);
} while (pid > 0 && WIFSTOPPED (wstat));
- return 0;
+ /* Let find_inferior return because maybe other lwp in the list will
+ be deleted by delete_lwp. */
+ return 1;
}
So the new loop would never break.
Here's a different patch that passes the test cleanly for me.
Let me know if it works for you.
8<-------------------
From ed3e42e5447e0276d6fc759a27408d06e9ccf0b7 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Thu, 10 Jul 2014 15:13:26 +0100
Subject: [PATCH] GDBserver crashes when killing a multi-thread process
Here's an example, with the new test:
gdbserver :9999 gdb.threads/kill
gdb gdb.threads/kill
(gdb) b 52
Breakpoint 1 at 0x4007f4: file kill.c, line 52.
Continuing.
Breakpoint 1, main () at kill.c:52
52 return 0; /* set break here */
(gdb) k
Kill the program being debugged? (y or n) y
gdbserver :9999 gdb.threads/kill
Process gdb.base/watch_thread_num created; pid = 9719
Listening on port 1234
Remote debugging from host 127.0.0.1
Killing all inferiors
Segmentation fault (core dumped)
Backtrace:
(gdb) bt
#0 0x00000000004068a0 in find_inferior (list=0x66b060 <all_threads>, func=0x427637 <kill_one_lwp_callback>, arg=0x7fffffffd3fc) at src/gdb/gdbserver/inferiors.c:199
#1 0x00000000004277b6 in linux_kill (pid=15708) at src/gdb/gdbserver/linux-low.c:966
#2 0x000000000041354d in kill_inferior (pid=15708) at src/gdb/gdbserver/target.c:163
#3 0x00000000004107e9 in kill_inferior_callback (entry=0x6704f0) at src/gdb/gdbserver/server.c:2934
#4 0x0000000000406522 in for_each_inferior (list=0x66b050 <all_processes>, action=0x4107a6 <kill_inferior_callback>) at src/gdb/gdbserver/inferiors.c:57
#5 0x0000000000412377 in process_serial_event () at src/gdb/gdbserver/server.c:3767
#6 0x000000000041267c in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:3880
#7 0x00000000004189ff in handle_file_event (event_file_desc=4) at src/gdb/gdbserver/event-loop.c:434
#8 0x00000000004181c6 in process_event () at src/gdb/gdbserver/event-loop.c:189
#9 0x0000000000418f45 in start_event_loop () at src/gdb/gdbserver/event-loop.c:552
#10 0x0000000000411272 in main (argc=3, argv=0x7fffffffd8d8) at src/gdb/gdbserver/server.c:3283
The problem is that linux_wait_for_event deletes lwps that have exited
(even those not passed in as lwps of interest), while the lwp/thread
list is being walked on with find_inferior. find_inferior can handle
the current iterated inferior being deleted, but not others.
When killing lwps, we don't really care about any of the pending
status handling of linux_wait_for_event. We can just waitpid the lwps
directly, which is also what GDB does (see
linux-nat.c:kill_wait_callback). This way the lwps are not deleted
while we're walking the list. They'll be deleted by linux_mourn
afterwards.
This crash triggers several times when running the testsuite against
GDBserver with the native-gdbserver board (target remote), but as GDB
can't distinguish between GDBserver crashing and "kill" being
sucessful, as in both cases the connection is closed (the 'k' packet
doesn't require a reply), and the inferior is gone, that results in no
FAIL.
The patch adds a generic test that catches the issue with
extended-remote mode (and works fine with native testing too). Here's
how it fails with the native-extended-gdbserver board without the fix:
(gdb) info threads
Id Target Id Frame
6 Thread 15367.15374 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
5 Thread 15367.15373 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
4 Thread 15367.15372 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
3 Thread 15367.15371 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
2 Thread 15367.15370 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
* 1 Thread 15367.15367 main () at .../gdb.threads/kill.c:52
(gdb) kill
Kill the program being debugged? (y or n) y
Remote connection closed
^^^^^^^^^^^^^^^^^^^^^^^^
(gdb) FAIL: gdb.threads/kill.exp: kill
Extended remote should remain connected after the kill.
gdb/gdbserver/
2014-07-10 Pedro Alves <palves@redhat.com>
* linux-low.c (kill_wait_lwp): New function, based on
kill_one_lwp_callback, but use my_waitpid directly.
(kill_one_lwp_callback, linux_kill): Use it.
gdb/testsuite/
2014-07-10 Pedro Alves <palves@redhat.com>
* gdb.threads/kill.c: New file.
* gdb.threads/kill.exp: New file.
---
gdb/gdbserver/linux-low.c | 68 ++++++++++++++++++++-------------
gdb/testsuite/gdb.threads/kill.c | 64 +++++++++++++++++++++++++++++++
gdb/testsuite/gdb.threads/kill.exp | 77 ++++++++++++++++++++++++++++++++++++++
3 files changed, 183 insertions(+), 26 deletions(-)
create mode 100644 gdb/testsuite/gdb.threads/kill.c
create mode 100644 gdb/testsuite/gdb.threads/kill.exp
diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index 61552f4..bdc9aaa 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -909,6 +909,46 @@ linux_kill_one_lwp (struct lwp_info *lwp)
errno ? strerror (errno) : "OK");
}
+/* Kill LWP and wait for it to die. */
+
+static void
+kill_wait_lwp (struct lwp_info *lwp)
+{
+ struct thread_info *thr = get_lwp_thread (lwp);
+ int pid = ptid_get_pid (ptid_of (thr));
+ int lwpid = ptid_get_lwp (ptid_of (thr));
+ int wstat;
+ int res;
+
+ if (debug_threads)
+ debug_printf ("kwl: killing lwp %d, for pid: %d\n", lwpid, pid);
+
+ do
+ {
+ linux_kill_one_lwp (lwp);
+
+ /* Make sure it died. Notes:
+
+ - The loop is most likely unnecessary.
+
+ - We don't use linux_wait_for_event as that could delete lwps
+ while we're iterating over them. We're not interested in
+ any pending status at this point, only in making sure all
+ wait status on the kernel side are collected until the
+ process is reaped.
+
+ - We don't use __WALL here as the __WALL emulation relies on
+ SIGCHLD, and killing a stopped process doesn't generate
+ one, nor an exit status.
+ */
+ res = my_waitpid (lwpid, &wstat, 0);
+ if (res == -1 && errno == ECHILD)
+ res = my_waitpid (lwpid, &wstat, __WCLONE);
+ } while (res > 0 && WIFSTOPPED (wstat));
+
+ gdb_assert (res > 0);
+}
+
/* Callback for `find_inferior'. Kills an lwp of a given process,
except the leader. */
@@ -917,7 +957,6 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args)
{
struct thread_info *thread = (struct thread_info *) entry;
struct lwp_info *lwp = get_thread_lwp (thread);
- int wstat;
int pid = * (int *) args;
if (ptid_get_pid (entry->id) != pid)
@@ -936,14 +975,7 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args)
return 0;
}
- do
- {
- linux_kill_one_lwp (lwp);
-
- /* Make sure it died. The loop is most likely unnecessary. */
- pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL);
- } while (pid > 0 && WIFSTOPPED (wstat));
-
+ kill_wait_lwp (lwp);
return 0;
}
@@ -952,8 +984,6 @@ linux_kill (int pid)
{
struct process_info *process;
struct lwp_info *lwp;
- int wstat;
- int lwpid;
process = find_process_pid (pid);
if (process == NULL)
@@ -976,21 +1006,7 @@ linux_kill (int pid)
pid);
}
else
- {
- struct thread_info *thr = get_lwp_thread (lwp);
-
- if (debug_threads)
- debug_printf ("lk_1: killing lwp %ld, for pid: %d\n",
- lwpid_of (thr), pid);
-
- do
- {
- linux_kill_one_lwp (lwp);
-
- /* Make sure it died. The loop is most likely unnecessary. */
- lwpid = linux_wait_for_event (thr->entry.id, &wstat, __WALL);
- } while (lwpid > 0 && WIFSTOPPED (wstat));
- }
+ kill_wait_lwp (lwp);
the_target->mourn (process);
diff --git a/gdb/testsuite/gdb.threads/kill.c b/gdb/testsuite/gdb.threads/kill.c
new file mode 100644
index 0000000..aefbb06
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/kill.c
@@ -0,0 +1,64 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+ Copyright 2014 Free Software Foundation, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#ifdef USE_THREADS
+
+#include <unistd.h>
+#include <pthread.h>
+
+#define NUM 5
+
+pthread_barrier_t barrier;
+
+void *
+thread_function (void *arg)
+{
+ volatile unsigned int counter = 1;
+
+ pthread_barrier_wait (&barrier);
+
+ while (counter > 0)
+ {
+ counter++;
+ usleep (1);
+ }
+
+ pthread_exit (NULL);
+}
+
+#endif /* USE_THREADS */
+
+void
+setup (void)
+{
+#ifdef USE_THREADS
+ pthread_t threads[NUM];
+ int i;
+
+ pthread_barrier_init (&barrier, NULL, NUM + 1);
+ for (i = 0; i < NUM; i++)
+ pthread_create (&threads[i], NULL, thread_function, NULL);
+ pthread_barrier_wait (&barrier);
+#endif /* USE_THREADS */
+}
+
+int
+main (void)
+{
+ setup ();
+ return 0; /* set break here */
+}
diff --git a/gdb/testsuite/gdb.threads/kill.exp b/gdb/testsuite/gdb.threads/kill.exp
new file mode 100644
index 0000000..a3f921f
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/kill.exp
@@ -0,0 +1,77 @@
+# This testcase is part of GDB, the GNU debugger.
+
+# Copyright 2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+standard_testfile
+
+# Run the test proper. THREADED indicates whether to build a threaded
+# program and spawn several threads before trying to kill the program.
+
+proc test {threaded} {
+ global testfile srcfile
+
+ with_test_prefix [expr ($threaded)?"threaded":"non-threaded"] {
+
+ set options {debug}
+ if {$threaded} {
+ lappend options "pthreads"
+ lappend options "additional_flags=-DUSE_THREADS"
+ set prog ${testfile}_threads
+ } else {
+ set prog ${testfile}_nothreads
+ }
+
+ if {[prepare_for_testing "failed to prepare" $prog $srcfile $options] == -1} {
+ return -1
+ }
+
+ if { ![runto main] } then {
+ fail "run to main"
+ return
+ }
+
+ set linenum [gdb_get_line_number "set break here"]
+ gdb_breakpoint "$srcfile:$linenum"
+ gdb_continue_to_breakpoint "break here" ".*break here.*"
+
+ if {$threaded} {
+ gdb_test "info threads" "6.*5.*4.*3.*2.*1.*" "all threads started"
+ }
+
+ # This kills and ensures no output other than the prompt comes out,
+ # like:
+ #
+ # (gdb) kill
+ # Kill the program being debugged? (y or n) y
+ # (gdb)
+ #
+ # If we instead saw more output, like e.g., with an extended-remote
+ # connection:
+ #
+ # (gdb) kill
+ # Kill the program being debugged? (y or n) y
+ # Remote connection closed
+ # (gdb)
+ #
+ # the above would mean that the remote end crashed.
+
+ gdb_test "kill" "^y" "kill program" "Kill the program being debugged\\? \\(y or n\\) $" "y"
+ }
+}
+
+foreach threaded {true false} {
+ test $threaded
+}