[PATCH][gdb] Fix hang after ext sigkill
Tom de Vries
tdevries@suse.de
Tue Apr 21 12:38:17 GMT 2020
On 16-04-2020 15:28, Pedro Alves wrote:
> Hi,
>
> Sorry for the delay, and thanks much for working on this.
>
> On 3/25/20 3:51 PM, Tom de Vries wrote:
>> On 25-03-2020 15:44, Simon Marchi wrote:
>>> On 2020-03-25 6:29 a.m., Tom de Vries wrote:
>>>> Here's the updated patch.
>>> Thanks. Some comments about the test:
>>>
>>> - Please add a comment at the top to describe briefly what this is testing.
>>> - Please replace the infinite loops with bounded ones (e.g. for (i = 0; i < 300; i++)),
>>> so that the test program eventually exits if something goes wrong and it is allowed to run
>>> freely.
>> Done.
>> 0001-gdb-Fix-hang-after-ext-sigkill.patch
>>
>> [gdb] Fix hang after ext sigkill
>>
>> Consider the test-case from this patch, compiled with pthread support:
>> ...
>> $ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread
>> ...
>>
>> After running, the program sleeps:
>> ...
>> $ gdb a.out
>> Reading symbols from a.out...
>> (gdb) r
>> Starting program: /data/gdb_versions/devel/a.out
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> [New Thread 0x7ffff77fe700 (LWP 22604)]
>> ...
>>
>> Until we interrupt it with a control-C:
>> ...
>> ^C
>> Thread 1 "a.out" received signal SIGINT, Interrupt.
>> 0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6
>> (gdb)
>> ...
>>
>> If we then kill the inferior using an external SIGKILL:
>> ...
>> (gdb) shell killall -s SIGKILL a.out
>> ...
>> and subsequently continue:
>> ...
>> (gdb) c
>> Continuing.
>> Couldn't get registers: No such process.
>> Couldn't get registers: No such process.
>> (gdb) Couldn't get registers: No such process.
>> (gdb) Couldn't get registers: No such process.
>> (gdb) Couldn't get registers: No such process.
>> <repeat>
>> ...
>> gdb hangs repeating the same warning. Typing control-C no longer helps,
>> and we have to kill gdb.
>>
>> This is a regression since commit 873657b9e8 "Preserve selected thread in
>> all-stop w/ background execution". The commit adds a
>> scoped_restore_current_thread typed variable restore_thread to
>> fetch_inferior_event, and the hang is caused by the constructor throwing an
>> exception.
>>
>> Fix this by catching the exception in the constructor.
>>
>> Build and reg-tested on x86_64-linux.
>>
>> gdb/ChangeLog:
>>
>> 2020-02-24 Tom de Vries <tdevries@suse.de>
>>
>> PR gdb/25471
>> * thread.c
>> (scoped_restore_current_thread::scoped_restore_current_thread): Catch
>> exception in get_frame_id.
>>
>> gdb/testsuite/ChangeLog:
>>
>> 2020-02-24 Tom de Vries <tdevries@suse.de>
>>
>> PR gdb/25471
>> * gdb.threads/hang-after-ext-sigkill.c: New test.
>> * gdb.threads/hang-after-ext-sigkill.exp: New file.
>
> "hang-after-ext-sigkill" is named in terms of how the
> bug manifested (a hang), but once the bug is fixed, it won't
> be obvious to remember to look for "hang" when someone goes
> look for a testcase related to a process being killed outside of
> gdb's control. Plus, then testcase may be extended in the future
> for related bugs that do not cause a hang.
>
> There's a gdb.base/killed-outside.exp testcase already exactly for
> this sort of issue -- the testcase does the same thing with killing
> with SIGKILL from outside, and then making sure that GDB
> behaves. I'd rather this new testcase was given the same or
> a similar name, so that e.g. 'make check TESTS="*/*killed-outside*.exp"'
> runs it too.
Ack, test-case renamed to gdb.threads/killed-outside.{exp,c}.
> Or maybe merge the testcases, though it's useful
> to run the existing one on non-threaded environments too.
> But I'm not sure this one needs to be threaded at all. Won't
> we see the failure to read registers with a single-threaded program
> too?
>
I was not able to reproduce the hang with a single-threaded program.
>> * lib/gdb.exp (runto): Handle "Temporary breakpoint" string.
>>
>> ---
>> gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 43 +++++++++++
>> .../gdb.threads/hang-after-ext-sigkill.exp | 88 ++++++++++++++++++++++
>> gdb/testsuite/lib/gdb.exp | 2 +-
>> gdb/thread.c | 12 ++-
>> 4 files changed, 142 insertions(+), 3 deletions(-)
>>
>> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
>> new file mode 100644
>> index 0000000000..b93d6c644a
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
>> @@ -0,0 +1,43 @@
>> +/* This testcase is part of GDB, the GNU debugger.
>> +
>> + Copyright 2020 Free Software Foundation, Inc.
>> +
>> + This program is free software; you can redistribute it and/or modify
>> + it under the terms of the GNU General Public License as published by
>> + the Free Software Foundation; either version 3 of the License, or
>> + (at your option) any later version.
>> +
>> + This program is distributed in the hope that it will be useful,
>> + but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + GNU General Public License for more details.
>> +
>> + You should have received a copy of the GNU General Public License
>> + along with this program. If not, see <http://www.gnu.org/licenses/>. */
>> +
>> +#include <pthread.h>
>> +#include <unistd.h>
>> +
>> +static void *
>> +fun (void *dummy)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < 300; i++)
>> + sleep (1);
>> +
>> + return NULL;
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> + int i;
>> + pthread_t thread;
>> + pthread_create (&thread, NULL, fun, NULL);
>> +
>> + for (i = 0; i < 300; i++)
>> + sleep (1);
>> +
>> + return 0;
>> +}
>> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
>> new file mode 100644
>> index 0000000000..89b38b1f6c
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
>> @@ -0,0 +1,88 @@
>> +# Copyright (C) 2020 Free Software Foundation, Inc.
>> +
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program. If not, see <http://www.gnu.org/licenses/>.
>> +
>> +# This test-case tests that continuing an inferior that has been killed
>> +# using an external sigkill does not make gdb hang.
>> +
>> +standard_testfile
>> +
>> +if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
>> + {pthreads}] == -1} {
>> + return -1
>> +}
>> +
>> +set res [runto main no-message temporary]
>> +if { $res != 1 } {
>> + return -1
>> +}
>> +
>> +set pid -1
>> +gdb_test_multiple "info inferior 1" "get inferior pid" {
>> + -re -wrap "process (\[0-9\]*).*" {
>> + set pid $expect_out(1,string)
>> + pass $gdb_test_name
>> + }
>> +}
>
> This won't work with remote targets that don't support
> the process extensions, since on that case, you'll
> get "Remote target" instead of "process $PID". See
> remote_target::pid_to_str. Likewise probably other
> targets. See gdb.base/killed-outside.exp.
>
>
Ack, updated accordingly.
>> +if { $pid == -1 } {
>> + return -1
>> +}
>> +
>> +gdb_test_multiple "continue" "" {
>> + -re "Continuing" {
>> + pass $gdb_test_name
>> + }
>> +}
>> +
>> +send_gdb "\003"
>> +
>> +gdb_test_multiple "" "get sigint" {
>> + -re -wrap "received signal SIGINT, Interrupt\..*" {
>> + pass $gdb_test_name
>> + }
>> +}
>> +
>
>
> I don't think interrupting with Ctrl-C is really important,
> compared to e.g., running to a breakpoint. I'd prefer to run
> to a breakpoint instead. E.g., call a all_started(); function
> after the child thread is spawned and after a pthread_barrier_wait
> call, to make sure the child was scheduled.
> See e.g., gdb.threads/async.c. Then all you need is to
> "runto all_started" instead of runto_main.
>
Done.
>> +gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid"
>
> This will always kill a process on the host gdb is running on, which
> of course does wrong the thing in cross scenarios. So this should
> do instead:
>
> remote_exec target "kill -9 ${testpid}"
>
> and sleeping a bit is a good idea to make sure the kill is actually
> scheduled and does its thing before gdb does the "continue".
> See gdb.base/killed-outside.exp.
>
>
Ack, done.
>> +
>> +set no_such_process_msg "Couldn't get registers: No such process\."
>> +set killed_msg "Program terminated with signal SIGKILL, Killed\."
>> +set no_longer_exists_msg "The program no longer exists\."
>> +set not_being_run_msg "The program is not being run\."
>> +
>> +gdb_test_multiple "continue" "prompt after first continue" {
>> + -re "Continuing\.\r\n\r\n$killed_msg\r\n$no_longer_exists_msg\r\n$gdb_prompt $" {
>> + pass $gdb_test_name
>> + # Regular output, bug condition was not triggered, we're done.
>> + return -1
>> + }
>> + -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " {
>> + pass $gdb_test_name
>> + # Two times $no_such_process_msg. The bug condition was triggered, go
>> + # check for it.
>> + }
>> + -re "Continuing\.\r\n$no_such_process_msg\r\n$gdb_prompt $" {
>> + pass $gdb_test_name
>> + # One time $no_such_process_msg. We're stuck here. The bug condition
>> + # was not triggered, but we're not getting correct gdb behaviour either:
>> + # every subsequent continue produces one no_such_process_msg. Give up.
>> + return -1
>
> I'm confused here -- the comment says we're not getting correct behavior,
> but this won't result in any FAIL?
>
Um, yes. I made that decision because I was trying to trigger another
scenario. But agreed, it's debatable. Anyway, it also doesn't matter
anymore since rewriting the test to stop at a breakpoint eliminated the
need for this. The scenario I was trying to trigger now always reproduces.
>> + }
>> +}
>> +
>> +gdb_test_multiple "" "messages" {
>> + -re ".*$killed_msg.*$no_longer_exists_msg\r\n" {
>> + pass $gdb_test_name
>> + gdb_test "continue" $not_being_run_msg "second continue"
>> + }
>> +}
>
> It isn't obvious to me why put this one separately instead of
> nested within the pass case in the other gdb_test_multiple above.
> Is this also meant to run if the previous gdb_test_multiple fails
> due to an internal gdb_test_multiple case being hit,
> like the default gdb_prompt match?
>
Fixed.
>
>> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
>> index e17ac0ef75..4cf2beca00 100644
>> --- a/gdb/testsuite/lib/gdb.exp
>> +++ b/gdb/testsuite/lib/gdb.exp
>> @@ -570,7 +570,7 @@ proc runto { function args } {
>> }
>> return 1
>> }
>> - -re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
>> + -re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
>> if { $print_pass } {
>> pass $test_name
>> }
>> diff --git a/gdb/thread.c b/gdb/thread.c
>> index c6e3d356a5..d287bce45f 100644
>> --- a/gdb/thread.c
>> +++ b/gdb/thread.c
>> @@ -1488,8 +1488,16 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
>> else
>> frame = NULL;
>>
>> - m_selected_frame_id = get_frame_id (frame);
>> - m_selected_frame_level = frame_relative_level (frame);
>> + try
>> + {
>> + m_selected_frame_id = get_frame_id (frame);
>> + m_selected_frame_level = frame_relative_level (frame);
>> + }
>> + catch (const gdb_exception &ex)
>
> This silently swallows Ctrl-C/QUIT too. That's usually not a good
> idea. gdb_exception_error should be the default choice, unless you
> really want to handle Ctrl-C here.
>
Fixed.
Thanks for the review, that was helpful.
Any more comments?
- Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-gdb-Fix-hang-after-ext-sigkill.patch
Type: text/x-patch
Size: 6699 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/gdb-patches/attachments/20200421/b42e9c53/attachment.bin>
More information about the Gdb-patches
mailing list