[PATCH][gdb] Fix hang after ext sigkill

Tom de Vries tdevries@suse.de
Tue Apr 21 12:38:17 GMT 2020


On 16-04-2020 15:28, Pedro Alves wrote:
> Hi,
> 
> Sorry for the delay, and thanks much for working on this.
> 
> On 3/25/20 3:51 PM, Tom de Vries wrote:
>> On 25-03-2020 15:44, Simon Marchi wrote:
>>> On 2020-03-25 6:29 a.m., Tom de Vries wrote:
>>>> Here's the updated patch.
>>> Thanks.  Some comments about the test:
>>>
>>> - Please add a comment at the top to describe briefly what this is testing.
>>> - Please replace the infinite loops with bounded ones (e.g. for (i = 0; i < 300; i++)),
>>>   so that the test program eventually exits if something goes wrong and it is allowed to run
>>>   freely.
>> Done.
>> 0001-gdb-Fix-hang-after-ext-sigkill.patch
>>
>> [gdb] Fix hang after ext sigkill
>>
>> Consider the test-case from this patch, compiled with pthread support:
>> ...
>> $ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread
>> ...
>>
>> After running, the program sleeps:
>> ...
>> $ gdb a.out
>> Reading symbols from a.out...
>> (gdb) r
>> Starting program: /data/gdb_versions/devel/a.out
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> [New Thread 0x7ffff77fe700 (LWP 22604)]
>> ...
>>
>> Until we interrupt it with a control-C:
>> ...
>> ^C
>> Thread 1 "a.out" received signal SIGINT, Interrupt.
>> 0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6
>> (gdb)
>> ...
>>
>> If we then kill the inferior using an external SIGKILL:
>> ...
>> (gdb) shell killall -s SIGKILL a.out
>> ...
>> and subsequently continue:
>> ...
>> (gdb) c
>> Continuing.
>> Couldn't get registers: No such process.
>> Couldn't get registers: No such process.
>> (gdb) Couldn't get registers: No such process.
>> (gdb) Couldn't get registers: No such process.
>> (gdb) Couldn't get registers: No such process.
>> <repeat>
>> ...
>> gdb hangs repeating the same warning.  Typing control-C no longer helps,
>> and we have to kill gdb.
>>
>> This is a regression since commit 873657b9e8 "Preserve selected thread in
>> all-stop w/ background execution".  The commit adds a
>> scoped_restore_current_thread typed variable restore_thread to
>> fetch_inferior_event, and the hang is caused by the constructor throwing an
>> exception.
>>
>> Fix this by catching the exception in the constructor.
>>
>> Build and reg-tested on x86_64-linux.
>>
>> gdb/ChangeLog:
>>
>> 2020-02-24  Tom de Vries  <tdevries@suse.de>
>>
>> 	PR gdb/25471
>> 	* thread.c
>> 	(scoped_restore_current_thread::scoped_restore_current_thread): Catch
>> 	exception in get_frame_id.
>>
>> gdb/testsuite/ChangeLog:
>>
>> 2020-02-24  Tom de Vries  <tdevries@suse.de>
>>
>> 	PR gdb/25471
>> 	* gdb.threads/hang-after-ext-sigkill.c: New test.
>> 	* gdb.threads/hang-after-ext-sigkill.exp: New file.
> 
> "hang-after-ext-sigkill" is named in terms of how the
> bug manifested (a hang), but once the bug is fixed, it won't
> be obvious to remember to look for "hang" when someone goes
> look for a testcase related to a process being killed outside of
> gdb's control.  Plus, then testcase may be extended in the future
> for related bugs that do not cause a hang.
> 
> There's a gdb.base/killed-outside.exp testcase already exactly for
> this sort of issue -- the testcase does the same thing with killing
> with SIGKILL from outside, and then making sure that GDB
> behaves.  I'd rather this new testcase was given the same or
> a similar name, so that e.g. 'make check TESTS="*/*killed-outside*.exp"'
> runs it too.

Ack, test-case renamed to gdb.threads/killed-outside.{exp,c}.

>  Or maybe merge the testcases, though it's useful
> to run the existing one on non-threaded environments too.
> But I'm not sure this one needs to be threaded at all.  Won't
> we see the failure to read registers with a single-threaded program
> too?
> 

I was not able to reproduce the hang with a single-threaded program.

>> 	* lib/gdb.exp (runto): Handle "Temporary breakpoint" string.
>>
>> ---
>>  gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 43 +++++++++++
>>  .../gdb.threads/hang-after-ext-sigkill.exp         | 88 ++++++++++++++++++++++
>>  gdb/testsuite/lib/gdb.exp                          |  2 +-
>>  gdb/thread.c                                       | 12 ++-
>>  4 files changed, 142 insertions(+), 3 deletions(-)
>>
>> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
>> new file mode 100644
>> index 0000000000..b93d6c644a
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
>> @@ -0,0 +1,43 @@
>> +/* This testcase is part of GDB, the GNU debugger.
>> +
>> +   Copyright 2020 Free Software Foundation, Inc.
>> +
>> +   This program is free software; you can redistribute it and/or modify
>> +   it under the terms of the GNU General Public License as published by
>> +   the Free Software Foundation; either version 3 of the License, or
>> +   (at your option) any later version.
>> +
>> +   This program is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +   GNU General Public License for more details.
>> +
>> +   You should have received a copy of the GNU General Public License
>> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
>> +
>> +#include <pthread.h>
>> +#include <unistd.h>
>> +
>> +static void *
>> +fun (void *dummy)
>> +{
>> +  int i;
>> +
>> +  for (i = 0; i < 300; i++)
>> +    sleep (1);
>> +
>> +  return NULL;
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> +  int i;
>> +  pthread_t thread;
>> +  pthread_create (&thread, NULL, fun, NULL);
>> +
>> +  for (i = 0; i < 300; i++)
>> +    sleep (1);
>> +
>> +  return 0;
>> +}
>> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
>> new file mode 100644
>> index 0000000000..89b38b1f6c
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
>> @@ -0,0 +1,88 @@
>> +# Copyright (C) 2020 Free Software Foundation, Inc.
>> +
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> +
>> +# This test-case tests that continuing an inferior that has been killed
>> +# using an external sigkill does not make gdb hang.
>> +
>> +standard_testfile
>> +
>> +if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
>> +	 {pthreads}] == -1} {
>> +    return -1
>> +}
>> +
>> +set res [runto main no-message temporary]
>> +if { $res != 1 } {
>> +    return -1
>> +}
>> +
>> +set pid -1
>> +gdb_test_multiple "info inferior 1" "get inferior pid" {
>> +    -re -wrap "process (\[0-9\]*).*" {
>> +       set pid $expect_out(1,string)
>> +       pass $gdb_test_name
>> +    }
>> +}
> 
> This won't work with remote targets that don't support
> the process extensions, since on that case, you'll
> get "Remote target" instead of "process $PID".  See
> remote_target::pid_to_str.  Likewise probably other
> targets.  See gdb.base/killed-outside.exp.
> 
> 

Ack, updated accordingly.

>> +if { $pid == -1 } {
>> +    return -1
>> +}
>> +
>> +gdb_test_multiple "continue" "" {
>> +    -re "Continuing" {
>> +	pass $gdb_test_name
>> +    }
>> +}
>> +
>> +send_gdb "\003"
>> +
>> +gdb_test_multiple "" "get sigint" {
>> +    -re -wrap "received signal SIGINT, Interrupt\..*" {
>> +       pass $gdb_test_name
>> +   }
>> +}
>> +
> 
> 
> I don't think interrupting with Ctrl-C is really important,
> compared to e.g., running to a breakpoint.  I'd prefer to run
> to a breakpoint instead.  E.g., call a all_started(); function
> after the child thread is spawned and after a pthread_barrier_wait
> call, to make sure the child was scheduled.
> See e.g., gdb.threads/async.c.  Then all you need is to
> "runto all_started" instead of runto_main.
> 

Done.

>> +gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid"
> 
> This will always kill a process on the host gdb is running on, which
> of course does wrong the thing in cross scenarios.  So this should
> do instead:
> 
>     remote_exec target "kill -9 ${testpid}"
> 
> and sleeping a bit is a good idea to make sure the kill is actually
> scheduled and does its thing before gdb does the "continue".
> See gdb.base/killed-outside.exp.
> 
> 

Ack, done.

>> +
>> +set no_such_process_msg "Couldn't get registers: No such process\."
>> +set killed_msg "Program terminated with signal SIGKILL, Killed\."
>> +set no_longer_exists_msg "The program no longer exists\."
>> +set not_being_run_msg "The program is not being run\."
>> +
>> +gdb_test_multiple "continue" "prompt after first continue" {
>> +    -re "Continuing\.\r\n\r\n$killed_msg\r\n$no_longer_exists_msg\r\n$gdb_prompt $" {
>> +	pass $gdb_test_name
>> +	# Regular output, bug condition was not triggered, we're done.
>> +	return -1
>> +    }
>> +    -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " {
>> +	pass $gdb_test_name
>> +	# Two times $no_such_process_msg.  The bug condition was triggered, go
>> +	# check for it.
>> +    }
>> +    -re "Continuing\.\r\n$no_such_process_msg\r\n$gdb_prompt $" {
>> +	pass $gdb_test_name
>> +	# One time $no_such_process_msg.  We're stuck here.  The bug condition
>> +	# was not triggered, but we're not getting correct gdb behaviour either:
>> +	# every subsequent continue produces one no_such_process_msg.  Give up.
>> +	return -1
> 
> I'm confused here -- the comment says we're not getting correct behavior,
> but this won't result in any FAIL?
> 

Um, yes. I made that decision because I was trying to trigger another
scenario. But agreed, it's debatable. Anyway, it also doesn't matter
anymore since rewriting the test to stop at a breakpoint eliminated the
need for this. The scenario I was trying to trigger now always reproduces.

>> +    }
>> +}
>> +
>> +gdb_test_multiple "" "messages" {
>> +    -re ".*$killed_msg.*$no_longer_exists_msg\r\n" {
>> +	pass $gdb_test_name
>> +	gdb_test "continue" $not_being_run_msg "second continue"
>> +    }
>> +}
> 
> It isn't obvious to me why put this one separately instead of
> nested within the pass case in the other gdb_test_multiple above.
> Is this also meant to run if the previous gdb_test_multiple fails
> due to an internal gdb_test_multiple case being hit,
> like the default gdb_prompt match?
> 

Fixed.

> 
>> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
>> index e17ac0ef75..4cf2beca00 100644
>> --- a/gdb/testsuite/lib/gdb.exp
>> +++ b/gdb/testsuite/lib/gdb.exp
>> @@ -570,7 +570,7 @@ proc runto { function args } {
>>  	    }
>>  	    return 1
>>  	}
>> -	-re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { 
>> +	-re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
>>  	    if { $print_pass } {
>>  		pass $test_name
>>  	    }
>> diff --git a/gdb/thread.c b/gdb/thread.c
>> index c6e3d356a5..d287bce45f 100644
>> --- a/gdb/thread.c
>> +++ b/gdb/thread.c
>> @@ -1488,8 +1488,16 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
>>        else
>>  	frame = NULL;
>>  
>> -      m_selected_frame_id = get_frame_id (frame);
>> -      m_selected_frame_level = frame_relative_level (frame);
>> +      try
>> +	{
>> +	  m_selected_frame_id = get_frame_id (frame);
>> +	  m_selected_frame_level = frame_relative_level (frame);
>> +	}
>> +      catch (const gdb_exception &ex)
> 
> This silently swallows Ctrl-C/QUIT too.  That's usually not a good
> idea.  gdb_exception_error should be the default choice, unless you
> really want to handle Ctrl-C here.
> 

Fixed.

Thanks for the review, that was helpful.

Any more comments?

- Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-gdb-Fix-hang-after-ext-sigkill.patch
Type: text/x-patch
Size: 6699 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/gdb-patches/attachments/20200421/b42e9c53/attachment.bin>


More information about the Gdb-patches mailing list