[RFC PATCH] gdb: fix eval.c assert during inferior exit event

Thu May 6 01:59:32 GMT 2021

y
On 2021-05-05 11:56 a.m., Magne Hov via Gdb-patches wrote:
> Evaluating expressions from within an inferior exit event handler can
> cause a crash:
> 
>     echo "int main() { return 0; }" > repro.cc
>     g++ -g repro.cc -o repro
>     ./gdb -q --ex "start" --ex "python gdb.events.exited.connect(lambda _: gdb.execute('set \$_a=0'))" --ex "continue" repro
> 
>     Reading symbols from repro...
>     Temporary breakpoint 1 at 0x1131: file repro.cc, line 1.
>     Starting program: /home/mhov/repos/binutils-gdb-master/install/bin/repro
> 
>     Temporary breakpoint 1, main () at repro.cc:1
>     1	int main() { return 0; }
>     Continuing.
>     [Inferior 1 (process 75524) exited normally]
>     ../../gdb/thread.c:72: internal-error: thread_info* inferior_thread(): Assertion `current_thread_ != nullptr' failed.
>     A problem internal to GDB has been detected,
>     further debugging may prove unreliable.
>     Quit this debugging session? (y or n) [answered Y; input not from terminal]
> 
>     This is a bug, please report it.  For instructions, see:
>     <https://www.gnu.org/software/gdb/bugs/>.
> 
> Backtrace
>     0  in internal_error of ../../gdbsupport/errors.cc:51
>     1  in inferior_thread of ../../gdb/thread.c:72
>     2  in expression::evaluate of ../../gdb/eval.c:98
>     3  in evaluate_expression of ../../gdb/eval.c:115
>     4  in set_command of ../../gdb/printcmd.c:1502
>     5  in do_const_cfunc of ../../gdb/cli/cli-decode.c:101
>     6  in cmd_func of ../../gdb/cli/cli-decode.c:2181
>     7  in execute_command of ../../gdb/top.c:670
>     ...
>     22 in python_inferior_exit of ../../gdb/python/py-inferior.c:182
> 
> This is the same assertion site as in
> <https://sourceware.org/bugzilla/show_bug.cgi?id=26761>, but I'm not
> sure if it is the same problem.

I don't think so, this assertion in inferior_thread (trying to get the
current thread while there is not current thread) is kind of a generic
symptom, but the root cause can be any thing.

> 
> In `expression::evaluate (...)' there is a call to `inferior_thread
> ()' that is guarded by `target_has_execution ()':
> 
>     struct value *
>     expression::evaluate (struct type *expect_type, enum noside noside)
>     {
>       gdb::optional<enable_thread_stack_temporaries> stack_temporaries;
>       if (target_has_execution ()
>           && language_defn->la_language == language_cplus
>           && !thread_stack_temporaries_enabled_p (inferior_thread ()))
>         stack_temporaries.emplace (inferior_thread ());
> 
> The `target_has_execution ()' guard maps onto `inf->pid' and the
> `inferior_thread ()' call assumes that `current_thread_' is set to
> something meaningful:
> 
>     struct thread_info*
>     inferior_thread (void)
>     {
>       gdb_assert (current_thread_ != nullptr);
>       return current_thread_;
>     }
> 
> In other words, it is assumed that if `inf->pid' is set then
> `current_thread_' must also be set. This does not hold at the point
> where inferior exit observers are notified:
> - `generic_mourn_inferior (...)'
>   - `switch_to_no_thread ()'
>     - `current_thread_ = nullptr;'
>   - `exit_inferior (...)'
>     - `gdb::observers::inferior_exit.notify (...)'
>     - `inf->pid = 0'
> 
> The inferior exit notification means that a Python handler can get a
> chance to run while `current_thread' has been cleared and the
> `inf->pid' has not been cleared. Since the Python handler can call any
> GDB command with `gdb.execute(...)' (in my case `gdb.execute("set
> $_a=0")' we can end evaluating expressions and asserting in
> `evaluate_subexp (...)'.
> 
> This patch adds a test in `evaluate_subexp (...)' to check the global
> `inferior_ptid' which is reset at the same time as `current_thread_'.
> Checking `inferior_ptid' at the same time as `target_has_execution ()'
> seems to be a common pattern:
> 
>     $ git grep -n -e inferior_ptid --and -e target_has_execution
>     gdb/breakpoint.c:2998:    && (inferior_ptid == null_ptid || !target_has_execution ()))
>     gdb/breakpoint.c:3054:    && (inferior_ptid == null_ptid || !target_has_execution ()))
>     gdb/breakpoint.c:4587:  if (inferior_ptid == null_ptid || !target_has_execution ())
>     gdb/infcmd.c:360:  if (inferior_ptid != null_ptid && target_has_execution ())
>     gdb/infcmd.c:2380:  /* FIXME:  This should not really be inferior_ptid (or target_has_execution).
>     gdb/infrun.c:3438:  if (!target_has_execution () || inferior_ptid == null_ptid)
>     gdb/remote.c:11961:  if (!target_has_execution () || inferior_ptid == null_ptid)
>     gdb/solib.c:725:  if (target_has_execution () && inferior_ptid != null_ptid)
> 
> The testsuite has been run on 5.4.0-59-generic x86_64 GNU/Linux:
> - Ubuntu 20.04.1 LTS
> - gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
> - DejaGnu version 1.6.2
>   - Expect version 5.45.4
>   - Tcl version 8.6
> - Native configuration: x86_64-pc-linux-gnu
> - Target: unix
> 
> Diffing results before and after the patch shows a few XFAIL in
> gdb.threads/attach-many-short-lived-threads.exp, and one unexpected
> failure in gdb.base/run-attach-while-running.exp which seems to be
> flaky on master as well.

First, let me say that your commit message is fantastic.

I do agree with your analysis and the fix.  inferior_ptid is always
supposed to be in sync with current_thread_.

> diff --git a/gdb/testsuite/gdb.python/py-events.exp b/gdb/testsuite/gdb.python/py-events.exp
> index e89cd8b021b..bec3c9f2cab 100644
> --- a/gdb/testsuite/gdb.python/py-events.exp
> +++ b/gdb/testsuite/gdb.python/py-events.exp
> @@ -197,18 +197,33 @@ gdb_test_multiple "continue" $test {
>  gdb_test_no_output "delete $second_breakpoint"
>  
>  #test exited event.
> +proc get_process_id {test} {
> +    global gdb_prompt
> +    global process_id
> +    gdb_test_multiple "info proc" $test {
> +	-re "process (\\d+).*$gdb_prompt $" {
> +	    set process_id $expect_out(1,string)
> +	    pass $test
> +	}
> +    }
> +}

Insted of a proc setting a global variable (process_id), I would prefer
if the proc returned the value.  And then you could do:

set process_id [get_process_id]

> +
> +get_process_id "get inferior 1 process id"
>  gdb_test "continue" ".*event type: continue.*
>  .*clear_objfiles\[\r\n\]*progspace: .*py-events.*
>  .*event type: exit.*
>  .*exit code: 12.*
>  .*exit inf: 1.*
> +.*exit pid: $process_id.*
>  dir ok: True.*" "Inferior 1 terminated."
>  
>  gdb_test "inferior 2" ".*Switching to inferior 2.*"
> +get_process_id "get inferior 2 process id"
>  gdb_test "continue" ".*event type: continue.*

I get:

    DUPLICATE: gdb.python/py-events.exp: continue

We try to avoid duplicate test names, to make it easier to track
down failures.  A handy way to make test names unique is to use
with_test_prefix:

with_test_prefix "inferior 1" {
    ...
}

with_test_prefix "inferior 2" {
    .
}

That will add "inferior 1: " and "inferior 2: " to the test names of
whatever is within these { }.

>  .*event type: exit.*
>  .*exit code: 12.*
>  .*exit inf: 2.*
> +.*exit pid: $process_id.*
>  dir ok: True.*" "Inferior 2 terminated."
>  
>  
> @@ -235,3 +250,27 @@ gdb_test "python print(count)" 2 "check for before_prompt event"
>  
>  gdb_test_no_output "xxz" "run a canned sequence"
>  gdb_test "python print(count)" 4 "check for before_prompt event again"
> +
> +# Test evaluating expressions from within an inferior exit event handler. This used to cause a crash
> +# when C++ expression were evaluated.

Wrap the comment to 80 columns, and use two spaces after the period
(this is a common rule in this codebase).

> +standard_testfile .cc

Is C++ really required?  I was able to reproduce the bug with your
reproducer, but compiled as C.  If we can re-use the existing test
program for this test (just let it run to completion), it would be
simpler and preferable.

> +
> +if {[prepare_for_testing "failed to prepare" $testfile $srcfile]} {
> +    return -1
> +}
> +
> +gdb_test_multiline "add exited listener" \
> +    "python" "" \
> +    "def increment_foo(_):" "" \
> +    "  gdb.execute('set \$_foo=\$_foo+1')" "" \
> +    "gdb.events.exited.connect(increment_foo)" "" \
> +    "end" ""
> +gdb_test "set \$_foo=0" "" "initialize foo variable"
> +gdb_test "print \$_foo" "= 0" "check foo initialized"
> +gdb_test "run" "exited normally.*"
> +gdb_test "print \$_foo" "= 1" "check foo after run"
> +gdb_test "start" "Temporary breakpoint .* main .*" "stop on a C++ frame"
> +gdb_test "continue" "exited normally.*"

We want the tests to also work when using the native-gdbserver
(gdbserver using the remote protocol) and native-extended-gdbserver
(gdbserver using the extended-remote protocol) boards.  For this reason,
it's generally not good to use "run" and "start" literally in the tests.

Instead, try to use the procedures runto_main, gdb_run_cmd,
gdb_start_cmd, etc.

Unfortunately, I can see that this test is completely skipped when
running with the native-gdbserver board.  I don't understand why, Python
events are still relevant then.  And it's broken with the
native-extended-gdbserver board.  But let's at least try to do the right
thing and use the right procedures.  It will help for when someone goes
and tries to fix that test.

> +gdb_test "print \$_foo" "= 2" "check foo after start continue"
> +
> +>>>>>>> 2d04478f2ca... gdb: check inferior_ptid before calling inferior_thread in eval.c

I guess this last line is not meant to be included.

Simon