This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2 4/4] Unregister the inferior from the event loop if failed to resume


On 11/6/19 5:52 PM, Maciej W. Rozycki wrote:
Fix an issue with GDB starting to effectively busy-loop (though still
accepting and interpreting commands) using the full processing power
available when a remote stub has gone closing the connection while GDB
is waiting for user input once the inferior has failed to resume.

This was observed with a `riscv64-linux-gnu' cross-GDB and RISC-V/Linux
`gdbserver' being developed and an attempt to step over the `ret' aka
`c.jr ra' instruction where the value held in the `ra' aka `x1' register
and therefore the address of a software-stepping breakpoint to insert is
0.  Here's a record of a remote debug session leading to the issue:

Sending packet: $vCont?#49...Packet received: vCont;c;C;t
Packet vCont (verbose-resume) is supported
Sending packet: $vCont;c:p6857.-1#b8...infrun: infrun_async(1)
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = ignore
infrun: handle_inferior_event status->kind = ignore
infrun: prepare_to_wait
Packet received: T05swbreak:;02:f0f8ffff3f000000;20:b807565515000000;thread:p6857.6857;core:1;
infrun: target_wait (-1.0.0, status) =
infrun:   26711.26711.0 [Thread 26711.26711],
infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: handle_inferior_event status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: stop_pc = 0x15555607b8
Sending packet: $qXfer:libraries-svr4:read::0,1000#20...Packet received: l<library-list-svr4 version="1.0" main-lm="0x155556f160"><library name=".../sysroot/lib/ld-linux-riscv64-lp64d.so.1" lm="0x155556ea18" l_addr="0x1555556000" l_ld="0x155556de90"/><library name="linux-vdso.so.1" lm="0x155556f6f0" l_addr="0x1555570000" l_ld="0x1555570350"/></library-list-svr4>
infrun: BPSTAT_WHAT_SINGLE
infrun: thread [Thread 26711.26711] still needs step-over
infrun: skipping breakpoint: stepping past insn at: 0x15555607b8
Sending packet: $X15555607b8,2:\202\200#2e...Packet received: OK
infrun: skipping breakpoint: stepping past insn at: 0x15555607b8
infrun: skipping breakpoint: stepping past insn at: 0x15555607b8
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 26711.26711] at 0x15555607b8
Sending packet: $m15555607b8,2#07...Packet received: 8280
Sending packet: $m15555607b8,2#07...Packet received: 8280
Sending packet: $g#67...Packet received: 00000000000000000000000000000000f0f8ffff3f000000d0c1b9aa2a00000020f76d55150000000f00000000000000ec6e555515000000ffffff6f00000000f0faffff3f00000090f0565515000000000000000000000052e57464000000000500000000000000887801000000000028f1565515000000010000000000000008f8ffff3f000000722f6c696236342f60f156551500000000000000000000000000000000000000000000000000000008f75655150000000100000000000000f5feffefffffffff80de56551500000050f9ffff3f00000068f9ffff3f000000ae585655150000000b00000000000000fffdff6f00000000fcffffffffffffff[552 bytes omitted]
Sending packet: $m0,40#2d...Packet received: E01
Sending packet: $m0,1#fa...Packet received: E01
Sending packet: $m0,40#2d...Packet received: E01
Sending packet: $m0,1#fa...Packet received: E01
infrun: skipping breakpoint: stepping past insn at: 0x15555607b8
infrun: clear_step_over_info
Cannot access memory at address 0x0
(gdb)

At this point a command was issued to kill `gdbserver' and GDB responded
immediately by entering the event loop:

infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = no-resumed
infrun: handle_inferior_event status->kind = no-resumed
infrun: TARGET_WAITKIND_NO_RESUMED (ignoring: bg)
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = no-resumed
infrun: handle_inferior_event status->kind = no-resumed
infrun: TARGET_WAITKIND_NO_RESUMED (ignoring: bg)
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = no-resumed
infrun: handle_inferior_event status->kind = no-resumed
infrun: TARGET_WAITKIND_NO_RESUMED (ignoring: bg)
infrun: prepare_to_wait
[...]

This is because `remote_target::resume' enables the asynchronous event
loop, as indicated by the first `infrun: infrun_async(1)' record above,
and then the EOF condition on the remote connection is delivered as an
event, but there are no resumed children to be waited for and no other
event waiting that would stop the loop.

Correct that then by disabling the asynchronous event as execution
completion would, where applicable, i.e. in the all-stop mode.  This
makes the final sequence of the session look like:

Sending packet: $m0,40#2d...Packet received: E01
Sending packet: $m0,1#fa...Packet received: E01
Sending packet: $m0,40#2d...Packet received: E01
Sending packet: $m0,1#fa...Packet received: E01
infrun: infrun_async(0)
infrun: skipping breakpoint: stepping past insn at: 0x15555607b8
infrun: clear_step_over_info
Cannot access memory at address 0x0
(gdb)

and GDB does not trigger the loop at this point if `gdbserver' goes away
as the asynchronous event loop has already been disabled.

	gdb/
	* infrun.c (resume): In the `catch' clause also unregister the
	inferior from the event loop.
---
Hi,

  It might be that the non-stop mode requires additional clean-up here,
however I have no way to work with that at the moment, so I'll be leaving
any investigation to someone who has and will be inclined to look into it.

   Maciej

New change in v2.
---
  gdb/infrun.c |   16 +++++++++++-----
  1 file changed, 11 insertions(+), 5 deletions(-)

gdb-resume-async.diff
Index: binutils-gdb/gdb/infrun.c
===================================================================
--- binutils-gdb.orig/gdb/infrun.c
+++ binutils-gdb/gdb/infrun.c
@@ -2614,11 +2614,17 @@ resume (gdb_signal sig)
      }
    catch (const gdb_exception &ex)
      {
-      /* If resuming is being aborted for any reason, delete any
-	 single-step breakpoint resume_1 may have created, to avoid
-	 confusing the following resumption, and to avoid leaving
-	 single-step breakpoints perturbing other threads, in case
-	 we're running in non-stop mode.  */
+      /* If resuming is being aborted for any reason, and we are in
+	 the all-stop mode, then unregister the inferior from the event
+	 loop.  This is done so that when the inferior is not running
+	 we don't get distracted by spurious inferior output.  */
+      if (!non_stop && target_has_execution && target_can_async_p ())
+	target_async (0);
+
+      /* Also delete any single-step breakpoint resume_1 may have
+	 created, to avoid confusing the following resumption, and to
+	 avoid leaving single-step breakpoints perturbing other threads,
+	 in case we're running in non-stop mode.  */
        if (inferior_ptid != null_ptid)
  	{
  	  thread_info *tp = inferior_thread ();


This looks reasonable, though the call to target_async (0) is always a bit cryptic to me. Makes me wonder if we're missing such invocations in other places to make GDB more recoverable on failures.

Does this fix also address 02/04? Since 02/04 is also a failure during resume? Not sure if it goes through the catch block.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]