Moribund breakpoints and hardware single-step

Frederic Riss
Thu Apr 28 16:27:00 GMT 2011


I just debugged a very interesting problem in the moribund breakpoints
machinery. First I'm working on sources that must be ~ 2 months old. I
haven't had time upgrading, but from looking at the diff, the current
GDB master should be subject to the same behavior.

The target is in async + non-stop mode and uses displaced stepping.
When stepping into a function. infrun.c:handle_step_into_function()
inserts a step_resume breakpoint at the end of the prologue and
resumes execution. When the breakpoint is hit, it is removed from the
target and from the breakpoint list and is remembered in the moribund
breakpoints list for a bit. We have the current PC that points to the
location of the moribund breakpoint, and we try to step further. GDB
asks the target to step one instruction and gets the hand back.
Currently if the size of an instruction equals decr_pc_after_break(),
infrun.c:adjust_pc_after_break() will consider that the target hit the
moribund breakpoint and reset the PC to the breakpoint address, thus
executing again and again the same instruction until the breakpoint is
ripped off the moribund list.

The issue is quite serious as it breaks the inferior behavior (it will
go unnoticed if the instruction being repeatedly stepped has always
the same side effect, but $r0 = $r0 + 1 will become $r0 = $r0 + 3 *
(thread_count () + 1) )

The comment in adjust_pc_after_break reads:

      /* When using hardware single-step, a SIGTRAP is reported for both
	 a completed single-step and a software breakpoint.  Need to
	 differentiate between the two, as the latter needs adjusting
	 but the former does not.

	 The SIGTRAP can be due to a completed hardware single-step only if
	  - we didn't insert software single-step breakpoints
	  - the thread to be examined is still the current thread
	  - this thread is currently being stepped

	 If any of these events did not occur, we must have stopped due
	 to hitting a software breakpoint, and have to back up to the
	 breakpoint address.

	 As a special case, we could have hardware single-stepped a
	 software breakpoint.  In this case (prev_pc == breakpoint_pc),
	 we also need to back up to the breakpoint address.  */

It's the last special case here that bites. I 'fixed' that in my tree
with the following simple patch:

@@ -2941,7 +2884,8 @@ adjust_pc_after_break (struct
execution_control_state *ecs)
       if (singlestep_breakpoints_inserted_p
          || !ptid_equal (ecs->ptid, inferior_ptid)
          || !currently_stepping (ecs->event_thread)
-         || ecs->event_thread->prev_pc == breakpoint_pc)
+         || (software_breakpoint_inserted_here_p (aspace, breakpoint_pc)
+             && ecs->event_thread->prev_pc == breakpoint_pc))
        regcache_write_pc (regcache, breakpoint_pc);

       if (RECORD_IS_USED)

The patch is based on the fact that we won't ever hardware single-step
a moribund-breakpoint. However, I'm not sure this assertion always
holds, and I'm a bit nervous that there might be some other cases that
lead to the same kind of behavior. What do you think?

As an aside, why do we use a step-resume breakpoint when stepping into
a function? In these days of massive multi-threading, wouldn't it be
much better to just change the thread's stepping range to avoid other
threads hitting the temporary breakpoint ?


More information about the Gdb mailing list