Hardware breakpoints and watchpoints: patches

Mon Aug 23 22:26:00 GMT 1999

> Date: Thu, 19 Aug 1999 12:49:29 -0700
> From: Michael Snyder <msnyder@cygnus.com>
> Cc: gdb-patches@sourceware.cygnus.com
> 
> > *************** bpstat_stop_status (pc, not_a_breakpoint
> > *** 2124,2130 ****
> >         bs->print = 1;
> > 
> >         sprintf (message, message1, b->number);
> > !       if (b->type == bp_watchpoint || b->type == bp_hardware_watchpoint)
> >         {
> >           switch (catch_errors (watchpoint_check, bs, message, RETURN_MASK_ALL))
> >             {
> > --- 2137,2143 ----
> >         bs->print = 1;
> > 
> >         sprintf (message, message1, b->number);
> > !       if (b->type == bp_watchpoint)
> >         {
> >           switch (catch_errors (watchpoint_check, bs, message, RETURN_MASK_ALL))
> >             {
> 
> This change is not OK.
> You might think that it makes more sense to treat a 
> hardware watchpoint like a hardware read/access watchpoint, 
> rather than like a software watchpoint.  But if I make this
> change, then the old vs. new values are not reported.

I don't see why the old vs new values shouldn't be reported after this
change; I still see them after I made the change.  Could you please
identify the code which produces this adverse effect?

Here's what I see in print_it_normal:

  else if ((bs->old_val != NULL) &&
	(bs->breakpoint_at->type == bp_watchpoint ||
	 bs->breakpoint_at->type == bp_access_watchpoint ||
	 bs->breakpoint_at->type == bp_hardware_watchpoint))
    {
      annotate_watchpoint (bs->breakpoint_at->number);
      mention (bs->breakpoint_at);
      printf_filtered ("\nOld value = ");
      value_print (bs->old_val, gdb_stdout, 0, Val_pretty_default);
      printf_filtered ("\nNew value = ");
      value_print (bs->breakpoint_at->val, gdb_stdout, 0,
		   Val_pretty_default);
      printf_filtered ("\n");
      value_free (bs->old_val);
      bs->old_val = NULL;
      /* More than one watchpoint may have been triggered.  */
      return -1;
    }

Perhaps some changes checked in after GDB 4.18 changed this?  (I don't
track the CVS tree and the snapshots, for technical reasons.)

> It does no harm to treat the 
> hardware watchpoint like the software watchpoint in this
> context, since the code will only execute when and if the
> hardware watchpoint "fires".

The difference between hardware and software watchpoints is that the
latter causes the program to single-step, and therefore whenever the
watched value changes, we *know* the watchpoint triggered without any
need for checking anything else.

In contrast, with hardware watchpoints, we need to check whether the
watched address(es) really triggered; the code that handles software
watchpoints doesn't do that.

Without checking which address triggered, we could print incorrect
watchpoint information, or think that a watchpoint triggered when in
fact it didn't.  For example, the existing version would think that
the hardware watchpoint triggered each time any of the hardware
watchpoints triggered; it would avoid announcing itself solely because
the watched value didn't change; but it would decrement the hit_count
for the breakpoint, and "info breakpoints" would show negative counts
for watchpoints that in reality never triggered.

> > *************** bpstat_stop_status (pc, not_a_breakpoint
> > *** 2141,2148 ****
> >               /* Don't stop.  */
> >               bs->print_it = print_it_noop;
> >               bs->stop = 0;
> > -             /* Don't consider this a hit.  */
> > -             --(b->hit_count);
> >               continue;
> >             default:
> >               /* Can't happen.  */
> > --- 2154,2159 ----
> 
> Can you tell me why you want to make this change?

"info breakpoints" was printing negative hit counts for watchpoints
that didn't trigger.  To reproduce, set two watchpoints for two
different addresses and run a fragment of a program where only one of
the watchpoints gets hit; then type "info breakpoints".

I traced this problem to the line above that decrements the hit count.
I didn't find any place that incremented the hit count, so it seemed
to me that decrementing it is a bug.  Did I miss something?

> > *************** bpstat_stop_status (pc, not_a_breakpoint
> > *** 2160,2173 ****
> >               break;
> >             }
> >         }
> > !       else if (b->type == bp_read_watchpoint || b->type == bp_access_watchpoint)
> >           {
> >           CORE_ADDR addr;
> >           value_ptr v;
> >             int found = 0;
> > 
> >           addr = target_stopped_data_address();
> > !         if (addr == 0) continue;
> >             for (v = b->val_chain; v; v = v->next)
> >               {
> >                 if (v->lval == lval_memory)
> > --- 2171,2192 ----
> >               break;
> >             }
> >         }
> > !       else if (b->type == bp_read_watchpoint
> > !              || b->type == bp_access_watchpoint
> > !              || b->type == bp_hardware_watchpoint)
> >           {
> >           CORE_ADDR addr;
> >           value_ptr v;
> >             int found = 0;
> > 
> >           addr = target_stopped_data_address();
> > !         if (addr == 0)
> > !           {
> > !             /* Don't stop.  */
> > !             bs->print_it = print_it_noop;
> > !             bs->stop = 0;
> > !             continue;
> > !           }
> >             for (v = b->val_chain; v; v = v->next)
> >               {
> >                 if (v->lval == lval_memory)
> 
> Regarding the change around "if (addr == 0)"
> this will break targets that don't define 
> "target_stopped_data_address".
> 
> The downside without this change is that, when a read/access BP
> fires, all read/access BP's will announce themselves.
> 
> The downside WITH the change (and without the macro defined),
> is that NONE of them will announce, and indeed the target 
> won't stop.
> 
> Downside #2 is worse than downside #1.

Perhaps we could have the cake and eat it, too ;-).  The test for
target_stopped_data_address being zero can be conditionalized.  For
example, we could remove the default definition of
target_stopped_data_address from target.h and have the test above be
conditioned on it being defined.  I expect every platform with
hardware watchpoints to define target_stopped_data_address.  Without
testing which address triggered, there's no way to have several
watchpoints at the same time.  If you are watching several values, it
is a real mess to have all of them trigger, especially with read
watchpoints where you don't have any good way to verify whether or not
it indeed triggered.

Is it acceptable to remove the default definition from target.h and
make the test in bpstat_stop_status be conditional-compiled only if
target_stopped_data_address is defined?

> > --- 2212,2251 ----
> >                     /* Stop.  */
> >                     break;
> >                   case WP_VALUE_CHANGED:
> > +                 if (b->type == bp_read_watchpoint)
> > +                   {
> > +                     /* Don't stop: read watchpoints shouldn't fire if
> > +                        the value has changed.  This is for targets
> > +                        which cannot set read-only watchpoints.  */
> > +                     bs->print_it = print_it_noop;
> > +                     bs->stop = 0;
> > +                     continue;
> > +                   }
> > +                 ++(b->hit_count);
> > +                 break;
> 
> This is no good.  If there is a write, followed by a read, 
> GDB will decide not to stop for the read.

I cannot imagine the scenario under which this would happen.  Could
you please show a code fragment that you think would produce such an
effect?

As far as I understand, if a data write is followed by a data read,
the data-write watchpoint will trigger first, and GDB will then update
the ``old value'' it maintains with the watchpoints; so the data-read
watchpoint correctly triggers after the data-write one.  I actually
tested this in a simple program, and it worked like I'd expect.

> It is better to 
> get a false positive (unnecessary stop) than a false negative
> (fail to stop when you should.)

We are talking about watchpoints, not code breakpoints, so I can only
agree to this up to a point.  Specifically, if using watchpoints
triggers false alarms all over the place as soon as you have several
of them and of different types, the results are too messy to be
useful.  After all, watchpoints are supposed to be the silver bullet
that lets you find bugs whereby values change when the programmer
thinks they shouldn't.  If we put the burden of finding the
watchpoints that *really* triggered on the programmer, the watchpoints
will not serve their purpose.

> >                   case WP_VALUE_NOT_CHANGED:
> > !                   /* Stop, but not if this is a write-only watchpoint.
> > !                    FIXME: this causes GDB to not report watchpoints
> > !                    if the memory was written with the same value.
> > !                    However, to correct this, we need maintain a
> > !                    one-to-one correspeondence between the watchpoint
> > !                    and the target resources used to implement it, and
> > !                    let the target tell us which watchpoint(s) fired.
> > !                    With the current design, if we don't check that the
> > !                    value has changed, we might erroneously say that a
> > !                    read watchpoint triggered if it watches the same
> > !                    address as some other write watchpoint.  */
> > !                 if (b->type == bp_hardware_watchpoint)
> > !                   {
> > !                     bs->print_it = print_it_noop;
> > !                     bs->stop = 0;
> > !                     continue;
> > !                   }
> 
> Again, it is better to stop when you don't need to, 
> than to keep going when you should have stopped.

I think we should also consider the probability for each one of these
problems to happen in real life.  In my view, the case where somebody
is interested in a data-write watchpoint that doesn't change the value
is much, much less probable than a case where the same address is
watched with both data-read and data-write watchpoints (each one with
its own condition).  Telling the user that both triggered will
effectively disallow such usage of watchpoints.

I think the suggested change would be a reasonable compromise (it
should be accompanied by a suitable documentation in the manual, of
course; I will submit such a change for the manual if you decide to
accept the code change).

> Should we extract the inner part of the loop in "insert_breakpoints"
> and make it a separate function, analogous to "delete_breakpoint"?
> Then call that function here?

I'm not sure.  The inner part of the loop is not replicated anywhere
else in the source (unlike delete_breakpoint that is called from many
places).  We never need to insert a single breakpoint, only all of
them together.

> Want to volunteer to do it?

I don't mind doing this, if you think it is really needed.

> > --- 1046,1061 ----
> > 
> >   #ifndef TARGET_HAS_HARDWARE_WATCHPOINTS
> > 
> > ! /* Returns the number of hardware watchpoints of type TYPE that we can
> > !    set.  Value is positive if we can set CNT watchpoints, zero if
> > !    setting watchpoints of type TYPE is not supported, and negative if
> > !    CNT is more than the maximum number of watchpoints of type TYPE
> > !    that we can support.  TYPE is one of bp_hardware_watchpoint,
> > !    bp_read_watchpoint, bp_write_watchpoint, or bp_hardware_breakpoint.
> > !    CNT is the number of such watchpoints used so far (including this
> > !    one).  OTHERTYPE is non-zero if other types of watchpoints are
> > !    currently enabled (this is for targets where all watchpoints need to
> > !    be of the same type, for example).  */
> > 
> >   #define TARGET_CAN_USE_HARDWARE_WATCHPOINT(TYPE,CNT,OTHERTYPE) 0
> > 
> 
> This is incorrect.  The macro is boolean, it only returns zero
> or non-zero.  Well, one implementation returns 1, 0, or -1, but
> that distinction isn't well-documented...

Well, one of the main reasons for this change (all it changes is a
comment) was precisely to document that distinction.  Here's the code
that uses this distinction (from watch_command_1):

  mem_cnt = can_use_hardware_watchpoint (val);
  if (mem_cnt == 0 && bp_type != bp_hardware_watchpoint)
    error ("Expression cannot be implemented with read/access watchpoint.");
  if (mem_cnt != 0) { 
    i = hw_watchpoint_used_count (bp_type, &other_type_used);
    target_resources_ok = TARGET_CAN_USE_HARDWARE_WATCHPOINT(
		bp_type, i + mem_cnt, other_type_used);
    if (target_resources_ok == 0 && bp_type != bp_hardware_watchpoint)
      error ("Target does not have this type of hardware watchpoint support.");
    if (target_resources_ok < 0 && bp_type != bp_hardware_watchpoint)
      error ("Target resources have been allocated for other types of watchpoints.");
  }

This clearly shows that positive, zero, and negative values each has
its own distinct meaning.  I tried to document that meaning after I
naively followed the existing comment and got burned...

I deciphered the meaning of the negative value based on the error
message text printed by GDB, which did make sense to me: on x86, the
same debug registers are used both for hardware breakpoints and for
watchpoints.  Perhaps I understood the code incorrectly, but the code
clearly does not treat this as a simple boolean macro, and I think it
shouldn't be documented as such.

> > *************** extern char *normal_pid_to_str PARAMS ((
> > *** 1058,1063 ****
> > --- 1064,1076 ----
> >           (LONGEST)(byte_count) <= REGISTER_SIZE
> >   #endif
> > 
> > + /* Returns non-zero if we can use hardware watchpoints to watch a region
> > +    whose address is ADDR and whose length is LEN.  */
> > + #if !defined(TARGET_REGION_OK_FOR_HW_WATCHPOINT)
> > + #define TARGET_REGION_OK_FOR_HW_WATCHPOINT(addr,len) \
> > +       TARGET_REGION_SIZE_OK_FOR_HW_WATCHPOINT(len)
> > + #endif
> > +
> >   /* However, some addresses may not be profitable to use hardware to watch,
> >      or may be difficult to understand when the addressed object is out of
> >      scope, and hence should be unwatched.  On some targets, this may have
> 
> This change is a duplicate.
> You've already defined this macro once earlier.

The other definition (in target.h) is only visible if
TARGET_HAS_HARDWARE_WATCHPOINTS is undefined.  So targets that do have
watchpoints won't see it.  I understand that this was the reason for
having a second definition of TARGET_REGION_SIZE_OK_FOR_HW_WATCHPOINT
in breakpoint.c, although target.h defines it as well.