[PATCHv3 2/2] gdb: remove VALUE_FRAME_ID

Andrew Burgess andrew.burgess@embecosm.com
Tue Jul 20 09:10:24 GMT 2021


While working on the previous commit I happened to switch on 'set
debug frame 1', and ran into a different assertion than the one I was
trying to fix.

The new problem I see is this assertion triggering:

  gdb/frame.c:622: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->this_id.p != frame_id_status::COMPUTING' failed.

We attempt to get the frame_id for a frame while we are computing the
frame_id for that same frame.

What happens is we have a stack like this:

  normal_frame -> inline_frame -> sentinel_frame

When we initially stop, GDB creates a frame for inline_frame but
doesn't sniff its type, or try to fill in its frame_id (see code near
the top of get_prev_frame_if_no_cycle).

Later on during the stop, this happens:

  process_event_stop_test
    get_stack_frame_id
      skip_artificial_frames
        get_frame_type

The call to get_frame_type causes inline_frame to sniff its type,
establishing that it is an INLINE_FRAME, but does not cause the frame
to build its frame_id.

The same skip_artificial_frames call then calls get_prev_frame_always
to unwind the stack, this will then try to get the frame previous to
inline_frame, i.e. normal_frame.

So, we create a new frame, but unlike frame #0, in
get_prev_frame_if_no_cycle, we immediately try to compute the frame_id
for this new frame #1.

Computing the frame_id for frame #1 invokes the sniffer.  If this
sniffer tries to read a register then we will create a lazy register
value by calling value_of_register_lazy.  As the next frame (frame #0)
is an INLINE_FRAME, we will skip this and instead create the lazy
register value using the frame_id for frame #-1 (the sentinel frame).

Now, when we try to resolve the lazy register value, in the value.c
function value_fetch_lazy_register, we want to print which frame the
lazy register value was for (in debug mode), so we call:

  frame = frame_find_by_id (VALUE_FRAME_ID (val));

where:

  #define VALUE_FRAME_ID(val) (get_prev_frame_id_by_id (VALUE_NEXT_FRAME_ID (val)))

In our case we call get_prev_frame_id_by_id with the frame_id of the
sentinel_frame frame, we then call frame_find_by_id, followed by
get_prev_frame (which gives us frame #0), followed by get_frame_id.

Frame #0 has not yet computed its frame_id, so we start that process.
The first thing we need when computing the frame_id of an inline frame
is the frame_id of the previous frame, so that's what we ask for.
Unfortunately, that's where we started this journey, we are already
computing the frame_id for frame #1, and thus the assert fires.

Solving the assertion failure is pretty easy, if we consider the code
in value_fetch_lazy_register and get_prev_frame_id_by_id then what we
do is:

  1. Start with a frame_id taken from a value,
  2. Lookup the corresponding frame,
  3. Find the previous frame,
  4. Get the frame_id for that frame, and
  5. Lookup the corresponding frame
  6. Print the frame's level

Notice that steps 3 and 5 give us the exact same result, step 4 is
just wasted effort.  We could shorten this process such that we drop
steps 4 and 5, thus:

  1. Start with a frame_id taken from a value,
  2. Lookup the corresponding frame,
  3. Find the previous frame,
  6. Print the frame's level

This will give the exact same frame as a result, and this is what I
have done in this patch by removing the use of VALUE_FRAME_ID from
value_fetch_lazy_register.

Out of curiosity I looked to see how widely VALUE_FRAME_ID was used,
and saw it was only used in one other place in valops.c:value_assign,
where, once again, we take the result of VALUE_FRAME_ID and pass it to
frame_find_by_id, thus introducing a redundant frame_id lookup.

I don't think the value_assign case risks triggering the assertion
though, as we are unlikely to call value_assign while computing the
frame_id for a frame, however, we could make value_assign slightly
more efficient, with no real additional complexity, by removing the
use of VALUE_FRAME_ID.

So, in this commit, I completely remove VALUE_FRAME_ID, and replace it
with a use of VALUE_NEXT_FRAME_ID, followed by a direct call to
get_prev_frame_always, this should make no difference in either case,
and resolves the assertion issue from value.c.

One thing that is worth noticing here is that in these two situations
we don't end up getting the frame we expected to, though this is not a
result of my change, we were not getting the expected result before
either.

Consider the debug printing case, the stack is:

  normal_frame -> inline_frame -> sentinel_frame

We read a register from normal_frame (frame #1), the value of which is
fetched from sentinel_frame (frame #-1).  The debug print is trying to
say:

  frame=1,regnum=....

However, as the lazy register value points at frame #-1, we will
actually (incorrectly) print:

  frame=0,regnum=....

Like I said, this bug existed before this commit, and so, if we didn't
assert (i.e. the lazy register read occurred in some context other
than during the frame sniffer), then the debug print would previous
have, and will continue to, print the wrong frame level.  Thankfully,
this is only in debug output, so not something a normal user should
see.

In theory, the same bug exists in the value_assign code, if we are in
normal_frame and try to perform a register assignment, then GDB will
get confused and think we are assigning in the context of
inline_frame.  However, having looked at the code I think we get away
with this as the frame is used for two things that I can see:

  1. Getting the gdbarch for the frame, I can't imagine a situation
  where inline_frame has a different gdbarch to normal_frame, and

  2. Unwinding register values from frame->next.  We should be asking
  to unwind the register values from inline_frame, but really we end
  up asking to unwind from sentinel_frame.  However, if we did ask to
  unwind the values from inline_frame this would just forward the
  request on to the next frame, i.e. sentinel_frame, so we would get
  the exact same result.

In short, though we do use the wrong frame in value_assign, I think
this is harmless.

Fixing this debug printing would require GDB to require extra
information in its value location to indicate how many frames had been
skipped.  For example, with the stack:

  normal_frame -> inline_frame -> sentinel_frame

A lazy register value read in frame normal_frame would have a location
frame_id for sentinel_frame, and a skip value of 2 indicating the
value was read for a frame 2 previous.  In contrast, for a more
standard case, with a stack like this:

  normal_frame -> sentinel_frame

A lazy register value read in frame normal_frame would have a location
frame_id for sentinel_frame and a skip value of 1 indicating the value
was read for a frame 1 previous.

However, adding this seems like a lot of work to fix a single like of
debug print, but might be something we want to consider in the future.
---
 gdb/frame.c                                     | 16 ----------------
 gdb/frame.h                                     |  4 ----
 .../gdb.base/inline-frame-cycle-unwind.exp      | 17 +++++++++++++++++
 gdb/valops.c                                    | 17 +++++++++--------
 gdb/value.c                                     |  5 ++---
 gdb/value.h                                     |  6 ------
 6 files changed, 28 insertions(+), 37 deletions(-)

diff --git a/gdb/frame.c b/gdb/frame.c
index 4f612d3546b..45a34badec6 100644
--- a/gdb/frame.c
+++ b/gdb/frame.c
@@ -2552,22 +2552,6 @@ get_prev_frame (struct frame_info *this_frame)
   return get_prev_frame_always (this_frame);
 }
 
-struct frame_id
-get_prev_frame_id_by_id (struct frame_id id)
-{
-  struct frame_id prev_id;
-  struct frame_info *frame;
-
-  frame = frame_find_by_id (id);
-
-  if (frame != NULL)
-    prev_id = get_frame_id (get_prev_frame (frame));
-  else
-    prev_id = null_frame_id;
-
-  return prev_id;
-}
-
 CORE_ADDR
 get_frame_pc (struct frame_info *frame)
 {
diff --git a/gdb/frame.h b/gdb/frame.h
index 0d2bc08a47b..2548846c1ed 100644
--- a/gdb/frame.h
+++ b/gdb/frame.h
@@ -394,10 +394,6 @@ extern struct frame_info *get_prev_frame_always (struct frame_info *);
    is not found.  */
 extern struct frame_info *frame_find_by_id (struct frame_id id);
 
-/* Given a frame's ID, find the previous frame's ID.  Returns null_frame_id
-   if the frame is not found.  */
-extern struct frame_id get_prev_frame_id_by_id (struct frame_id id);
-
 /* Base attributes of a frame: */
 
 /* The frame's `resume' address.  Where the program will resume in
diff --git a/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.exp b/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.exp
index 27709fe2232..2801b683a03 100644
--- a/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.exp
+++ b/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.exp
@@ -126,3 +126,20 @@ with_test_prefix "cycle at level 1" {
 	     "#1 \[^\r\n\]* normal_func \\(\\) at \[^\r\n\]+" \
 	     "Backtrace stopped: previous frame identical to this frame \\(corrupt stack\\?\\)"]
 }
+
+# Flush the register cache (which also flushes the frame cache) so we
+# get a full backtrace again, then switch on frame debugging and try
+# to back trace.  At one point this triggered an assertion.
+gdb_test "maint flush register-cache" \
+    "Register cache flushed\\." ""
+gdb_test_no_output "set debug frame 1"
+gdb_test_multiple "bt" "backtrace with debugging on" {
+    -re "^$gdb_prompt $" {
+	pass $gdb_test_name
+    }
+    -re "\[^\r\n\]+\r\n" {
+	exp_continue
+    }
+}
+gdb_test "p 1 + 2 + 3" " = 6" \
+    "ensure GDB is still alive"
diff --git a/gdb/valops.c b/gdb/valops.c
index bd547923496..50874a5f55d 100644
--- a/gdb/valops.c
+++ b/gdb/valops.c
@@ -1197,14 +1197,15 @@ value_assign (struct value *toval, struct value *fromval)
 	struct gdbarch *gdbarch;
 	int value_reg;
 
-	/* Figure out which frame this is in currently.
-	
-	   We use VALUE_FRAME_ID for obtaining the value's frame id instead of
-	   VALUE_NEXT_FRAME_ID due to requiring a frame which may be passed to
-	   put_frame_register_bytes() below.  That function will (eventually)
-	   perform the necessary unwind operation by first obtaining the next
-	   frame.  */
-	frame = frame_find_by_id (VALUE_FRAME_ID (toval));
+	/* Figure out which frame this register value is in.  The value
+	   holds the frame_id for the next frame, that is the frame this
+	   register value was unwound from.
+
+	   Below we will call put_frame_register_bytes which requires that
+	   we pass it the actual frame in which the register value is
+	   valid, i.e. not the next frame.  */
+	frame = frame_find_by_id (VALUE_NEXT_FRAME_ID (toval));
+	frame = get_prev_frame_always (frame);
 
 	value_reg = VALUE_REGNUM (toval);
 
diff --git a/gdb/value.c b/gdb/value.c
index 6a07495d32b..91db66fa3be 100644
--- a/gdb/value.c
+++ b/gdb/value.c
@@ -3950,9 +3950,8 @@ value_fetch_lazy_register (struct value *val)
     {
       struct gdbarch *gdbarch;
       struct frame_info *frame;
-      /* VALUE_FRAME_ID is used here, instead of VALUE_NEXT_FRAME_ID,
-	 so that the frame level will be shown correctly.  */
-      frame = frame_find_by_id (VALUE_FRAME_ID (val));
+      frame = frame_find_by_id (VALUE_NEXT_FRAME_ID (val));
+      frame = get_prev_frame_always (frame);
       regnum = VALUE_REGNUM (val);
       gdbarch = get_frame_arch (frame);
 
diff --git a/gdb/value.h b/gdb/value.h
index 379cddafbe7..e1c6aabfa29 100644
--- a/gdb/value.h
+++ b/gdb/value.h
@@ -458,12 +458,6 @@ extern struct internalvar **deprecated_value_internalvar_hack (struct value *);
 extern struct frame_id *deprecated_value_next_frame_id_hack (struct value *);
 #define VALUE_NEXT_FRAME_ID(val) (*deprecated_value_next_frame_id_hack (val))
 
-/* Frame ID of frame to which a register value is relative.  This is
-   similar to VALUE_NEXT_FRAME_ID, above, but may not be assigned to. 
-   Note that VALUE_FRAME_ID effectively undoes the "next" operation
-   that was performed during the assignment to VALUE_NEXT_FRAME_ID.  */
-#define VALUE_FRAME_ID(val) (get_prev_frame_id_by_id (VALUE_NEXT_FRAME_ID (val)))
-
 /* Register number if the value is from a register.  */
 extern int *deprecated_value_regnum_hack (struct value *);
 #define VALUE_REGNUM(val) (*deprecated_value_regnum_hack (val))
-- 
2.25.4



More information about the Gdb-patches mailing list