[rfc, v2] Fix frame_id_inner comparison false positives
Ulrich Weigand
uweigand@de.ibm.com
Thu Aug 21 19:43:00 GMT 2008
Daniel Jacobowitz wrote:
> I had a concern about this patch, which I couldn't explain well in
> person - let me see if I can do better with examples.
Thanks for looking at this, and sorry for the long delay in my
getting back to this patch ...
> > - dummy_frame_push checks for stale dummy frame IDs. As suggested
> > in a comment by Andrew, this can simply do a frame_find_by_id check.
>
> frame_find_by_id is a loop calling get_prev_frame until we run out of
> stack. By default there is no limit on the number of frames GDB will
> unwind. So if the stack is very deep, or if the unwinder is confused
> and the resulting stack is essentially infinite (a common failure
> mode), frame_find_by_id may take a very long time. frame_find_by_id
> is constant time (though, as you've noted, sometimes wrong).
I think the frame_id_inner check in dummy_frame_push is simply wrong
anyway. On the one hand, there are false positives on targets where
the range of valid stack addresses is non-contiguous (e.g. with
sigaltstack, or in a multi-arch setup). On the other hand, even so
there are common situations where there are false negatives: if you
simply call multiple inferior functions one after the other from the
same stack frame, their dummy IDs will all have the same stack address,
and therefore the frame_id_inner check will not recognize the old
dummy frames may be cleaned up.
Overall, I guess I still agree with Andrew's comment that at this point,
we just should do a check whether the frame ID is still valid.
On the other hand, you're probably right that we need to take care to
avoid unnecessary looping/backtracing -- but that should be fixed *in*
frame_find_by_id.
> On the other hand, if the user has set a backtrace depth limit, then
> frame_find_by_id may fail if a function with a deep stack is called.
> We'll discard the dummy frame when it's still valid and get an extra
> sigtrap. This behavior should be straightforward to write a test case
> for: set the backtrace limit to ten, call a function which recurses
> twenty times and then hits a breakpoint, call another function, raise
> the backtrace limit and see if we've lost the outer dummy frame.
This seems simply a bug of frame_find_by_id. This function should be
using get_prev_frame_1 instead of get_prev_frame. There is no reason
why re-identifying a frame you had already selected previously should
suddenly fail just because some user-interface setting was changed ...
> This is a limit on the number of cycles frame_find_by_id will travel
> if we're in a new location or if we have a confused unwinder.
> Suppose we record a varobj at frame sp==0x1000 (stack grows down).
> We continue. Next time we hit a breakpoint sp==0xfc0. The frontend
> requests a varobj update. We unwind, find the next frame is
> sp==0x1200. The old frame is gone; we stop. Good thing, since if
> we'd kept backtracing we'd find the stack went all the way up to
> 0xff00... it'd take seconds or minutes to get up there.
Agreed, we should keep some sanity check here. See below ...
> I think the part using frame_id_inner is another corrupt stack check,
> and the chances of it forming an exact cycle are slim to none. But
> it's clear that there's a problem with the multi-stack or multi-arch
> cases that you're fixing.
This part is actually OK, as long as cross-arch frames are treated just
like signal trampoline frames are today.
> Is there some other way we can avoid unnecessary backtracing? Maybe a
> predicate which determines whether two consecutive frames are
> reasonably on the same stack - in the presence of sigaltstack that
> could be quite fragile though.
Even in the presence of non-contiguous stack ranges, we can still
keep a significant set of safety-net changes because for NORMAL
frames changes in the stack address still follow predictable rules:
* If frame NEXT is the immediate inner frame to THIS, and NEXT
is a NORMAL frame, then the stack address of NEXT must be
inner-than-or-equal to the stack address of THIS.
Therefore, if frame_id_inner (THIS, NEXT) holds, some unwind
error has occurred.
* If frame NEXT is the immediate inner frame to THIS, and NEXT
is a NORMAL frame, and NEXT and THIS have different stack
addresses, no other frame in the frame chain may have a stack
address in between.
Therefore, if frame_id_inner (TEST, THIS) holds, but
frame_id_inner (TEST, NEXT) does not hold, TEST cannot refer
to a valid frame in the frame chain.
This will catch for example the case you mention below, while at
the same time never yielding false positives in non-contiguous
stack situations.
The patch below implements this logic. Note that frame_id_inner
is now only used in frame.c, so I've made it static -- due to the
somewhat fragile logic, it's probably better if is isn't used
outside the core frame code anyway ...
What do you think?
Tested on powerpc-linux, powerpc64-linux, and spu-elf with no
regressions.
Bye,
Ulrich
ChangeLog:
* frame.h (struct frame_id): Update comments.
(frame_id_inner): Remove prototype.
* frame.c (frame_id_inner): Make static. Add comments.
(frame_find_by_id): Update frame_id_inner safety net check to avoid
false positives for targets using non-contiguous stack ranges.
Use get_prev_frame_1 instead of get_prev_frame.
(get_prev_frame_1): Update frame_id_inner safety net check.
* dummy-frame.c (dummy_frame_push): Use frame_find_by_id to
detect stale dummy frames.
* stack.c (return_command): Directly pop the selected frame.
* infrun.c (handle_inferior_event): Remove dead code.
* i386-tdep.c (i386_push_dummy_call): Update comment.
diff -urNp gdb-orig/gdb/dummy-frame.c gdb-head/gdb/dummy-frame.c
--- gdb-orig/gdb/dummy-frame.c 2008-05-01 01:19:59.000000000 +0200
+++ gdb-head/gdb/dummy-frame.c 2008-08-21 19:34:49.716119838 +0200
@@ -87,7 +87,6 @@ void
dummy_frame_push (struct regcache *caller_regcache,
const struct frame_id *dummy_id)
{
- struct gdbarch *gdbarch = get_regcache_arch (caller_regcache);
struct dummy_frame *dummy_frame;
/* Check to see if there are stale dummy frames, perhaps left over
@@ -95,8 +94,7 @@ dummy_frame_push (struct regcache *calle
the debugger. */
dummy_frame = dummy_frame_stack;
while (dummy_frame)
- /* FIXME: cagney/2004-08-02: Should just test IDs. */
- if (frame_id_inner (gdbarch, dummy_frame->id, (*dummy_id)))
+ if (frame_find_by_id (dummy_frame->id) == NULL)
/* Stale -- destroy! */
{
dummy_frame_stack = dummy_frame->next;
diff -urNp gdb-orig/gdb/frame.c gdb-head/gdb/frame.c
--- gdb-orig/gdb/frame.c 2008-08-05 22:16:25.000000000 +0200
+++ gdb-head/gdb/frame.c 2008-08-21 20:39:52.682528854 +0200
@@ -368,7 +368,33 @@ frame_id_eq (struct frame_id l, struct f
return eq;
}
-int
+/* Safety net to check whether frame ID L should be inner to
+ frame ID R, according to their stack addresses.
+
+ This method cannot be used to compare arbitrary frames, as the
+ ranges of valid stack addresses may be discontiguous (e.g. due
+ to sigaltstack).
+
+ However, it can be used as safety net to discover invalid frame
+ IDs in certain circumstances.
+
+ * If frame NEXT is the immediate inner frame to THIS, and NEXT
+ is a NORMAL frame, then the stack address of NEXT must be
+ inner-than-or-equal to the stack address of THIS.
+
+ Therefore, if frame_id_inner (THIS, NEXT) holds, some unwind
+ error has occurred.
+
+ * If frame NEXT is the immediate inner frame to THIS, and NEXT
+ is a NORMAL frame, and NEXT and THIS have different stack
+ addresses, no other frame in the frame chain may have a stack
+ address in between.
+
+ Therefore, if frame_id_inner (TEST, THIS) holds, but
+ frame_id_inner (TEST, NEXT) does not hold, TEST cannot refer
+ to a valid frame in the frame chain. */
+
+static int
frame_id_inner (struct gdbarch *gdbarch, struct frame_id l, struct frame_id r)
{
int inner;
@@ -395,28 +421,34 @@ frame_id_inner (struct gdbarch *gdbarch,
struct frame_info *
frame_find_by_id (struct frame_id id)
{
- struct frame_info *frame;
+ struct frame_info *frame, *prev_frame;
/* ZERO denotes the null frame, let the caller decide what to do
about it. Should it instead return get_current_frame()? */
if (!frame_id_p (id))
return NULL;
- for (frame = get_current_frame ();
- frame != NULL;
- frame = get_prev_frame (frame))
+ for (frame = get_current_frame (); ; frame = prev_frame)
{
struct frame_id this = get_frame_id (frame);
if (frame_id_eq (id, this))
/* An exact match. */
return frame;
- if (frame_id_inner (get_frame_arch (frame), id, this))
- /* Gone to far. */
+
+ prev_frame = get_prev_frame_1 (frame);
+ if (!prev_frame)
+ return NULL;
+
+ /* As a safety net to avoid unnecessary backtracing while trying
+ to find an invalid ID, we check for a common situation where
+ we can detect from comparing stack addresses that no other
+ frame in the current frame chain can have this ID. See the
+ comment at frame_id_inner for details. */
+ if (get_frame_type (frame) == NORMAL_FRAME
+ && !frame_id_inner (get_frame_arch (frame), id, this)
+ && frame_id_inner (get_frame_arch (prev_frame), id,
+ get_frame_id (prev_frame)))
return NULL;
- /* Either we're not yet gone far enough out along the frame
- chain (inner(this,id)), or we're comparing frameless functions
- (same .base, different .func, no test available). Struggle
- on until we've definitly gone to far. */
}
return NULL;
}
@@ -1222,11 +1254,10 @@ get_prev_frame_1 (struct frame_info *thi
/* Check that this frame's ID isn't inner to (younger, below, next)
the next frame. This happens when a frame unwind goes backwards.
- Exclude signal trampolines (due to sigaltstack the frame ID can
- go backwards) and sentinel frames (the test is meaningless). */
- if (this_frame->next->level >= 0
- && this_frame->next->unwind->type != SIGTRAMP_FRAME
- && frame_id_inner (get_frame_arch (this_frame), this_id,
+ This check is valid only if the next frame is NORMAL. See the
+ comment at frame_id_inner for details. */
+ if (this_frame->next->unwind->type == NORMAL_FRAME
+ && frame_id_inner (get_frame_arch (this_frame->next), this_id,
get_frame_id (this_frame->next)))
{
if (frame_debug)
diff -urNp gdb-orig/gdb/frame.h gdb-head/gdb/frame.h
--- gdb-orig/gdb/frame.h 2008-08-05 22:16:25.000000000 +0200
+++ gdb-head/gdb/frame.h 2008-08-21 19:51:46.145791293 +0200
@@ -111,7 +111,7 @@ struct frame_id
frames that do not change the stack but are still distinct and have
some form of distinct identifier (e.g. the ia64 which uses a 2nd
stack for registers). This field is treated as unordered - i.e. will
- not be used in frame ordering comparisons such as frame_id_inner().
+ not be used in frame ordering comparisons.
This field is valid only if special_addr_p is true. Otherwise, this
frame is considered to have a wildcard special address, i.e. one that
@@ -124,22 +124,7 @@ struct frame_id
unsigned int special_addr_p : 1;
};
-/* Methods for constructing and comparing Frame IDs.
-
- NOTE: Given stackless functions A and B, where A calls B (and hence
- B is inner-to A). The relationships: !eq(A,B); !eq(B,A);
- !inner(A,B); !inner(B,A); all hold.
-
- This is because, while B is inner-to A, B is not strictly inner-to A.
- Being stackless, they have an identical .stack_addr value, and differ
- only by their unordered .code_addr and/or .special_addr values.
-
- Because frame_id_inner is only used as a safety net (e.g.,
- detect a corrupt stack) the lack of strictness is not a problem.
- Code needing to determine an exact relationship between two frames
- must instead use frame_id_eq and frame_id_unwind. For instance,
- in the above, to determine that A stepped-into B, the equation
- "A.id != B.id && A.id == id_unwind (B)" can be used. */
+/* Methods for constructing and comparing Frame IDs. */
/* For convenience. All fields are zero. */
extern const struct frame_id null_frame_id;
@@ -176,12 +161,6 @@ extern int frame_id_p (struct frame_id l
either L or R have a zero .func, then the same frame base. */
extern int frame_id_eq (struct frame_id l, struct frame_id r);
-/* Returns non-zero when L is strictly inner-than R (they have
- different frame .bases). Neither L, nor R can be `null'. See note
- above about frameless functions. */
-extern int frame_id_inner (struct gdbarch *gdbarch, struct frame_id l,
- struct frame_id r);
-
/* Write the internal representation of a frame ID on the specified
stream. */
extern void fprint_frame_id (struct ui_file *file, struct frame_id id);
diff -urNp gdb-orig/gdb/i386-tdep.c gdb-head/gdb/i386-tdep.c
--- gdb-orig/gdb/i386-tdep.c 2008-08-12 19:13:15.000000000 +0200
+++ gdb-head/gdb/i386-tdep.c 2008-08-21 19:34:49.736116974 +0200
@@ -1712,8 +1712,8 @@ i386_push_dummy_call (struct gdbarch *gd
(i386_frame_this_id, i386_sigtramp_frame_this_id,
i386_dummy_id). It's there, since all frame unwinders for
a given target have to agree (within a certain margin) on the
- definition of the stack address of a frame. Otherwise
- frame_id_inner() won't work correctly. Since DWARF2/GCC uses the
+ definition of the stack address of a frame. Otherwise frame id
+ comparison might not work correctly. Since DWARF2/GCC uses the
stack address *before* the function call as a frame's CFA. On
the i386, when %ebp is used as a frame pointer, the offset
between the contents %ebp and the CFA as defined by GCC. */
diff -urNp gdb-orig/gdb/infrun.c gdb-head/gdb/infrun.c
--- gdb-orig/gdb/infrun.c 2008-08-08 20:14:08.000000000 +0200
+++ gdb-head/gdb/infrun.c 2008-08-21 19:34:49.796108383 +0200
@@ -3284,33 +3284,6 @@ infrun: BPSTAT_WHAT_SET_LONGJMP_RESUME (
tss->current_line = stop_pc_sal.line;
tss->current_symtab = stop_pc_sal.symtab;
- /* In the case where we just stepped out of a function into the
- middle of a line of the caller, continue stepping, but
- step_frame_id must be modified to current frame */
-#if 0
- /* NOTE: cagney/2003-10-16: I think this frame ID inner test is too
- generous. It will trigger on things like a step into a frameless
- stackless leaf function. I think the logic should instead look
- at the unwound frame ID has that should give a more robust
- indication of what happened. */
- if (step - ID == current - ID)
- still stepping in same function;
- else if (step - ID == unwind (current - ID))
- stepped into a function;
- else
- stepped out of a function;
- /* Of course this assumes that the frame ID unwind code is robust
- and we're willing to introduce frame unwind logic into this
- function. Fortunately, those days are nearly upon us. */
-#endif
- {
- struct frame_info *frame = get_current_frame ();
- struct frame_id current_frame = get_frame_id (frame);
- if (!(frame_id_inner (get_frame_arch (frame), current_frame,
- step_frame_id)))
- step_frame_id = current_frame;
- }
-
if (debug_infrun)
fprintf_unfiltered (gdb_stdlog, "infrun: keep going\n");
keep_going (ecs);
diff -urNp gdb-orig/gdb/stack.c gdb-head/gdb/stack.c
--- gdb-orig/gdb/stack.c 2008-08-05 22:16:26.000000000 +0200
+++ gdb-head/gdb/stack.c 2008-08-21 19:34:49.850100651 +0200
@@ -1855,29 +1855,8 @@ If you continue, the return value that y
error (_("Not confirmed"));
}
- /* NOTE: cagney/2003-01-18: Is this silly? Rather than pop each
- frame in turn, should this code just go straight to the relevant
- frame and pop that? */
-
- /* First discard all frames inner-to the selected frame (making the
- selected frame current). */
- {
- struct frame_id selected_id = get_frame_id (get_selected_frame (NULL));
- while (!frame_id_eq (selected_id, get_frame_id (get_current_frame ())))
- {
- struct frame_info *frame = get_current_frame ();
- if (frame_id_inner (get_frame_arch (frame), selected_id,
- get_frame_id (frame)))
- /* Caught in the safety net, oops! We've gone way past the
- selected frame. */
- error (_("Problem while popping stack frames (corrupt stack?)"));
- frame_pop (get_current_frame ());
- }
- }
-
- /* Second discard the selected frame (which is now also the current
- frame). */
- frame_pop (get_current_frame ());
+ /* Discard the selected frame and all frames inner-to it. */
+ frame_pop (get_selected_frame (NULL));
/* Store RETURN_VALUE in the just-returned register set. */
if (return_value != NULL)
--
Dr. Ulrich Weigand
GNU Toolchain for Linux on System z and Cell BE
Ulrich.Weigand@de.ibm.com
More information about the Gdb-patches
mailing list