[PATCH] Fix inline frame unwinding breakage

Andrew Burgess andrew.burgess@embecosm.com
Thu Jun 18 17:40:59 GMT 2020


* Andrew Burgess <andrew.burgess@embecosm.com> [2020-06-18 18:29:22 +0100]:

> * Andrew Burgess <andrew.burgess@embecosm.com> [2020-06-18 17:58:55 +0100]:
> 
> > * Luis Machado via Gdb-patches <gdb-patches@sourceware.org> [2020-04-14 18:38:36 -0300]:
> > 
> > > *** re-sending due to the poor choice of characters for the backtrace
> > > annotations. GIT swallowed parts of it.
> > > 
> > > There has been some breakage for aarch64-linux, arm-linux and s390-linux in
> > > terms of inline frame unwinding. There may be other targets, but these are
> > > the ones i'm aware of.
> > > 
> > > The following testcases started to show numerous failures and trigger internal
> > > errors in GDB after commit 1009d92fc621bc4d017029b90a5bfab16e17fde5,
> > > "Find tailcall frames before inline frames".
> > > 
> > > gdb.opt/inline-break.exp
> > > gdb.opt/inline-cmds.exp
> > > gdb.python/py-frame-inline.exp
> > > gdb.reverse/insn-reverse.exp
> > > 
> > > The internal errors were of this kind:
> > > 
> > > binutils-gdb/gdb/frame.c:579: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->level == 0' failed.
> > > 
> > > After a lengthy investigation to try and find the cause of these assertions,
> > > it seems we're dealing with some fragile/poorly documented code to handle inline
> > > frames and we are attempting to unwind from this fragile section of code.
> > > 
> > > Before commit 1009d92fc621bc4d017029b90a5bfab16e17fde5, the tailcall sniffer
> > > was invoked from dwarf2_frame_prev_register. By the time we invoke the
> > > dwarf2_frame_prev_register function, we've probably already calculated the
> > > frame id (via compute_frame_id).
> > > 
> > > After said commit, the call to dwarf2_tailcall_sniffer_first was moved to
> > > dwarf2_frame_cache. This is very early in a frame creation process, and
> > > we're still calculating the frame ID (so compute_frame_id is in the call
> > > stack).
> > > 
> > > This would be fine for regular frames, but the above testcases all deal
> > > with some inline frames.
> > > 
> > > The particularity of inline frames is that their frame ID's depend on
> > > the previous frame's ID, and the previous frame's ID relies in the inline
> > > frame's registers. So it is a bit of a messy situation.
> > > 
> > > We have comments in various parts of the code warning about some of these
> > > particularities.
> > > 
> > > In the case of dwarf2_tailcall_sniffer_first, we attempt to unwind the PC,
> > > which goes through various functions until we eventually invoke
> > > frame_unwind_got_register. This function will eventually attempt to create
> > > a lazy value for a particular register, and this lazy value will require
> > > a valid frame ID.  Since the inline frame doesn't have a valid frame ID
> > > yet (remember we're still calculating the previous frame's ID so we can tell
> > > what the inline frame ID is) we will call compute_frame_id for the inline
> > > frame (level 0).
> > > 
> > > We'll eventually hit the assertion above, inside get_frame_id:
> > > 
> > > --
> > >       /* If we haven't computed the frame id yet, then it must be that
> > >          this is the current frame.  Compute it now, and stash the
> > >          result.  The IDs of other frames are computed as soon as
> > >          they're created, in order to detect cycles.  See
> > >          get_prev_frame_if_no_cycle.  */
> > >       gdb_assert (fi->level == 0);
> > > --
> > > 
> > > It seems to me we shouldn't have reached this assertion without having the
> > > inline frame ID already calculated. In fact, it seems we even start recursing
> > > a bit when we invoke get_prev_frame_always within inline_frame_this_id. But
> > > a check makes us quit the recursion and proceed to compute the id.
> > > 
> > > Here's the call stack for context:
> > > 
> > > #0  get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2109
> > > RECURSION - #1  0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124
> > > #2  0x0000aaaaaae95768 in inline_frame_this_id (this_frame=0xaaaaab85a670, this_cache=0xaaaaab85a688, this_id=0xaaaaab85a6d0)
> > >     at ../../../repos/binutils-gdb/gdb/inline-frame.c:165
> > > #3  0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:550
> > > #4  0x0000aaaaaae19318 in get_frame_id (fi=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:582
> > > #5  0x0000aaaaaae13480 in value_of_register_lazy (frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/findvar.c:296
> > > #6  0x0000aaaaaae16c00 in frame_unwind_got_register (frame=0xaaaaab85a730, regnum=30, new_regnum=30) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:268
> > > #7  0x0000aaaaaad52604 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=30)
> > >     at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1296
> > > #8  0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1229
> > > #9  0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=30) at ../../../repos/binutils-gdb/gdb/frame.c:1320
> > > #10 0x0000aaaaaab76574 in aarch64_dwarf2_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)
> > >     at ../../../repos/binutils-gdb/gdb/aarch64-tdep.c:1114
> > > #11 0x0000aaaaaad52724 in dwarf2_frame_prev_register (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, regnum=32)
> > >     at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1316
> > > #12 0x0000aaaaaae1ae68 in frame_unwind_register_value (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1229
> > > #13 0x0000aaaaaae1b304 in frame_unwind_register_unsigned (next_frame=0xaaaaab85a730, regnum=32) at ../../../repos/binutils-gdb/gdb/frame.c:1320
> > > #14 0x0000aaaaaae16a84 in default_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame-unwind.c:223
> > > #15 0x0000aaaaaae32124 in gdbarch_unwind_pc (gdbarch=0xaaaaab81edc0, next_frame=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/gdbarch.c:3074
> > > #16 0x0000aaaaaad4f15c in dwarf2_tailcall_sniffer_first (this_frame=0xaaaaab85a730, tailcall_cachep=0xaaaaab85a830, entry_cfa_sp_offsetp=0x0)
> > >     at ../../../repos/binutils-gdb/gdb/dwarf2/frame-tailcall.c:388
> > > #17 0x0000aaaaaad520c0 in dwarf2_frame_cache (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748) at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1190
> > > #18 0x0000aaaaaad52204 in dwarf2_frame_this_id (this_frame=0xaaaaab85a730, this_cache=0xaaaaab85a748, this_id=0xaaaaab85a790)
> > >     at ../../../repos/binutils-gdb/gdb/dwarf2/frame.c:1218
> > > #19 0x0000aaaaaae1916c in compute_frame_id (fi=0xaaaaab85a730) at ../../../repos/binutils-gdb/gdb/frame.c:550
> > > #20 0x0000aaaaaae1c958 in get_prev_frame_if_no_cycle (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:1927
> > > #21 0x0000aaaaaae1cc44 in get_prev_frame_always_1 (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2006
> > > FIRST CALL - #22 0x0000aaaaaae1d098 in get_prev_frame_always (this_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:2124
> > > #23 0x0000aaaaaae18f68 in skip_artificial_frames (frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:495
> > > #24 0x0000aaaaaae193e8 in get_stack_frame_id (next_frame=0xaaaaab85a670) at ../../../repos/binutils-gdb/gdb/frame.c:596
> > > #25 0x0000aaaaaae87a54 in process_event_stop_test (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6857
> > > #26 0x0000aaaaaae86bdc in handle_signal_stop (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:6381
> > > #27 0x0000aaaaaae84fd0 in handle_inferior_event (ecs=0xffffffffefc8) at ../../../repos/binutils-gdb/gdb/infrun.c:5578
> > > #28 0x0000aaaaaae81588 in fetch_inferior_event (client_data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:4020
> > > #29 0x0000aaaaaae5f7fc in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at ../../../repos/binutils-gdb/gdb/inf-loop.c:43
> > > #30 0x0000aaaaaae8d768 in infrun_async_inferior_event_handler (data=0x0) at ../../../repos/binutils-gdb/gdb/infrun.c:9377
> > > #31 0x0000aaaaaabff970 in check_async_event_handlers () at ../../../repos/binutils-gdb/gdb/async-event.c:291
> > > #32 0x0000aaaaab27cbec in gdb_do_one_event () at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:194
> > > #33 0x0000aaaaaaef1894 in start_event_loop () at ../../../repos/binutils-gdb/gdb/main.c:356
> > > #34 0x0000aaaaaaef1a04 in captured_command_loop () at ../../../repos/binutils-gdb/gdb/main.c:416
> > > #35 0x0000aaaaaaef3338 in captured_main (data=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1254
> > > #36 0x0000aaaaaaef33a0 in gdb_main (args=0xfffffffff1f0) at ../../../repos/binutils-gdb/gdb/main.c:1269
> > > #37 0x0000aaaaaab6e0dc in main (argc=6, argv=0xfffffffff348) at ../../../repos/binutils-gdb/gdb/gdb.c:32
> > > 
> > > The following patch addresses this by using a function that unwinds the PC
> > > from the next (inline) frame directly as opposed to creating a lazy value
> > > that is bound to the next frame's ID (still not computed).
> > > 
> > > I've validated this for aarch64-linux and x86_64-linux by running the
> > > testsuite.
> > > 
> > > Tromey, would you mind checking if this suits your problematic core file
> > > tailcall scenario?
> > > 
> > > gdb/ChangeLog:
> > > 
> > > 2020-04-14  Luis Machado  <luis.machado@linaro.org>
> > > 
> > > 	* dwarf2/frame-tailcall.c (dwarf2_tailcall_sniffer_first): Use
> > > 	get_frame_register instead of gdbarch_unwind_pc.
> > > ---
> > >  gdb/dwarf2/frame-tailcall.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/gdb/dwarf2/frame-tailcall.c b/gdb/dwarf2/frame-tailcall.c
> > > index 2d219f13f9..01bb134a5c 100644
> > > --- a/gdb/dwarf2/frame-tailcall.c
> > > +++ b/gdb/dwarf2/frame-tailcall.c
> > > @@ -385,7 +385,9 @@ dwarf2_tailcall_sniffer_first (struct frame_info *this_frame,
> > >        prev_gdbarch = frame_unwind_arch (this_frame);
> > >  
> > >        /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */
> > > -      prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);
> > > +      get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),
> > > +			  (gdb_byte *) &prev_pc);
> > > +      prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);
> > >  
> > >        /* call_site_find_chain can throw an exception.  */
> > >        chain = call_site_find_chain (prev_gdbarch, prev_pc, this_pc);
> > 
> > I'm now no longer convinced that this patch is correct, and I'd like
> > to reopen the discussion.
> > 
> > Here's what concerns me, we used to make the following call-chain:
> > 
> >   gdbarch_unwind_pc --> frame_unwind_register_unsigned --> frame_unwind_register_value
> > 
> > Now we do this:
> > 
> >   get_frame_register --> frame_unwind_register --> frame_register_unwind --> frame_unwind_register_value
> > 
> > The problem is that gdbarch_unwind_pc' takes an argument 'next_frame',
> > while, get_frame_register takes an argument called frame', but is
> > really 'this_frame', it then passes 'frame->next' to
> > 'frame_unwind_register'.
> > 
> > What this means is that if we have a call stack like this:
> > 
> >   #3 --> #2 --> #1 --> #0
> > 
> > And we invoke the tail-call sniffer in frame #1, previously we figured
> > out the $pc value in frame #2, while now we figure out the $pc value
> > in frame #1.
> > 
> > I'm even more convinced that this is an error based on the fix patch
> > you applied later:
> > 
> >   commit 991a3e2e9944a4b3a27bd989ac03c18285bd545d
> >   Date:   Sat Apr 25 00:32:44 2020 -0300
> > 
> >       Fix remaining inline/tailcall unwinding breakage for x86_64
> > 

After this commit the code looks like this:

      /* Simulate frame_unwind_pc without setting this_frame->prev_pc.p.  */
      if (next_frame != nullptr && get_frame_type (next_frame) == INLINE_FRAME
	  && !frame_id_computed_p (next_frame))
	{
	  /* The next frame is an inline frame and its frame id has not been
	     computed yet.  */
	  get_frame_register (this_frame, gdbarch_pc_regnum (prev_gdbarch),
			      (gdb_byte *) &prev_pc);
	  prev_pc = gdbarch_addr_bits_remove (prev_gdbarch, prev_pc);
	}
      else
	prev_pc = gdbarch_unwind_pc (prev_gdbarch, this_frame);

Now the point of this function is to decide if the frame we are in
_right_now_ was tail called into by it's "apparent" caller.  That is
when we unwind we have:

  main --> foo --> bar

And in foo we ask, can we build a tailcall chain that gets us from
main to foo.  So in this case, we get the answer yes, and the chain
returned represents:

  main --> test_func --> foo

However, my thinking is that we know we're not in the position of
'foo' (that is reached by a tail call) if, foo is an inline frame -
right?

So, I wonder if it's as simple as saying:

  /* We know that THIS_FRAME was not reached by a tail call if
     THIS_FRAME is an inline frame.  */
  if (get_frame_type (this_frame) == INLINE_FRAME)
    return;

That's totally untested, just a random thought....  I wonder if such a
change would fix the original failures you saw?

Thanks,
Andrew


More information about the Gdb-patches mailing list