16155 – Backtraces in threads don't work on AArch64

Bug 16155 - Backtraces in threads don't work on AArch64

Summary: Backtraces in threads don't work on AArch64

Status:	RESOLVED FIXED

Alias:	None

Product:	gdb
Classification:	Unclassified
Component:	backtrace (show other bugs)
Version:	7.6

Importance:	P2 critical
Target Milestone:	7.7
Assignee:	Tom Tromey

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-11-11 17:34 UTC by aph
Modified:	2013-11-22 18:12 UTC (History)
CC List:	2 users (show)

See Also:	16169
Host:
Target:	aarch64*
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description aph 2013-11-11 17:34:33 UTC

Create a thread.

Type "bt"

(gdb) bt
#0  PerfMemory::alloc (size=size@entry=56)
    at /scratch/rpmbuild/BUILD/java-1.7.0-openjdk-1.7.0.45-2.4.3.4.sa1.aarch64/openjdk/hotspot/src/share/vm/runtime/perfMemory.cpp:209
#1  0x0000007fb7e7c074 in create_entry (vlen=0, dsize=8, dtype=T_LONG, 
    this=0x7fb0009d50)
    at /scratch/rpmbuild/BUILD/java-1.7.0-openjdk-1.7.0.45-2.4.3.4.sa1.aarch64/openjdk/hotspot/src/share/vm/runtime/perfData.cpp:135
#2  PerfLong (v=PerfData::V_Monotonic, u=PerfData::U_Events, 
    namep=0x7fb7f5c168 "_sync_Inflations", ns=SUN_RT, 
    this=0x7fb0009d50)
    at /scratch/rpmbuild/BUILD/java-1.7.0-openjdk-1.7.0.45-2.4.3.4.sa1.aarch64/openjdk/hotspot/src/share/vm/runtime/perfData.cpp:191
#3  PerfLongVariant (initial_value=0, v=PerfData::V_Monotonic, 
    u=PerfData::U_Events, namep=0x7fb7f5c168 "_sync_Inflations", 
    ns=SUN_RT, this=0x7fb0009d50)
    at /scratch/rpmbuild/BUILD/java-1.7.0-openjdk-1.7.0.45-2.4.3.4.sa1.aarch64/openjdk/hotspot/src/share/vm/runtime/perfData.hpp:412
#4  PerfLongCounter (initial_value=0, u=PerfData::U_Events, 
    namep=0x7fb7f5c168 "_sync_Inflations", ns=SUN_RT, 
    this=0x7fb0009d50)
    at /scratch/rpmbuild/BUILD/java-1.7.0-openjdk-1.7.0.45-2.4.3.4.sa1.aarch64/openjdk/hotspot/src/share/vm/runtime/perfData.hpp:448
#5  PerfDataManager::create_long_counter (ns=ns@entry=SUN_RT, 
    name=name@entry=0x7fb7f5c168 "_sync_Inflations", 
    u=u@entry=PerfData::U_Events, ival=ival@entry=0, __the_thread__=
    0x7fb00073e0)
    at /scratch/rpmbuild/BUILD/java-1.7.0-openjdk-1.7.0.45-2.4.3.4.sa1.aarch64/openjdk/hotspot/src/share/vm/runtime/perfData.cpp:508
#6  0x0000007fb7e4fa74 in create_counter (
    __the_thread__=<optimized out>, u=PerfData::U_Events, 
    name=0x7fb7f5c168 "_sync_Inflations", ns=SUN_RT)
    at /scratch/rpmbuild/BUILD/java-1.7.0-openjdk-1.7.0.45-2.4.3.4.sa1.aarch64/openjdk/hotspot/src/share/vm/runtime/perfData.hpp:856
#7  ObjectMonitor::Initialize ()
    at /scratch/rpmbuild/BUILD/java-1.7.0-openjdk-1.7.0.45-2.4.3.4.sa1.aarch64/openjdk/hotspot/src/share/vm/runtime/objectMonitor.cpp:2383
#8  0x0000007fb7eeb928 in Threads::create_vm (args=<optimized out>, 
    canTryAgain=canTryAgain@entry=0x7fb7745860)
    at /scratch/rpmbuild/BUILD/java-1.7.0-openjdk-1.7.0.45-2.4.3.4.sa1.aarch64/openjdk/hotspot/src/share/vm/runtime/thread.cpp:3370
#9  0x0000007fb7d69628 in JNI_CreateJavaVM (vm=0x7fb7745998, 
    penv=0x7fb77459a0, args=<optimized out>)
    at /scratch/rpmbuild/BUILD/java-1.7.0-openjdk-1.7.0.45-2.4.3.4.sa1.aarch64/openjdk/hotspot/src/share/vm/prims/jni.cpp:5128
#10 0x00000000004048c0 in InitializeJVM (pvm=0x7fb7745998, 
    penv=0x7fb77459a0, ifn=0x7fb7745988)
    at /scratch/rpmbuild/BUILD/java-1.7.0-openjdk-1.7.0.45-2.4.3.4.sa1.aarch64/openjdk/hotspot/src/share/tools/launcher/java.c:1288
#11 0x000000000040364c in JavaMain (_args=0x7fffffed98)
    at /scratch/rpmbuild/BUILD/java-1.7.0-openjdk-1.7.0.45-2.4.3.4.sa1.aarch64/openjdk/hotspot/src/share/tools/launcher/java.c:423
#12 0x0000007fb7a18acc in start_thread (arg=0x7fb77461e0)
    at pthread_create.c:310
#13 0x0000007fb7974a2c in clone () from /lib64/libc.so.6

... hangs

Comment 1 Pedro Alves 2013-11-11 18:18:41 UTC

Sounds like something wrong with outermost frame detection.  Is GDB in an infinite loop?

Comment 2 Tom Tromey 2013-11-12 20:37:14 UTC

gdb gets stuck in a loop in value_fetch_lazy.
At each step it tries to unwind a register.
But the unwinding takes this path in dwarf2-frame.c:

    case DWARF2_FRAME_REG_UNSPECIFIED:
      /* GCC, in its infinite wisdom decided to not provide unwind
	 information for registers that are "same value".  Since
	 DWARF2 (3 draft 7) doesn't define such behavior, said
	 registers are actually undefined (which is different to CFI
	 "undefined").  Code above issues a complaint about this.
	 Here just fudge the books, assume GCC, and that the value is
	 more inner on the stack.  */
      return frame_unwind_got_register (this_frame, regnum, regnum);

... and returns the same register in the same frame each time.
This makes for an infinite loop, sucking up memory on the value
chain besides.

Sticking a QUIT into this loop at least lets it be interruptible.

That's clearly a stopgap though.
I'm not certain yet what the correct fix may be.

Also I note that the AArch64 clone.S in glibc does not have CFI
information.  That is a contributing cause of this bug.

Comment 3 Tom Tromey 2013-11-12 22:00:04 UTC

FWIW adding a QUIT there will cause an internal error
if it fires.  The unwind code gets upset.

My current fix is to add this code in dwarf2_frame_cache:

  else if (fs->retaddr_column >= fs->regs.num_regs
	  || (fs->regs.reg[fs->retaddr_column].how
	      == DWARF2_FRAME_REG_UNSPECIFIED))
    cache->undefined_retaddr = 1;

The frame in question had an unspecified return address column.
I couldn't think of a scenario in which this made sense, and
this lets dwarf2_frame_unwind_stop_reason make the right choice.

I wonder if unavailable_retaddr is the more correct choice.

With this addition, unwinding terminates properly:

(gdb) thr 6
[Switching to thread 6 (Thread 458)]
#0  0x0000007fb7ed485c in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x0000007fb7ed485c in nanosleep () from /lib64/libc.so.6
#1  0x0000007fb7ed4508 in sleep () from /lib64/libc.so.6
#2  0x00000000004008bc in thread_function (arg=0x4) at threadapply.c:73
#3  0x0000007fb7fad950 in start_thread () from /lib64/libpthread.so.0
#4  0x0000007fb7f0956c in clone () from /lib64/libc.so.6
(gdb)

Comment 4 Tom Tromey 2013-11-13 16:21:59 UTC

Mine.

Comment 5 Sourceware Commits 2013-11-22 13:43:08 UTC

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gdb and binutils".

The branch, master has been updated
       via  1dc8686c48e72fc02723d44ee0fecde0d233c74e (commit)
       via  f5b0ed3c8ce42b0dd6b6caa0b3d7b7e734311afe (commit)
       via  be2c48b4d50b992ba83bc51f086e316621a03a14 (commit)
      from  5ed365b417ae675db9bd42c6920de83027edcc0c (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=1dc8686c48e72fc02723d44ee0fecde0d233c74e

commit 1dc8686c48e72fc02723d44ee0fecde0d233c74e
Author: Pedro Alves <palves@redhat.com>
Date:   Fri Nov 22 13:17:46 2013 +0000

    Eliminate dwarf2_frame_cache recursion, don't unwind from the dwarf2 sniffer (move dwarf2_tailcall_sniffer_first elsewhere).
    
    Two rationales, same patch.
    
    TL;DR 1:
    
     dwarf2_frame_cache recursion is evil.  dwarf2_frame_cache calls
     dwarf2_tailcall_sniffer_first which then recurses into
     dwarf2_frame_cache.
    
    TL;DR 2:
    
     An unwinder trying to unwind is evil.  dwarf2_frame_sniffer calls
     dwarf2_frame_cache which calls dwarf2_tailcall_sniffer_first which
     then tries to unwind the PC of the previous frame.
    
    Avoid all that by deferring dwarf2_tailcall_sniffer_first until it's
    really necessary.
    
    Rationale 1
    ===========
    
    A frame sniffer should not try to unwind, because that bypasses all
    the validation checks done by get_prev_frame.  The UNWIND_SAME_ID
    scenario is one such case where GDB is currently broken because (in
    part) of this (the next patch adds a test that would fail without
    this).
    
    GDB goes into an infinite loop in value_fetch_lazy, here:
    
          while (VALUE_LVAL (new_val) == lval_register && value_lazy (new_val))
    	{
    	  frame = frame_find_by_id (VALUE_FRAME_ID (new_val));
    ...
    	  new_val = get_frame_register_value (frame, regnum);
    	}
    
    (top-gdb) bt
    #0  value_fetch_lazy (val=0x11516d0) at ../../src/gdb/value.c:3510
    #1  0x0000000000584bd8 in value_optimized_out (value=0x11516d0) at ../../src/gdb/value.c:1096
    #2  0x00000000006fe7a1 in frame_register_unwind (frame=0x1492600, regnum=16, optimizedp=0x7fffffffcdec, unavailablep=0x7fffffffcde8, lvalp=0x7fffffffcdd8, addrp=
        0x7fffffffcde0, realnump=0x7fffffffcddc, bufferp=0x7fffffffce10 "@\316\377\377\377\177") at ../../src/gdb/frame.c:940
    #3  0x00000000006fea3a in frame_unwind_register (frame=0x1492600, regnum=16, buf=0x7fffffffce10 "@\316\377\377\377\177") at ../../src/gdb/frame.c:990
    #4  0x0000000000473b9b in i386_unwind_pc (gdbarch=0xf54660, next_frame=0x1492600) at ../../src/gdb/i386-tdep.c:1771
    #5  0x0000000000601dfa in gdbarch_unwind_pc (gdbarch=0xf54660, next_frame=0x1492600) at ../../src/gdb/gdbarch.c:2870
    #6  0x0000000000693db5 in dwarf2_tailcall_sniffer_first (this_frame=0x1492600, tailcall_cachep=0x14926f0, entry_cfa_sp_offsetp=0x7fffffffcf00)
        at ../../src/gdb/dwarf2-frame-tailcall.c:389
    #7  0x0000000000690928 in dwarf2_frame_cache (this_frame=0x1492600, this_cache=0x1492618) at ../../src/gdb/dwarf2-frame.c:1245
    #8  0x0000000000690f46 in dwarf2_frame_sniffer (self=0x8e4980, this_frame=0x1492600, this_cache=0x1492618) at ../../src/gdb/dwarf2-frame.c:1423
    #9  0x000000000070203b in frame_unwind_find_by_frame (this_frame=0x1492600, this_cache=0x1492618) at ../../src/gdb/frame-unwind.c:112
    #10 0x00000000006fd681 in get_frame_id (fi=0x1492600) at ../../src/gdb/frame.c:408
    #11 0x00000000007006c2 in get_prev_frame_1 (this_frame=0xdc1860) at ../../src/gdb/frame.c:1826
    #12 0x0000000000700b7a in get_prev_frame (this_frame=0xdc1860) at ../../src/gdb/frame.c:2056
    #13 0x0000000000514588 in frame_info_to_frame_object (frame=0xdc1860) at ../../src/gdb/python/py-frame.c:322
    #14 0x000000000051784c in bootstrap_python_frame_filters (frame=0xdc1860, frame_low=0, frame_high=-1) at ../../src/gdb/python/py-framefilter.c:1396
    #15 0x0000000000517a6f in apply_frame_filter (frame=0xdc1860, flags=7, args_type=CLI_SCALAR_VALUES, out=0xed7a90, frame_low=0, frame_high=-1)
        at ../../src/gdb/python/py-framefilter.c:1492
    #16 0x00000000005e77b0 in backtrace_command_1 (count_exp=0x0, show_locals=0, no_filters=0, from_tty=1) at ../../src/gdb/stack.c:1777
    #17 0x00000000005e7c0f in backtrace_command (arg=0x0, from_tty=1) at ../../src/gdb/stack.c:1891
    #18 0x00000000004e37a7 in do_cfunc (c=0xda4fa0, args=0x0, from_tty=1) at ../../src/gdb/cli/cli-decode.c:107
    #19 0x00000000004e683c in cmd_func (cmd=0xda4fa0, args=0x0, from_tty=1) at ../../src/gdb/cli/cli-decode.c:1882
    #20 0x00000000006f35ed in execute_command (p=0xcc66c2 "", from_tty=1) at ../../src/gdb/top.c:468
    #21 0x00000000005f8853 in command_handler (command=0xcc66c0 "bt") at ../../src/gdb/event-top.c:435
    #22 0x00000000005f8e12 in command_line_handler (rl=0xfe05f0 "@") at ../../src/gdb/event-top.c:632
    #23 0x000000000074d2c6 in rl_callback_read_char () at ../../src/readline/callback.c:220
    #24 0x00000000005f8375 in rl_callback_read_char_wrapper (client_data=0x0) at ../../src/gdb/event-top.c:164
    #25 0x00000000005f876a in stdin_event_handler (error=0, client_data=0x0) at ../../src/gdb/event-top.c:375
    #26 0x00000000005f72fa in handle_file_event (data=...) at ../../src/gdb/event-loop.c:768
    #27 0x00000000005f67a3 in process_event () at ../../src/gdb/event-loop.c:342
    #28 0x00000000005f686a in gdb_do_one_event () at ../../src/gdb/event-loop.c:406
    #29 0x00000000005f68bb in start_event_loop () at ../../src/gdb/event-loop.c:431
    #30 0x00000000005f83a7 in cli_command_loop (data=0x0) at ../../src/gdb/event-top.c:179
    #31 0x00000000005eeed3 in current_interp_command_loop () at ../../src/gdb/interps.c:327
    #32 0x00000000005ef8ff in captured_command_loop (data=0x0) at ../../src/gdb/main.c:267
    #33 0x00000000005ed2f6 in catch_errors (func=0x5ef8e4 <captured_command_loop>, func_args=0x0, errstring=0x8b6554 "", mask=RETURN_MASK_ALL)
        at ../../src/gdb/exceptions.c:524
    #34 0x00000000005f0d21 in captured_main (data=0x7fffffffd9e0) at ../../src/gdb/main.c:1067
    #35 0x00000000005ed2f6 in catch_errors (func=0x5efb9b <captured_main>, func_args=0x7fffffffd9e0, errstring=0x8b6554 "", mask=RETURN_MASK_ALL)
        at ../../src/gdb/exceptions.c:524
    #36 0x00000000005f0d57 in gdb_main (args=0x7fffffffd9e0) at ../../src/gdb/main.c:1076
    #37 0x000000000045bb6a in main (argc=4, argv=0x7fffffffdae8) at ../../src/gdb/gdb.c:34
    (top-gdb)
    
    GDB is trying to unwind the PC register of the previous frame (frame
    #5 above), starting from the frame being sniffed (the THIS frame).
    But the THIS frame's unwinder says the PC of the previous frame is
    actually the same as the previous's frame's next frame (which is the
    same frame we started with, the THIS frame), therefore it returns an
    lval_register lazy value with frame set to THIS frame.  And so the
    value_fetch_lazy loop never ends.
    
    
    Rationale 2
    ===========
    
    As an experiment, I tried making dwarf2-frame.c:read_addr_from_reg use
    address_from_register.  That caused a bunch of regressions, but it
    actually took me a long while to figure out what was going on.  Turns
    out dwarf2-frame.c:read_addr_from_reg is called while computing the
    frame's CFA, from within dwarf2_frame_cache.  address_from_register
    wants to create a register with frame_id set to the frame being
    constructed.  To create the frame id, we again call dwarf2_frame_cache,
    which given:
    
    static struct dwarf2_frame_cache *
    dwarf2_frame_cache (struct frame_info *this_frame, void **this_cache)
    {
    ...
      if (*this_cache)
        return *this_cache;
    
    returns an incomplete object to the caller:
    static void
    dwarf2_frame_this_id (struct frame_info *this_frame, void **this_cache,
    		      struct frame_id *this_id)
    {
      struct dwarf2_frame_cache *cache =
        dwarf2_frame_cache (this_frame, this_cache);
    ...
     (*this_id) = frame_id_build (cache->cfa, get_frame_func (this_frame));
    }
    
    As cache->cfa is still 0 (we were trying to compute it!), and
    get_frame_id recalls this id from here on, we end up with a broken
    frame id in recorded for this frame.  Later, when inspecting locals,
    the dwarf machinery needs to know the selected frame's base, which
    calls get_frame_base:
    
    CORE_ADDR
    get_frame_base (struct frame_info *fi)
    {
      return get_frame_id (fi).stack_addr;
    }
    
    which as seen above then returns 0 ...
    
    So I gave up using address_from_register.
    
    But, the pain of investigating this made me want to have GDB itself
    assert that recursion never happens here.  So I wrote a patch to do
    that.  But, it triggers on current mainline, because
    dwarf2_tailcall_sniffer_first, called from dwarf2_frame_cache, unwinds
    the this_frame.
    
    A sniffer shouldn't be trying to unwind, exactly because of this sort
    of tricky issue.  The patch defers calling
    dwarf2_tailcall_sniffer_first until it's really necessary, in
    dwarf2_frame_prev_register (thus actually outside the sniffer path).
    As this makes the call to dwarf2_frame_sniffer in dwarf2_frame_cache
    unnecessary again, the patch removes that too.
    
    Tested on x86_64 Fedora 17.
    
    gdb/
    2013-11-22  Pedro Alves  <palves@redhat.com>
    
    	PR 16155
    	* dwarf2-frame.c (struct dwarf2_frame_cache)
    	<checked_tailcall_bottom, entry_cfa_sp_offset,
    	entry_cfa_sp_offset_p>: New fields.
    	(dwarf2_frame_cache): Adjust to use the new cache fields instead
    	of locals.  Don't call dwarf2_tailcall_sniffer_first here.
    	(dwarf2_frame_prev_register): Call it here, but only once.

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=f5b0ed3c8ce42b0dd6b6caa0b3d7b7e734311afe

commit f5b0ed3c8ce42b0dd6b6caa0b3d7b7e734311afe
Author: Pedro Alves <palves@redhat.com>
Date:   Thu Nov 21 15:20:09 2013 +0000

    Make use of the frame stash to detect wider stack cycles.
    
    Tested on x86_64 Fedora 17.
    
    gdb/
    2013-11-22  Pedro Alves  <palves@redhat.com>
    	    Tom Tromey  <tromey@redhat.com>
    
    	* frame.c (frame_stash_add): Now returns whether a frame with the
    	same ID was already known.
    	(compute_frame_id): New function, factored out from get_frame_id.
    	(get_frame_id): No longer lazilly compute the frame id here.
    	(get_prev_frame_if_no_cycle): New function.  Detects wider stack
    	cycles.
    	(get_prev_frame_1): Use it instead of get_prev_frame_raw directly,
    	and checking for stack cycles here.

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=be2c48b4d50b992ba83bc51f086e316621a03a14

commit be2c48b4d50b992ba83bc51f086e316621a03a14
Author: Pedro Alves <palves@redhat.com>
Date:   Fri Nov 22 11:51:59 2013 +0000

    Don't let two frames with the same id end up in the frame chain.
    
    The UNWIND_SAME_ID check is done between THIS_FRAME and the next frame
    when we go try to unwind the previous frame.  But at this point, it's
    already too late -- we ended up with two frames with the same ID in
    the frame chain.  Each frame having its own ID is an invariant assumed
    throughout GDB.  This patch applies the UNWIND_SAME_ID detection
    earlier, right after the previous frame is unwound, discarding the dup
    frame if a cycle is detected.
    
    The patch includes a new test that fails before the change.  Before
    the patch, the test causes an infinite loop in GDB, after the patch,
    the UNWIND_SAME_ID logic kicks in and makes the backtrace stop with:
    
      Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    
    The test uses dwarf CFI to emulate a corrupted stack with a cycle.  It
    has a function with registers marked DW_CFA_same_value (most
    importantly RSP/RIP), so that GDB computes the same ID for that frame
    and its caller.  IOW, something like this:
    
     #0 - frame_id_1
     #1 - frame_id_2
     #2 - frame_id_3
     #3 - frame_id_4
     #4 - frame_id_4  <<<< outermost (UNWIND_SAME_ID).
    
    (The test's code is just a copy of dw2-reg-undefined.S /
    dw2-reg-undefined.c, adjusted to use DW_CFA_same_value instead of
    DW_CFA_undefined, and to mark a different set of registers.)
    
    The infinite loop is here, in value_fetch_lazy:
    
          while (VALUE_LVAL (new_val) == lval_register && value_lazy (new_val))
    	{
    	  frame = frame_find_by_id (VALUE_FRAME_ID (new_val));
    ...
    	  new_val = get_frame_register_value (frame, regnum);
    	}
    
    get_frame_register_value can return a lazy register value pointing to
    the next frame.  This means that the register wasn't clobbered by
    FRAME; the debugger should therefore retrieve its value from the next
    frame.
    
    To be clear, get_frame_register_value unwinds the value in question
    from the next frame:
    
     struct value *
     get_frame_register_value (struct frame_info *frame, int regnum)
     {
       return frame_unwind_register_value (frame->next, regnum);
                                           ^^^^^^^^^^^
     }
    
    In other words, if we get a lazy lval_register, it should have the
    frame ID of the _next_ frame, never of FRAME.
    
    At this point in value_fetch_lazy, the whole relevant chunk of the
    stack up to frame #4 has already been unwound.  The loop always
    "unlazies" lval_registers in the "next/innermost" direction, not in
    the "prev/unwind further/outermost" direction.
    
    So say we're looking at frame #4.  get_frame_register_value in frame
    #4 can return a lazy register value of frame #3.  So the next
    iteration, frame_find_by_id tries to read the register from frame #3.
    But, since frame #4 happens to have same id as frame #3,
    frame_find_by_id returns frame #4 instead.  Rinse, repeat, and we have
    an infinite loop.
    
    This is an old latent problem, exposed by the recent addition of the
    frame stash.  Before we had a stash, frame_find_by_id(frame_id_4)
    would walk over all frames starting at the current frame, and would
    always find #3 first.  The stash happens to return #4 instead:
    
    struct frame_info *
    frame_find_by_id (struct frame_id id)
    {
      struct frame_info *frame, *prev_frame;
    
    ...
      /* Try using the frame stash first.  Finding it there removes the need
         to perform the search by looping over all frames, which can be very
         CPU-intensive if the number of frames is very high (the loop is O(n)
         and get_prev_frame performs a series of checks that are relatively
         expensive).  This optimization is particularly useful when this function
         is called from another function (such as value_fetch_lazy, case
         VALUE_LVAL (val) == lval_register) which already loops over all frames,
         making the overall behavior O(n^2).  */
      frame = frame_stash_find (id);
      if (frame)
        return frame;
    
      for (frame = get_current_frame (); ; frame = prev_frame)
        {
    
    gdb/
    2013-11-22  Pedro Alves  <palves@redhat.com>
    
    	PR 16155
    	* frame.c (get_prev_frame_1): Do the UNWIND_SAME_ID check between
    	this frame and the new previous frame, not between this frame and
    	the next frame.
    
    gdb/testsuite/
    2013-11-22  Pedro Alves  <palves@redhat.com>
    
    	PR 16155
    	* gdb.dwarf2/dw2-dup-frame.S: New file.
     	* gdb.dwarf2/dw2-dup-frame.c: New file.
     	* gdb.dwarf2/dw2-dup-frame.exp: New file.

-----------------------------------------------------------------------

Summary of changes:
 gdb/ChangeLog                              |   10 +
 gdb/dwarf2-frame.c                         |   37 ++-
 gdb/frame.c                                |  142 +++++---
 gdb/testsuite/gdb.dwarf2/dw2-dup-frame.S   |  540 ++++++++++++++++++++++++++++
 gdb/testsuite/gdb.dwarf2/dw2-dup-frame.c   |   36 ++
 gdb/testsuite/gdb.dwarf2/dw2-dup-frame.exp |   44 +++
 6 files changed, 740 insertions(+), 69 deletions(-)
 create mode 100644 gdb/testsuite/gdb.dwarf2/dw2-dup-frame.S
 create mode 100644 gdb/testsuite/gdb.dwarf2/dw2-dup-frame.c
 create mode 100644 gdb/testsuite/gdb.dwarf2/dw2-dup-frame.exp

Comment 6 Sourceware Commits 2013-11-22 13:50:37 UTC

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gdb and binutils".

The branch, master has been updated
       via  1ec56e88aa9b052ab10b806d82fbdbc8d153d977 (commit)
      from  8ad6489081e0685755f03779fb26463b83add34c (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=1ec56e88aa9b052ab10b806d82fbdbc8d153d977

commit 1ec56e88aa9b052ab10b806d82fbdbc8d153d977
Author: Pedro Alves <palves@redhat.com>
Date:   Fri Nov 22 13:17:46 2013 +0000

    Eliminate dwarf2_frame_cache recursion, don't unwind from the dwarf2 sniffer (move dwarf2_tailcall_sniffer_first elsewhere).
    
    Two rationales, same patch.
    
    TL;DR 1:
    
     dwarf2_frame_cache recursion is evil.  dwarf2_frame_cache calls
     dwarf2_tailcall_sniffer_first which then recurses into
     dwarf2_frame_cache.
    
    TL;DR 2:
    
     An unwinder trying to unwind is evil.  dwarf2_frame_sniffer calls
     dwarf2_frame_cache which calls dwarf2_tailcall_sniffer_first which
     then tries to unwind the PC of the previous frame.
    
    Avoid all that by deferring dwarf2_tailcall_sniffer_first until it's
    really necessary.
    
    Rationale 1
    ===========
    
    A frame sniffer should not try to unwind, because that bypasses all
    the validation checks done by get_prev_frame.  The UNWIND_SAME_ID
    scenario is one such case where GDB is currently broken because (in
    part) of this (the next patch adds a test that would fail without
    this).
    
    GDB goes into an infinite loop in value_fetch_lazy, here:
    
          while (VALUE_LVAL (new_val) == lval_register && value_lazy (new_val))
    	{
    	  frame = frame_find_by_id (VALUE_FRAME_ID (new_val));
    ...
    	  new_val = get_frame_register_value (frame, regnum);
    	}
    
    (top-gdb) bt
    #0  value_fetch_lazy (val=0x11516d0) at ../../src/gdb/value.c:3510
    #1  0x0000000000584bd8 in value_optimized_out (value=0x11516d0) at ../../src/gdb/value.c:1096
    #2  0x00000000006fe7a1 in frame_register_unwind (frame=0x1492600, regnum=16, optimizedp=0x7fffffffcdec, unavailablep=0x7fffffffcde8, lvalp=0x7fffffffcdd8, addrp=
        0x7fffffffcde0, realnump=0x7fffffffcddc, bufferp=0x7fffffffce10 "@\316\377\377\377\177") at ../../src/gdb/frame.c:940
    #3  0x00000000006fea3a in frame_unwind_register (frame=0x1492600, regnum=16, buf=0x7fffffffce10 "@\316\377\377\377\177") at ../../src/gdb/frame.c:990
    #4  0x0000000000473b9b in i386_unwind_pc (gdbarch=0xf54660, next_frame=0x1492600) at ../../src/gdb/i386-tdep.c:1771
    #5  0x0000000000601dfa in gdbarch_unwind_pc (gdbarch=0xf54660, next_frame=0x1492600) at ../../src/gdb/gdbarch.c:2870
    #6  0x0000000000693db5 in dwarf2_tailcall_sniffer_first (this_frame=0x1492600, tailcall_cachep=0x14926f0, entry_cfa_sp_offsetp=0x7fffffffcf00)
        at ../../src/gdb/dwarf2-frame-tailcall.c:389
    #7  0x0000000000690928 in dwarf2_frame_cache (this_frame=0x1492600, this_cache=0x1492618) at ../../src/gdb/dwarf2-frame.c:1245
    #8  0x0000000000690f46 in dwarf2_frame_sniffer (self=0x8e4980, this_frame=0x1492600, this_cache=0x1492618) at ../../src/gdb/dwarf2-frame.c:1423
    #9  0x000000000070203b in frame_unwind_find_by_frame (this_frame=0x1492600, this_cache=0x1492618) at ../../src/gdb/frame-unwind.c:112
    #10 0x00000000006fd681 in get_frame_id (fi=0x1492600) at ../../src/gdb/frame.c:408
    #11 0x00000000007006c2 in get_prev_frame_1 (this_frame=0xdc1860) at ../../src/gdb/frame.c:1826
    #12 0x0000000000700b7a in get_prev_frame (this_frame=0xdc1860) at ../../src/gdb/frame.c:2056
    #13 0x0000000000514588 in frame_info_to_frame_object (frame=0xdc1860) at ../../src/gdb/python/py-frame.c:322
    #14 0x000000000051784c in bootstrap_python_frame_filters (frame=0xdc1860, frame_low=0, frame_high=-1) at ../../src/gdb/python/py-framefilter.c:1396
    #15 0x0000000000517a6f in apply_frame_filter (frame=0xdc1860, flags=7, args_type=CLI_SCALAR_VALUES, out=0xed7a90, frame_low=0, frame_high=-1)
        at ../../src/gdb/python/py-framefilter.c:1492
    #16 0x00000000005e77b0 in backtrace_command_1 (count_exp=0x0, show_locals=0, no_filters=0, from_tty=1) at ../../src/gdb/stack.c:1777
    #17 0x00000000005e7c0f in backtrace_command (arg=0x0, from_tty=1) at ../../src/gdb/stack.c:1891
    #18 0x00000000004e37a7 in do_cfunc (c=0xda4fa0, args=0x0, from_tty=1) at ../../src/gdb/cli/cli-decode.c:107
    #19 0x00000000004e683c in cmd_func (cmd=0xda4fa0, args=0x0, from_tty=1) at ../../src/gdb/cli/cli-decode.c:1882
    #20 0x00000000006f35ed in execute_command (p=0xcc66c2 "", from_tty=1) at ../../src/gdb/top.c:468
    #21 0x00000000005f8853 in command_handler (command=0xcc66c0 "bt") at ../../src/gdb/event-top.c:435
    #22 0x00000000005f8e12 in command_line_handler (rl=0xfe05f0 "@") at ../../src/gdb/event-top.c:632
    #23 0x000000000074d2c6 in rl_callback_read_char () at ../../src/readline/callback.c:220
    #24 0x00000000005f8375 in rl_callback_read_char_wrapper (client_data=0x0) at ../../src/gdb/event-top.c:164
    #25 0x00000000005f876a in stdin_event_handler (error=0, client_data=0x0) at ../../src/gdb/event-top.c:375
    #26 0x00000000005f72fa in handle_file_event (data=...) at ../../src/gdb/event-loop.c:768
    #27 0x00000000005f67a3 in process_event () at ../../src/gdb/event-loop.c:342
    #28 0x00000000005f686a in gdb_do_one_event () at ../../src/gdb/event-loop.c:406
    #29 0x00000000005f68bb in start_event_loop () at ../../src/gdb/event-loop.c:431
    #30 0x00000000005f83a7 in cli_command_loop (data=0x0) at ../../src/gdb/event-top.c:179
    #31 0x00000000005eeed3 in current_interp_command_loop () at ../../src/gdb/interps.c:327
    #32 0x00000000005ef8ff in captured_command_loop (data=0x0) at ../../src/gdb/main.c:267
    #33 0x00000000005ed2f6 in catch_errors (func=0x5ef8e4 <captured_command_loop>, func_args=0x0, errstring=0x8b6554 "", mask=RETURN_MASK_ALL)
        at ../../src/gdb/exceptions.c:524
    #34 0x00000000005f0d21 in captured_main (data=0x7fffffffd9e0) at ../../src/gdb/main.c:1067
    #35 0x00000000005ed2f6 in catch_errors (func=0x5efb9b <captured_main>, func_args=0x7fffffffd9e0, errstring=0x8b6554 "", mask=RETURN_MASK_ALL)
        at ../../src/gdb/exceptions.c:524
    #36 0x00000000005f0d57 in gdb_main (args=0x7fffffffd9e0) at ../../src/gdb/main.c:1076
    #37 0x000000000045bb6a in main (argc=4, argv=0x7fffffffdae8) at ../../src/gdb/gdb.c:34
    (top-gdb)
    
    GDB is trying to unwind the PC register of the previous frame (frame
    #5 above), starting from the frame being sniffed (the THIS frame).
    But the THIS frame's unwinder says the PC of the previous frame is
    actually the same as the previous's frame's next frame (which is the
    same frame we started with, the THIS frame), therefore it returns an
    lval_register lazy value with frame set to THIS frame.  And so the
    value_fetch_lazy loop never ends.
    
    
    Rationale 2
    ===========
    
    As an experiment, I tried making dwarf2-frame.c:read_addr_from_reg use
    address_from_register.  That caused a bunch of regressions, but it
    actually took me a long while to figure out what was going on.  Turns
    out dwarf2-frame.c:read_addr_from_reg is called while computing the
    frame's CFA, from within dwarf2_frame_cache.  address_from_register
    wants to create a register with frame_id set to the frame being
    constructed.  To create the frame id, we again call dwarf2_frame_cache,
    which given:
    
    static struct dwarf2_frame_cache *
    dwarf2_frame_cache (struct frame_info *this_frame, void **this_cache)
    {
    ...
      if (*this_cache)
        return *this_cache;
    
    returns an incomplete object to the caller:
    static void
    dwarf2_frame_this_id (struct frame_info *this_frame, void **this_cache,
    		      struct frame_id *this_id)
    {
      struct dwarf2_frame_cache *cache =
        dwarf2_frame_cache (this_frame, this_cache);
    ...
     (*this_id) = frame_id_build (cache->cfa, get_frame_func (this_frame));
    }
    
    As cache->cfa is still 0 (we were trying to compute it!), and
    get_frame_id recalls this id from here on, we end up with a broken
    frame id in recorded for this frame.  Later, when inspecting locals,
    the dwarf machinery needs to know the selected frame's base, which
    calls get_frame_base:
    
    CORE_ADDR
    get_frame_base (struct frame_info *fi)
    {
      return get_frame_id (fi).stack_addr;
    }
    
    which as seen above then returns 0 ...
    
    So I gave up using address_from_register.
    
    But, the pain of investigating this made me want to have GDB itself
    assert that recursion never happens here.  So I wrote a patch to do
    that.  But, it triggers on current mainline, because
    dwarf2_tailcall_sniffer_first, called from dwarf2_frame_cache, unwinds
    the this_frame.
    
    A sniffer shouldn't be trying to unwind, exactly because of this sort
    of tricky issue.  The patch defers calling
    dwarf2_tailcall_sniffer_first until it's really necessary, in
    dwarf2_frame_prev_register (thus actually outside the sniffer path).
    As this makes the call to dwarf2_frame_sniffer in dwarf2_frame_cache
    unnecessary again, the patch removes that too.
    
    Tested on x86_64 Fedora 17.
    
    gdb/
    2013-11-22  Pedro Alves  <palves@redhat.com>
    
    	PR 16155
    	* dwarf2-frame.c (struct dwarf2_frame_cache)
    	<checked_tailcall_bottom, entry_cfa_sp_offset,
    	entry_cfa_sp_offset_p>: New fields.
    	(dwarf2_frame_cache): Adjust to use the new cache fields instead
    	of locals.  Don't call dwarf2_tailcall_sniffer_first here.
    	(dwarf2_frame_prev_register): Call it here, but only once.

-----------------------------------------------------------------------

Summary of changes:
 gdb/ChangeLog      |   10 ++++++++++
 gdb/dwarf2-frame.c |   37 ++++++++++++++++++++++---------------
 2 files changed, 32 insertions(+), 15 deletions(-)

Comment 7 Sourceware Commits 2013-11-22 13:53:28 UTC

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gdb and binutils".

The branch, master has been updated
       via  33f8fe58b9a55a0075a90cc9080a1716221a3f81 (commit)
      from  1ec56e88aa9b052ab10b806d82fbdbc8d153d977 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=33f8fe58b9a55a0075a90cc9080a1716221a3f81

commit 33f8fe58b9a55a0075a90cc9080a1716221a3f81
Author: Pedro Alves <palves@redhat.com>
Date:   Fri Nov 22 11:51:59 2013 +0000

    Don't let two frames with the same id end up in the frame chain.
    
    The UNWIND_SAME_ID check is done between THIS_FRAME and the next frame
    when we go try to unwind the previous frame.  But at this point, it's
    already too late -- we ended up with two frames with the same ID in
    the frame chain.  Each frame having its own ID is an invariant assumed
    throughout GDB.  This patch applies the UNWIND_SAME_ID detection
    earlier, right after the previous frame is unwound, discarding the dup
    frame if a cycle is detected.
    
    The patch includes a new test that fails before the change.  Before
    the patch, the test causes an infinite loop in GDB, after the patch,
    the UNWIND_SAME_ID logic kicks in and makes the backtrace stop with:
    
      Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    
    The test uses dwarf CFI to emulate a corrupted stack with a cycle.  It
    has a function with registers marked DW_CFA_same_value (most
    importantly RSP/RIP), so that GDB computes the same ID for that frame
    and its caller.  IOW, something like this:
    
     #0 - frame_id_1
     #1 - frame_id_2
     #2 - frame_id_3
     #3 - frame_id_4
     #4 - frame_id_4  <<<< outermost (UNWIND_SAME_ID).
    
    (The test's code is just a copy of dw2-reg-undefined.S /
    dw2-reg-undefined.c, adjusted to use DW_CFA_same_value instead of
    DW_CFA_undefined, and to mark a different set of registers.)
    
    The infinite loop is here, in value_fetch_lazy:
    
          while (VALUE_LVAL (new_val) == lval_register && value_lazy (new_val))
    	{
    	  frame = frame_find_by_id (VALUE_FRAME_ID (new_val));
    ...
    	  new_val = get_frame_register_value (frame, regnum);
    	}
    
    get_frame_register_value can return a lazy register value pointing to
    the next frame.  This means that the register wasn't clobbered by
    FRAME; the debugger should therefore retrieve its value from the next
    frame.
    
    To be clear, get_frame_register_value unwinds the value in question
    from the next frame:
    
     struct value *
     get_frame_register_value (struct frame_info *frame, int regnum)
     {
       return frame_unwind_register_value (frame->next, regnum);
                                           ^^^^^^^^^^^
     }
    
    In other words, if we get a lazy lval_register, it should have the
    frame ID of the _next_ frame, never of FRAME.
    
    At this point in value_fetch_lazy, the whole relevant chunk of the
    stack up to frame #4 has already been unwound.  The loop always
    "unlazies" lval_registers in the "next/innermost" direction, not in
    the "prev/unwind further/outermost" direction.
    
    So say we're looking at frame #4.  get_frame_register_value in frame
    #4 can return a lazy register value of frame #3.  So the next
    iteration, frame_find_by_id tries to read the register from frame #3.
    But, since frame #4 happens to have same id as frame #3,
    frame_find_by_id returns frame #4 instead.  Rinse, repeat, and we have
    an infinite loop.
    
    This is an old latent problem, exposed by the recent addition of the
    frame stash.  Before we had a stash, frame_find_by_id(frame_id_4)
    would walk over all frames starting at the current frame, and would
    always find #3 first.  The stash happens to return #4 instead:
    
    struct frame_info *
    frame_find_by_id (struct frame_id id)
    {
      struct frame_info *frame, *prev_frame;
    
    ...
      /* Try using the frame stash first.  Finding it there removes the need
         to perform the search by looping over all frames, which can be very
         CPU-intensive if the number of frames is very high (the loop is O(n)
         and get_prev_frame performs a series of checks that are relatively
         expensive).  This optimization is particularly useful when this function
         is called from another function (such as value_fetch_lazy, case
         VALUE_LVAL (val) == lval_register) which already loops over all frames,
         making the overall behavior O(n^2).  */
      frame = frame_stash_find (id);
      if (frame)
        return frame;
    
      for (frame = get_current_frame (); ; frame = prev_frame)
        {
    
    gdb/
    2013-11-22  Pedro Alves  <palves@redhat.com>
    
    	PR 16155
    	* frame.c (get_prev_frame_1): Do the UNWIND_SAME_ID check between
    	this frame and the new previous frame, not between this frame and
    	the next frame.
    
    gdb/testsuite/
    2013-11-22  Pedro Alves  <palves@redhat.com>
    
    	PR 16155
    	* gdb.dwarf2/dw2-dup-frame.S: New file.
    	* gdb.dwarf2/dw2-dup-frame.c: New file.
    	* gdb.dwarf2/dw2-dup-frame.exp: New file.

-----------------------------------------------------------------------

Summary of changes:
 gdb/ChangeLog                              |    7 +
 gdb/frame.c                                |   43 ++-
 gdb/testsuite/ChangeLog                    |    7 +
 gdb/testsuite/gdb.dwarf2/dw2-dup-frame.S   |  540 ++++++++++++++++++++++++++++
 gdb/testsuite/gdb.dwarf2/dw2-dup-frame.c   |   36 ++
 gdb/testsuite/gdb.dwarf2/dw2-dup-frame.exp |   44 +++
 6 files changed, 660 insertions(+), 17 deletions(-)
 create mode 100644 gdb/testsuite/gdb.dwarf2/dw2-dup-frame.S
 create mode 100644 gdb/testsuite/gdb.dwarf2/dw2-dup-frame.c
 create mode 100644 gdb/testsuite/gdb.dwarf2/dw2-dup-frame.exp

Comment 8 Sourceware Commits 2013-11-22 17:39:38 UTC

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gdb and binutils".

The branch, master has been updated
       via  6eeee81c8e59511962bdd83df5e7785bfdf871d2 (commit)
      from  0cb112f7400187275da81a05a9ad0534f1430139 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=6eeee81c8e59511962bdd83df5e7785bfdf871d2

commit 6eeee81c8e59511962bdd83df5e7785bfdf871d2
Author: Tom Tromey <tromey@redhat.com>
Date:   Fri Nov 22 17:38:44 2013 +0000

    Detect infinite loop in value_fetch_lazy's lval_register handling.
    
    If value_fetch_lazy loops infinitely while unwrapping lval_register
    values, it means we either somehow ended up with two frames with the
    same ID in the frame chain, or some code is trying to unwind behind
    get_prev_frame's back (e.g., a frame unwind sniffer trying to unwind).
    In any case, it should always be an internal error to end up in this
    situation.
    
    This patch adds a check and throws an internal error if the same frame
    is returned.
    
    2013-11-22  Tom Tromey  <tromey@redhat.com>
    	    Pedro Alves  <palves@redhat.com>
    
    	PR backtrace/16155
    	* value.c (value_fetch_lazy): Internal error if
    	get_frame_register_value returns the same register.

-----------------------------------------------------------------------

Summary of changes:
 gdb/ChangeLog |    7 +++++++
 gdb/value.c   |   20 +++++++++++++++++++-
 2 files changed, 26 insertions(+), 1 deletions(-)

Comment 9 Tom Tromey 2013-11-22 17:46:11 UTC

Fixed.

Comment 10 Sourceware Commits 2013-11-22 18:12:25 UTC

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gdb and binutils".

The branch, master has been updated
       via  da2b2fdf57a96f7a5b6b153e94afb747e212b17f (commit)
      from  6eeee81c8e59511962bdd83df5e7785bfdf871d2 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=da2b2fdf57a96f7a5b6b153e94afb747e212b17f

commit da2b2fdf57a96f7a5b6b153e94afb747e212b17f
Author: Tom Tromey <tromey@redhat.com>
Date:   Wed Nov 13 11:10:55 2013 -0700

    handle an unspecified return address column
    
    Debugging PR 16155 further, I found that the DWARF unwinder found the
    function in question, but thought it had no registers saved
    (fs->regs.num_regs == 0).
    
    It seems to me that if a frame does not specify the return address
    column, or if the return address column is explicitly marked as
    DWARF2_FRAME_REG_UNSPECIFIED, then we should set the
    "undefined_retaddr" flag and let the DWARF unwinder gracefully stop.
    
    This patch implements that idea.
    
    With this patch the backtrace works properly:
    
        (gdb) bt
        #0  0x0000007fb7ed485c in nanosleep () from /lib64/libc.so.6
        #1  0x0000007fb7ed4508 in sleep () from /lib64/libc.so.6
        #2  0x00000000004008bc in thread_function (arg=0x4) at threadapply.c:73
        #3  0x0000007fb7fad950 in start_thread () from /lib64/libpthread.so.0
        #4  0x0000007fb7f0956c in clone () from /lib64/libc.so.6
    
    2013-11-22  Tom Tromey  <tromey@redhat.com>
    
    	PR backtrace/16155:
    	* dwarf2-frame.c (dwarf2_frame_cache): Set undefined_retaddr if
    	the return address column is unspecified.
    
    2013-11-22  Tom Tromey  <tromey@redhat.com>
    
    	* gdb.dwarf2/dw2-bad-cfi.c: New file.
    	* gdb.dwarf2/dw2-bad-cfi.exp: New file.
    	* gdb.dwarf2/dw2-bad-cfi.S: New file.

-----------------------------------------------------------------------

Summary of changes:
 gdb/ChangeLog                            |    6 +
 gdb/dwarf2-frame.c                       |    4 +
 gdb/testsuite/ChangeLog                  |    6 +
 gdb/testsuite/gdb.dwarf2/dw2-bad-cfi.S   |  216 ++++++++++++++++++++++++++++++
 gdb/testsuite/gdb.dwarf2/dw2-bad-cfi.c   |   28 ++++
 gdb/testsuite/gdb.dwarf2/dw2-bad-cfi.exp |   42 ++++++
 6 files changed, 302 insertions(+), 0 deletions(-)
 create mode 100644 gdb/testsuite/gdb.dwarf2/dw2-bad-cfi.S
 create mode 100644 gdb/testsuite/gdb.dwarf2/dw2-bad-cfi.c
 create mode 100644 gdb/testsuite/gdb.dwarf2/dw2-bad-cfi.exp