17384 – android arm gdb "Cannot access memory at address" when I "stepi" over "blx"

Bug 17384 - android arm gdb "Cannot access memory at address" when I "stepi" over "blx"

Summary: android arm gdb "Cannot access memory at address" when I "stepi" over "blx"

Status:	NEW

Alias:	None

Product:	gdb
Classification:	Unclassified
Component:	gdb (show other bugs)
Version:	unknown

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2014-09-12 13:38 UTC by molsson
Modified:	2016-05-16 18:07 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
use catch_exceptions() instead of catch_errors() (692 bytes, patch) 2014-09-17 07:57 UTC, molsson	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description molsson 2014-09-12 13:38:03 UTC

I'm debugging chromium on an android arm devices (nexus5). There is a particular location in the code where if I put a breakpoint there, hit it, and then run "next"; then gdb prints "Cannot access memory at address 0x1". If I do the same thing but I use "stepi" instead of "next" then the error happens when an "blx" instruction is executed.

Before the error it looks like this in gdb:
http://temp.minimum.se/screenshot_20140912_151327.png

...then I run "stepi", and afterwards it looks like this:
http://temp.minimum.se/screenshot_20140912_151801.png

...I was using the standard gdb 7.6 from the android ndk r10b-rc1 when I first ran into this issue, however. I compiled an android arm gdb based on gdb master (3adc1a7 / 7.8.50.20140911-cvs to be specific) and I can repro the same bug there.

I did google a bit to see if someone else had run into this issue before me and I found this on stackoverflow:
http://stackoverflow.com/questions/15166351/cant-step-in-gdb-past-thumb-arm-transition-arm-breakpoints-are-not-set

...that bug looks quite similar to my bug, however I have this bugfix in both of the gdb versions I had reproduced the bug in so this is not the bugfix that fixes my issue. Note however that this bugfix speaks about "blx REG" and I suppose my issue is "blx ADDR". (Just guessing, I don't know assembly).

While running gdb built from gdb git master, I also sampled the backtrace leading up to the call to memory_error_message(). It looked like this:
http://temp.minimum.se/bt_for_memory_error.txt

The chromium binary I'm trying to debug was built by gcc 4.8 from the android ndk r10b-rc1.

Comment 1 molsson 2014-09-16 08:48:40 UTC

(gdb) disassemble /r $pc,+10
Dump of assembler code from 0x4def62a4 to 0x4def62ae:
=> 0x4def62a4 <blink::RenderFullScreen::createPlaceholder(WTF::PassRefPtr<blink::RenderStyle>, blink::LayoutRect const&)+12>:   b8 68   ldr r0, [r7, #8]
   0x4def62a6 <blink::RenderFullScreen::createPlaceholder(WTF::PassRefPtr<blink::RenderStyle>, blink::LayoutRect const&)+14>:   e6 f1 bc ec blx 0x4e0dcc20
   0x4def62aa <blink::RenderFullScreen::createPlaceholder(WTF::PassRefPtr<blink::RenderStyle>, blink::LayoutRect const&)+18>:   03 46   mov r3, r0
   0x4def62ac <blink::RenderFullScreen::createPlaceholder(WTF::PassRefPtr<blink::RenderStyle>, blink::LayoutRect const&)+20>:   18 46   mov r0, r3
End of assembler dump.
(gdb) disassemble /r 0x4e0dcc20,+10
Dump of assembler code from 0x4e0dcc20 to 0x4e0dcc2a:
   0x4e0dcc20:  04 c0 9f e5 ldr r12, [pc, #4]   ; 0x4e0dcc2c
   0x4e0dcc24:  0c c0 8f e0 add r12, pc, r12
   0x4e0dcc28:  1c ff 2f e1 bx  r12
End of assembler dump.


When "stepi" runs the "blx 0x4e0dcc20" a call is made to thumb_get_next_pc_raw() and in there we enter the branch:

  else if (thumb_insn_size (inst1) == 4) /* 32-bit instruction */

...and continue into the branch:

  if ((inst1 & 0xf800) == 0xf000 && (inst2 & 0x8000) == 0x8000)

...and continue into the branch for B, BL, BLX etc:

  if ((inst2 & 0x1000) != 0 || (inst2 & 0xd001) == 0xc000)

...and in the end of thumb_get_next_pc_raw() nextpc will be 0x4e0dcc20 just like expected. So basically the thumb_get_next_pc_raw() part seems to work well and this bug might not be that similar to the other bugfix mentioned on stackoverflow (see gerrit link on stackoverflow page).

Comment 2 Pedro Alves 2014-09-16 09:03:34 UTC

Pasting "http://temp.minimum.se/bt_for_memory_error.txt" here, for easier reference, and so we're protected against that resource disappearing in the future:

#0  memory_error_message (err=err@entry=TARGET_XFER_E_IO, gdbarch=0x1ed4cc0, memaddr=memaddr@entry=1) at corefile.c:196
#1  0x000000000058bd1d in memory_error (err=TARGET_XFER_E_IO, memaddr=memaddr@entry=1) at corefile.c:224
#2  0x000000000058be11 in read_memory (memaddr=1, myaddr=myaddr@entry=0x7fff5dee86d0 "\377\200\203R", len=len@entry=4) at corefile.c:261
#3  0x000000000058bec5 in read_memory_integer (memaddr=<optimized out>, len=4, byte_order=BFD_ENDIAN_LITTLE) at corefile.c:357
#4  0x000000000058bf02 in do_captured_read_memory_integer (data=data@entry=0x7fff5dee8770) at corefile.c:322
#5  0x0000000000513225 in catch_errors (func=func@entry=0x58bef0 <do_captured_read_memory_integer>, func_args=func_args@entry=0x7fff5dee8770, errstring=errstring@entry=0x6c20cd "", mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:237
#6  0x000000000058be8b in safe_read_memory_integer (memaddr=<optimized out>, len=len@entry=4, byte_order=byte_order@entry=BFD_ENDIAN_LITTLE, return_value=return_value@entry=0x7fff5dee87b8) at corefile.c:343
#7  0x000000000041122a in arm_scan_prologue (cache=0x1e51ec0, this_frame=0x1e51e00) at arm-tdep.c:1996
#8  arm_make_prologue_cache (this_frame=0x1e51e00) at arm-tdep.c:2022
#9  0x000000000041142a in arm_prologue_this_id (this_frame=0x1e51e00, this_cache=0x1e51e18, this_id=0x1e51e60) at arm-tdep.c:2052
#10 0x00000000005e4ea4 in compute_frame_id (fi=0x1e51e00) at frame.c:459
#11 get_prev_frame_if_no_cycle (this_frame=this_frame@entry=0x1e51470) at frame.c:1781
#12 0x00000000005e72ca in get_prev_frame_always_1 (this_frame=0x1e51470) at frame.c:1955
#13 get_prev_frame_always (this_frame=0x1e51470) at frame.c:1972
#14 0x00000000005e7754 in frame_unwind_caller_id (next_frame=<optimized out>) at frame.c:500
#15 0x0000000000503a07 in process_event_stop_test (ecs=ecs@entry=0x7fff5dee9070) at infrun.c:4738
#16 0x0000000000505478 in handle_inferior_event (ecs=0x7fff5dee9070) at infrun.c:3424
#17 0x00000000005079ba in fetch_inferior_event (client_data=client_data@entry=0x0) at infrun.c:2899
#18 0x000000000051dc73 in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at inf-loop.c:58
#19 0x0000000000428fce in run_async_handler_and_reschedule (scb=0x1ec5600) at ser-base.c:137
#20 0x000000000051bf73 in process_event () at event-loop.c:340
#21 0x000000000051c2c7 in gdb_do_one_event () at event-loop.c:404
#22 0x000000000051c4e7 in start_event_loop () at event-loop.c:429
#23 0x0000000000515b73 in captured_command_loop (data=data@entry=0x0) at main.c:322
#24 0x0000000000513225 in catch_errors (func=func@entry=0x515b60 <captured_command_loop>, func_args=func_args@entry=0x0, errstring=errstring@entry=0x6c20cd "", mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:237
#25 0x0000000000516a76 in captured_main (data=data@entry=0x7fff5dee93c0) at main.c:1142
#26 0x0000000000513225 in catch_errors (func=func@entry=0x516060 <captured_main>, func_args=func_args@entry=0x7fff5dee93c0, errstring=errstring@entry=0x6c20cd "", mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:237
#27 0x0000000000516f1b in gdb_main (args=args@entry=0x7fff5dee93c0) at main.c:1150
#28 0x0000000000407b65 in main (argc=<optimized out>, argv=<optimized out>) at gdb.c:32

Comment 3 Pedro Alves 2014-09-16 09:16:51 UTC

I'm not exactly sure what's going on here, but the backtrace shows a memory read coming from the prologue scanner, triggered probably from the code that tries to determine whether the program stepped into a function:

#2  0x000000000058be11 in read_memory (memaddr=1, 
...
#5  0x0000000000513225 in catch_errors (func=func@entry=0x58bef0 
...
#6  0x000000000058be8b in safe_read_memory_integer (memaddr=<optimized out>, 
#7  0x000000000041122a in arm_scan_prologue (cache=0x1e51ec0, 
#8  arm_make_prologue_cache (this_frame=0x1e51e00) at arm-tdep.c:2022
...
#12 0x00000000005e72ca in get_prev_frame_always_1 (this_frame=0x1e51470) at frame.c:1955
 ...
#15 0x0000000000503a07 in process_event_stop_test (ecs=ecs@entry=0x7fff5dee9070) at infrun.c:4738

So it kind of sounds like either the program's stack really is corrupted, or, GDB's heuristic prologue scanner gets this prologue wrong.

But the thing is that error you've shown is caught and swallowed (frames #5/#6),
so it looks like this particular memory_error call can't result in the user visible error.  There's probably another one after if you let GDB continue.

Comment 4 molsson 2014-09-16 13:41:40 UTC

You're right, there was infact 5 calls to memory_error() during the relevant "stepi" and it was only after I "continue" past the 5th one that the error is printed. The bt to that last memory_error() call is:

#0  read_memory (memaddr=1, myaddr=0x7ffff71ed760 <incomplete sequence \360>, len=4) at corefile.c:247
#1  0x000000000060a81e in read_memory_integer (memaddr=1, len=4, byte_order=BFD_ENDIAN_LITTLE) at corefile.c:357
#2  0x000000000060a774 in do_captured_read_memory_integer (data=0x7ffff71ed850) at corefile.c:322
#3  0x0000000000564fae in catch_errors (func=0x60a72d <do_captured_read_memory_integer>, func_args=0x7ffff71ed850, errstring=0x848451 "", mask=RETURN_MASK_ALL) at exceptions.c:237
#4  0x000000000060a7cb in safe_read_memory_integer (memaddr=1, len=4, byte_order=BFD_ENDIAN_LITTLE, return_value=0x7ffff71ed8d0) at corefile.c:343
#5  0x000000000040e07c in arm_scan_prologue (this_frame=0x301f6c90, cache=0x301f6d50) at arm-tdep.c:1996
#6  0x000000000040e141 in arm_make_prologue_cache (this_frame=0x301f6c90) at arm-tdep.c:2022
#7  0x000000000040e230 in arm_prologue_this_id (this_frame=0x301f6c90, this_cache=0x301f6ca8, this_id=0x301f6cf0) at arm-tdep.c:2052
#8  0x00000000006870ce in compute_frame_id (fi=0x301f6c90) at frame.c:459
#9  0x0000000000689b2e in get_prev_frame_if_no_cycle (this_frame=0x301f6300) at frame.c:1781
#10 0x000000000068a1c5 in get_prev_frame_always_1 (this_frame=0x301f6300) at frame.c:1955
#11 0x000000000068a215 in get_prev_frame_always (this_frame=0x301f6300) at frame.c:1972
#12 0x000000000068728f in frame_unwind_caller_id (next_frame=0x301f6300) at frame.c:500
#13 0x0000000000553be1 in process_event_stop_test (ecs=0x7ffff71ee0e0) at infrun.c:4738
#14 0x0000000000553016 in handle_signal_stop (ecs=0x7ffff71ee0e0) at infrun.c:4304
#15 0x0000000000552041 in handle_inferior_event (ecs=0x7ffff71ee0e0) at infrun.c:3797
#16 0x000000000055051d in fetch_inferior_event (client_data=0x0) at infrun.c:2899
#17 0x0000000000572a6e in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at inf-loop.c:58
#18 0x000000000044e435 in remote_async_serial_handler (scb=0x1d2c610, context=0x1c46b90) at remote.c:11586
#19 0x0000000000436aba in run_async_handler_and_reschedule (scb=0x1d2c610) at ser-base.c:137
#20 0x0000000000436b8a in fd_event (error=0, context=0x1d2c610) at ser-base.c:182
#21 0x000000000057086f in handle_file_event (data=...) at event-loop.c:763
#22 0x000000000056fd44 in process_event () at event-loop.c:340
#23 0x000000000056fe0b in gdb_do_one_event () at event-loop.c:404
#24 0x000000000056fe5b in start_event_loop () at event-loop.c:429
#25 0x00000000005718d6 in cli_command_loop (data=0x0) at event-top.c:182
#26 0x00000000005681cd in current_interp_command_loop () at interps.c:318
#27 0x0000000000568d79 in captured_command_loop (data=0x0) at main.c:322
#28 0x0000000000564fae in catch_errors (func=0x568d5e <captured_command_loop>, func_args=0x0, errstring=0x81a8f5 "", mask=RETURN_MASK_ALL) at exceptions.c:237
#29 0x000000000056a21e in captured_main (data=0x7ffff71ee560) at main.c:1142
#30 0x0000000000564fae in catch_errors (func=0x569153 <captured_main>, func_args=0x7ffff71ee560, errstring=0x81a8f5 "", mask=RETURN_MASK_ALL) at exceptions.c:237
#31 0x000000000056a247 in gdb_main (args=0x7ffff71ee560) at main.c:1150
#32 0x00000000004062cd in main (argc=3, argv=0x7ffff71ee668) at gdb.c:32

Comment 5 molsson 2014-09-16 13:46:26 UTC

Sorry that last comment didn't make a lot of sense, I used the wrong breakpoint.

If I put a breakpoint on memory_error_message() instead, which is the function that prints the actual error. Then I get one hit only, and from this stack: 

#0  memory_error_message (err=TARGET_XFER_E_IO, gdbarch=0x25a8a50, memaddr=1) at corefile.c:197
#1  0x000000000060a56c in memory_error (err=TARGET_XFER_E_IO, memaddr=1) at corefile.c:224
#2  0x000000000060a65a in read_memory (memaddr=1, myaddr=0x7fffcfde1b90 <incomplete sequence \360>, len=4) at corefile.c:261
#3  0x000000000060a81e in read_memory_integer (memaddr=1, len=4, byte_order=BFD_ENDIAN_LITTLE) at corefile.c:357
#4  0x000000000060a774 in do_captured_read_memory_integer (data=0x7fffcfde1c80) at corefile.c:322
#5  0x0000000000564fae in catch_errors (func=0x60a72d <do_captured_read_memory_integer>, func_args=0x7fffcfde1c80, errstring=0x848451 "", mask=RETURN_MASK_ALL) at exceptions.c:237
#6  0x000000000060a7cb in safe_read_memory_integer (memaddr=1, len=4, byte_order=BFD_ENDIAN_LITTLE, return_value=0x7fffcfde1d00) at corefile.c:343
#7  0x000000000040e07c in arm_scan_prologue (this_frame=0x30a79ad0, cache=0x30a79b90) at arm-tdep.c:1996
#8  0x000000000040e141 in arm_make_prologue_cache (this_frame=0x30a79ad0) at arm-tdep.c:2022
#9  0x000000000040e230 in arm_prologue_this_id (this_frame=0x30a79ad0, this_cache=0x30a79ae8, this_id=0x30a79b30) at arm-tdep.c:2052
#10 0x00000000006870ce in compute_frame_id (fi=0x30a79ad0) at frame.c:459
#11 0x0000000000689b2e in get_prev_frame_if_no_cycle (this_frame=0x30a79140) at frame.c:1781
#12 0x000000000068a1c5 in get_prev_frame_always_1 (this_frame=0x30a79140) at frame.c:1955
#13 0x000000000068a215 in get_prev_frame_always (this_frame=0x30a79140) at frame.c:1972
#14 0x000000000068728f in frame_unwind_caller_id (next_frame=0x30a79140) at frame.c:500
#15 0x0000000000553be1 in process_event_stop_test (ecs=0x7fffcfde2510) at infrun.c:4738
#16 0x0000000000553016 in handle_signal_stop (ecs=0x7fffcfde2510) at infrun.c:4304
#17 0x0000000000552041 in handle_inferior_event (ecs=0x7fffcfde2510) at infrun.c:3797
#18 0x000000000055051d in fetch_inferior_event (client_data=0x0) at infrun.c:2899
#19 0x0000000000572a6e in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at inf-loop.c:58
#20 0x000000000044e435 in remote_async_serial_handler (scb=0x259e610, context=0x24b8b90) at remote.c:11586
#21 0x0000000000436aba in run_async_handler_and_reschedule (scb=0x259e610) at ser-base.c:137
#22 0x0000000000436b8a in fd_event (error=0, context=0x259e610) at ser-base.c:182
#23 0x000000000057086f in handle_file_event (data=...) at event-loop.c:763
#24 0x000000000056fd44 in process_event () at event-loop.c:340
#25 0x000000000056fe0b in gdb_do_one_event () at event-loop.c:404
#26 0x000000000056fe5b in start_event_loop () at event-loop.c:429
#27 0x00000000005718d6 in cli_command_loop (data=0x0) at event-top.c:182
#28 0x00000000005681cd in current_interp_command_loop () at interps.c:318
#29 0x0000000000568d79 in captured_command_loop (data=0x0) at main.c:322
#30 0x0000000000564fae in catch_errors (func=0x568d5e <captured_command_loop>, func_args=0x0, errstring=0x81a8f5 "", mask=RETURN_MASK_ALL) at exceptions.c:237
#31 0x000000000056a21e in captured_main (data=0x7fffcfde2990) at main.c:1142
#32 0x0000000000564fae in catch_errors (func=0x569153 <captured_main>, func_args=0x7fffcfde2990, errstring=0x81a8f5 "", mask=RETURN_MASK_ALL) at exceptions.c:237
#33 0x000000000056a247 in gdb_main (args=0x7fffcfde2990) at main.c:1150
#34 0x00000000004062cd in main (argc=3, argv=0x7fffcfde2a98) at gdb.c:32

Comment 6 Pedro Alves 2014-09-16 15:10:16 UTC

> If I put a breakpoint on memory_error_message() instead, which is the function 
> that prints the actual error. Then I get one hit only, and from this stack: 

That doesn't print the error, it only builds the error string.

But I see where it's printed:

 #5  0x0000000000564fae in catch_errors (func=0x60a72d 
<do_captured_read_memory_integer>, func_args=0x7fffcfde1c80, errstring=0x848451 "", mask=RETURN_MASK_ALL) at exceptions.c:237
 #6  0x000000000060a7cb in safe_read_memory_integer (memaddr=1, len=4, byte_order=BFD_ENDIAN_LITTLE, return_value=0x7fffcfde1d00) at corefile.c:343

It's catch_errors itself:

int
catch_errors (catch_errors_ftype *func, void *func_args, char *errstring,
	      return_mask mask)
{
...
  exception_fprintf (gdb_stderr, exception, "%s", errstring);
  if (exception.reason != 0)
    return 0;
  return val;
}

Eh, that's unexpected.

This means that safe_read_memory_integer is not silent on error.  This seems bogus to me.  It should probably be using catch_exceptions instead.

Of course, that still doesn't explain why we try to read address 0x1 in the first place.

Comment 7 molsson 2014-09-17 07:57:23 UTC

Created attachment 7791 [details]
use catch_exceptions() instead of catch_errors()

FWIW, if I switch catch_errors() to catch_exceptions() using the attached patch, then I still get the error printed, this time via:

#0  fputs_maybe_filtered (linebuffer=0x146d710 "\n", stream=0x111d1e0, filter=1) at utils.c:2149
#1  0x0000000000682a44 in vfprintf_maybe_filtered (stream=0x111d1e0, format=0x819661 "\n", args=0x7fff5473a6c8, filter=1) at utils.c:2303
#2  0x0000000000682a7f in vfprintf_filtered (stream=0x111d1e0, format=0x819661 "\n", args=0x7fff5473a6c8) at utils.c:2311
#3  0x0000000000682caa in fprintf_filtered (stream=0x111d1e0, format=0x819661 "\n") at utils.c:2363
#4  0x0000000000564be6 in print_exception (file=0x111d1e0, e=...) at exceptions.c:93
#5  0x0000000000564c6f in exception_print (file=0x111d1e0, e=...) at exceptions.c:116
#6  0x0000000000564e74 in catch_exceptions_with_msg (func_uiout=0x1119830, func=0x60a6fd <do_captured_read_memory_integer>, func_args=0x7fff5473a8e0, gdberrmsg=0x0, mask=RETURN_MASK_ALL) at exceptions.c:202
#7  0x0000000000564d8d in catch_exceptions (uiout=0x1119830, func=0x60a6fd <do_captured_read_memory_integer>, func_args=0x7fff5473a8e0, mask=RETURN_MASK_ALL) at exceptions.c:169
#8  0x000000000060a7a1 in safe_read_memory_integer (memaddr=1, len=4, byte_order=BFD_ENDIAN_LITTLE, return_value=0x7fff5473a960) at corefile.c:343
#9  0x000000000040e07c in arm_scan_prologue (this_frame=0x10f6d90, cache=0x10f6e50) at arm-tdep.c:1996

Comment 8 molsson 2014-09-17 09:45:50 UTC

On the topic of "why is it reading from 0x1 in the first place", the code says:

  /* We have no symbol information.  Our only option is to assume this
   function has a standard stack frame and the normal frame register.
   Then, we can find the value of our frame pointer on entrance to
   the callee (or at the present moment if this is the innermost frame).
   The value stored there should be the address of the stmfd + 8.  */
  CORE_ADDR frame_loc;
  LONGEST return_value;

  frame_loc = get_frame_register_unsigned (this_frame, ARM_FP_REGNUM);
  if (!safe_read_memory_integer (frame_loc, 4, byte_order, &return_value))

It should be noted that I have symbols loaded for my own app but I don't have debugging info loaded for the android libraries (this is not an AOSP phone so for android .so files I have only the regular exported symbols and not full debug info). My "info sharedlibaries" look like this:
http://temp.minimum.se/sharedlibs.txt

The android system library .so files have been copied from the android device into /tmp/adb_gdb_libs_armv7/system/* host-side.

Also, I noted that if I put a breakpoint on "blink::RenderFullScreen::createPlaceholder" and then just do "bt" when the breakpoint hits, then it prints the same error after the bt is printed (so this has nothing to do with next/stepi specifically), like this:

...
#29 0x49bb355c in base::MessageLoop::Run (this=0x51406720) at ../../base/message_loop/message_loop.cc:308
#30 0x4ba00aac in content::RendererMain (parameters=...) at ../../content/renderer/renderer_main.cc:227
#31 0x4b28b3d8 in content::RunNamedProcessTypeMain (process_type=..., main_function_params=..., delegate=0x512f8750) at ../../content/app/content_main_runner.cc:486
#32 0x4b28bc00 in content::ContentMainRunnerImpl::Run (this=0x51409648) at ../../content/app/content_main_runner.cc:882
#33 0x4b2878aa in content::Start (Cannot access memory at address 0x1
env=0x512fc418, clazz=0x51406aec) at ../../content/app/android/content_main.cc:48
#34 0x48ee5c50 in ?? ()
Cannot access memory at address 0x1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

If I reformat my phone and install AOSP on it instead of the non-rooted production build I was using earlier, then if I run "bt" at the same breakpoint I see this:
#29 0x7624855c in base::MessageLoop::Run (this=0x7d9c7730) at ../../base/message_loop/message_loop.cc:308
#30 0x77fc3aac in content::RendererMain (parameters=...) at ../../content/renderer/renderer_main.cc:227
#31 0x7784e3d8 in content::RunNamedProcessTypeMain (process_type=..., main_function_params=..., delegate=0x7d8be550) at ../../content/app/content_main_runner.cc:486
#32 0x7784ec00 in content::ContentMainRunnerImpl::Run (this=0x7db542f0) at ../../content/app/content_main_runner.cc:882
#33 0x7784a8aa in content::Start (env=0x7d8c0980, clazz=0x2f400001) at ../../content/app/android/content_main.cc:48
#34 0x414f5bd0 in dvmPlatformInvoke () from /tmp/adb_gdb_libs_armv7/system/lib/libdvm.so
#35 0x41526126 in dvmCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*) () from /tmp/adb_gdb_libs_armv7/system/lib/libdvm.so
#36 0x41510828 in dvmCheckCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*) () from /tmp/adb_gdb_libs_armv7/system/lib/libdvm.so
#37 0x414fefe4 in dvmJitToInterpNoChain () from /tmp/adb_gdb_libs_armv7/system/lib/libdvm.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 

...so when running AOSP I do _not_ get the error, but gdb still fails to understand the end of the stacktrace. FWIW, when running AOSP my "info sharedlibaries" looks like this:
http://temp.minimum.se/sharedlibraries_aosp.txt
...i.e. I still have that *-marker on all the android system libraries. I don't know if the default AOSP build adds -g or if it just builds with exported-symbols-only anyway.

Comment 9 Pedro Alves 2014-09-17 10:20:20 UTC

> Also, I noted that if I put a breakpoint on 
> "blink::RenderFullScreen::createPlaceholder" and then just do "bt" when the 
> breakpoint hits, then it prints the same error after the bt is printed (so 
> this has nothing to do with next/stepi specifically) (...)

Yeah, this is an issue in the unwinder, which next/step use internally to detect when the program stepped into a function.  "bt" is the most direct way to trigger the unwinder.

If you have no (dwarf) debug/unwind info available, then GDB's fallback heuristic unwinders kick in, which, being heuristic can fail to unwind in presence of clever compiler optimizations that end up generating frames/prologues that gdb might not grok, etc.  Sometimes we may be able to improve the heuristics, often times, we won't, and debug info is the only salvation.

First, IMO, safe_read_memory_integer shouldn't ever print the error.  It's just too confusing when the unwinder kicks in for any reason other than "bt", and just throws us chasing red herrings.  I've raised this here now:

  https://sourceware.org/ml/gdb-patches/2014-09/msg00574.html

With that out of the way, this then boils down to just another case of either getting debug/unwind info for that code, or staring at the disassembly of the function and seeing whether GDB's fallback heuristic unwinder could be improved somehow.  (I'm no particular ARM expert, so I'll leave that to someone else).

Comment 10 molsson 2014-09-17 11:23:37 UTC

I put a breakpoint just below the comment "We have no symbol information" that I mentioned earlier, and then I ran the code on an AOSP device. And indeed that block runs even on AOSP. So presence of debuginfo doesn't seem to be what makes the behavioral difference in AOSP vs non-AOSP; making what you said about optimization levels an even more likely explanation.

From a high-level perspective it feels wrong to force the user to recompile/reinstall his OS just to be able to debug his application; so it would be really nice if some ARM expert tagged along and asked me the right questions to allow us to tweak the heuristics sufficiently.

---

Also, regarding safe_read_memory_integer() printing errors I should say that there is more to that part of the bug than just the fact that the error is printed. This is because after I type "next" and get the error, I cannot just run "next" again; at that point gdb just constantly prints "Cannot find bounds of current function" and refuses to move forward; like this:

Breakpoint 1, blink::RenderFullScreen::createPlaceholder (this=0x34038100, style=..., frameRect=...) at ../../third_party/WebKit/Source/core/rendering/RenderFullScreen.cpp:188
188	    if (style->width().isAuto())
(gdb) n
Cannot access memory at address 0x1
0x4e128838 in ?? () from /media/ssd/src/opera/opera/chromium/src/out_generic_armv7/Debug/lib/libblink_web.cr.so
(gdb) n
Cannot find bounds of current function
(gdb)

...however, if I run "stepi" three times consecutively I can get out of this weird state and make gdb get "unstuck" (until the next function call where it typically gets "stuck" again, needing more "stepi" to continue). Like this:

Breakpoint 1, blink::RenderFullScreen::createPlaceholder (this=0x80238100, style=..., frameRect=...) at ../../third_party/WebKit/Source/core/rendering/RenderFullScreen.cpp:188
188	    if (style->width().isAuto())
(gdb) n
0x7a6c5838 in ?? () from /media/ssd/src/opera/opera/chromium/src/out_generic_armv7/Debug/lib/libblink_web.cr.so
(gdb) n
Cannot find bounds of current function
(gdb) n
Cannot find bounds of current function
(gdb) stepi
0x7a6c583c in ?? () from /media/ssd/src/opera/opera/chromium/src/out_generic_armv7/Debug/lib/libblink_web.cr.so
(gdb) stepi
0x7a6c5840 in ?? () from /media/ssd/src/opera/opera/chromium/src/out_generic_armv7/Debug/lib/libblink_web.cr.so
(gdb) stepi
WTF::RefPtr<WTF::StringImpl>::get (this=0x0) at ../../third_party/WebKit/Source/wtf/OwnPtr.h:71
71	        PtrType get() const { return m_ptr; }
(gdb) n
0x7a6bff00 in ?? () from /media/ssd/src/opera/opera/chromium/src/out_generic_armv7/Debug/lib/libblink_web.cr.so
(gdb) stepi
0x7a6bff04 in ?? () from /media/ssd/src/opera/opera/chromium/src/out_generic_armv7/Debug/lib/libblink_web.cr.so
(gdb) stepi
0x7a6bff08 in ?? () from /media/ssd/src/opera/opera/chromium/src/out_generic_armv7/Debug/lib/libblink_web.cr.so
(gdb) stepi
blink::Length::isAuto (this=0x37a90ca8) at ../../third_party/WebKit/Source/platform/Length.h:274
274	    bool isAuto() const { return type() == Auto; }
(gdb) n
blink::RenderFullScreen::createPlaceholder (this=0x80238100, style=..., frameRect=...) at ../../third_party/WebKit/Source/core/rendering/RenderFullScreen.cpp:189
189	        style->setWidth(Length(frameRect.width(), Fixed));
(gdb) n

Comment 11 Pedro Alves 2014-09-17 11:38:52 UTC

> Also, regarding safe_read_memory_integer() printing errors I should say that 
> there is more to that part of the bug than just the fact that the error is 
> printed. This is because after I type "next" and get the error, I cannot just 
> run "next" again; at that point gdb just constantly prints "Cannot find bounds 
> of current function" and refuses to move forward; like this:

Yeah, GDB isn't very clear here.  GDB is looking for the bounds of the function in order to do the:

	      printf_filtered (_("Single stepping until exit from function %s,"
				 "\nwhich has no line number information.\n"),
			       name);

bit, which you've probably seen trigger before.

If you do "set step-mode on", GDB will fall back to "stepi" instead of erroring out.

	  /* If we have no line info, switch to stepi mode.  */
	  if (tp->control.step_range_end == 0 && step_stop_if_no_debug)
	    {
	      tp->control.step_range_start = tp->control.step_range_end = 1;
	      tp->control.may_range_step = 0;
	    }
	  else if (tp->control.step_range_end == 0)
	    {
	      const char *name;

	      if (find_pc_partial_function (pc, &name,
					    &tp->control.step_range_start,
					    &tp->control.step_range_end) == 0)
		error (_("Cannot find bounds of current function"));

	      target_terminal_ours ();
	      printf_filtered (_("Single stepping until exit from function %s,"
				 "\nwhich has no line number information.\n"),
			       name);
	    }

IMO, instead of "error", when we can't find the founds of the function, GDB should instead switch to stepi mode.  I think there's a specific bug open about this.

Comment 12 Sourceware Commits 2014-09-17 15:32:58 UTC

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gdb and binutils".

The branch, master has been updated
       via  5e43d46791c4c66fd83947a12d4f716b561a9103 (commit)
      from  2569ceb0b02cc5569af5f946d89b578510ac5ea1 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=5e43d46791c4c66fd83947a12d4f716b561a9103

commit 5e43d46791c4c66fd83947a12d4f716b561a9103
Author: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Date:   Wed Sep 17 17:29:27 2014 +0200

    PR gdb/17384: Do not print memory errors in safe_read_memory_integer
    
    If accessing memory via safe_read_memory_integer fails, that function
    used to print an error message even though callers were perfectly able
    to handle (and even expected!) failures.
    
    This patch removes the confusing message by changing the routine to
    directly use target_read_memory.
    
    gdb/ChangeLog:
    
    	PR gdb/17384
    	* corefile.c (struct captured_read_memory_integer_arguments): Remove.
    	(do_captured_read_memory_integer): Remove.
    	(safe_read_memory_integer): Use target_read_memory directly instead
    	of catching errors in do_captured_read_memory_integer.

-----------------------------------------------------------------------

Summary of changes:
 gdb/ChangeLog  |    8 ++++++++
 gdb/corefile.c |   50 +++++---------------------------------------------
 2 files changed, 13 insertions(+), 45 deletions(-)

Comment 13 molsson 2014-09-18 07:15:42 UTC

I built master (with Weigand's patch) and I don't see the error message anymore, so that part of the fix works. Thanks.

Regarding gdb getting stuck in "Cannot find bounds of current function" errors I think there is some risk of confusion if gdb just switches to stepi silently because with "next" you expect it not to move into sub functions. It might be better to change the error message "Cannot find bounds of current function" into "Cannot find bounds of current function, use stepi to continue instruction by instruction".

Further, in my particular case I'm stepping inside blink::RenderFullScreen::createPlaceholder() which is located in libblink_web.cr.so and "info sharedlibrary" confirms that I do have debug symbols loaded for this .so file. I guess this might be because of the "stack frame detection heuristics" you mentioned?

What can I do to debug that part further?

Comment 14 Pedro Alves 2014-09-18 07:28:02 UTC

> Regarding gdb getting stuck in "Cannot find bounds of current function" errors 
> I think there is some risk of confusion if gdb just switches to stepi silently 
> because with "next" you expect it not to move into sub functions.

True.  Seems like we may already have that with "set step-mode on" (*).
Maybe we could do "next" -> "nexti"; "step" -> "stepi".

(*) gosh, we should rename that setting to something that actually has meaning.

Comment 15 Pedro Alves 2014-09-18 07:33:57 UTC

> Maybe we could do "next" -> "nexti"; "step" -> "stepi".

Actually, I haven't tried that, but looking at the code, I think that's already what happens (w/ "step-mode on").

Comment 16 Pedro Alves 2014-09-18 08:44:06 UTC

> Further, in my particular case I'm stepping inside 
> blink::RenderFullScreen::createPlaceholder() which is located in 
> libblink_web.cr.so and "info sharedlibrary" confirms that I do have debug 
> symbols loaded for this .so file. I guess this might be because of the "stack 
> frame detection heuristics" you mentioned?
> What can I do to debug that part further?

Let me try to clear terminology a bit.

GDB has several unwinders installed.  If there's debug/unwind/dwarf info, we'll pick that one, and that should fully describe how to unwind correctly at each instruction's address.  If GDB can't unwind correctly with dwarf info available, then either we have a bug in the dwarf unwinder, or the debug/unwind info is incomplete or incorrect (which is more often correct, though reader bugs do happen).  The "heuristics" part kicks in when no debug/unwind info is available.  GDB then has to fall back to parsing the function prologue's instructions, looking for something that looks like a new frame being set up, and tries to figure out where in memory that starts, and where in the stack has each register been saved, etc.  It'll break down if the compiler did something too clever.  This is the part that is heuristic, as it's impossible to handle all possible input.

Because we saw this:

#7  0x000000000041122a in arm_scan_prologue (cache=0x1e51ec0, this_frame=0x1e51e00) at arm-tdep.c:1996
#8  arm_make_prologue_cache (this_frame=0x1e51e00) at arm-tdep.c:2022
#9  0x000000000041142a in arm_prologue_this_id (this_frame=0x1e51e00, this_cache=0x1e51e18, this_id=0x1e51e60) at arm-tdep.c:2052
#10 0x00000000005e4ea4 in compute_frame_id (fi=0x1e51e00) at frame.c:459
#11 get_prev_frame_if_no_cycle (this_frame=this_frame@entry=0x1e51470) at frame.c:1781
#12 0x00000000005e72ca in get_prev_frame_always_1 (this_frame=0x1e51470) at frame.c:1955

that is, you're reaching arm_scan_prologue, we can tell that GDB fell back to the ARM's heuristic unwinder.  arm_analyze_prologue, its callee, is where the the prologue parsing and finding out of where registers have been saved is.  But in order for that to be reached, GDB needs to at least know where the function starts/ends.  And it doesn not.  We can tell, because you're reaching this part:

  else
    {
      /* We have no symbol information.  Our only option is to assume this
	 function has a standard stack frame and the normal frame register.
	 Then, we can find the value of our frame pointer on entrance to
	 the callee (or at the present moment if this is the innermost frame).
	 The value stored there should be the address of the stmfd + 8.  */
      CORE_ADDR frame_loc;
      LONGEST return_value;

      frame_loc = get_frame_register_unsigned (this_frame, ARM_FP_REGNUM);
      if (!safe_read_memory_integer (frame_loc, 4, byte_order, &return_value))
        return;  <<<<<<<<<<<<<<<<<<<
      else
        {
          prologue_start = gdbarch_addr_bits_remove
			     (gdbarch, return_value) - 8;
          prologue_end = prologue_start + 64;	/* See above.  */
        }
    }

and hitting that early return, which means that GDB thinks you have _no_ symbol information whatsoever for that address, not even minimal elf info, and then the function doesn't appear to have been built with a frame pointer (-fomit-frame-pointer), as the frame pointer points at 1.

We also know that GDB believes you don't have debug info for that function by another mechanism -- "next" complains about it (the "Cannot find bounds of current function" error).

So it all indicates GDB doesn't believe you have any debug info for that address.  Is libblink_web.cr.so fully stripped?

What does "info symbol $pc" say in the problem case?

In order to see the unwinder selection process, you can set a breakpoint at frame_unwind_try_unwinder, and see GDB trying the dwarf2 unwinder first, and check why that unwinder refuses the frame.  There's also "set debug frame", but unfortunately that doesn't show anything about the unwinder selection process.  But that seems irrelevant, as we can tell from other means that you have no debug info for that function.

Comment 17 molsson 2014-09-18 14:56:29 UTC

The things that makes me believe that I do have symbols available is A) it says "Yes" in the "Syms" column of the libblink_web.cr.so line in "info sharedlibrary", also if I trace all executed commands during the build process I can see that RenderFullscreen.cpp is built with "-g" parameter.

FWIW, the list of CFLAGS (I'm skipping a metric ton of -D and -I switches) is:

-fstack-protector --param=ssp-buffer-size=4 -Werror -fno-exceptions -fno-strict-aliasing -Wall -Wno-unused-parameter -Wno-missing-field-initializers -fvisibility=hidden -pipe -fPIC -Wno-unused-local-typedefs -march=armv7-a -mtune=generic-armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp -mthumb -fno-tree-sra -fno-partial-inlining -fno-early-inlining -fno-tree-copy-prop -fno-tree-loop-optimize -fno-move-loop-invariants -fno-caller-saves -Wno-psabi -mthumb-interwork -ffunction-sections -funwind-tables -g -fstack-protector -fno-short-enums -finline-limit=64 -Wa,--noexecstack --sysroot=..../third_party/android_tools/ndk/platforms/android-14/arch-arm -isystem..../third_party/android_tools/ndk/sources/cxx-stl/stlport/stlport -O0 -g -funwind-tables -fdiagnostics-color=always -fno-rtti -fno-threadsafe-statics -fvisibility-inlines-hidden -Wsign-compare -Wno-c++0x-compat -Wno-abi -std=gnu++11 -Wno-narrowing -Wno-literal-suffix

Notably, I'm also not passing -fomit-frame-pointer for this file at least (if I grep for "omit-frame-pointer" I see several hits in the chromium tree but I'm not sure which platforms or files etc use it).

Also, I can put a breakpoint on certain other functions like for example base::debug::TaskAnnotator::RunTask() and there I can use "next" without problems so there seems to be something special with blink::RenderFullScreen::createPlaceholder() or possibly with one of the functions it calls. Maybe something with inlining, or maybe because it's static, not sure.

I will try to put some breakpoints near the unwind selection tomorrow and get back with more info on that.

Comment 18 Pedro Alves 2014-09-18 17:40:14 UTC

I've play with "objdump -h", "nm -A", "readelf --symbols", "readelf -dw" etc., to check if there are symbols that cover the missing range.  Somehow sounds like there's a hole (or gdb thinks so).  The mention of BLX makes me wonder if this is somehow a thumb-bit thing, like the thumb bit set or missing on some address, confusing bfd/gdb.

Comment 19 molsson 2014-09-19 14:54:23 UTC

This is what "info symbol $pc" says before and after running the blx instruction:

(gdb) info symbol $pc
blink::RenderFullScreen::createPlaceholder(WTF::PassRefPtr<blink::RenderStyle>, blink::LayoutRect const&) + 14 in section .text of /media/ssd/src/opera/opera/chromium/src/out_generic_armv7/Debug/lib/libblink_web
.cr.so
(gdb) stepi
0x7a6f2ac0 in ?? () from /media/ssd/src/opera/opera/chromium/src/out_generic_armv7/Debug/lib/libblink_web.cr.so
(gdb) info symbol $pc
No symbol matches $pc.

The range for listed in "info sharedlibrary" for libblink_web.cr.so is: from 0x7909aa90 to 0x7a70bae8 so the pc is still within the bounds of that .so file after the blx.

Running "file libblink_web.cr.so" on the exact file that is listed under "info sharedlibrary" says "not stripped", and running "objdump --debugging libblink_web.cr.so" on it prints lots of stuff.

Here is what some of the other tools say about this file:
http://temp.minimum.se/objdump_-h.txt
http://temp.minimum.se/nm_-A_libblink_web.cr.so.txt
http://temp.minimum.se/readelf_--symbols_libblink_web.cr.so.txt
The output of "readelf -dw" was 7GB when I CTRL-C'd it so I can't upload it.

FWIW, the size of this .so file is 955M so it's quite large (it contains all of the blink web engine). Here is the full file btw:
http://temp.minimum.se/libblink_web.cr.so

I wasn't able to experiment with the unwinder selection breakpoint today, hopefully I can do it monday.

Comment 20 Pedro Alves 2014-09-19 15:25:39 UTC

> (gdb) disassemble /r 0x4e0dcc20,+10
> Dump of assembler code from 0x4e0dcc20 to 0x4e0dcc2a:
>    0x4e0dcc20:  04 c0 9f e5 ldr r12, [pc, #4]   ; 0x4e0dcc2c
>    0x4e0dcc24:  0c c0 8f e0 add r12, pc, r12
>    0x4e0dcc28:  1c ff 2f e1 bx  r12
> End of assembler dump.

I wonder whether that's some sort of trampoline the compiler/linker is generating
and gdb is not grokking.

See arm_stub_unwind_sniffer and arm_skip_bx_reg.

Comment 21 molsson 2014-09-22 09:12:43 UTC

If I run with a breakpoint on frame_unwind_find_by_frame() as I step of the "blx" instruction, I see this happening:

It doesn't exit early when checking "target_get_unwinder" and "target_get_tailcall_unwinder", instead it goes into the "unwinders loop"; these unwinders seem to be:

$1 = {
  type = DUMMY_FRAME, 
  stop_reason = 0x68dfa1 <default_frame_unwind_stop_reason>, 
  this_id = 0x4f6e66 <dummy_frame_this_id>, 
  prev_register = 0x4f6db9 <dummy_frame_prev_register>, 
  unwind_data = 0x0, 
  sniffer = 0x4f6c83 <dummy_frame_sniffer>, 
  dealloc_cache = 0x0, 
  prev_arch = 0x0
}
$2 = {
  type = INLINE_FRAME, 
  stop_reason = 0x68dfa1 <default_frame_unwind_stop_reason>, 
  this_id = 0x6905f2 <inline_frame_this_id>, 
  prev_register = 0x69075d <inline_frame_prev_register>, 
  unwind_data = 0x0, 
  sniffer = 0x690783 <inline_frame_sniffer>, 
  dealloc_cache = 0x0, 
  prev_arch = 0x0
}
$3 = {
  type = NORMAL_FRAME, 
  stop_reason = 0x68dfa1 <default_frame_unwind_stop_reason>, 
  this_id = 0x69ea4e <tramp_frame_this_id>, 
  prev_register = 0x69ea8e <tramp_frame_prev_register>, 
  unwind_data = 0x1802db0, 
  sniffer = 0x69ec2f <tramp_frame_sniffer>, 
  dealloc_cache = 0x0, 
  prev_arch = 0x0
}
$4 = {
  type = NORMAL_FRAME, 
  stop_reason = 0x68dfa1 <default_frame_unwind_stop_reason>, 
  this_id = 0x69ea4e <tramp_frame_this_id>, 
  prev_register = 0x69ea8e <tramp_frame_prev_register>, 
  unwind_data = 0x1802d50, 
  sniffer = 0x69ec2f <tramp_frame_sniffer>, 
  dealloc_cache = 0x0, 
  prev_arch = 0x0
}
$5 = {
  type = SIGTRAMP_FRAME, 
  stop_reason = 0x68dfa1 <default_frame_unwind_stop_reason>, 
  this_id = 0x69ea4e <tramp_frame_this_id>, 
  prev_register = 0x69ea8e <tramp_frame_prev_register>, 
  unwind_data = 0x1802cf0, 
  sniffer = 0x69ec2f <tramp_frame_sniffer>, 
  dealloc_cache = 0x0, 
  prev_arch = 0x0
}
$6 = {
  type = SIGTRAMP_FRAME, 
  stop_reason = 0x68dfa1 <default_frame_unwind_stop_reason>, 
  this_id = 0x69ea4e <tramp_frame_this_id>, 
  prev_register = 0x69ea8e <tramp_frame_prev_register>, 
  unwind_data = 0x1802c90, 
  sniffer = 0x69ec2f <tramp_frame_sniffer>, 
  dealloc_cache = 0x0, 
  prev_arch = 0x0
}
$7 = {
  type = SIGTRAMP_FRAME, 
  stop_reason = 0x68dfa1 <default_frame_unwind_stop_reason>, 
  this_id = 0x69ea4e <tramp_frame_this_id>, 
  prev_register = 0x69ea8e <tramp_frame_prev_register>, 
  unwind_data = 0x1802c30, 
  sniffer = 0x69ec2f <tramp_frame_sniffer>, 
  dealloc_cache = 0x0, 
  prev_arch = 0x0
}
$8 = {
  type = SIGTRAMP_FRAME, 
  stop_reason = 0x68dfa1 <default_frame_unwind_stop_reason>, 
  this_id = 0x69ea4e <tramp_frame_this_id>, 
  prev_register = 0x69ea8e <tramp_frame_prev_register>, 
  unwind_data = 0x1802bd0, 
  sniffer = 0x69ec2f <tramp_frame_sniffer>, 
  dealloc_cache = 0x0, 
  prev_arch = 0x0
}
$9 = {
  type = SIGTRAMP_FRAME, 
  stop_reason = 0x68dfa1 <default_frame_unwind_stop_reason>, 
  this_id = 0x69ea4e <tramp_frame_this_id>, 
  prev_register = 0x69ea8e <tramp_frame_prev_register>, 
  unwind_data = 0x1802b70, 
  sniffer = 0x69ec2f <tramp_frame_sniffer>, 
  dealloc_cache = 0x0, 
  prev_arch = 0x0
}
$10 = {
  type = SIGTRAMP_FRAME, 
  stop_reason = 0x68dfa1 <default_frame_unwind_stop_reason>, 
  this_id = 0x69ea4e <tramp_frame_this_id>, 
  prev_register = 0x69ea8e <tramp_frame_prev_register>, 
  unwind_data = 0x1802ae0, 
  sniffer = 0x69ec2f <tramp_frame_sniffer>, 
  dealloc_cache = 0x0, 
  prev_arch = 0x0
}
$11 = {
  type = NORMAL_FRAME, 
  stop_reason = 0x68dfa1 <default_frame_unwind_stop_reason>, 
  this_id = 0x410457 <arm_stub_this_id>, 
  prev_register = 0x40e2e1 <arm_prologue_prev_register>, 
  unwind_data = 0x0, 
  sniffer = 0x4104e8 <arm_stub_unwind_sniffer>, 
  dealloc_cache = 0x0, 
  prev_arch = 0x0
}
$12 = {
  type = TAILCALL_FRAME, 
  stop_reason = 0x68dfa1 <default_frame_unwind_stop_reason>, 
  this_id = 0x61f731 <tailcall_frame_this_id>, 
  prev_register = 0x61fa95 <tailcall_frame_prev_register>, 
  unwind_data = 0x0, 
  sniffer = 0x61fb29 <tailcall_frame_sniffer>, 
  dealloc_cache = 0x61fe9c <tailcall_frame_dealloc_cache>, 
  prev_arch = 0x61fec2 <tailcall_frame_prev_arch>
}
$13 = {
  type = NORMAL_FRAME, 
  stop_reason = 0x61c7fd <dwarf2_frame_unwind_stop_reason>, 
  this_id = 0x61c84f <dwarf2_frame_this_id>, 
  prev_register = 0x61c925 <dwarf2_frame_prev_register>, 
  unwind_data = 0x0, 
  sniffer = 0x61cde1 <dwarf2_frame_sniffer>, 
  dealloc_cache = 0x61cd90 <dwarf2_frame_dealloc_cache>, 
  prev_arch = 0x0
}
$14 = {
  type = SIGTRAMP_FRAME, 
  stop_reason = 0x61c7fd <dwarf2_frame_unwind_stop_reason>, 
  this_id = 0x61c84f <dwarf2_frame_this_id>, 
  prev_register = 0x61c925 <dwarf2_frame_prev_register>, 
  unwind_data = 0x0, 
  sniffer = 0x61cde1 <dwarf2_frame_sniffer>, 
  dealloc_cache = 0x0, 
  prev_arch = 0x0
}
$15 = {
  type = NORMAL_FRAME, 
  stop_reason = 0x68dfa1 <default_frame_unwind_stop_reason>, 
  this_id = 0x40e204 <arm_prologue_this_id>, 
  prev_register = 0x40e2e1 <arm_prologue_prev_register>, 
  unwind_data = 0x0, 
  sniffer = 0x410197 <arm_exidx_unwind_sniffer>, 
  dealloc_cache = 0x0, 
  prev_arch = 0x0
}

This last unwinder (labelled $15 above) is the first one where I hit the "return;" part of the loop and thereby exit frame_unwind_find_by_frame().

Comment 22 Yao Qi 2014-09-22 13:12:24 UTC

(In reply to Pedro Alves from comment #20)
> > (gdb) disassemble /r 0x4e0dcc20,+10
> > Dump of assembler code from 0x4e0dcc20 to 0x4e0dcc2a:
> >    0x4e0dcc20:  04 c0 9f e5 ldr r12, [pc, #4]   ; 0x4e0dcc2c
> >    0x4e0dcc24:  0c c0 8f e0 add r12, pc, r12
> >    0x4e0dcc28:  1c ff 2f e1 bx  r12
> > End of assembler dump.
> 
> I wonder whether that's some sort of trampoline the compiler/linker is
> generating
> and gdb is not grokking.
> 

Yes, it is a trampoline and GDB isn't aware of that.

See bfd/elf32-arm.c:

static const insn_sequence elf32_arm_stub_long_branch_any_thumb_pic[] =
{
  ARM_INSN (0xe59fc004),             /* ldr   ip, [pc, #4] */
  ARM_INSN (0xe08fc00c),             /* add   ip, pc, ip */
  ARM_INSN (0xe12fff1c),             /* bx    ip */
  DATA_WORD (0, R_ARM_REL32, 0),     /* dcd   R_ARM_REL32(X) */
};

We need to teach GDB to understand it in both arm_stub_unwind_sniffer and arm_skip_stub.  I'll take a look.

Comment 23 Yao Qi 2014-09-23 09:23:56 UTC

(In reply to molsson from comment #17)

> Also, I can put a breakpoint on certain other functions like for example
> base::debug::TaskAnnotator::RunTask() and there I can use "next" without
> problems so there seems to be something special with
> blink::RenderFullScreen::createPlaceholder() or possibly with one of the
> functions it calls. Maybe something with inlining, or maybe because it's
> static, not sure.
> 

molsson,
Is your problem that can't use "next" in function blink::RenderFullScreen::createPlaceholder()?  What is the output and message of GDB?  You mixed your analysis and the problem report together, it makes harder to understand what the problem is.

Comment 24 molsson 2014-09-23 12:50:31 UTC

At first I had this error printed saying "Cannot access memory at address 0x1" but Ulrich Weigand's fix makes sure that this error is no longer printed. AFAIK that fix did so by suppressing the printout and I don't know if gdb is meant to attempt reading at 0x1 or not (sounds unlikely?) so I guess this attempted read is a side affect of incorrect assumptions about the prologue it is trying to analyze?

The main remaining bug for me is, as you say, that when I try to move through this function using "next" commands, gdb gets stuck in a mode where it just prints "Cannot find bounds of current function" without moving forward:

Breakpoint 1, blink::RenderFullScreen::createPlaceholder (this=0x7f038100, style=..., frameRect=...) at ../../third_party/WebKit/Source/core/rendering/RenderFullScreen.cpp:188
188	    if (style->width().isAuto())
(gdb) n
0x7a6f4ac0 in ?? () from /media/ssd/src/opera/opera/chromium/src/out_generic_armv7/Debug/lib/libblink_web.cr.so
(gdb) n
Cannot find bounds of current function
(gdb) n
Cannot find bounds of current function
(gdb)

Also when I'm in this mode the "bt" looks like this:

(gdb) bt
#0  0x7a6f4ac0 in ?? () from /media/ssd/src/opera/opera/chromium/src/out_generic_armv7/Debug/lib/libblink_web.cr.so
#1  0x7f038100 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Comment 25 Yao Qi 2014-09-24 11:57:17 UTC

An update here...

What I am doing right now is to get a minimal c file and a set compilation options, with them, I can produce this trampoline too.  I dig into ld source to see when such long branch stub can be used, but ld source is new to me, so it takes time to get a minimal case.

The GDB fix should be straightforward once the case is ready.

Comment 26 Yao Qi 2014-10-09 08:10:29 UTC

(In reply to molsson from comment #24)

I've got a small case which has such stub,

   0x20011ce <main+33558534>:   blx     0x20011d8
   0x020011d2 <main+33558538>:  00 23   movs    r3, #0
   ...
   ...
   0x20011d8:   ldr     r12, [pc, #4]   ; 0x20011e4
   0x20011dc:   add     r12, pc, r12
   0x20011e0:   bx      r12

 but GDB behaves correctly.

infrun: stop_pc = 0x20011d8^M
infrun: stepped into subroutine^M
infrun: inserting step-resume breakpoint at 0x20011d2^M
infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 1] at 0x20011d8^M
infrun: prepare_to_wait^M
infrun: target_wait (-1, status) =^M
infrun:   42000 [Thread 1],^M
infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP^M
infrun: infwait_normal_state^M
infrun: TARGET_WAITKIND_STOPPED^M
infrun: stop_pc = 0x20011d2^M
infrun: BPSTAT_WHAT_STEP_RESUME^M
infrun: stepped to a different line^M
infrun: stop_waiting^M
14        return 0;

so I want to see your log.

> 
> Breakpoint 1, blink::RenderFullScreen::createPlaceholder (this=0x7f038100,
> style=..., frameRect=...) at
> ../../third_party/WebKit/Source/core/rendering/RenderFullScreen.cpp:188
> 188	    if (style->width().isAuto())
> (gdb) n
> 0x7a6f4ac0 in ?? () from
> /media/ssd/src/opera/opera/chromium/src/out_generic_armv7/Debug/lib/
> libblink_web.cr.so

before type command "n", please do "set debug infrun 1".  Copy the output here.

Comment 27 molsson 2014-10-14 08:32:11 UTC

This is using the bog standard gdb 7.6 that Google ships in the Android NDK:

Breakpoint 1, blink::RenderFullScreen::createPlaceholder (this=0x7f638100, style=..., frameRect=...) at ../../third_party/WebKit/Source/core/rendering/RenderFullScreen.cpp:188
188	    if (style->width().isAuto())
(gdb) set debug infrun 1
(gdb) n
infrun: clear_proceed_status_thread (Thread 7128)
infrun: clear_proceed_status_thread (Thread 7106)
infrun: proceed (addr=0xffffffff, signal=144, step=1)
infrun: resume (step=1, signal=0), trap_expected=1, current thread [Thread 7128] at 0x7a5b1170
infrun: wait_for_inferior ()
infrun: target_wait (-1, status) =
infrun:   7106 [Thread 7128],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x7a5b1172
infrun: software single step trap for Thread 7128
infrun: stepping inside range [0x7a5b1170-0x7a5b118c]
infrun: resume (step=1, signal=0), trap_expected=0, current thread [Thread 7128] at 0x7a5b1172
infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   7106 [Thread 7128],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x7a797ac0
infrun: software single step trap for Thread 7128
infrun: no line number info
infrun: stop_stepping
0x7a797ac0 in ?? () from /media/ssd/src/opera/opera/chromium/src/out_generic_armv7/Debug/lib/libblink_web.cr.so
(gdb) n
infrun: clear_proceed_status_thread (Thread 7128)
infrun: clear_proceed_status_thread (Thread 7106)
Cannot find bounds of current function
(gdb) 


I spent a few hours yesterday trying to build gdb git master for android but I couldn't get gdbserver to build, so I wasn't able to test with that version.

Comment 28 molsson 2014-10-14 11:01:19 UTC

Here is an alternative test run; for this one I used gdb 7.8.50.20141014-cvs built from c50415e, but since I couldn't build the corresponding gdbserver I used the arm gdbserver binary from android NDK r10b instead (i.e. their android-ndk-r10b/prebuilt/android-arm/gdbserver/gdbserver ARM binary). Since there was no warning, I am _guessing_ that this gdbserver is compatible with this gdb version.

Breakpoint 1, blink::RenderFullScreen::createPlaceholder (this=0x80238100, style=..., frameRect=...) at ../../third_party/WebKit/Source/core/rendering/RenderFullScreen.cpp:188
188	    if (style->width().isAuto())
(gdb) set debug infrun 1
(gdb) n
infrun: clear_proceed_status_thread (Thread 9311)
infrun: clear_proceed_status_thread (Thread 9310)
infrun: clear_proceed_status_thread (Thread 9305)
infrun: clear_proceed_status_thread (Thread 9304)
infrun: clear_proceed_status_thread (Thread 9303)
infrun: clear_proceed_status_thread (Thread 9302)
infrun: clear_proceed_status_thread (Thread 9301)
infrun: clear_proceed_status_thread (Thread 9300)
infrun: clear_proceed_status_thread (Thread 9299)
infrun: clear_proceed_status_thread (Thread 9297)
infrun: clear_proceed_status_thread (Thread 9296)
infrun: clear_proceed_status_thread (Thread 9295)
infrun: clear_proceed_status_thread (Thread 9294)
infrun: clear_proceed_status_thread (Thread 9293)
infrun: clear_proceed_status_thread (Thread 9292)
infrun: clear_proceed_status_thread (Thread 9291)
infrun: clear_proceed_status_thread (Thread 9290)
infrun: clear_proceed_status_thread (Thread 9289)
infrun: clear_proceed_status_thread (Thread 9285)
infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=1)
infrun: skipping breakpoint: stepping past insn at: 0x7a5b1170
infrun: skipping breakpoint: stepping past insn at: 0x7a5b1170
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 9299] at 0x7a5b1170
infrun: target_wait (-1, status) =
infrun:   9285 [Thread 9299],
infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x7a5b1172
infrun: stepping inside range [0x7a5b1170-0x7a5b118c]
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 9299] at 0x7a5b1172
infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   9285 [Thread 9299],
infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x7a797ac0
infrun: no line number info
infrun: stop_waiting
0x7a797ac0 in ?? () from /media/ssd/src/opera/opera/chromium/src/out_generic_armv7/Debug/lib/libblink_web.cr.so
(gdb) n
infrun: clear_proceed_status_thread (Thread 9311)
infrun: clear_proceed_status_thread (Thread 9310)
infrun: clear_proceed_status_thread (Thread 9305)
infrun: clear_proceed_status_thread (Thread 9304)
infrun: clear_proceed_status_thread (Thread 9303)
infrun: clear_proceed_status_thread (Thread 9302)
infrun: clear_proceed_status_thread (Thread 9301)
infrun: clear_proceed_status_thread (Thread 9300)
infrun: clear_proceed_status_thread (Thread 9299)
infrun: clear_proceed_status_thread (Thread 9297)
infrun: clear_proceed_status_thread (Thread 9296)
infrun: clear_proceed_status_thread (Thread 9295)
infrun: clear_proceed_status_thread (Thread 9294)
infrun: clear_proceed_status_thread (Thread 9293)
infrun: clear_proceed_status_thread (Thread 9292)
infrun: clear_proceed_status_thread (Thread 9291)
infrun: clear_proceed_status_thread (Thread 9290)
infrun: clear_proceed_status_thread (Thread 9289)
infrun: clear_proceed_status_thread (Thread 9285)
Cannot find bounds of current function
(gdb)