This is the mail archive of the
gdb@sourceware.org
mailing list for the GDB project.
Infinite Stack Unwinding ARM
- From: Johannes Stoelp <Johannes dot Stoelp at synopsys dot com>
- To: "gdb at sourceware dot org" <gdb at sourceware dot org>
- Cc: Andreas Ropers <Andreas dot Ropers at synopsys dot com>, Marc Mones <Marc dot Mones at synopsys dot com>, Kai Schuetz <Kai dot Schuetz at synopsys dot com>, Johannes Stoelp <Johannes dot Stoelp at synopsys dot com>
- Date: Tue, 4 Apr 2017 12:54:09 +0000
- Subject: Infinite Stack Unwinding ARM
- Authentication-results: sourceware.org; auth=none
Hi folks,
While debugging Linux kernel code on an ARMv8 target we ended up in an "infinite" loop when printing
the backtrace. We tracked it down that this happens when we break in any function origin in
el[01]_irq (just the ones we ran into).
Note: These functions (el[01]_irq) are handwritten assembler functions and therefore gdb uses the
prologue analyzer (gdb/aarch64-tdep.c:aarch64_analyze_prologue(...)) to unwind the previous frame.
We further analyzed the stack unwinding and found the root cause of this infinite loop.
Let's look at el1_irq for example to see what is happening in detail. While unwinding the stack,
gdb comes to the 'el1_irq' function and then starts analyzing the function prologue to further
unwind the stack.
gdb calculates the begin and end of the function prologue as follows:
start_prologue = 0xffffffc000083c80
end_prologue = 0xffffffc000083cd4
When the analyzer hits the first MRS instruction at 0xffffffc000083cc4 it stops because MRS is not supported in the analyzer.
/* el1_irq disassembly start */
Dump of assembler code for function el1_irq (arch/arm64/kernel/entry.S):
start_prologue --> 0xffffffc000083c80 <+0>: sub sp, sp, #0x30
0xffffffc000083c84 <+4>: stp x28, x29, [sp,#-16]!
0xffffffc000083c88 <+8>: stp x26, x27, [sp,#-16]!
0xffffffc000083c8c <+12>: stp x24, x25, [sp,#-16]!
0xffffffc000083c90 <+16>: stp x22, x23, [sp,#-16]!
0xffffffc000083c94 <+20>: stp x20, x21, [sp,#-16]!
0xffffffc000083c98 <+24>: stp x18, x19, [sp,#-16]!
0xffffffc000083c9c <+28>: stp x16, x17, [sp,#-16]!
0xffffffc000083ca0 <+32>: stp x14, x15, [sp,#-16]!
0xffffffc000083ca4 <+36>: stp x12, x13, [sp,#-16]!
0xffffffc000083ca8 <+40>: stp x10, x11, [sp,#-16]!
0xffffffc000083cac <+44>: stp x8, x9, [sp,#-16]!
0xffffffc000083cb0 <+48>: stp x6, x7, [sp,#-16]!
0xffffffc000083cb4 <+52>: stp x4, x5, [sp,#-16]!
0xffffffc000083cb8 <+56>: stp x2, x3, [sp,#-16]!
0xffffffc000083cbc <+60>: stp x0, x1, [sp,#-16]!
0xffffffc000083cc0 <+64>: add x21, sp, #0x120
analyze_stop --> 0xffffffc000083cc4 <+68>: mrs x22, elr_el1
0xffffffc000083cc8 <+72>: mrs x23, spsr_el1
0xffffffc000083ccc <+76>: stp x30, x21, [sp,#240]
0xffffffc000083cd0 <+80>: stp x22, x23, [sp,#256]
end_prologue --> 0xffffffc000083cd4 <+84>: msr daifclr, #0x8
0xffffffc000083cd8 <+88>: ldr x1, 0xffffffc000084210 <handle_arch_irq>
0xffffffc000083cdc <+92>: mov x0, sp
0xffffffc000083ce0 <+96>: blr x1
0xffffffc000083ce4 <+100>: mov x28, sp
...
/* el1_irq disassembly end */
The analysis doesn't see the store of the link register x30 (0xffffffc000083ccc) and therefore
assumes the previous value of the link register is still valid.
The link register actually has the value of
x30 = 0xffffffc000083ce4 (pointing into el1_irq)
and the unwinder then uses this value to determine the caller of el1_irq, which oh wonder is again el1_irq.
The other thing is that gdb doesn't recognize this situation as "Backtrace stopped: previous frame
identical to this frame (corrupt stack?)". This in turn happens because the symbolic calculation
for the stack pointer register looks as following after the prologue analyzer stops:
(gdb) p/x regs[AARCH64_SP_REGNUM]
$3 = {
kind = 0x2, // pv_register
reg = 0x1f, // 31
k = 0xfffffffffffffee0 // #-0x120
}
gdb then tries to use this stack pointer as frame pointer.
/* gdb/aarch64-tdep.c:aarch64_analyze_prologue(...) start */
...
if (pv_is_register (regs[AARCH64_FP_REGNUM], AARCH64_SP_REGNUM)) {
/* Frame pointer is fp. Frame size is constant. */
cache->framereg = AARCH64_FP_REGNUM;
cache->framesize = -regs[AARCH64_FP_REGNUM].k;
}
else if (pv_is_register (regs[AARCH64_SP_REGNUM], AARCH64_SP_REGNUM)) {
/* Try the stack pointer. */
cache->framesize = -regs[AARCH64_SP_REGNUM].k;
cache->framereg = AARCH64_SP_REGNUM;
}
else {
/* We're just out of luck. We don't know where the frame is. */
cache->framereg = -1;
cache->framesize = 0;
}
...
/* gdb/aarch64-tdep.c:aarch64_analyze_prologue(...) end */
Which in turn is used for calculating the previous stack pointer.
/* gdb/aarch64-tdep.c:aarch64_make_prologue_cache_1 (...) start */
...
aarch64_scan_prologue (this_frame, cache);
if (cache->framereg == -1)
return;
unwound_fp = get_frame_register_unsigned (this_frame, cache->framereg);
if (unwound_fp == 0)
return;
cache->prev_sp = unwound_fp + cache->framesize;
...
/* gdb/aarch64-tdep.c:aarch64_make_prologue_cache_1 (...) end */
The result of this is that gdb interprets the situation as a recursion.
* Anyone ran into similar situations with the arm prologue analyzer?
* Anyone worked on an extension for the prologue analyzer to support SYSRegs and therefore
instructions like MRS?
* A defensive workaround I'm currently using is to stop the stack unwinding when hitting an
unsupported instruction.
-Johannes