Summary: | panic when sampling backtrace() in timer.profile | ||
---|---|---|---|
Product: | systemtap | Reporter: | brendan.gregg |
Component: | runtime | Assignee: | Mark Wielaard <mark> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | mark |
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: |
Make sure REG_STATE.cfa_is_expr is always set correctly add more sanity checks
x86 entry_64.S cfi fixup |
Description
brendan.gregg
2012-02-20 06:29:39 UTC
After some poking (also needed to increase MAXMAPENTRIES) I could finally replicate it on 3.3.0-0.rc1.git6.1.fc17.x86_64 with systemtap-devel-1.7-1.fc17.x86_64 [ 1305.976762] stap_e93c7971f27dbaecad41f45add08f2ea_2319: systemtap: 1.7/0.152, base: ffffffffa0662000, memory: 4291data/40text/86ctx/2058net/244893alloc kb, probes: 2 [ 1309.808090] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007 [ 1309.809029] IP: [<ffffffffa06620c4>] get_uleb128+0x54/0x80 [stap_e93c7971f27dbaecad41f45add08f2ea_2319] Found the root cause of this issue. It happens when a DW_CFA operation that defines the CFA as dwarf expression is followed by a DW_CFA operation that (re)defined the CFA as register+offset. In that case we forgot the reset the REG_STATE.cfa_is_expr flag which made compute_expr() interpret the reg/offset as expr pointer (because they share their values in a union). While adding more sanity checks to make sure we catch such issues I found what looks like bad CFI in the x86_64 kernel in common_interrupt (arch/x86/kernel/entry_64.S) which defines CFI "by hand" and has a CFI_DEF_CFA_REGISTER following a def_cfa_expression, which is invalid. Created attachment 6233 [details]
Make sure REG_STATE.cfa_is_expr is always set correctly add more sanity checks
Testing the following patch on the systemtap runtime side.
Created attachment 6234 [details]
x86 entry_64.S cfi fixup
This is the kernel patch that fixes the cfi for common_interrupt on x86_64.
commit 64b0cff3bee6b00cb4193ed887439c66055f85b4 Author: Mark Wielaard <mjw@redhat.com> Date: Tue Feb 21 15:08:58 2012 +0100 PR13714 - Make sure REG_STATE.cfa_is_expr is always set correctly. runtime/unwind.c (processCFI): Always set REG_STATE.cfa_is_expr and add new sanity checks to make sure the cfa definition rules are sane. Since the cfa expr pointer and cfa register/offset rule shared a union not setting REG_STATE.cfa_is_expr could result in compute_expr () wrongly being called and using the register/offset as expr pointer. Kernel patch send upstream: https://lkml.org/lkml/2012/2/21/154 Note that with the systemtap runtime patch any wrong DW_CFA_def sequence is detected. So you don't need an updated kernel. You do need the systemtap runtime patch however also with a kernel that has the kernel-cfi-fix patch applied. There could be other (correct) CFI sequences that did trigger the bug. Great - thanks for the quick fix! *** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla. |