The following program: int func (void) { int *foo = (void *) 0x1234; *foo = 0x12345; return 0; } int main (void) { return func (); } compiled with gcc -o bad_code bad_code.c and the following stap script: probe kernel.function("show_signal_msg") { /*(PF_USER | PR_WRITE) */ if (execname() == "bad_code") { if ($error_code & 0x6) { printf ("\nUser mode process %s [pid: %d] received a SIGSEGV - error_code: 0x%x\n", execname(), pid(), $error_code) print_ubacktrace() } } } ran with: stap -d ./bad_code --ldd show_signal_msg.stp -c ./bad_code produces the following (correct) user backtrace on 3.3.5-2.fc16.x86_64: User mode process bad_code [pid: 18431] received a SIGSEGV - error_code: 0x6 0x400484 : func+0x10/0x1d [/usr/local/build/systemtap-obj/bad_code] 0x40049a : main+0x9/0xf [/usr/local/build/systemtap-obj/bad_code] 0x7fd419d1069d : __libc_start_main+0xed/0x1c0 [/lib64/libc-2.14.90.so] 0x4003b9 : _start+0x29/0x2c [/usr/local/build/systemtap-obj/bad_code] But on some other x86_64 kernels it produces: WARNING: _stp_read_address failed to access memory location User mode process bad_code [pid: 12152] received a SIGSEGV - error_code: 0x6 0x400484 : func+0x10/0x1d [/home/mark/build/systemtap-obj/bad_code] Warning: child process exited with signal 11 (Segmentation fault) WARNING: Number of errors: 0, skipped probes: 1 WARNING: /usr/local/install/systemtap/bin/staprun exited with status: 1
The issue is that on x86_64 (it doesn't happen on i686) stap tries to recover the user space registers by unwinding the kernel stack. This succeeds on the f16 kernel and then the unwinder takes those recovered registers to do the user space unwind. But it fails on the rhel6 kernel. See -DDEBUG_UNWIND=99 output: _stp_get_uregs:194: unwind levels: 15, ret: -5, pc=0xffffffff814ef8f5 _stp_get_uregs:209: failed to recover user reg state And the for user space the unwinder has to do with partial register values and fails...
(In reply to comment #1) > See -DDEBUG_UNWIND=99 output: > > _stp_get_uregs:194: unwind levels: 15, ret: -5, pc=0xffffffff814ef8f5 > _stp_get_uregs:209: failed to recover user reg state > > And the for user space the unwinder has to do with partial register values and > fails... According to /proc/kallsyms: ffffffff814ef8d0 T page_fault ffffffff814ef900 T machine_check
And we do actually go trough do_page_fault just before this frame: _stp_get_uregs:194: unwind levels: 17, ret: 0, pc=0xffffffff814f253e unwind:1452: pc=ffffffff814f253d, ffffffff814f253e unwind:1492: trying debug_frame set_no_state_rule:375: reg=10, where=1 _stp_search_unwind_hdr:777: binary search for ffffffff814f253d _stp_search_unwind_hdr:839: fde off=26520 _stp_search_unwind_hdr:849: returning fde=ffffffffa14be360 startLoc=ffffffff814f 2500 unwind_frame:1184: kernel: fde=ffffffffa14be360 unwind_frame:1189: kernel: cie=ffffffffa14bde28 parse_fde_cie:282: map retAddrReg value 16 to reg_info idx 16 unwind_frame:1203: startLoc: ffffffff814f2500, endLoc: ffffffff814f2597 unwind_frame:1251: cie=ffffffffa14bde28 fde=ffffffffa14be360 startLoc=ffffffff81 4f2500 endLoc=ffffffff814f2597, pc=ffffffff814f253d unwind_frame:1271: processCFI for CIE [...] unwind_frame:1426: returning 0 (ffffffff814ef8f5) _stp_get_uregs:194: unwind levels: 16, ret: 0, pc=0xffffffff814ef8f5 unwind:1452: pc=ffffffff814ef8f4, ffffffff814ef8f5 unwind:1492: trying debug_frame set_no_state_rule:375: reg=10, where=1 _stp_search_unwind_hdr:777: binary search for ffffffff814ef8f4 _stp_search_unwind_hdr:839: fde off=113238 _stp_search_unwind_hdr:849: returning fde=ffffffffa15ab078 startLoc=ffffffff814ef680 unwind_frame:1184: kernel: fde=ffffffffa15ab078 unwind_frame:1189: kernel: cie=ffffffffa15aafb0 parse_fde_cie:282: map retAddrReg value 16 to reg_info idx 16 unwind_frame:1203: startLoc: ffffffff814ef680, endLoc: ffffffff814ef707 unwind_frame:1205: pc (ffffffff814ef8f4) > endLoc(ffffffff814ef707) unwind:1496: debug_frame failed: 1, trying eh_frame unwind_frame:1168: Module kernel: no unwind frame data _stp_get_uregs:194: unwind levels: 15, ret: -5, pc=0xffffffff814ef8f5 _stp_get_uregs:209: failed to recover user reg state Since do_page_fault is the actual errorentry for page_fault it looks like the CFI for do_page_fault is wrong, or we don't process is correctly. The CFI for do_page_fault looks as follows for 2.6.32-220.7.1.el6.x86_64: [ 25fe8] CIE length=20 CIE_id: 18446744073709551615 version: 3 augmentation: "" code_alignment_factor: 1 data_alignment_factor: -8 return_address_register: 16 Program: def_cfa r7 (rsp) at offset 8 offset_extended_sf r16 (rip) at cfa-8 nop nop nop nop nop [ 26520] FDE length=76 cie=[ 25fe8] CIE_pointer: 155624 initial_location: 0xffffffff814f2500 <do_page_fault> address_range: 0x97 Program: advance_loc4 1 to 0x1 def_cfa_offset 16 offset_extended_sf r6 (rbp) at cfa-16 advance_loc4 3 to 0x4 def_cfa_register r6 (rbp) advance_loc4 23 to 0x1b offset_extended_sf r14 (r14) at cfa-24 offset_extended_sf r13 (r13) at cfa-32 offset_extended_sf r12 (r12) at cfa-40 offset_extended_sf r3 (rbx) at cfa-48 advance_loc4 83 to 0x6e remember_state restore r6 (rbp) def_cfa r7 (rsp) at offset 8 restore r14 (r14) restore r13 (r13) restore r12 (r12) restore r3 (rbx) advance_loc4 1 to 0x6f restore_state nop nop
The problem isn't the CFI for do_page_fault, but that there is no CFI for page_fault. Nor does there seem to be any CFI for any assembly symbol defined in entry_64.S. Which explains why unwinding to the kernel/user space barrier just fails. No idea yet, why the CFI isn't included in /usr/lib/debug/lib/modules/*/vmlinux for the RHEL6 kernel, it certainly is there in entry_64.S source code. And it also is in the fedora version $ eu-readelf --debug-dump=frames /usr/lib/debug/lib/modules/3.3.5-2.fc16.x86_64/vmlinux | grep -B2 -A1 page_fault [ 7ae0] FDE length=68 cie=[ 6da8] CIE_pointer: 28072 initial_location: 0xffffffff815f4850 <page_fault> address_range: 0x2a
Looks like the RHEL6 kernel is missing this: commit 9e565292270a2d55524be38835104c564ac8f795 Author: Roland McGrath <roland@redhat.com> Date: Thu May 13 21:43:03 2010 -0700 x86: Use .cfi_sections for assembly code The newer assemblers support the .cfi_sections directive so we can put the CFI from .S files into the .debug_frame section that is preserved in unstripped vmlinux and in separate debuginfo, rather than the .eh_frame section that is now discarded by vmlinux.lds.S. Signed-off-by: Roland McGrath <roland@redhat.com> LKML-Reference: <20100514044303.A6FE7400BE@magilla.sf.frob.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Not really "fixed", but with the proper kernel patch, see comment #5, this should just work. Added a testcase to check the system is behaving properly. commit 07c9d78ebb28b888f01aed9c206e724f0e72db25 Author: Mark Wielaard <mjw@redhat.com> Date: Mon May 21 12:57:41 2012 +0200 Add testcase for PR14107 Bad user unwinding from kernel fatal signal handler This is really a kernel bug, see bug report, when the CFI for the assembly code is missing we cannot properly recover the register state for the user process and might give a bad/missing user backtrace.