Bug 14107 - Bad user unwinding from kernel fatal signal handler for some x86_64 kernels
Summary: Bad user unwinding from kernel fatal signal handler for some x86_64 kernels
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: runtime (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-14 15:39 UTC by Mark Wielaard
Modified: 2012-05-21 11:01 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2012-05-14 15:39:55 UTC
The following program:

int
func (void)
{
        int *foo = (void *) 0x1234;
        *foo = 0x12345;
        return 0;
}

int
main (void)
{
  return func ();
}

compiled with gcc -o bad_code bad_code.c and the following stap script:

probe kernel.function("show_signal_msg") {
        /*(PF_USER | PR_WRITE) */
        if (execname() == "bad_code") {
                if ($error_code & 0x6) {
                        printf ("\nUser mode process %s [pid: %d] received a SIGSEGV - error_code: 0x%x\n", execname(), pid(), $error_code)
                        print_ubacktrace()
                }
        }
}


ran with: stap -d ./bad_code --ldd show_signal_msg.stp -c ./bad_code

produces the following (correct) user backtrace on 3.3.5-2.fc16.x86_64:

User mode process bad_code [pid: 18431] received a SIGSEGV - error_code: 0x6
 0x400484 : func+0x10/0x1d [/usr/local/build/systemtap-obj/bad_code]
 0x40049a : main+0x9/0xf [/usr/local/build/systemtap-obj/bad_code]
 0x7fd419d1069d : __libc_start_main+0xed/0x1c0 [/lib64/libc-2.14.90.so]
 0x4003b9 : _start+0x29/0x2c [/usr/local/build/systemtap-obj/bad_code]

But on some other x86_64 kernels it produces:

WARNING: _stp_read_address failed to access memory location

User mode process bad_code [pid: 12152] received a SIGSEGV - error_code: 0x6
 0x400484 : func+0x10/0x1d [/home/mark/build/systemtap-obj/bad_code]
Warning: child process exited with signal 11 (Segmentation fault)
WARNING: Number of errors: 0, skipped probes: 1
WARNING: /usr/local/install/systemtap/bin/staprun exited with status: 1
Comment 1 Mark Wielaard 2012-05-14 15:47:52 UTC
The issue is that on x86_64 (it doesn't happen on i686) stap tries to recover the user space registers by unwinding the kernel stack. This succeeds on the f16 kernel and then the unwinder takes those recovered registers to do the user space unwind. But it fails on the rhel6 kernel. See -DDEBUG_UNWIND=99 output:

_stp_get_uregs:194: unwind levels: 15, ret: -5, pc=0xffffffff814ef8f5
_stp_get_uregs:209: failed to recover user reg state

And the for user space the unwinder has to do with partial register values and fails...
Comment 2 Mark Wielaard 2012-05-14 22:16:13 UTC
(In reply to comment #1)
> See -DDEBUG_UNWIND=99 output:
> 
> _stp_get_uregs:194: unwind levels: 15, ret: -5, pc=0xffffffff814ef8f5
> _stp_get_uregs:209: failed to recover user reg state
> 
> And the for user space the unwinder has to do with partial register values and
> fails...

According to /proc/kallsyms:

ffffffff814ef8d0 T page_fault
ffffffff814ef900 T machine_check
Comment 3 Mark Wielaard 2012-05-14 22:22:51 UTC
And we do actually go trough do_page_fault just before this frame:

_stp_get_uregs:194: unwind levels: 17, ret: 0, pc=0xffffffff814f253e
unwind:1452: pc=ffffffff814f253d, ffffffff814f253e
unwind:1492: trying debug_frame
set_no_state_rule:375: reg=10, where=1
_stp_search_unwind_hdr:777: binary search for ffffffff814f253d
_stp_search_unwind_hdr:839: fde off=26520
_stp_search_unwind_hdr:849: returning fde=ffffffffa14be360 startLoc=ffffffff814f
2500
unwind_frame:1184: kernel: fde=ffffffffa14be360
unwind_frame:1189: kernel: cie=ffffffffa14bde28
parse_fde_cie:282: map retAddrReg value 16 to reg_info idx 16
unwind_frame:1203: startLoc: ffffffff814f2500, endLoc: ffffffff814f2597
unwind_frame:1251: cie=ffffffffa14bde28 fde=ffffffffa14be360 startLoc=ffffffff81
4f2500 endLoc=ffffffff814f2597, pc=ffffffff814f253d
unwind_frame:1271: processCFI for CIE
[...]
unwind_frame:1426: returning 0 (ffffffff814ef8f5)
_stp_get_uregs:194: unwind levels: 16, ret: 0, pc=0xffffffff814ef8f5
unwind:1452: pc=ffffffff814ef8f4, ffffffff814ef8f5
unwind:1492: trying debug_frame
set_no_state_rule:375: reg=10, where=1
_stp_search_unwind_hdr:777: binary search for ffffffff814ef8f4
_stp_search_unwind_hdr:839: fde off=113238
_stp_search_unwind_hdr:849: returning fde=ffffffffa15ab078 startLoc=ffffffff814ef680
unwind_frame:1184: kernel: fde=ffffffffa15ab078
unwind_frame:1189: kernel: cie=ffffffffa15aafb0
parse_fde_cie:282: map retAddrReg value 16 to reg_info idx 16
unwind_frame:1203: startLoc: ffffffff814ef680, endLoc: ffffffff814ef707
unwind_frame:1205: pc (ffffffff814ef8f4) > endLoc(ffffffff814ef707)
unwind:1496: debug_frame failed: 1, trying eh_frame
unwind_frame:1168: Module kernel: no unwind frame data
_stp_get_uregs:194: unwind levels: 15, ret: -5, pc=0xffffffff814ef8f5
_stp_get_uregs:209: failed to recover user reg state

Since do_page_fault is the actual errorentry for page_fault it looks like the CFI for do_page_fault is wrong, or we don't process is correctly.

The CFI for do_page_fault looks as follows for 2.6.32-220.7.1.el6.x86_64:

 [ 25fe8] CIE length=20
   CIE_id:                   18446744073709551615
   version:                  3
   augmentation:             ""
   code_alignment_factor:    1
   data_alignment_factor:    -8
   return_address_register:  16

   Program:
     def_cfa r7 (rsp) at offset 8
     offset_extended_sf r16 (rip) at cfa-8
     nop
     nop
     nop
     nop
     nop

 [ 26520] FDE length=76 cie=[ 25fe8]
   CIE_pointer:              155624
   initial_location:         0xffffffff814f2500 <do_page_fault>
   address_range:            0x97

   Program:
     advance_loc4 1 to 0x1
     def_cfa_offset 16
     offset_extended_sf r6 (rbp) at cfa-16
     advance_loc4 3 to 0x4
     def_cfa_register r6 (rbp)
     advance_loc4 23 to 0x1b
     offset_extended_sf r14 (r14) at cfa-24
     offset_extended_sf r13 (r13) at cfa-32
     offset_extended_sf r12 (r12) at cfa-40
     offset_extended_sf r3 (rbx) at cfa-48
     advance_loc4 83 to 0x6e
     remember_state
     restore r6 (rbp)
     def_cfa r7 (rsp) at offset 8
     restore r14 (r14)
     restore r13 (r13)
     restore r12 (r12)
     restore r3 (rbx)
     advance_loc4 1 to 0x6f
     restore_state
     nop
     nop
Comment 4 Mark Wielaard 2012-05-15 14:07:27 UTC
The problem isn't the CFI for do_page_fault, but that there is no CFI for page_fault. Nor does there seem to be any CFI for any assembly symbol defined in entry_64.S. Which explains why unwinding to the kernel/user space barrier just fails.

No idea yet, why the CFI isn't included in /usr/lib/debug/lib/modules/*/vmlinux for the RHEL6 kernel, it certainly is there in entry_64.S source code. And it also is in the fedora version
$ eu-readelf --debug-dump=frames /usr/lib/debug/lib/modules/3.3.5-2.fc16.x86_64/vmlinux | grep -B2 -A1 page_fault
 [  7ae0] FDE length=68 cie=[  6da8]
   CIE_pointer:              28072
   initial_location:         0xffffffff815f4850 <page_fault>
   address_range:            0x2a
Comment 5 Mark Wielaard 2012-05-15 14:14:41 UTC
Looks like the RHEL6 kernel is missing this:

commit 9e565292270a2d55524be38835104c564ac8f795
Author: Roland McGrath <roland@redhat.com>
Date:   Thu May 13 21:43:03 2010 -0700

    x86: Use .cfi_sections for assembly code
    
    The newer assemblers support the .cfi_sections directive so we can put
    the CFI from .S files into the .debug_frame section that is preserved
    in unstripped vmlinux and in separate debuginfo, rather than the
    .eh_frame section that is now discarded by vmlinux.lds.S.
    
    Signed-off-by: Roland McGrath <roland@redhat.com>
    LKML-Reference: <20100514044303.A6FE7400BE@magilla.sf.frob.com>
    Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Comment 6 Mark Wielaard 2012-05-21 11:01:19 UTC
Not really "fixed", but with the proper kernel patch, see comment #5, this should just work. Added a testcase to check the system is behaving properly.

commit 07c9d78ebb28b888f01aed9c206e724f0e72db25
Author: Mark Wielaard <mjw@redhat.com>
Date:   Mon May 21 12:57:41 2012 +0200

    Add testcase for PR14107 Bad user unwinding from kernel fatal signal handler
    
    This is really a kernel bug, see bug report, when the CFI for the assembly
    code is missing we cannot properly recover the register state for the user
    process and might give a bad/missing user backtrace.