Differences between revisions 3 and 4
Revision 3 as of 2008-03-19 17:46:21
Size: 3516
Editor: nat-pool-fab
Comment:
Revision 4 as of 2008-03-19 17:47:15
Size: 3492
Editor: nat-pool-fab
Comment:
Deletions are marked like this. Additions are marked like this.
Line 35: Line 35:
        panic("__oom_kill_task called - panicking\n")         panic("__oom_kill_task called\n")
Line 57: Line 57:
Kernel panic - not syncing: __oom_kill_task called - panicking Kernel panic - not syncing: __oom_kill_task called

Panicking the system from systemtap

Problem

Sometimes it's useful to cause the system to panic when a particular event happens. This can be used to obtain a vmcore file via netdump, diskdump or kdump in order to carry out post-mortem debugging using a tool like crash.

Scripts

# Include the header that declares panic()
%{
#include <kernel.h>
%}

# Wrap panic() in stap
function panic(msg:string) %{
        panic("%s", THIS->msg);
%}

# Tell the user what we're doing
probe begin {
        printf("panic on OOM enabled\n")
}

probe end {
        printf("panic on OOM disabled\n")
}

# Just probe __oom_kill_task - it's after sysctl etc. checks in oom_kill
probe kernel.function("__oom_kill_task") {
        panic("__oom_kill_task called\n")
}

Output

This script must be run with guru mode (-g), since it uses embeded C to access the kernel's panic() routine.

# stap -g panic-on-oom.stp
panic on OOM enabled

When an OOM kill occurs:
oom-killer: gfp_mask=0xd0
Mem-info:
[SNIP]
0 bounce buffer pages
Free swap:            0kB
523914 pages of RAM
294538 pages of HIGHMEM
5594 reserved pages
264 pages shared
0 pages swap cached
Kernel panic - not syncing: __oom_kill_task called

------------[ cut here ]------------
kernel BUG at kernel/panic.c:75!
invalid operand: 0000 [#1]
SMP 
Modules linked in: netconsole netdump stap_a48a9d50ed21c03a01970dd07bd4b2f2_392(U) md5 ipv6 parport_pc lp parport autofs4 sunrpc loop dm_multipath usb_storage button battery ac uhci_hcd ehci_hcd hw
_random snd_azx snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata sd_mod scsi_mod
CPU:    1
EIP:    0060:[<c0122106>]    Not tainted VLI
EFLAGS: 00010086   (2.6.9-55.ELsmp) 
EIP is at panic+0x47/0x147
eax: 00000043   ebx: f401a200   ecx: f60b8cf0   edx: c02e774b
esi: f60b8dc4   edi: f60b8dc4   ebp: c2022120   esp: f60b8cf8
ds: 007b   es: 007b   ss: 0068
Process sshd (pid: 4265, threadinfo=f60b8000 task=f66ca330)
Stack: f401a200 f8aa2596 f8aa332b f401a2b4 f8aa25df f8aa7120 f8aa26da 00000000 
       00000000 00000000 cc9867cb 00000155 00000096 f401a200 c2022100 f8aa7120 
       f60b8dc4 c2022120 c011947b f89d3da0 f60b8000 c0143427 00000000 c032ae3c 
[SNIP]

Lessons

Sometimes it's useful to be able to panic a box when a particular event happens, or some condition becomes true. Post-mortem debugging from a memory image can be a powerful tool to understand a problem but it can be difficult, or require creation of custom kernel patches to trigger a crash at just the right moment. Systemtap allows this functionality to be added on-the-fly. Although this example chose to hook into the OOM killer routines the same basic idea can be adapted to many different problems.


WarStories

None: WSPanicOnOom (last edited 2008-03-19 17:47:15 by nat-pool-fab)