Panicking the system from systemtap
Problem
Sometimes it's useful to cause the system to panic when a particular event happens. This can be used to obtain a vmcore file via netdump, diskdump or kdump in order to carry out post-mortem debugging using a tool like crash.
Scripts
# Include the header that declares panic()
%{
#include <kernel.h>
%}
# Wrap panic() in stap
function panic(msg:string) %{
panic("%s", THIS->msg);
%}
# Tell the user what we're doing
probe begin {
printf("panic on OOM enabled\n")
}
probe end {
printf("panic on OOM disabled\n")
}
# Just probe __oom_kill_task - it's after sysctl etc. checks in oom_kill
probe kernel.function("__oom_kill_task") {
panic("__oom_kill_task called\n")
}
Output
This script must be run with guru mode (-g), since it uses embeded C to access the kernel's panic() routine.
# stap -g panic-on-oom.stp
panic on OOM enabled
When an OOM kill occurs:
oom-killer: gfp_mask=0xd0
Mem-info:
[SNIP]
0 bounce buffer pages
Free swap: 0kB
523914 pages of RAM
294538 pages of HIGHMEM
5594 reserved pages
264 pages shared
0 pages swap cached
Kernel panic - not syncing: __oom_kill_task called
------------[ cut here ]------------
kernel BUG at kernel/panic.c:75!
invalid operand: 0000 [#1]
SMP
Modules linked in: netconsole netdump stap_a48a9d50ed21c03a01970dd07bd4b2f2_392(U) md5 ipv6 parport_pc lp parport autofs4 sunrpc loop dm_multipath usb_storage button battery ac uhci_hcd ehci_hcd hw
_random snd_azx snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata sd_mod scsi_mod
CPU: 1
EIP: 0060:[<c0122106>] Not tainted VLI
EFLAGS: 00010086 (2.6.9-55.ELsmp)
EIP is at panic+0x47/0x147
eax: 00000043 ebx: f401a200 ecx: f60b8cf0 edx: c02e774b
esi: f60b8dc4 edi: f60b8dc4 ebp: c2022120 esp: f60b8cf8
ds: 007b es: 007b ss: 0068
Process sshd (pid: 4265, threadinfo=f60b8000 task=f66ca330)
Stack: f401a200 f8aa2596 f8aa332b f401a2b4 f8aa25df f8aa7120 f8aa26da 00000000
00000000 00000000 cc9867cb 00000155 00000096 f401a200 c2022100 f8aa7120
f60b8dc4 c2022120 c011947b f89d3da0 f60b8000 c0143427 00000000 c032ae3c
[SNIP]
Lessons
Sometimes it's useful to be able to panic a box when a particular event happens, or some condition becomes true. Post-mortem debugging from a memory image can be a powerful tool to understand a problem but it can be difficult, or require creation of custom kernel patches to trigger a crash at just the right moment. Systemtap allows this functionality to be added on-the-fly. Although this example chose to hook into the OOM killer routines the same basic idea can be adapted to many different problems.
