Using the latest git (HEAD e8402528a) with latest available elfutils on F9 (0.133-3) on an x86 (32bit) machine running 2.6.26-rc5 kernel, I see the following Oops on every script executed: BUG: unable to handle kernel NULL pointer dereference at 00000000 IP: [<00000000>] *pde = 00000000 Oops: 0000 [#1] Modules linked in: stap_25635fa2767efdc6a286d47c0380042a_404(+) iTCO_wdt iTCO_vendor_support sg intel_agp floppy e100 i2c_i801 agpgart rng_core button pcspkr mii sr_mod cdrom i2c_i810 i2c_algo_bit i2c_core ata_generic ata_piix libata dock sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ehci_hcd [last unloaded: microcode] Pid: 1830, comm: staprun Not tainted (2.6.26-rc5 #1) EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0 EIP is at 0x0 EAX: 00000000 EBX: df778e88 ECX: 00000000 EDX: df778e88 ESI: e0973568 EDI: c11a0e04 EBP: e0973700 ESP: df778df0 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Process staprun (pid: 1830, ti=df778000 task=df79dc80 task.ti=df778000) Stack: e092b053 ffffffe1 74737973 61746d65 df400070 00000005 c04ac269 df778e58 00001360 00000000 00001709 00000005 df40a6c0 df407000 00000002 c13dce98 c13dce80 00000672 70617473 3635325f 61663533 37363732 63646665 38326136 Call Trace: [<e092b053>] _stp_transport_init+0x2b7/0x43b [stap_25635fa2767efdc6a286d47c0380042a_404] [<c04ac269>] ida_get_new_above+0xc2/0x17d [<c0433103>] sys_init_module+0x1599/0x16fb [<c04665b2>] do_path_lookup+0x154/0x199 [<c04af285>] simple_strtoull+0x0/0xf0 [<c040371e>] syscall_call+0x7/0xb ======================= Code: Bad EIP value. EIP: [<00000000>] 0x0 SS:ESP 0068:df778df0 ---[ end trace df61255cfec2f882 ]--- The machine is usable, but the stap modules seem to be still inserted - lsmod lists them. A forced rmmod doesn't work either as the module refcount is 1. The .config does show that required options are 'y': [ananth@... linux-2.6.26-rc5]$ cat .config|grep PROBE ... CONFIG_KPROBES=y CONFIG_KRETPROBES=y CONFIG_HAVE_KPROBES=y CONFIG_HAVE_KRETPROBES=y CONFIG_KPROBES_SANITY_TEST=y # CONFIG_SAMPLE_KPROBES is not set [ananth@... linux-2.6.26-rc5]$ cat .config|grep RELAY CONFIG_RELAY=y [ananth@... linux-2.6.26-rc5]$ cat .config|grep DEBUG_INFO CONFIG_DEBUG_INFO=y
Subject: Re: New: SystemTap scripts fail with an Oops on x86 Is that relocatable kernel? ananth at in dot ibm dot com wrote: > Using the latest git (HEAD e8402528a) with latest available elfutils on F9 > (0.133-3) on an x86 (32bit) machine running 2.6.26-rc5 kernel, I see the > following Oops on every script executed: > > BUG: unable to handle kernel NULL pointer dereference at 00000000 > IP: [<00000000>] > *pde = 00000000 > Oops: 0000 [#1] > Modules linked in: stap_25635fa2767efdc6a286d47c0380042a_404(+) iTCO_wdt > iTCO_vendor_support sg intel_agp floppy e100 i2c_i801 agpgart rng_core button > pcspkr mii sr_mod cdrom i2c_i810 i2c_algo_bit i2c_core ata_generic ata_piix > libata dock sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ehci_hcd [last unloaded: > microcode] > > Pid: 1830, comm: staprun Not tainted (2.6.26-rc5 #1) > EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0 > EIP is at 0x0 > EAX: 00000000 EBX: df778e88 ECX: 00000000 EDX: df778e88 > ESI: e0973568 EDI: c11a0e04 EBP: e0973700 ESP: df778df0 > DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 > Process staprun (pid: 1830, ti=df778000 task=df79dc80 task.ti=df778000) > Stack: e092b053 ffffffe1 74737973 61746d65 df400070 00000005 c04ac269 df778e58 > 00001360 00000000 00001709 00000005 df40a6c0 df407000 00000002 c13dce98 > c13dce80 00000672 70617473 3635325f 61663533 37363732 63646665 38326136 > Call Trace: > [<e092b053>] _stp_transport_init+0x2b7/0x43b > [stap_25635fa2767efdc6a286d47c0380042a_404] > [<c04ac269>] ida_get_new_above+0xc2/0x17d > [<c0433103>] sys_init_module+0x1599/0x16fb > [<c04665b2>] do_path_lookup+0x154/0x199 > [<c04af285>] simple_strtoull+0x0/0xf0 > [<c040371e>] syscall_call+0x7/0xb > ======================= > Code: Bad EIP value. > EIP: [<00000000>] 0x0 SS:ESP 0068:df778df0 > ---[ end trace df61255cfec2f882 ]--- > > The machine is usable, but the stap modules seem to be still inserted - lsmod > lists them. A forced rmmod doesn't work either as the module refcount is 1. > > The .config does show that required options are 'y': > > [ananth@... linux-2.6.26-rc5]$ cat .config|grep PROBE > ... > CONFIG_KPROBES=y > CONFIG_KRETPROBES=y > CONFIG_HAVE_KPROBES=y > CONFIG_HAVE_KRETPROBES=y > CONFIG_KPROBES_SANITY_TEST=y > # CONFIG_SAMPLE_KPROBES is not set > > [ananth@... linux-2.6.26-rc5]$ cat .config|grep RELAY > CONFIG_RELAY=y > > [ananth@... linux-2.6.26-rc5]$ cat .config|grep DEBUG_INFO > CONFIG_DEBUG_INFO=y >
Created attachment 2778 [details] kernel .config that fails Here is the config that fails. CONFIG_RELOCATABLE=y But that shouldn't matter as stap works fine with stock F9 kernels that also have this set.
Created attachment 2779 [details] Diff between the failing and working configs Here is the diff between the failing and working configs. Lots of driver stuff, but its mostly benign.
Other options toggled and tried unsuccessfully - CONFIG_MODVERSIONS, CONFIG_4KSTACKS
Try to decode the oops (disassemble the module_init function and figure out what c-level statement dies).
The NULL pointer deref is coming from the modules_op->start() call in runtime/transport/symbols.c: 606 static int _stp_init_modules(void) .... 611 const struct seq_operations *modules_op = (const struct seq_oper ations *)_stp_kallsyms_lookup_name("modules_op"); 612 ...... 617 618 /* Use the seq_file interface to safely get a list of installed modules */ 619 res = modules_op->start(NULL, &pos); <-- here! 620 while (res) { The reason is that _stp_kallsyms_lookup_name returns an incorrect address (it returns c11a0e0c while the actual address is c05a0e0c): [root@jataayu mod]# cat /proc/kallsyms |grep modules_op c048eee9 t modules_open c05a0e0c R modules_op c05a40d8 r proc_modules_operations I've verified that *(c11a0e0c) is NULL causing the BUG.
Yeah, I have been suspicious of that code (it came in with the new unwind/symbol code). I'm worried even about the way it performs _stext relocation. We'll have to revisit this area too.
One further observation from the different runs is that its only the second and third nibbles from MSB that are corrupt (changed to 10 or 11 from the correct values). Wierd!
To reproduce the problem, just make a little kernel config change and compile it again. Then execute any stap script, it will fail in _stp_transport_init (kernel oops). Both _stp_kallsyms_lookup_name and _stp_init_kernel_symbols depend on _stp_kernel_symbols which is from debuginfo file or /proc/kallsyms. I believe runtime facility such as transport shouldn't trust debuginfo too much . The incorrect debuginfo would result in kprobe register/trigger error, not earlier stp_transport_init. The origin code about retrieving kernel symbols is to read /sys/modules/ which looks like safer although not very graceful.
dwfl_module_getsym() is returning the incorrect address: dwfl_module_getsym: name is modules_op, st_value is c11d1580 Given this is extracted from the debuginfo, is the debuginfo creation buggy? Further, distro shipped kernels work, while custom built kernels don't, which gives credence to the possibility of the debuginfo creation itself being buggy.
None of the addresses resolve correctly when using the debuginfo: ... control symbols: kts: 0xc11c39a0 kte: 0xc11c6ca8 stext: 0xc1001000 ... pattern 'copy_process' matches function 'copy_process' selected function copy_process probe copy_process@kernel/fork.c:878 kernel section=.text pc=0xc101cb10 [root@jataayu ~]# cat /proc/kallsyms |grep copy_process c041cb10 t copy_process Seems like all the addresses are ANDed with a 0x01100000.
The mask in comment #11 should be 0xf11fffff
Do scripts work with --ignore-vmlinux --kmap=/proc/kallsyms? How about --ignore-dwarf --kelf? Both those invocations should avoid using the suspect dwarf info.
I am missing any indications that the dwarf data was *bad*, only that it contains pre-relocation numbers as it always does. (And it has to: relocation of the kernel is done at run time at an unpredictable offset.) This is what the _stext-based relocation logic for bug #4224 was for. What seems to be wrong here is that the unrelocated addresses are used by the runtime directly. It should have never done that, but I wonder why it has appeared to work at all!
To test it, need to make a little change: 4419 // FIXME for non-debuginfo use. 4420 if (true) { 4421 return emit_symbol_data_from_debuginfo(s, kallsyms_out); 4422 } else { 4423 // For symbol-table only operation, we don't have debuginfo, ---> s/true/false/ And the scripts work with --ignore-vmlinux --kmap=/proc/kallsyms, also --ignore-dwarf --kelf even in case of incorrect vmlinux.
Re: comment #13, with --ignore-dwarf --ignore-vmlinux --kmap=/proc/kallsyms, the script is able to resolve the probe address correctly, but it still crashes 'cos of the incorrect modules_op address. [ananth@jataayu ~]$ stap -vv -k --ignore-dwarf --ignore-vmlinux --kmap=/proc/kallsyms current-parent.stp ... Pass 1: parsed user script and 42 library script(s) in 600usr/10sys/628real ms. probe sys_open@?:-1 kernel pc=0xc04678b0 Pass 2: analyzed script: 1 probe(s), 3 function(s), 1 embed(s), 0 global(s) in 90usr/70sys/1902real ms. probe_1365 locks nothing dwfl_module_getsym: name is modules_op, st_value is c11d1580 Re comment #14, I don't understand that. We don't see the problem with distro shipped kernels: a printf in dwfl_module_getsym() shows st_value has the correct (relocated?) address => runtime already gets the correct address from libdwfl; in the failing case, this is the 'unrelocated' address shown above.
Created attachment 2786 [details] patch for checking addr This patch adds validating addr vs name mapping in runtime for several symbols which are used in initialing process. So kernel oops could be early found in case of incorrect vmlinux or wrong address. Sanity check based on build-id will be in systemtap_module_init.
Reran the test with a pre-unwind SystemTap build from the source rpm (0.6.2-1) using the current F9 toolchain. This rpm works fine on 2.6.24 and a more recent (2.6.26-rc6) kernel without problems. One observation though... with any custom built kernel, the address reported by stap on a -vv is still the non-relocated one: Pass 1: parsed user script and 38 library script(s) in 390usr/20sys/490real ms. probe sys_open@fs/open.c:1100 kernel section=.text pc=0xc10678b0 [...]# cat /proc/kallsyms |grep sys_open c04678b0 T sys_open [...]# uname -r 2.6.26-rc6 However, with a distro built kernel, the stap -vv output is the same as the /proc/kallsyms one: Pass 1: parsed user script and 38 library script(s) in 380usr/20sys/482real ms. probe sys_open@fs/open.c:1062 kernel section=.text pc=0xc0480bbd [...]# cat /proc/kallsyms |grep sys_open c04680bbd T sys_open [...]# uname -r 2.6.25.6-55.fc9.i686 Wierd!
Recent overnight tests on RHEL5 and F8 seem to have a problem with the following patch checked in: 2008-06-23 Wenji Huang <wenji.huang@oracle.com> PR 6646 * symbols.c (_stp_validate_addr): Add validating address in runtime. There are many like the following in /var/log/message on the f8 (x86_64) and rhel5 (x86_64, i686, and ia64): Jun 24 15:08:49 dhcp231-142 kernel: Systemtap Error at _stp_validate_addr:191 Incorrect mapping 0xffffffff81292480 vs _etext. Jun 24 15:08:49 dhcp231-142 kernel: Systemtap Error at _stp_transport_init:277 failed to initialize modules
Sorry for that. Seems the address resolved by stap varies in self-built and ditro-built kernels. And there are some aliased symbols sharing same address. I will revert it and try to figure out better solution.
I've finally managed to narrow down the problem to the CONFIG_PHYSICAL_START option. The diff between a working and a failing config is: CONFIG_SCHED_HRTICK=y CONFIG_KEXEC=y CONFIG_CRASH_DUMP=y -CONFIG_PHYSICAL_START=0x100000 +CONFIG_PHYSICAL_START=0x1000000 CONFIG_RELOCATABLE=y CONFIG_PHYSICAL_ALIGN=0x400000 CONFIG_COMPAT_VDSO=y
It turns out that the kernel symbol table passing mechanism broke the old relocatability support, by pretending that the compile-time /proc/kallsyms will match run-time addresses. I'm working a series of fixes on the pr6429-comp-unwindsyms branch.
*** This bug has been marked as a duplicate of 6736 ***
Though this bug is marked a duplicate, this update is to confirm that the problem has been fixed -- tested on git HEAD 94687c8d Thanks Frank!