Bug 6646 - SystemTap scripts fail with an Oops on x86
Summary: SystemTap scripts fail with an Oops on x86
Status: RESOLVED DUPLICATE of bug 6736
Alias: None
Product: systemtap
Classification: Unclassified
Component: runtime (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Frank Ch. Eigler
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-06-16 05:30 UTC by Ananth Mavinakayanahalli
Modified: 2008-07-11 15:00 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
kernel .config that fails (10.63 KB, text/plain)
2008-06-16 10:27 UTC, Ananth Mavinakayanahalli
Details
Diff between the failing and working configs (11.36 KB, patch)
2008-06-16 10:34 UTC, Ananth Mavinakayanahalli
Details | Diff
patch for checking addr (832 bytes, patch)
2008-06-19 09:37 UTC, Wenji Huang
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ananth Mavinakayanahalli 2008-06-16 05:30:45 UTC
Using the latest git (HEAD e8402528a) with latest available elfutils on F9
(0.133-3) on an x86 (32bit) machine running 2.6.26-rc5 kernel, I see the
following Oops on every script executed:

BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [<00000000>]
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: stap_25635fa2767efdc6a286d47c0380042a_404(+) iTCO_wdt
iTCO_vendor_support sg intel_agp floppy e100 i2c_i801 agpgart rng_core button
pcspkr mii sr_mod cdrom i2c_i810 i2c_algo_bit i2c_core ata_generic ata_piix
libata dock sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ehci_hcd [last unloaded:
microcode]

Pid: 1830, comm: staprun Not tainted (2.6.26-rc5 #1)
EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
EIP is at 0x0
EAX: 00000000 EBX: df778e88 ECX: 00000000 EDX: df778e88
ESI: e0973568 EDI: c11a0e04 EBP: e0973700 ESP: df778df0
 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process staprun (pid: 1830, ti=df778000 task=df79dc80 task.ti=df778000)
Stack: e092b053 ffffffe1 74737973 61746d65 df400070 00000005 c04ac269 df778e58
       00001360 00000000 00001709 00000005 df40a6c0 df407000 00000002 c13dce98
       c13dce80 00000672 70617473 3635325f 61663533 37363732 63646665 38326136
Call Trace:
 [<e092b053>] _stp_transport_init+0x2b7/0x43b
[stap_25635fa2767efdc6a286d47c0380042a_404]
 [<c04ac269>] ida_get_new_above+0xc2/0x17d
 [<c0433103>] sys_init_module+0x1599/0x16fb
 [<c04665b2>] do_path_lookup+0x154/0x199
 [<c04af285>] simple_strtoull+0x0/0xf0
 [<c040371e>] syscall_call+0x7/0xb
 =======================
Code:  Bad EIP value.
EIP: [<00000000>] 0x0 SS:ESP 0068:df778df0
---[ end trace df61255cfec2f882 ]---

The machine is usable, but the stap modules seem to be still inserted - lsmod
lists them. A forced rmmod doesn't work either as the module refcount is 1.

The .config does show that required options are 'y':

[ananth@... linux-2.6.26-rc5]$ cat .config|grep PROBE
...
CONFIG_KPROBES=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_KPROBES_SANITY_TEST=y
# CONFIG_SAMPLE_KPROBES is not set

[ananth@... linux-2.6.26-rc5]$ cat .config|grep RELAY
CONFIG_RELAY=y

[ananth@... linux-2.6.26-rc5]$ cat .config|grep DEBUG_INFO
CONFIG_DEBUG_INFO=y
Comment 1 Wenji Huang 2008-06-16 09:58:18 UTC
Subject: Re:  New: SystemTap scripts fail with an Oops on
 x86

Is that relocatable kernel?

ananth at in dot ibm dot com wrote:
> Using the latest git (HEAD e8402528a) with latest available elfutils on F9
> (0.133-3) on an x86 (32bit) machine running 2.6.26-rc5 kernel, I see the
> following Oops on every script executed:
> 
> BUG: unable to handle kernel NULL pointer dereference at 00000000
> IP: [<00000000>]
> *pde = 00000000
> Oops: 0000 [#1]
> Modules linked in: stap_25635fa2767efdc6a286d47c0380042a_404(+) iTCO_wdt
> iTCO_vendor_support sg intel_agp floppy e100 i2c_i801 agpgart rng_core button
> pcspkr mii sr_mod cdrom i2c_i810 i2c_algo_bit i2c_core ata_generic ata_piix
> libata dock sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ehci_hcd [last unloaded:
> microcode]
> 
> Pid: 1830, comm: staprun Not tainted (2.6.26-rc5 #1)
> EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
> EIP is at 0x0
> EAX: 00000000 EBX: df778e88 ECX: 00000000 EDX: df778e88
> ESI: e0973568 EDI: c11a0e04 EBP: e0973700 ESP: df778df0
>  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> Process staprun (pid: 1830, ti=df778000 task=df79dc80 task.ti=df778000)
> Stack: e092b053 ffffffe1 74737973 61746d65 df400070 00000005 c04ac269 df778e58
>        00001360 00000000 00001709 00000005 df40a6c0 df407000 00000002 c13dce98
>        c13dce80 00000672 70617473 3635325f 61663533 37363732 63646665 38326136
> Call Trace:
>  [<e092b053>] _stp_transport_init+0x2b7/0x43b
> [stap_25635fa2767efdc6a286d47c0380042a_404]
>  [<c04ac269>] ida_get_new_above+0xc2/0x17d
>  [<c0433103>] sys_init_module+0x1599/0x16fb
>  [<c04665b2>] do_path_lookup+0x154/0x199
>  [<c04af285>] simple_strtoull+0x0/0xf0
>  [<c040371e>] syscall_call+0x7/0xb
>  =======================
> Code:  Bad EIP value.
> EIP: [<00000000>] 0x0 SS:ESP 0068:df778df0
> ---[ end trace df61255cfec2f882 ]---
> 
> The machine is usable, but the stap modules seem to be still inserted - lsmod
> lists them. A forced rmmod doesn't work either as the module refcount is 1.
> 
> The .config does show that required options are 'y':
> 
> [ananth@... linux-2.6.26-rc5]$ cat .config|grep PROBE
> ...
> CONFIG_KPROBES=y
> CONFIG_KRETPROBES=y
> CONFIG_HAVE_KPROBES=y
> CONFIG_HAVE_KRETPROBES=y
> CONFIG_KPROBES_SANITY_TEST=y
> # CONFIG_SAMPLE_KPROBES is not set
> 
> [ananth@... linux-2.6.26-rc5]$ cat .config|grep RELAY
> CONFIG_RELAY=y
> 
> [ananth@... linux-2.6.26-rc5]$ cat .config|grep DEBUG_INFO
> CONFIG_DEBUG_INFO=y
> 

Comment 2 Ananth Mavinakayanahalli 2008-06-16 10:27:48 UTC
Created attachment 2778 [details]
kernel .config that fails

Here is the config that fails. CONFIG_RELOCATABLE=y

But that shouldn't matter as stap works fine with stock F9 kernels that also
have this set.
Comment 3 Ananth Mavinakayanahalli 2008-06-16 10:34:18 UTC
Created attachment 2779 [details]
Diff between the failing and working configs

Here is the diff between the failing and working configs. Lots of driver stuff,
but its mostly benign.
Comment 4 Ananth Mavinakayanahalli 2008-06-16 10:52:21 UTC
Other options toggled and tried unsuccessfully - CONFIG_MODVERSIONS, CONFIG_4KSTACKS
Comment 5 Frank Ch. Eigler 2008-06-16 12:59:40 UTC
Try to decode the oops (disassemble the module_init function and figure out
what c-level statement dies).
Comment 6 Ananth Mavinakayanahalli 2008-06-17 07:12:52 UTC
The NULL pointer deref is coming from the modules_op->start() call in
runtime/transport/symbols.c:

   606  static int _stp_init_modules(void)
     ....
   611          const struct seq_operations *modules_op = (const struct seq_oper
ations *)_stp_kallsyms_lookup_name("modules_op");
   612
   ......
   617
   618          /* Use the seq_file interface to safely get a list of installed
modules */
   619          res = modules_op->start(NULL, &pos); <-- here!
   620          while (res) {

The reason is that _stp_kallsyms_lookup_name returns an incorrect address (it
returns c11a0e0c while the actual address is c05a0e0c):

[root@jataayu mod]# cat /proc/kallsyms |grep modules_op
c048eee9 t modules_open
c05a0e0c R modules_op
c05a40d8 r proc_modules_operations

I've verified that *(c11a0e0c) is NULL causing the BUG.
Comment 7 Frank Ch. Eigler 2008-06-17 12:16:23 UTC
Yeah, I have been suspicious of that code (it came in with the new
unwind/symbol code).  I'm worried even about the way it performs
_stext relocation.  We'll have to revisit this area too.
Comment 8 Ananth Mavinakayanahalli 2008-06-17 15:20:36 UTC
One further observation from the different runs is that its only the second and
third nibbles from MSB that are corrupt (changed to 10 or 11 from the correct
values). Wierd!
Comment 9 Wenji Huang 2008-06-18 07:22:16 UTC
To reproduce the problem,  just make  a little kernel config change and compile
it again.
Then execute any stap script,  it will fail in _stp_transport_init (kernel oops).

Both  _stp_kallsyms_lookup_name and _stp_init_kernel_symbols depend on 
_stp_kernel_symbols which is from debuginfo file or /proc/kallsyms.   I believe
 runtime facility such as transport shouldn't trust debuginfo too much .  The
incorrect debuginfo would result in kprobe register/trigger error,  not earlier
stp_transport_init.

The origin code about retrieving kernel symbols is to read /sys/modules/ which
looks like safer although not very graceful.

Comment 10 Ananth Mavinakayanahalli 2008-06-18 09:48:14 UTC
dwfl_module_getsym() is returning the incorrect address:
dwfl_module_getsym: name is modules_op, st_value is c11d1580

Given this is extracted from the debuginfo, is the debuginfo creation buggy?

Further, distro shipped kernels work, while custom built kernels don't, which
gives credence to the possibility of the debuginfo creation itself being buggy.
Comment 11 Ananth Mavinakayanahalli 2008-06-18 10:11:03 UTC
None of the addresses resolve correctly when using the debuginfo:
...
control symbols: kts: 0xc11c39a0 kte: 0xc11c6ca8 stext: 0xc1001000
...
pattern 'copy_process' matches function 'copy_process'
selected function copy_process
probe copy_process@kernel/fork.c:878 kernel section=.text pc=0xc101cb10

[root@jataayu ~]# cat /proc/kallsyms |grep copy_process
c041cb10 t copy_process

Seems like all the addresses are ANDed with a 0x01100000.
Comment 12 Ananth Mavinakayanahalli 2008-06-18 15:09:54 UTC
The mask in comment #11 should be 0xf11fffff
Comment 13 Jim Keniston 2008-06-18 23:06:10 UTC
Do scripts work with --ignore-vmlinux --kmap=/proc/kallsyms?  How about
--ignore-dwarf --kelf?  Both those invocations should avoid using the suspect
dwarf info.
Comment 14 Frank Ch. Eigler 2008-06-18 23:55:56 UTC
I am missing any indications that the dwarf data was *bad*, only
that it contains pre-relocation numbers as it always does.  (And
it has to: relocation of the kernel is done at run time at an
unpredictable offset.)  This is what the _stext-based relocation
logic for bug #4224 was for.

What seems to be wrong here is that the unrelocated addresses are
used by the runtime directly.  It should have never done that, but
I wonder why it has appeared to work at all!
Comment 15 Wenji Huang 2008-06-19 02:43:16 UTC
To test it, need to make a little change:

4419   // FIXME for non-debuginfo use.

4420   if (true) {

4421       return emit_symbol_data_from_debuginfo(s, kallsyms_out);

4422   } else {

4423     // For symbol-table only operation, we don't have debuginfo,

---> s/true/false/

And the scripts work with --ignore-vmlinux --kmap=/proc/kallsyms, also
--ignore-dwarf --kelf even in case of incorrect vmlinux.


Comment 16 Ananth Mavinakayanahalli 2008-06-19 03:07:09 UTC
Re: comment #13, with --ignore-dwarf --ignore-vmlinux --kmap=/proc/kallsyms, the
script is able to resolve the probe address correctly, but it still crashes 'cos
of the incorrect modules_op address.

[ananth@jataayu ~]$ stap -vv -k --ignore-dwarf --ignore-vmlinux
--kmap=/proc/kallsyms current-parent.stp
...
Pass 1: parsed user script and 42 library script(s) in 600usr/10sys/628real ms.
probe sys_open@?:-1 kernel pc=0xc04678b0
Pass 2: analyzed script: 1 probe(s), 3 function(s), 1 embed(s), 0 global(s) in
90usr/70sys/1902real ms.
probe_1365 locks nothing
dwfl_module_getsym: name is modules_op, st_value is c11d1580

Re comment #14, I don't understand that. We don't see the problem with distro
shipped kernels: a printf in dwfl_module_getsym() shows st_value has the correct
(relocated?) address => runtime already gets the correct address from libdwfl;
in the failing case, this is the 'unrelocated' address shown above.
Comment 17 Wenji Huang 2008-06-19 09:37:20 UTC
Created attachment 2786 [details]
patch for checking addr

This patch adds validating addr  vs name mapping in runtime  for several 
symbols which are used in initialing process.	So kernel oops could be early
found in case of incorrect vmlinux or wrong address.

Sanity check based on build-id will be in systemtap_module_init.
Comment 18 Ananth Mavinakayanahalli 2008-06-23 06:34:16 UTC
Reran the test with a pre-unwind SystemTap build from the source rpm (0.6.2-1)
using the current F9 toolchain. This rpm works fine on 2.6.24 and a more recent
(2.6.26-rc6) kernel without problems.

One observation though... with any custom built kernel, the address reported by
stap on a -vv is still the non-relocated one:

Pass 1: parsed user script and 38 library script(s) in 390usr/20sys/490real ms.
probe sys_open@fs/open.c:1100 kernel section=.text pc=0xc10678b0

[...]# cat /proc/kallsyms |grep sys_open
c04678b0 T sys_open

[...]# uname -r
2.6.26-rc6

However, with a distro built kernel, the stap -vv output is the same as the
/proc/kallsyms one:

Pass 1: parsed user script and 38 library script(s) in 380usr/20sys/482real ms.
probe sys_open@fs/open.c:1062 kernel section=.text pc=0xc0480bbd

[...]# cat /proc/kallsyms |grep sys_open
c04680bbd T sys_open

[...]# uname -r
2.6.25.6-55.fc9.i686

Wierd!
Comment 19 William Cohen 2008-06-24 19:42:58 UTC
Recent overnight tests on RHEL5 and F8 seem to have a problem with the following
patch checked in:

 2008-06-23  Wenji Huang  <wenji.huang@oracle.com>
	PR 6646
        * symbols.c (_stp_validate_addr): Add validating address in runtime.

There are many like the following in /var/log/message on the f8 (x86_64) and
rhel5 (x86_64, i686, and ia64):

Jun 24 15:08:49 dhcp231-142 kernel: Systemtap Error at _stp_validate_addr:191
Incorrect mapping 0xffffffff81292480 vs _etext.
Jun 24 15:08:49 dhcp231-142 kernel: Systemtap Error at _stp_transport_init:277
failed to initialize modules

Comment 20 Wenji Huang 2008-06-25 01:58:12 UTC
Sorry for that.

Seems the address resolved by stap varies in self-built and ditro-built kernels.
And there are some aliased symbols sharing same address. 

I will revert it and try to figure out better solution.
Comment 21 Ananth Mavinakayanahalli 2008-06-27 15:00:11 UTC
I've finally managed to narrow down the problem to the CONFIG_PHYSICAL_START
option. The diff between a working and a failing config is:

 CONFIG_SCHED_HRTICK=y
 CONFIG_KEXEC=y
 CONFIG_CRASH_DUMP=y
-CONFIG_PHYSICAL_START=0x100000
+CONFIG_PHYSICAL_START=0x1000000
 CONFIG_RELOCATABLE=y
 CONFIG_PHYSICAL_ALIGN=0x400000
 CONFIG_COMPAT_VDSO=y
Comment 22 Frank Ch. Eigler 2008-07-03 16:30:20 UTC
It turns out that the kernel symbol table passing mechanism broke the
old relocatability support, by pretending that the compile-time
/proc/kallsyms will match run-time addresses.  I'm working a series
of fixes on the pr6429-comp-unwindsyms branch.
Comment 23 Frank Ch. Eigler 2008-07-10 15:00:17 UTC

*** This bug has been marked as a duplicate of 6736 ***
Comment 24 Ananth Mavinakayanahalli 2008-07-11 15:00:07 UTC
Though this bug is marked a duplicate, this update is to confirm that the
problem has been fixed -- tested on git HEAD 94687c8d

Thanks Frank!