Bug 4531 - systemtap tests fail produce kernel oops on some configs of 2.6.22-rc2 kernel
Summary: systemtap tests fail produce kernel oops on some configs of 2.6.22-rc2 kernel
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: kprobes (show other bugs)
Version: unspecified
: P2 critical
Target Milestone: ---
Assignee: Prasanna S Panchamukhi
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-05-22 16:21 UTC by William Cohen
Modified: 2007-06-08 16:47 UTC (History)
4 users (show)

See Also:
Host: x86_64
Target:
Build:
Last reconfirmed:


Attachments
oops output from config_x86_64_fc7 (295.42 KB, image/jpeg)
2007-05-22 16:25 UTC, William Cohen
Details
problem kernel config file (17.43 KB, application/octet-stream)
2007-05-22 16:27 UTC, William Cohen
Details
kernel config that works. (14.81 KB, text/plain)
2007-05-22 16:29 UTC, William Cohen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description William Cohen 2007-05-22 16:21:40 UTC
I was testing systemtap snapshot on x86_64 with the 2.6.22 kernel. I usually
start with the existing config file for the distro's kernel config on the
machine, e.g. /boot/config-2.6.9-55.EL for RHEL4 machine. The 2.6.22-rc2 kernel
on x86_64 machine that was built starting with rhel4 config file work. The
2.6.22-rc2 kernel on x86_64 machine starting with the fc7 config oopsed. The
oops showed it dying in the kprobes arc_arm_kprobe, the instruction writing the
actual break point when attempting place a probe for the following test:

Running
/home/wcohen/stap_snap_200705221118/src/testsuite/systemtap.base/equal.exp ...


The point it is dying is arch_arm_kprobe+0x4/0x8 in the function below:

ffffffff804400cf <arch_arm_kprobe>:
ffffffff804400cf:       48 8b 47 30             mov    0x30(%rdi),%rax
ffffffff804400d3:       c6 00 cc                movb   $0xcc,(%rax)
ffffffff804400d6:       c3                      retq

Unfortunately, this laptop machine does not have a serial port, so it isn't easy
to get complete back trace.

-Will
Comment 1 William Cohen 2007-05-22 16:25:58 UTC
Created attachment 1853 [details]
oops output from config_x86_64_fc7

This is the output from the oops that occurred for the problem configuration.
Comment 2 William Cohen 2007-05-22 16:27:12 UTC
Created attachment 1854 [details]
problem kernel config file

This is the configuration used for the kernel that oopsed.
Comment 3 William Cohen 2007-05-22 16:29:24 UTC
Created attachment 1855 [details]
kernel config that works.

The kernel built with the config_x86_64_rhel4 works on the same machine.
Comment 4 William Cohen 2007-05-22 19:43:35 UTC
The following option in the .config looks like a possible cause:

CONFIG_DEBUG_RODATA=y

Shouldn't CONFIG_KPROBES disable this?

Comment 5 Roland McGrath 2007-05-22 20:00:36 UTC
CONFIG_DEBUG_RODATA=y has been there in Fedora kernels for some time.
AFAICT, because CONFIG_HOTPLUG_CPU is set it's only getting applied to
_etext..__end_rodata, which does not include the code sections.  It still seems
odd, since I would expect num_cpus_possible to return 1 on some machines anyway.
 Also, the i386 code is similar and I would expect it to be write-protecting the
text for num_possible_cpus()==1 too.  So there is something else going on here.
Comment 6 William Cohen 2007-05-22 20:06:55 UTC
I realized after submitting that comment CONFIG_DEBUG_RODATA is set on the
working f7 kernels. Is there anything other configuration that would prevent the
actual writing of the breakpoint into the kernel text section?
Comment 7 Frank Ch. Eigler 2007-05-22 20:08:21 UTC
From your systemtap .ko cache, you should be able to find the C code, which in
turn identifies the address at which the probe was attempted to be placed.
Comment 8 William Cohen 2007-05-22 21:08:51 UTC
I added a diagnostic print to the C code, manually compiled, then run the
module. The machine remained up to get get oops from the /var/log/messages.
Looking at the  disassembled kernel image it appears to be trying to put the
probe in the correct place for 'probe kernel.function("schedule")':

ffffffff8043ccc0 <schedule>:
ffffffff8043ccc0:       55                      push   %rbp
ffffffff8043ccc1:       48 89 e5                mov    %rsp,%rbp
ffffffff8043ccc4:       41 57                   push   %r15

The output in /var/log/messages:

May 22 15:27:43 dhcp231-176 kernel: orig address = 0000000000235cc0, relocated
address = ffffffff8043ccc0
May 22 15:27:43 dhcp231-176 kernel: Unable to handle kernel paging request at
ffffffff8043ccc0 RIP: 
May 22 15:27:43 dhcp231-176 kernel:  [<ffffffff804400d3>] arch_arm_kprobe+0x4/0x8
May 22 15:27:43 dhcp231-176 kernel: PGD 203067 PUD 205063 PMD 37ce5163 PTE 43c161
May 22 15:27:43 dhcp231-176 kernel: Oops: 0003 [1] SMP 
May 22 15:27:43 dhcp231-176 kernel: CPU 0 
May 22 15:27:43 dhcp231-176 kernel: Modules linked in: stap_2727 autofs4 hidp
rfcomm l2cap bluetooth ipv6 nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4
xt_state nf_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables
cpufreq_ondemand video sbs button dock battery ac lp loop snd_intel8x0
snd_seq_dummy snd_intel8x0m snd_ac97_codec ac97_bus snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss 8139cp
fw_ohci bcm43xx ieee80211softmac ieee80211 fw_core 8139too mii ieee80211_crypt
snd_pcm snd_timer parport_pc crc_itu_t parport snd serio_raw soundcore
snd_page_alloc sr_mod k8temp cdrom shpchp i2c_nforce2 i2c_core hwmon joydev sg
dm_snapshot dm_zero dm_mirror dm_mod sata_nv ata_generic pata_amd libata sd_mod
scsi_mod ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd
May 22 15:27:43 dhcp231-176 kernel: Pid: 2781, comm: staprun Not tainted
2.6.22-rc2 #1
May 22 15:27:43 dhcp231-176 kernel: RIP: 0010:[<ffffffff804400d3>] 
[<ffffffff804400d3>] arch_arm_kprobe+0x4/0x8
May 22 15:27:43 dhcp231-176 kernel: RSP: 0018:ffff810030b59dd0  EFLAGS: 00010212
May 22 15:27:43 dhcp231-176 kernel: RAX: ffffffff8043ccc0 RBX: 0000000000000000
RCX: 0000000000000282
May 22 15:27:43 dhcp231-176 kernel: RDX: 0000000000000000 RSI: 0000000000000212
RDI: ffffffff88408e60
May 22 15:27:43 dhcp231-176 kernel: RBP: ffffffff88408e60 R08: ffffffff8055fa18
R09: 0000000000000010
May 22 15:27:43 dhcp231-176 kernel: R10: ffff810002031d00 R11: ffffffff80323fd7
R12: 0000000000000000
May 22 15:27:43 dhcp231-176 kernel: R13: ffff810030b59f50 R14: 0000000000000000
R15: 0000000000000000
May 22 15:27:43 dhcp231-176 kernel: FS:  00002b470d28c8f0(0000)
GS:ffffffff805a4000(0000) knlGS:0000000000000000
May 22 15:27:43 dhcp231-176 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
000000008005003b
May 22 15:27:43 dhcp231-176 kernel: CR2: ffffffff8043ccc0 CR3: 0000000030b1d000
CR4: 00000000000006e0
May 22 15:27:43 dhcp231-176 kernel: Process staprun (pid: 2781, threadinfo
ffff810030b58000, task ffff8100322250c0)
May 22 15:27:43 dhcp231-176 kernel: Stack:  ffffffff804417f3 ffffffff8043ccc0
0000000000000000 ffffffff8840478c
May 22 15:27:43 dhcp231-176 kernel:  ffffffff88402f6b 0000000000000292
0000000000000292 ffff810030b59e58
May 22 15:27:43 dhcp231-176 kernel:  0000000000000008 00007fff9d833c74
ffffffff88402ff2 00007fff9d833c74
May 22 15:27:43 dhcp231-176 kernel: Call Trace:
May 22 15:27:43 dhcp231-176 kernel:  [<ffffffff804417f3>]
__register_kprobe+0x28a/0x2ed
May 22 15:27:43 dhcp231-176 kernel:  [<ffffffff8043ccc0>]
__sched_text_start+0x0/0x8d7
May 22 15:27:43 dhcp231-176 kernel:  [<ffffffff88402f6b>]
:stap_2727:systemtap_module_init+0x260/0x2de
May 22 15:27:43 dhcp231-176 kernel:  [<ffffffff88402ff2>]
:stap_2727:probe_start+0x9/0x13
May 22 15:27:43 dhcp231-176 kernel:  [<ffffffff8840301d>]
:stap_2727:_stp_handle_start+0x21/0x7c
May 22 15:27:43 dhcp231-176 kernel:  [<ffffffff884031f0>]
:stap_2727:_stp_ctl_write_cmd+0x178/0x6ee
May 22 15:27:43 dhcp231-176 kernel:  [<ffffffff802e7016>] file_has_perm+0x94/0xa3
May 22 15:27:43 dhcp231-176 kernel:  [<ffffffff80231798>] do_fork+0x11f/0x1a8
May 22 15:27:43 dhcp231-176 kernel:  [<ffffffff8028e3ad>] vfs_write+0xce/0x177
May 22 15:27:43 dhcp231-176 kernel:  [<ffffffff8028e970>] sys_write+0x45/0x6e
May 22 15:27:43 dhcp231-176 kernel:  [<ffffffff80209b9e>] system_call+0x7e/0x83
May 22 15:27:43 dhcp231-176 kernel:  [<ffffffff80209b9e>] system_call+0x7e/0x83
May 22 15:27:43 dhcp231-176 kernel: 
May 22 15:27:43 dhcp231-176 kernel: 
May 22 15:27:43 dhcp231-176 kernel: Code: c6 00 cc c3 48 8b 57 30 8a 47 68 88 02
c3 48 8b 97 08 01 00 
May 22 15:27:43 dhcp231-176 kernel: RIP  [<ffffffff804400d3>]
arch_arm_kprobe+0x4/0x8
May 22 15:27:43 dhcp231-176 kernel:  RSP <ffff810030b59dd0>
May 22 15:27:43 dhcp231-176 kernel: CR2: ffffffff8043ccc0




Comment 9 William Cohen 2007-05-22 23:10:12 UTC
Put im some diagnostic printk in mark_rodata_ro() for the 2.6.22-rc2. Below is
the output:

May 22 17:31:36 dhcp231-176 kernel: num_possible_cpus()= 1
May 22 17:31:36 dhcp231-176 kernel: start= ffffffff80207000  end= ffffffff80556000
May 22 17:31:36 dhcp231-176 kernel: Write protecting the kernel read-only data:
3388k

Looking up the address from the nm:

ffffffff80207000 T _stext
ffffffff80556090 R __end_rodata

Looks like all the kprobe address falls between those two.
ffffffff8043cd00 T schedule

This is uniprocessor machine so the process is marking the text as read only
too. Thus, the following start point is not used:
ffffffff804449a8 A _etext



Comment 10 William Cohen 2007-05-22 23:33:36 UTC
Looks like some changes at the beginning of May expanded the range of data that
CONFIG_DEBUG_RODATA protected:

http://www.linux-mips.org/git?p=linux.git;a=commitdiff;h=6fb14755a676282a4e6caa05a08c92db8e45cfff;hp=d01ad8dd56527be72947b4b9997bb2c05783c3ed#patch2

This isn't going to work well with things like kprobes that expect to be able to
put software break points in code.
Comment 11 Josh Stone 2007-05-23 01:05:53 UTC
(In reply to comment #10)
> This isn't going to work well with things like kprobes that expect to be able to
> put software break points in code.

Perhaps kprobes can add code that overrides the page protection when adding a
probe?  It may be able to add write access the page long enough to write the
breakpoint, and then turn the write back off...
Comment 12 Roland McGrath 2007-05-23 04:30:18 UTC
It is certainly possible to change the page protections on the fly as needed,
but it may make probe insertion and removal too costly.  kprobes maintainers
should experiment and measure that overhead.  The other option is to change the
CONFIG_DEBUG_RODATA code to start protecting only from _etext and not _stext
when #ifdef CONFIG_KPROBES.
Comment 13 Frank Ch. Eigler 2007-05-26 02:39:47 UTC
Some entertaining links:
http://www.archivum.info/linux.kernel/2005-11/msg04473.html
... according to which, RODATA should not be on by default, and is expected to
hurt kprobes

http://lkml.org/lkml/2007/5/25/155
... according to which, kprobes should work around RODATA
Comment 14 Prasanna S Panchamukhi 2007-06-08 08:09:00 UTC
Fix for i386 and x86_64 architectures has been posted to lkml.

http://lkml.org/lkml/2007/6/7/2
http://lkml.org/lkml/2007/6/7/3

Thanks
Prasanna
Comment 15 Jim Keniston 2007-06-08 16:47:45 UTC
The fix (both patches) has been pulled into the -mm tree, but the bug is in
Linus's 2.6.22-rc* kernels.  Thanks, Prasanna.  Please update this bugzilla when
the patches make it into Linus's kernel.