Summary: | 2.6.29-rc7 - kernel crash with sharedbuf.exp stap test script | ||
---|---|---|---|
Product: | systemtap | Reporter: | Mahesh J Salgaonkar <mahesh> |
Component: | runtime | Assignee: | Unassigned <systemtap> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: |
Description
Mahesh J Salgaonkar
2009-03-06 08:19:10 UTC
If some symbol is unresolved in an incoming module, none of its code is actually supposed to be executed. So the crash you see is probably not systemtap-generated code crashing, but the kernel doing it to itself. Perhaps you can reproduce it with some tiny hand-made module with an unresolved extern function reference. If your CONFIG_MODVERSIONS=y, try it with "# CONFIG_MODVERSIONS is not set" again to see if there will be different result. Otherwise, seems new problem is introduced in new kernel. Problem is reproducible even after disabling CONFIG_MODVERSIONS. $ cat /lib/modules/2.6.29-rc7-git2/build/.config|grep MODVERSION # CONFIG_MODVERSIONS is not set This was reproduced on 2.6.29-rc7/i386, but not reproduced on 2.6.29-rc6/i386. So, that seems kernel bug between them. Maybe, need to do git-bisect? BUG: unable to handle kernel paging request at f9ac9218 IP: [<c045b3cb>] load_module+0x14d9/0x164a Oops: 0000 [#1] SMP DEBUG_PAGEALLOC last sysfs file: /sys/module/xt_state/sections/.text Modules linked in: stap_869395e8a8c1abc66e314b36f05020cd_202 sco bridge stp bnep l2cap bluetooth sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 cpufreq_ondemand powernow_k8 dm_mirror dm_region_hash dm_log dm_multipath uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore rtc_cmos pcspkr dcdbas snd_page_alloc k8temp rtc_core tg3 hwmon i2c_nforce2 rtc_lib i2c_core libphy ata_generic pata_acpi sata_nv [last unloaded: stap_987bd8a88d00f68d84ccb5d3c698cdcb_414] Pid: 9875, comm: staprun Not tainted (2.6.29-rc7 #1) OptiPlex 740 EIP: 0060:[<c045b3cb>] EFLAGS: 00210246 CPU: 0 EIP is at load_module+0x14d9/0x164a EAX: 00200286 EBX: f9ac90e0 ECX: 00000003 EDX: 00000001 ESI: f9ac90e0 EDI: fffffffe EBP: f28c7fa0 ESP: f28c7ea4 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process staprun (pid: 9875, ti=f28c7000 task=f3523e40 task.ti=f28c7000) Stack: 0000002a f3523e40 f28c7f14 00200046 00023b59 00000162 f28c7f5c b95bb088 f9ab703c f9ab735c c08689c4 f3523e40 f3524314 f9a93000 f9ab6cf4 f9ab6b59 f2874aa0 f9ac195c 00000029 00000010 00000000 f9ac90e0 00000000 f28c7f1c Call Trace: [<c0452602>] ? mark_lock+0x1e/0x349 [<c06f3dff>] ? __mutex_lock_common+0x2d4/0x329 [<c06f3e82>] ? mutex_lock_interruptible_nested+0x2e/0x35 [<c045b658>] ? sys_init_module+0x41/0x18c [<c040846b>] ? sysenter_do_call+0x12/0x3f Code: 58 ff ff ff 8b 93 d4 00 00 00 89 d8 e8 84 0e fc ff 83 bd 5c ff ff ff 00 74 0b 8b 85 5c ff ff ff e8 ac e4 ff ff 8b b5 58 ff ff ff <8b> 86 38 01 00 00 e8 9b e4 ff ff 8b 85 44 ff ff ff e8 25 5f 04 EIP: [<c045b3cb>] load_module+0x14d9/0x164a SS:ESP 0068:f28c7ea4 ---[ end trace 2a7ec952fee814db ]--- Kernel panic - not syncing: Fatal exception $ eu-addr2line -e vmlinux 0xc045b3cb /home/mhiramat/ksrc/linux-2.6/kernel/module.c:2298 2292 free_core: 2293 module_free(mod, mod->module_core); 2294 free_percpu: 2295 if (percpu) 2296 percpu_modfree(percpu); 2297 #if defined(CONFIG_MODULE_UNLOAD) && defined(CONFIG_SMP) 2298 percpu_modfree(mod->refptr); 2299 #endif 2300 free_mod: 2301 kfree(args); it seems mod->refptr is invalid. Perhaps, below commit caused this issue. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=720eba31f47aeade8ec130ca7f4353223c49170f Interesting... I don't know a whole lot about percpu_mod*, but looking at the use of mod->refptr, its just allocated and freed -- no other operations are performed on it, or so it seems. Its allocated in load_module() and freed either in load_module() itself on error, or in free_module(). However, free_module() is invoked either from delete_module() or if the module init function returns an error (much after load_module() has run). That leads one to wonder if percpu_modalloc() itself returned an invalid pointer? If so, why is it happening only in this case? Or there may be some kind of data corruption in struct module (refptr is the last element of the structure). Further datapoint -- on a RHEL5.3 system running rc8, insmod just fails to load a module with an unknown symbol -- so, no crash. However, Fedora's module-init-tools is probably more tolerant? > Further datapoint -- on a RHEL5.3 system running rc8, insmod just fails to load
> a module with an unknown symbol -- so, no crash. However, Fedora's
> module-init-tools is probably more tolerant?
I see no reason to think that this is a userspace issue.
Subject: Re: 2.6.29-rc7 - kernel crash with
sharedbuf.exp stap test script
> I see no reason to think that this is a userspace issue.
Sure, I don't suggest that could be the main issue however. We need
someone with good knowledge of the module loading subsystem as well as
the per_cpu allocators to take a closer look at the issue on hand.
From my investigation, this problem is just a kernel bug. mod pointer moved on mod->module_core after copying all sections. So, after freeing module_core, you must not access mod->(members), because it has been freed. I'll report and post a fix patch to LKML. Thanks, The fix hit Linus' tree (commit 6e2b757). |