This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
[Bug lket/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
- From: "prasanna at in dot ibm dot com" <sourceware-bugzilla at sourceware dot org>
- To: systemtap at sources dot redhat dot com
- Date: 5 Jul 2007 10:52:56 -0000
- Subject: [Bug lket/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
- References: <20070424200707.4420.wcohen@redhat.com>
- Reply-to: sourceware-bugzilla at sourceware dot org
------- Additional Comments From prasanna at in dot ibm dot com 2007-07-05 10:52 -------
>From the trace below, it looks like 2 nested pagefaults get generated while
executing the registered pre_handler().
This happens consistantly due to registered probe_handler on __switch_to().
First the registered stap pre_handler()(enter_kprobe_probe()) gets executed and
that generates page_fault at an address 0xd0c73b8b and ends up calling
fixup_exception()->search_exception_tables()->search_extable()->search_module_extables().
This routine search_module_extables() takes up modlist_lock and then another
pagefault happens at an address 0xc01e9a0d due to search_module_extables()
called by fixup_exception() that tries to grab the modlist_lock. Since this is
on uniprocessor with nops as spinlock and SPINLOCK_DEBUG enabled, it panics with
message below.
-------------> 1st trace
Red Hat Enterprise Linux AS release 4 (Nahant Update 4)
Kernel 2.6.9-prep on an i686
k50wks273993wss.in.ibm.com login: fixup exception c010478d, pid = 2755
modlock d0c73b8b
fixup exception c010478d, pid = 2755
modlock c01e9a0d
kernel/module.c:2115: spin_lock(kernel/module.c:c0370280) already
locked by kernel/module.c/2115
modunlock c01e9a0d
Kernel panic - not syncing: kernel/module.c:2126:
spin_unlock(kernel/module.c:c0370280) not locked
<3>kernel/sched.c:2430: spin_lock(kernel/sched.c:c040b5a0) already
locked by kernel/sched.c/2685
-----------------------> 2nd trace
fixup exception c010478d, pid = 3205
modlock d0c73b8b
fixup exception c010478d, pid = 3205
modlock c01e9a31
[<c01e9a31>] search_extable+0x1f/0x36
[<c0141fc6>] search_module_extables+0x23/0x17d
[<c01e9a31>] search_extable+0x1f/0x36
[<c013a653>] search_exception_tables+0x1f/0x21
[<c011e3b3>] fixup_exception+0xb/0x20
[<c011c481>] kprobe_exceptions_notify+0x1a9/0x1bd
[<c0135031>] notifier_call_chain+0x17/0x2e
[<c011d9b5>] do_page_fault+0x0/0x4dc
[<c011da07>] do_page_fault+0x52/0x4dc
[<c027b3d1>] ata_output_data+0x60/0x66
[<c01ed14d>] __delay+0x9/0xa
[<c02542c4>] serial8250_console_write+0x16c/0x1b2
[<c0254158>] serial8250_console_write+0x0/0x1b2
[<c011d9b5>] do_page_fault+0x0/0x4dc
[<c031e6b7>] error_code+0x2f/0x38
Possible solutions:
1. Dont allow
probes on __switch_to() only on Uniprocessor machines.
2. Dont allow pagefaults, just recover using setjmp/longjmp() mechanism.
(posted earlier on systemtap mailing-lists)
Any other possible solutions/suggestions?
--
http://sourceware.org/bugzilla/show_bug.cgi?id=4420
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.