This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug lket/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine

From: "prasanna at in dot ibm dot com" <sourceware-bugzilla at sourceware dot org>
To: systemtap at sources dot redhat dot com
Date: 5 Jul 2007 10:52:56 -0000
Subject: [Bug lket/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
References: <20070424200707.4420.wcohen@redhat.com>
Reply-to: sourceware-bugzilla at sourceware dot org

------- Additional Comments From prasanna at in dot ibm dot com  2007-07-05 10:52 -------
>From the trace below, it looks like 2 nested pagefaults get generated while
executing the registered pre_handler(). 

This happens consistantly due to registered probe_handler on __switch_to().

First the registered  stap pre_handler()(enter_kprobe_probe()) gets executed and
that generates page_fault at an address 0xd0c73b8b and ends up calling
fixup_exception()->search_exception_tables()->search_extable()->search_module_extables().
This routine search_module_extables() takes up modlist_lock and then another
pagefault happens at an address 0xc01e9a0d due to search_module_extables()
called by fixup_exception() that tries to grab the modlist_lock. Since this is
on uniprocessor with nops as spinlock and SPINLOCK_DEBUG enabled, it panics with
message below.


-------------> 1st trace
Red Hat Enterprise Linux AS release 4 (Nahant Update 4)
Kernel 2.6.9-prep on an i686
k50wks273993wss.in.ibm.com login: fixup exception c010478d, pid = 2755
modlock d0c73b8b 
fixup exception c010478d, pid = 2755
modlock c01e9a0d
kernel/module.c:2115: spin_lock(kernel/module.c:c0370280) already
 locked by kernel/module.c/2115
modunlock c01e9a0d
Kernel panic - not syncing: kernel/module.c:2126:
spin_unlock(kernel/module.c:c0370280) not locked
 <3>kernel/sched.c:2430: spin_lock(kernel/sched.c:c040b5a0) already
 locked by kernel/sched.c/2685

-----------------------> 2nd trace
                                                                               
                                                               
fixup exception c010478d, pid = 3205
modlock d0c73b8b
fixup exception c010478d, pid = 3205
modlock c01e9a31
 [<c01e9a31>] search_extable+0x1f/0x36
 [<c0141fc6>] search_module_extables+0x23/0x17d
 [<c01e9a31>] search_extable+0x1f/0x36
 [<c013a653>] search_exception_tables+0x1f/0x21
 [<c011e3b3>] fixup_exception+0xb/0x20
 [<c011c481>] kprobe_exceptions_notify+0x1a9/0x1bd
 [<c0135031>] notifier_call_chain+0x17/0x2e
 [<c011d9b5>] do_page_fault+0x0/0x4dc
 [<c011da07>] do_page_fault+0x52/0x4dc
 [<c027b3d1>] ata_output_data+0x60/0x66
 [<c01ed14d>] __delay+0x9/0xa
 [<c02542c4>] serial8250_console_write+0x16c/0x1b2
 [<c0254158>] serial8250_console_write+0x0/0x1b2
 [<c011d9b5>] do_page_fault+0x0/0x4dc
 [<c031e6b7>] error_code+0x2f/0x38

Possible solutions:

                                                                               
                                                               1. Dont allow
probes on __switch_to() only on Uniprocessor machines.
2. Dont allow pagefaults, just recover using setjmp/longjmp() mechanism.
   (posted  earlier on systemtap mailing-lists)                                
                                                                               
                              
Any other possible solutions/suggestions?
                                                                               
                                                               

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4420

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]