2071 – Probes on ISR with probes on task thread's prehandler crash the system

Bug 2071 - Probes on ISR with probes on task thread's prehandler crash the system

Summary: Probes on ISR with probes on task thread's prehandler crash the system

Status:	RESOLVED FIXED

Alias:	None

Product:	systemtap
Classification:	Unclassified
Component:	kprobes (show other bugs)
Version:	unspecified

Importance:	P1 normal
Target Milestone:	---
Assignee:	Masami Hiramatsu

URL:
Keywords:

Depends on:
Blocks:

Reported:	2005-12-20 07:31 UTC by Anil S Keshavamurthy
Modified:	2008-03-13 14:59 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
test modules (1.60 KB, application/x-gzip-compressed) 2005-12-20 07:34 UTC, Anil S Keshavamurthy	Details
[PATCH] please review and provide comments (1.01 KB, patch) 2006-01-13 02:14 UTC, Anil S Keshavamurthy	Details \| Diff
updated test modules (1.69 KB, application/x-gzip-compressed) 2006-12-08 16:41 UTC, Anil S Keshavamurthy	Details
This patch seemed to work on both i386 and x86_64. (1.92 KB, patch) 2006-12-08 16:44 UTC, Anil S Keshavamurthy	Details \| Diff
Fix the order of atomic operations in restore_previous_kprobe() on ia64 (497 bytes, patch) 2008-01-17 21:58 UTC, Masami Hiramatsu	Details \| Diff
Show Obsolete (2) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Anil S Keshavamurthy 2005-12-20 07:31:12 UTC

I was in the process of testing reentrant probes, so I was calling a routing 
from inside a task thread's prehandler which had a probe, just to test 
reentrancy and this reentrancy test worked just fine with out any problem.

However when I inserted another test module which inserted probes on ISR 
routine (__do_ISR), I see system crash.

Here is what I think is happening.
Our current kprobes design supports reentrancy only from one thread. If while 
in the process of reentrancy and before completing the single-stepping, if  
another probes on ISR fires, then we loose or overwrite the previous kprobes 
state and eventually crash the system.

Will disabling interrupts while servicing the reentrant probes solve the 
problem? Need to try.


The attached test case has
1)probes on my_test_reentrant_export_function().
2)probes on schedule() and the pre_handler for schedule() calls 
my_test_reentrant_export_function()
3)probes on __do_ISR and the pre_handler for __do_ISR() calls
my_test_reentrant_export_function().

Here is the system crash stack back trace while executing the above test on 
IA64. I think this problem should exist on PPC64 too and not sure on Ia32 as 
Ia32 disables interrupt while servicing the breakfault handler. ppc64??

[<a0000001000122a0>] show_stack+0x80/0xa0
                                sp=e000000001feed10 bsp=e000000001fe9360
 [<a000000100012bb0>] show_regs+0x890/0x8c0
                                sp=e000000001feeee0 bsp=e000000001fe9318
 [<a00000010003a560>] die+0x1a0/0x2a0
                                sp=e000000001feef00 bsp=e000000001fe92c8
 [<a00000010003a6a0>] die_if_kernel+0x40/0x60
                                sp=e000000001feef20 bsp=e000000001fe9298
 [<a000000100736a10>] ia64_bad_break+0x550/0x6c0
                                sp=e000000001feef20 bsp=e000000001fe9270
 [<a00000010000c520>] ia64_leave_kernel+0x0/0x280
                                sp=e000000001feeff0 bsp=e000000001fe9270
 [<a000000100739780>] kprobe_exceptions_notify+0x8a0/0x900
                                sp=e000000001fef1c0 bsp=e000000001fe91c0
 [<a00000010073a560>] notifier_call_chain+0x80/0xe0
                                sp=e000000001fef1d0 bsp=e000000001fe9188
 [<a000000100736b50>] ia64_bad_break+0x690/0x6c0
                                sp=e000000001fef1d0 bsp=e000000001fe9160
 [<a00000010000c520>] ia64_leave_kernel+0x0/0x280
                                sp=e000000001fef2a0 bsp=e000000001fe9160
 [<a0000001000ec220>] __do_IRQ+0x0/0x440
                                sp=e000000001fef470 bsp=e000000001fe9150
 [<a0000001000112e0>] indle_irq+0xa0/0x140
                                sp=e000000001fef470 bsp=e000000001fe9118
 [<a00000010000c520>] ia64_leave_kernel+0x0/0x280
                                sp=e00000000fe9118
 [<a00000010073aca0>] kprobes_inc_nmissed_count+0x0/0x120
                                sp=e000000001fef640 bsp=e000000001fe9100
 [<a0000001007392e0>] kprobe_exceptions_notify+0x                         
sp=e000000001fef640 bsp=e000000001fe9070
 [<a00000010073a560>] notifier_call_chain+0x80/0xe0
                                sp=e000000001fef650 bsp=e000000001fe900>] 
ia64_bad_break+0x690/0x6c0
                                sp=e000000001fef650 bsp=e000000001fe9010
 [<a00000010000c520>] ia64_leave_kernel+0x0/0x280
                                s=e000000001fe9010
 [<a00000020008c000>] my_test_reentrant_export_function+0x0/0x40 [mon_dummy]
                                sp=e000000001fef8f0 bsp=e000000001fe9010
 [<a0000002000e4140on_sched]
                                sp=e000000001fef8f0 bsp=e000000001fe8ff0
 [<a00000010073a840>] aggr_pre_handler+0x180/0x1c0
                                sp=e000000001fef8f0 b8
 [<a000000100739570>] kprobe_exceptions_notify+0x690/0x900
                                sp=e000000001fef8f0 bsp=e000000001fe8f18
 [<a00000010073a560>] notifier_call_chain+0x80/0xe0
          sp=e000000001fef900 bsp=e000000001fe8ee0
 [<a000000100736b50>] ia64_bad_break+0x690/0x6c0
                                sp=e000000001fef900 bsp=e000000001fe8eb8
 [<a0000001000nel+0x0/0x280
                                sp=e000000001fef9d0 bsp=e000000001fe8eb8
 [<a0000001007312e0>] schedule+0x0/0x15c0
                                sp=e000000001fefba0 bsp=e0<a00000010005d420>] 
kretprobe_trampoline+0x0/0x20
                                sp=e000000001fefba0 bsp=e000000001fe8e68
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!

Comment 1 Anil S Keshavamurthy 2005-12-20 07:34:45 UTC

Created attachment 808 [details]
test modules

Attaching a test case.
1) Untar
2)cd reent_test; make
3) ./please_load_me
4) do some make -jx

Should see a system crash in few minutes.

Comment 2 Hien Nguyen 2005-12-21 18:11:07 UTC

Anil, I run your test on ppc64, the system gave an oops after building the
kernel with make -j8 for a while. Here's the trace

kernel BUG in do_exit at kernel/exit.c:880!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=128 NUMA PSERIES LPAR
Modules linked in: mon_sched mon_sched_4 mon_sched_3 mon_sched_2 mon_sched_1
mon_irq mon_irq_4 mon_irq_3 mon_irq_2 mon_irq_1 mon_reent mon_dummy ipv6
parport_pc lp parport sg autofs4 binfmt_misc dm_multipath dm_mod pdc202xx_new
e1000 ipr firmware_class sd_mod scsi_mod
NIP: C000000000059694 LR: C00000000002C9F8 CTR: C00000000004CFCC
REGS: c000000038847a40 TRAP: 0700   Not tainted  (2.6.15-rc5cel)
MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 24004422  XER: 00000010
TASK = c000000041c447e0[32713] 'as' THREAD: c000000038844000 CPU: 1
GPR00: 0000000000000000 C000000038847CC0 C0000000005BF790 0000000000000000
GPR04: 8000000000001032 C0000000417FDC80 00000000283B9E5D C000000001F34DA0
GPR08: 0000000000000000 0000000000000004 C000000038847D30 C0000000005BF790
GPR12: 0000000024004482 C00000000048D400 00000000100F39F8 0000000010030000
GPR16: 0000000010030000 0000000010020000 00000000FFFF9008 0000000010050000
GPR20: 0000000010050000 0000000000000002 C000000041C44908 C000000038847D30
GPR24: C000000041C44970 C000000041C44890 C000000041C44890 C0000000028C47E0
GPR28: 0000000000000010 C000000041C447E0 C0000000004FCBB0 C000000038847D30
NIP [C000000000059694] .do_exit+0xa9c/0xda4
LR [C00000000002C9F8] kretprobe_trampoline+0x0/0x8
Call Trace:
[C000000038847CC0] [C00000000002C9F8] kretprobe_trampoline+0x0/0x8 (unreliable)
[C000000038847D90] [C000000000059A2C] .do_group_exit+0x50/0xe4
[C000000038847E30] [C000000000008600] syscall_exit+0x0/0x18
Instruction dump:
480586a1 60000000 e81d0018 39200000 f93d0788 70000008 0b000000 e93d0018
61290008 f93d0018 48348ea1 60000000 <0fe00000> 48000000 39200001 4bfffda8
 <1>Fixing recursive fault but reboot is needed!

Comment 3 Hien Nguyen 2005-12-22 01:17:09 UTC

On ppc64, I tried to disable the interrupt in the kprobe handler in the case of
reentry and re-enable interrupt when it came out of the handler and it seems to
*WORK*. I was able to complete my kernel build (make -j8), where it gave an oops
before.

Comment 4 bibo,mao 2005-12-26 09:15:02 UTC

I test in EM64T in linux 2.6.9 with RCU patch, it does not crash. But when I 
test it in linux 2.6.15-RC5-mm3, it crashed.

Comment 5 bibo,mao 2005-12-27 03:41:21 UTC

In linux 2.6.15-RC5-mm3, it does not crash in IA32, when running in EM64T, I 
find that when there is int3 instruction in the first int3 hanlder function, 
the first int 3 handler function can continue to execute, but when this 
funciton return, system will crash.
For example, when kp_pre() in mon_sched.c calls 
my_test_reentrant_export_function function which has been probed, 
my_test_reentrant_export_function can continue to execute,but when it returned 
system crashed.
And I think maybe it is the problem of trap stack in EM64T, I do not know how 
trap stack is established when trap happens in use/kenrel mode or trap 
executing mode.

Comment 6 bibo,mao 2005-12-29 08:53:38 UTC

I think this problem only happens in 2.6.15-rc5-mm3, there is one patch 
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15-
rc5/2.6.15-rc5-mm3/broken-out/x86_64-debug-stack.patch, this patch changed int3 
handler stack, debug and int3 share the same stack(DEBUG_STACK), this stack is 
saved in TSS structure, (64-Bit Extension Technology Software Developer's Guide 
Volume 1 of 2, section 1.6.10.5).
With this patch, every time debug/int3 exception happens, it will changed to 
DEBUG_STACK, so when reentrancy int3 happens, the later int3 handler will 
overwrite pervious DEBUG_STACK. So system crashed.

Comment 7 Anil S Keshavamurthy 2006-01-13 02:08:32 UTC

(In reply to comment #3)
> On ppc64, I tried to disable the interrupt in the kprobe handler in the case 
of
> reentry and re-enable interrupt when it came out of the handler and it seems 
to
> *WORK*. I was able to complete my kernel build (make -j8), where it gave an 
oops
> before. 
Hien, I tried your patch (porting onto IA64) and it did not work for me. Also 
I see the solution you have mentioned might now work for x86_64 too. 
Especially on x86_64 even when you disable interrupt, NMI's can still happen 
and any probes on that path will cause the problem again.

Comment 8 Anil S Keshavamurthy 2006-01-13 02:14:54 UTC

Created attachment 827 [details]
[PATCH] please review and provide comments

I am attaching a patch which has worked for me on IA64. Can someone port the
same onto PPC64 & x86_64 and give it a try. Porting this onto other
architecture should be very easy. Also I am looking for review comments on the
patch itself.

Thanks,
Anil

Comment 9 bibo,mao 2006-01-13 04:42:28 UTC

(In reply to comment #7)
> Hien, I tried your patch (porting onto IA64) and it did not work for me. Also 
> I see the solution you have mentioned might now work for x86_64 too. 
> Especially on x86_64 even when you disable interrupt, NMI's can still happen 
> and any probes on that path will cause the problem again.
In x86_86 linux 2.6.15-git8, when INT3 trap happens recursively system will 
crash, but in x86_64 Linux v2.6.15, system will not crash. If you change 
arch/x86_64/kernel/traps.c:969 :set_system_gate_ist(3,&int3,DEBUG_STACK) as 
set_system_gate(3,&int3), system will not crash. 
Current in IA32 and X86_64, INT3 vector makes use of GATE_INTERRUPT, when trap 
happens hardware will clear interrupt flag automatically. So I think this bug 
is architecture relative, in IA64 it actually crashed.

Comment 10 Hien Nguyen 2006-01-13 17:37:08 UTC

(In reply to comment #8)
> Created an attachment (id=827)
> [PATCH] please review and provide comments
> 
> I am attaching a patch which has worked for me on IA64. Can someone port the
> same onto PPC64 & x86_64 and give it a try. Porting this onto other
> architecture should be very easy. Also I am looking for review comments on the
> patch itself.
> 
Anil,
Do you still disable IRQ in the handler in the case of re-entrance with this patch?

I could test this patch with ppc64 after you verify this. Let me know.
Thanks, Hien.

Comment 11 Anil S Keshavamurthy 2006-01-13 19:23:11 UTC

> Anil,
> Do you still disable IRQ in the handler in the case of re-entrance with this 
patch?
No, you don;t have to disable IRQ with my new approach which has worked on 
IA64.

> I could test this patch with ppc64 after you verify this. Let me know.
You have to port the patch to ppc64 and test it.

Comment 12 Hien Nguyen 2006-01-13 23:00:31 UTC

Anil,

It works for ppc64. I ported your patch to ppc64 and run the test and build the
kernel (make -j8). Kernel build completed no crash.

I am going to look for a x86_64 box and try the same patch on that platform.

Hien.

Comment 13 Hien Nguyen 2006-01-14 00:55:52 UTC

Anil,

It works on x86_64 too. Ported and tested with kernel v2.6.15. I've just done my
kernel build  (make -j4)  with the test running on x86_64. Great job.

Comment 14 Jim Keniston 2006-01-16 21:14:27 UTC

Assigning this to Anil, since he's been coordinating the work on this.

Comment 15 Jim Keniston 2006-04-11 00:15:18 UTC

Anil thinks that this is NOT fixed for x86_64:
"I guess with patch you can get through make -j's for some time but won't
run for overnight."

Comment 16 Anil S Keshavamurthy 2006-12-08 16:41:59 UTC

Created attachment 1450 [details]
updated test modules

Removes kallsym_lookup_name() and instead uses kp.symbol_name.

Comment 17 Anil S Keshavamurthy 2006-12-08 16:44:33 UTC

Created attachment 1451 [details]
This patch seemed to work on both i386 and x86_64.

save_previous_kprobe() and set_current_kprobe()'s call is now enclosed between
local_irq_save() and local_irq_restore().

Comment 18 Anil S Keshavamurthy 2006-12-08 16:54:21 UTC

Can any one please test the above new patch and get back to me. I tested the 
above patch on both x86_64 and i386 and was NOT able to crash the system. The 
patch applies to 2.6.19-git11 or the latest mm.

Here is the test procedure in case you want to know.

1)Build and boot the kernel with the isr_reent.patch
2)untar the reeent_test.tgz
3)cd reent_test
4)make;
5)./please_load_me ( all the modules gets loaded)
6)cd to_some_kernel_source_directory
7)while true; do make -j8; make clean; done

Happy testing!!

-Anil

Comment 19 Frank Ch. Eigler 2006-12-11 18:30:45 UTC

This patch has *amazing* results.  All my private kernel.function("*") tests are
now passing (on i686; testing on x86-64 in progress).  Please let's push it
upstream immediately.

Comment 20 Srikar Dronamraju 2007-06-11 12:07:46 UTC

Anil, 

I ran your test on ppc64, on 2.6.21-rc6-mm1 for around 3 days. However I
couldn't reproduce the problem. Hence I don't think that the problem affects ppc64. 
Please do push your patch and close the bug. If ever the problem is seen on
ppc64 later we can reopen this bug.

Comment 21 Jim Keniston 2007-06-11 16:14:11 UTC

Subject: Re:  Probes on ISR with probes on task thread's
	prehandler crash the system

On Mon, 2007-06-11 at 12:07 +0000, srikar at linux dot vnet dot ibm dot
com wrote:
> ------- Additional Comments From srikar at linux dot vnet dot ibm dot com  2007-06-11 12:07 -------
> Anil, 
> 
> I ran your test on ppc64, on 2.6.21-rc6-mm1 for around 3 days. However I
> couldn't reproduce the problem. Hence I don't think that the problem affects ppc64. 
> Please do push your patch and close the bug. If ever the problem is seen on
> ppc64 later we can reopen this bug.
> 

Thanks, Srikar.
Jim

Comment 22 Anil S Keshavamurthy 2007-06-11 16:43:47 UTC

(In reply to comment #21)
> Subject: Re:  Probes on ISR with probes on task thread's
> 	prehandler crash the system
> On Mon, 2007-06-11 at 12:07 +0000, srikar at linux dot vnet dot ibm dot
> com wrote:
> > ------- Additional Comments From srikar at linux dot vnet dot ibm dot com  
2007-06-11 12:07 -------
> > Anil, 
> > 
> > I ran your test on ppc64, on 2.6.21-rc6-mm1 for around 3 days. However I
> > couldn't reproduce the problem. Hence I don't think that the problem 
affects ppc64. 
> > Please do push your patch and close the bug. If ever the problem is seen on
> > ppc64 later we can reopen this bug.
> > 
Already pushed this for Ia64 and it is has made it to Linus's kernel.
Will close this bug as we have covered for all architecture.


-Anil

Comment 23 Masami Hiramatsu 2008-01-17 14:53:36 UTC

I found this bug still exist in the latest kernel.
I'm investing this bug.

Here is the kernel bug message:
------
probes registered
probes registered
probes registered
probes registered
probes registered
probes registered
probes registered
probes registered
probes registered
probes registered
probes registered
kernel BUG at /home/mhiramat/ksrc/linux-2.6.24-rc7/kernel/exit.c:1050!
mv[32200]: bugcheck! 0 [1]
Modules linked in: mon_sched mon_sched_4 mon_sched_3 mon_sched_2 mon_sched_1
mon_irq mon_irq_4 mon_irq_3 mon_irq_2 mon_irq_1 mon_reent mon_dummy sunrpc
binfmt_misc dm_multipath fan sg thermal processor button container dm_snapshot
dm_zero dm_mirror dm_mod usb_storage megaraid_mbox megaraid_mm ehci_hcd ohci_hcd
uhci_hcd usbcore

Pid: 32200, CPU 1, comm:                   mv
psr : 0000101008526030 ifs : 800000000000040c ip  : [<a00000010009def0>]    Not
tainted (2.6.24-rc7)
ip is at do_exit+0x11b0/0x11c0
unat: 0000000000000000 pfs : 000000000000040c rsc : 0000000000000003
rnat: 0000000000000400 bsps: 0000000000000400 pr  : 0000000000556959
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a00000010009def0 b6  : a00000010008d300 b7  : a00000010000f9f0
f6  : 1003e000000000001c000 f7  : 1003e0000000000000400
f8  : 1003e0000000000000070 f9  : 0ffff8000000000000000
f10 : 10008fffffffff0000000 f11 : 1003e0000000000000400
r1  : a000000100e60c60 r2  : e00000071eab0024 r3  : a000000100c05348
r8  : 000000000000004a r9  : a000000100c05348 r10 : 0000000000000007
r11 : 0000000000004000 r12 : e0000005a639fe20 r13 : e0000005a6390000
r14 : 0000000000004000 r15 : a000000100c05348 r16 : a000000100c05330
r17 : 0000000000004000 r18 : 0000001516f675a7 r19 : e000000001129460
r20 : e000000001129460 r21 : e0000000011257d8 r22 : e000000707dea6d8
r23 : 0000001513fb8527 r24 : e000000707dea6c0 r25 : e0000000011257c0
r26 : a000000100c79e4c r27 : 0000000000000400 r28 : 0000000000000400
r29 : 0000000000001000 r30 : 0000000000000070 r31 : e00000071eab0048

Call Trace:
 [<a000000100015340>] show_stack+0x40/0xa0
                                sp=e0000005a639f9f0 bsp=e0000005a6390e78
 [<a000000100015c50>] show_regs+0x850/0x8a0
                                sp=e0000005a639fbc0 bsp=e0000005a6390e20
 [<a000000100038d60>] die+0x1a0/0x2a0
                                sp=e0000005a639fbc0 bsp=e0000005a6390dd0
 [<a000000100038eb0>] die_if_kernel+0x50/0x80
                                sp=e0000005a639fbc0 bsp=e0000005a6390da0
 [<a0000001007627c0>] ia64_bad_break+0x240/0x440
                                sp=e0000005a639fbc0 bsp=e0000005a6390d78
 [<a00000010000b9a0>] ia64_leave_kernel+0x0/0x270
                                sp=e0000005a639fc50 bsp=e0000005a6390d78
 [<a00000010009def0>] do_exit+0x11b0/0x11c0
                                sp=e0000005a639fe20 bsp=e0000005a6390d18
 [<a00000010009e050>] do_group_exit+0x150/0x160
                                sp=e0000005a639fe30 bsp=e0000005a6390ce0
 [<a00000010009e080>] sys_exit_group+0x20/0x40
                                sp=e0000005a639fe30 bsp=e0000005a6390c88
 [<a00000010000b800>] ia64_ret_from_syscall+0x0/0x20
                                sp=e0000005a639fe30 bsp=e0000005a6390c88
 [<a000000000010720>] __kernel_syscall_via_break+0x0/0x20
                                sp=e0000005a63a0000 bsp=e0000005a6390c88
Fixing recursive fault but reboot is needed!
------

Comment 24 Masami Hiramatsu 2008-01-17 21:56:03 UTC

As the result of investigation, I found a bug in restore_previous_kprobe().
This function and save_previous_kprobe() do FILO(stack) operation. 
These functions work as like below;

save_previous_kprobe() // this pushes a value to stack
{
i = ++index;
stack[i-1] = val;
}

restore_previous_kprobe() // this pops a value from stack
{
i = --index;    // (a)
val = stack[i]; // (b)
}

However, if an interrupt occurs between (a) and (b), and a kprobe
is hit in that interrupt, this overwrites previous stack[] entry.

restore_previous_kprobe() // this pops a value from stack
{
i = --index;    // (a) (i == 0, index == 0)
--(interrupt)
	save_previous_kprobe() // this pushes a value to stack
	{
	i = ++index; (i == 1, index == 1)
	stack[i-1] = val2; (!!overwrite stack[0]!!)
	}
	restore_previous_kprobe() // this pops a value from stack
	{
	i = --index; (i == 0, index == 0)
	val2 = stack[i]; (stack[0] == val2)
	}
--
val = stack[i]; // (b) (val = val2)
}

Thus, the index must be decremented AFTER reading the value.

restore_previous_kprobe() // this pops a value from stack
{
i = index;
val = stack[i-1];
--index;
}

Comment 25 Masami Hiramatsu 2008-01-17 21:58:15 UTC

Created attachment 2203 [details]
Fix the order of atomic operations in restore_previous_kprobe() on ia64

Comment 26 Shaohua Li 2008-01-18 03:29:52 UTC

re comments #25: patch looks good, please send to ia64 maillist.

Comment 27 Masami Hiramatsu 2008-01-21 22:46:29 UTC

(In reply to comment #26)
> re comments #25: patch looks good, please send to ia64 maillist.

Thank you for review,
I sent this patch to linux-ia64 ml.
Here is the title:
[PATCH]Fix the order of atomic operations in restore_previous_kprobes on ia64

Could you give me your Ack on the ml?

Comment 28 Masami Hiramatsu 2008-03-13 14:59:36 UTC

The patch was merged into linus tree(2.6.25-rc1).