I was in the process of testing reentrant probes, so I was calling a routing from inside a task thread's prehandler which had a probe, just to test reentrancy and this reentrancy test worked just fine with out any problem. However when I inserted another test module which inserted probes on ISR routine (__do_ISR), I see system crash. Here is what I think is happening. Our current kprobes design supports reentrancy only from one thread. If while in the process of reentrancy and before completing the single-stepping, if another probes on ISR fires, then we loose or overwrite the previous kprobes state and eventually crash the system. Will disabling interrupts while servicing the reentrant probes solve the problem? Need to try. The attached test case has 1)probes on my_test_reentrant_export_function(). 2)probes on schedule() and the pre_handler for schedule() calls my_test_reentrant_export_function() 3)probes on __do_ISR and the pre_handler for __do_ISR() calls my_test_reentrant_export_function(). Here is the system crash stack back trace while executing the above test on IA64. I think this problem should exist on PPC64 too and not sure on Ia32 as Ia32 disables interrupt while servicing the breakfault handler. ppc64?? [<a0000001000122a0>] show_stack+0x80/0xa0 sp=e000000001feed10 bsp=e000000001fe9360 [<a000000100012bb0>] show_regs+0x890/0x8c0 sp=e000000001feeee0 bsp=e000000001fe9318 [<a00000010003a560>] die+0x1a0/0x2a0 sp=e000000001feef00 bsp=e000000001fe92c8 [<a00000010003a6a0>] die_if_kernel+0x40/0x60 sp=e000000001feef20 bsp=e000000001fe9298 [<a000000100736a10>] ia64_bad_break+0x550/0x6c0 sp=e000000001feef20 bsp=e000000001fe9270 [<a00000010000c520>] ia64_leave_kernel+0x0/0x280 sp=e000000001feeff0 bsp=e000000001fe9270 [<a000000100739780>] kprobe_exceptions_notify+0x8a0/0x900 sp=e000000001fef1c0 bsp=e000000001fe91c0 [<a00000010073a560>] notifier_call_chain+0x80/0xe0 sp=e000000001fef1d0 bsp=e000000001fe9188 [<a000000100736b50>] ia64_bad_break+0x690/0x6c0 sp=e000000001fef1d0 bsp=e000000001fe9160 [<a00000010000c520>] ia64_leave_kernel+0x0/0x280 sp=e000000001fef2a0 bsp=e000000001fe9160 [<a0000001000ec220>] __do_IRQ+0x0/0x440 sp=e000000001fef470 bsp=e000000001fe9150 [<a0000001000112e0>] indle_irq+0xa0/0x140 sp=e000000001fef470 bsp=e000000001fe9118 [<a00000010000c520>] ia64_leave_kernel+0x0/0x280 sp=e00000000fe9118 [<a00000010073aca0>] kprobes_inc_nmissed_count+0x0/0x120 sp=e000000001fef640 bsp=e000000001fe9100 [<a0000001007392e0>] kprobe_exceptions_notify+0x sp=e000000001fef640 bsp=e000000001fe9070 [<a00000010073a560>] notifier_call_chain+0x80/0xe0 sp=e000000001fef650 bsp=e000000001fe900>] ia64_bad_break+0x690/0x6c0 sp=e000000001fef650 bsp=e000000001fe9010 [<a00000010000c520>] ia64_leave_kernel+0x0/0x280 s=e000000001fe9010 [<a00000020008c000>] my_test_reentrant_export_function+0x0/0x40 [mon_dummy] sp=e000000001fef8f0 bsp=e000000001fe9010 [<a0000002000e4140on_sched] sp=e000000001fef8f0 bsp=e000000001fe8ff0 [<a00000010073a840>] aggr_pre_handler+0x180/0x1c0 sp=e000000001fef8f0 b8 [<a000000100739570>] kprobe_exceptions_notify+0x690/0x900 sp=e000000001fef8f0 bsp=e000000001fe8f18 [<a00000010073a560>] notifier_call_chain+0x80/0xe0 sp=e000000001fef900 bsp=e000000001fe8ee0 [<a000000100736b50>] ia64_bad_break+0x690/0x6c0 sp=e000000001fef900 bsp=e000000001fe8eb8 [<a0000001000nel+0x0/0x280 sp=e000000001fef9d0 bsp=e000000001fe8eb8 [<a0000001007312e0>] schedule+0x0/0x15c0 sp=e000000001fefba0 bsp=e0<a00000010005d420>] kretprobe_trampoline+0x0/0x20 sp=e000000001fefba0 bsp=e000000001fe8e68 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
Created attachment 808 [details] test modules Attaching a test case. 1) Untar 2)cd reent_test; make 3) ./please_load_me 4) do some make -jx Should see a system crash in few minutes.
Anil, I run your test on ppc64, the system gave an oops after building the kernel with make -j8 for a while. Here's the trace kernel BUG in do_exit at kernel/exit.c:880! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=128 NUMA PSERIES LPAR Modules linked in: mon_sched mon_sched_4 mon_sched_3 mon_sched_2 mon_sched_1 mon_irq mon_irq_4 mon_irq_3 mon_irq_2 mon_irq_1 mon_reent mon_dummy ipv6 parport_pc lp parport sg autofs4 binfmt_misc dm_multipath dm_mod pdc202xx_new e1000 ipr firmware_class sd_mod scsi_mod NIP: C000000000059694 LR: C00000000002C9F8 CTR: C00000000004CFCC REGS: c000000038847a40 TRAP: 0700 Not tainted (2.6.15-rc5cel) MSR: 8000000000029032 <EE,ME,IR,DR> CR: 24004422 XER: 00000010 TASK = c000000041c447e0[32713] 'as' THREAD: c000000038844000 CPU: 1 GPR00: 0000000000000000 C000000038847CC0 C0000000005BF790 0000000000000000 GPR04: 8000000000001032 C0000000417FDC80 00000000283B9E5D C000000001F34DA0 GPR08: 0000000000000000 0000000000000004 C000000038847D30 C0000000005BF790 GPR12: 0000000024004482 C00000000048D400 00000000100F39F8 0000000010030000 GPR16: 0000000010030000 0000000010020000 00000000FFFF9008 0000000010050000 GPR20: 0000000010050000 0000000000000002 C000000041C44908 C000000038847D30 GPR24: C000000041C44970 C000000041C44890 C000000041C44890 C0000000028C47E0 GPR28: 0000000000000010 C000000041C447E0 C0000000004FCBB0 C000000038847D30 NIP [C000000000059694] .do_exit+0xa9c/0xda4 LR [C00000000002C9F8] kretprobe_trampoline+0x0/0x8 Call Trace: [C000000038847CC0] [C00000000002C9F8] kretprobe_trampoline+0x0/0x8 (unreliable) [C000000038847D90] [C000000000059A2C] .do_group_exit+0x50/0xe4 [C000000038847E30] [C000000000008600] syscall_exit+0x0/0x18 Instruction dump: 480586a1 60000000 e81d0018 39200000 f93d0788 70000008 0b000000 e93d0018 61290008 f93d0018 48348ea1 60000000 <0fe00000> 48000000 39200001 4bfffda8 <1>Fixing recursive fault but reboot is needed!
On ppc64, I tried to disable the interrupt in the kprobe handler in the case of reentry and re-enable interrupt when it came out of the handler and it seems to *WORK*. I was able to complete my kernel build (make -j8), where it gave an oops before.
I test in EM64T in linux 2.6.9 with RCU patch, it does not crash. But when I test it in linux 2.6.15-RC5-mm3, it crashed.
In linux 2.6.15-RC5-mm3, it does not crash in IA32, when running in EM64T, I find that when there is int3 instruction in the first int3 hanlder function, the first int 3 handler function can continue to execute, but when this funciton return, system will crash. For example, when kp_pre() in mon_sched.c calls my_test_reentrant_export_function function which has been probed, my_test_reentrant_export_function can continue to execute,but when it returned system crashed. And I think maybe it is the problem of trap stack in EM64T, I do not know how trap stack is established when trap happens in use/kenrel mode or trap executing mode.
I think this problem only happens in 2.6.15-rc5-mm3, there is one patch http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15- rc5/2.6.15-rc5-mm3/broken-out/x86_64-debug-stack.patch, this patch changed int3 handler stack, debug and int3 share the same stack(DEBUG_STACK), this stack is saved in TSS structure, (64-Bit Extension Technology Software Developer's Guide Volume 1 of 2, section 1.6.10.5). With this patch, every time debug/int3 exception happens, it will changed to DEBUG_STACK, so when reentrancy int3 happens, the later int3 handler will overwrite pervious DEBUG_STACK. So system crashed.
(In reply to comment #3) > On ppc64, I tried to disable the interrupt in the kprobe handler in the case of > reentry and re-enable interrupt when it came out of the handler and it seems to > *WORK*. I was able to complete my kernel build (make -j8), where it gave an oops > before. Hien, I tried your patch (porting onto IA64) and it did not work for me. Also I see the solution you have mentioned might now work for x86_64 too. Especially on x86_64 even when you disable interrupt, NMI's can still happen and any probes on that path will cause the problem again.
Created attachment 827 [details] [PATCH] please review and provide comments I am attaching a patch which has worked for me on IA64. Can someone port the same onto PPC64 & x86_64 and give it a try. Porting this onto other architecture should be very easy. Also I am looking for review comments on the patch itself. Thanks, Anil
(In reply to comment #7) > Hien, I tried your patch (porting onto IA64) and it did not work for me. Also > I see the solution you have mentioned might now work for x86_64 too. > Especially on x86_64 even when you disable interrupt, NMI's can still happen > and any probes on that path will cause the problem again. In x86_86 linux 2.6.15-git8, when INT3 trap happens recursively system will crash, but in x86_64 Linux v2.6.15, system will not crash. If you change arch/x86_64/kernel/traps.c:969 :set_system_gate_ist(3,&int3,DEBUG_STACK) as set_system_gate(3,&int3), system will not crash. Current in IA32 and X86_64, INT3 vector makes use of GATE_INTERRUPT, when trap happens hardware will clear interrupt flag automatically. So I think this bug is architecture relative, in IA64 it actually crashed.
(In reply to comment #8) > Created an attachment (id=827) > [PATCH] please review and provide comments > > I am attaching a patch which has worked for me on IA64. Can someone port the > same onto PPC64 & x86_64 and give it a try. Porting this onto other > architecture should be very easy. Also I am looking for review comments on the > patch itself. > Anil, Do you still disable IRQ in the handler in the case of re-entrance with this patch? I could test this patch with ppc64 after you verify this. Let me know. Thanks, Hien.
> Anil, > Do you still disable IRQ in the handler in the case of re-entrance with this patch? No, you don;t have to disable IRQ with my new approach which has worked on IA64. > I could test this patch with ppc64 after you verify this. Let me know. You have to port the patch to ppc64 and test it.
Anil, It works for ppc64. I ported your patch to ppc64 and run the test and build the kernel (make -j8). Kernel build completed no crash. I am going to look for a x86_64 box and try the same patch on that platform. Hien.
Anil, It works on x86_64 too. Ported and tested with kernel v2.6.15. I've just done my kernel build (make -j4) with the test running on x86_64. Great job.
Assigning this to Anil, since he's been coordinating the work on this.
Anil thinks that this is NOT fixed for x86_64: "I guess with patch you can get through make -j's for some time but won't run for overnight."
Created attachment 1450 [details] updated test modules Removes kallsym_lookup_name() and instead uses kp.symbol_name.
Created attachment 1451 [details] This patch seemed to work on both i386 and x86_64. save_previous_kprobe() and set_current_kprobe()'s call is now enclosed between local_irq_save() and local_irq_restore().
Can any one please test the above new patch and get back to me. I tested the above patch on both x86_64 and i386 and was NOT able to crash the system. The patch applies to 2.6.19-git11 or the latest mm. Here is the test procedure in case you want to know. 1)Build and boot the kernel with the isr_reent.patch 2)untar the reeent_test.tgz 3)cd reent_test 4)make; 5)./please_load_me ( all the modules gets loaded) 6)cd to_some_kernel_source_directory 7)while true; do make -j8; make clean; done Happy testing!! -Anil
This patch has *amazing* results. All my private kernel.function("*") tests are now passing (on i686; testing on x86-64 in progress). Please let's push it upstream immediately.
Anil, I ran your test on ppc64, on 2.6.21-rc6-mm1 for around 3 days. However I couldn't reproduce the problem. Hence I don't think that the problem affects ppc64. Please do push your patch and close the bug. If ever the problem is seen on ppc64 later we can reopen this bug.
Subject: Re: Probes on ISR with probes on task thread's prehandler crash the system On Mon, 2007-06-11 at 12:07 +0000, srikar at linux dot vnet dot ibm dot com wrote: > ------- Additional Comments From srikar at linux dot vnet dot ibm dot com 2007-06-11 12:07 ------- > Anil, > > I ran your test on ppc64, on 2.6.21-rc6-mm1 for around 3 days. However I > couldn't reproduce the problem. Hence I don't think that the problem affects ppc64. > Please do push your patch and close the bug. If ever the problem is seen on > ppc64 later we can reopen this bug. > Thanks, Srikar. Jim
(In reply to comment #21) > Subject: Re: Probes on ISR with probes on task thread's > prehandler crash the system > On Mon, 2007-06-11 at 12:07 +0000, srikar at linux dot vnet dot ibm dot > com wrote: > > ------- Additional Comments From srikar at linux dot vnet dot ibm dot com 2007-06-11 12:07 ------- > > Anil, > > > > I ran your test on ppc64, on 2.6.21-rc6-mm1 for around 3 days. However I > > couldn't reproduce the problem. Hence I don't think that the problem affects ppc64. > > Please do push your patch and close the bug. If ever the problem is seen on > > ppc64 later we can reopen this bug. > > Already pushed this for Ia64 and it is has made it to Linus's kernel. Will close this bug as we have covered for all architecture. -Anil
I found this bug still exist in the latest kernel. I'm investing this bug. Here is the kernel bug message: ------ probes registered probes registered probes registered probes registered probes registered probes registered probes registered probes registered probes registered probes registered probes registered kernel BUG at /home/mhiramat/ksrc/linux-2.6.24-rc7/kernel/exit.c:1050! mv[32200]: bugcheck! 0 [1] Modules linked in: mon_sched mon_sched_4 mon_sched_3 mon_sched_2 mon_sched_1 mon_irq mon_irq_4 mon_irq_3 mon_irq_2 mon_irq_1 mon_reent mon_dummy sunrpc binfmt_misc dm_multipath fan sg thermal processor button container dm_snapshot dm_zero dm_mirror dm_mod usb_storage megaraid_mbox megaraid_mm ehci_hcd ohci_hcd uhci_hcd usbcore Pid: 32200, CPU 1, comm: mv psr : 0000101008526030 ifs : 800000000000040c ip : [<a00000010009def0>] Not tainted (2.6.24-rc7) ip is at do_exit+0x11b0/0x11c0 unat: 0000000000000000 pfs : 000000000000040c rsc : 0000000000000003 rnat: 0000000000000400 bsps: 0000000000000400 pr : 0000000000556959 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a00000010009def0 b6 : a00000010008d300 b7 : a00000010000f9f0 f6 : 1003e000000000001c000 f7 : 1003e0000000000000400 f8 : 1003e0000000000000070 f9 : 0ffff8000000000000000 f10 : 10008fffffffff0000000 f11 : 1003e0000000000000400 r1 : a000000100e60c60 r2 : e00000071eab0024 r3 : a000000100c05348 r8 : 000000000000004a r9 : a000000100c05348 r10 : 0000000000000007 r11 : 0000000000004000 r12 : e0000005a639fe20 r13 : e0000005a6390000 r14 : 0000000000004000 r15 : a000000100c05348 r16 : a000000100c05330 r17 : 0000000000004000 r18 : 0000001516f675a7 r19 : e000000001129460 r20 : e000000001129460 r21 : e0000000011257d8 r22 : e000000707dea6d8 r23 : 0000001513fb8527 r24 : e000000707dea6c0 r25 : e0000000011257c0 r26 : a000000100c79e4c r27 : 0000000000000400 r28 : 0000000000000400 r29 : 0000000000001000 r30 : 0000000000000070 r31 : e00000071eab0048 Call Trace: [<a000000100015340>] show_stack+0x40/0xa0 sp=e0000005a639f9f0 bsp=e0000005a6390e78 [<a000000100015c50>] show_regs+0x850/0x8a0 sp=e0000005a639fbc0 bsp=e0000005a6390e20 [<a000000100038d60>] die+0x1a0/0x2a0 sp=e0000005a639fbc0 bsp=e0000005a6390dd0 [<a000000100038eb0>] die_if_kernel+0x50/0x80 sp=e0000005a639fbc0 bsp=e0000005a6390da0 [<a0000001007627c0>] ia64_bad_break+0x240/0x440 sp=e0000005a639fbc0 bsp=e0000005a6390d78 [<a00000010000b9a0>] ia64_leave_kernel+0x0/0x270 sp=e0000005a639fc50 bsp=e0000005a6390d78 [<a00000010009def0>] do_exit+0x11b0/0x11c0 sp=e0000005a639fe20 bsp=e0000005a6390d18 [<a00000010009e050>] do_group_exit+0x150/0x160 sp=e0000005a639fe30 bsp=e0000005a6390ce0 [<a00000010009e080>] sys_exit_group+0x20/0x40 sp=e0000005a639fe30 bsp=e0000005a6390c88 [<a00000010000b800>] ia64_ret_from_syscall+0x0/0x20 sp=e0000005a639fe30 bsp=e0000005a6390c88 [<a000000000010720>] __kernel_syscall_via_break+0x0/0x20 sp=e0000005a63a0000 bsp=e0000005a6390c88 Fixing recursive fault but reboot is needed! ------
As the result of investigation, I found a bug in restore_previous_kprobe(). This function and save_previous_kprobe() do FILO(stack) operation. These functions work as like below; save_previous_kprobe() // this pushes a value to stack { i = ++index; stack[i-1] = val; } restore_previous_kprobe() // this pops a value from stack { i = --index; // (a) val = stack[i]; // (b) } However, if an interrupt occurs between (a) and (b), and a kprobe is hit in that interrupt, this overwrites previous stack[] entry. restore_previous_kprobe() // this pops a value from stack { i = --index; // (a) (i == 0, index == 0) --(interrupt) save_previous_kprobe() // this pushes a value to stack { i = ++index; (i == 1, index == 1) stack[i-1] = val2; (!!overwrite stack[0]!!) } restore_previous_kprobe() // this pops a value from stack { i = --index; (i == 0, index == 0) val2 = stack[i]; (stack[0] == val2) } -- val = stack[i]; // (b) (val = val2) } Thus, the index must be decremented AFTER reading the value. restore_previous_kprobe() // this pops a value from stack { i = index; val = stack[i-1]; --index; }
Created attachment 2203 [details] Fix the order of atomic operations in restore_previous_kprobe() on ia64
re comments #25: patch looks good, please send to ia64 maillist.
(In reply to comment #26) > re comments #25: patch looks good, please send to ia64 maillist. Thank you for review, I sent this patch to linux-ia64 ml. Here is the title: [PATCH]Fix the order of atomic operations in restore_previous_kprobes on ia64 Could you give me your Ack on the ml?
The patch was merged into linus tree(2.6.25-rc1).