1229 – system hangs when "schedule"/"do_IRQ" probed

Bug 1229 - system hangs when "schedule"/"do_IRQ" probed

Summary: system hangs when "schedule"/"do_IRQ" probed

Status:	RESOLVED FIXED

Alias:	None

Product:	systemtap
Classification:	Unclassified
Component:	kprobes (show other bugs)
Version:	unspecified

Importance:	P2 normal
Target Milestone:	---
Assignee:	Ananth N Mavinakayanahalli

URL:
Keywords:

Depends on:
Blocks:

Reported:	2005-08-22 21:30 UTC by Kevin Stafford
Modified:	2005-09-06 14:08 UTC (History)
CC List:	7 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
this is the module that monitors calls kernel calls to schedule (530 bytes, text/plain) 2005-08-22 21:32 UTC, Kevin Stafford	Details
this is the module that monitors calls kernel calls to do_IRQ (519 bytes, text/plain) 2005-08-22 21:32 UTC, Kevin Stafford	Details
this Makefile will build them... (158 bytes, text/plain) 2005-08-22 21:33 UTC, Kevin Stafford	Details
possible fix for issue with probing both do_irq and schedule (739 bytes, patch) 2005-08-26 21:58 UTC, Ananth N Mavinakayanahalli	Details \| Diff
kpobe_cpu is updated while interrupts disabled (336 bytes, patch) 2005-08-29 19:36 UTC, Anil S Keshavamurthy	Details \| Diff
another attempt to fix irq-sched issue (1007 bytes, patch) 2005-08-29 20:57 UTC, Ananth N Mavinakayanahalli	Details \| Diff
changing preempt_enable to preempt_enable_no_resched (689 bytes, patch) 2005-08-31 00:58 UTC, Anil S Keshavamurthy	Details \| Diff
use preempt_enable_no_resched() instead of preempt_enable() on ppc64 (187 bytes, patch) 2005-08-31 13:12 UTC, Ananth N Mavinakayanahalli	Details \| Diff
Mother of all patches to fix irq-sched issue on RHEL4 U2 (2.12 KB, patch) 2005-09-01 22:26 UTC, Anil S Keshavamurthy	Details \| Diff
Final - Mother of all patches to fix irq-sched issue on RHEL4 U2 (2.25 KB, patch) 2005-09-01 23:59 UTC, Anil S Keshavamurthy	Details \| Diff
Show Obsolete (6) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Kevin Stafford 2005-08-22 21:30:47 UTC

I have two bare-bones C modules: The first places a kprobe, jprobe, & kretprobe
on "schedule" function, the second does the same to "do_IRQ." The handlers all
simply increment counters (no printk's). Both modules can be used independently
of one another without problems however running both modules together causes the
system to eventually hang. The results are sporadic, sometimes significant
execution time is required to cause a problem, other times the system will
freeze immediately.

Comment 1 Kevin Stafford 2005-08-22 21:32:09 UTC

Created attachment 606 [details]
this is the module that monitors calls kernel calls to schedule

Comment 2 Kevin Stafford 2005-08-22 21:32:35 UTC

Created attachment 607 [details]
this is the module that monitors calls kernel calls to do_IRQ

Comment 3 Kevin Stafford 2005-08-22 21:33:00 UTC

Created attachment 608 [details]
this Makefile will build them...

Comment 4 Kevin Stafford 2005-08-22 21:35:51 UTC

Here is some information about the machine I used to test these modules:

Linux elm3b90 2.6.9-15.EL #1 SMP Fri Aug 5 18:57:49 EDT 2005 ppc64 ppc64 ppc64
GNU/Linux

Comment 5 Ananth N Mavinakayanahalli 2005-08-24 20:49:12 UTC

Here is the xmon log from one recreate:

cpu 0x1: Vector: 700 (Program Check) at [c0000000009d73e0]
    pc: c000000000013c7c: .do_IRQ+0x0/0x100
    lr: c00000000000ae34: HardwareInterrupt_entry+0x8/0x54
    sp: c0000000009d7660
   msr: 8000000000029032
  current = 0xc0000000146468b0
  paca    = 0xc0000000003eec00
    pid   = 0, comm = swapper
enter ? for help
1:mon> t
[link register   ] c00000000000ae34 HardwareInterrupt_entry+0x8/0x54
[c0000000009d7660] c000000000047dac .kprobe_exceptions_notify+0x3bc/0x4b0
(unreliable)
--- Exception: 501 (Hardware Interrupt) at c00000000008da24 .get_kprobe+0x58/0x7c
[link register   ] c000000000047b88 .kprobe_exceptions_notify+0x198/0x4b0
[c0000000009d7950] c000000000047b7c .kprobe_exceptions_notify+0x18c/0x4b0
(unreliable)
[c0000000009d79f0] c000000000070984 .notifier_call_chain+0x68/0x98
[c0000000009d7a80] c0000000000127b0 .ProgramCheckException+0x80/0x17c
[c0000000009d7b20] c00000000000b048 ProgramCheck_common+0xc8/0x100
--- Exception: 700 (Program Check) at c0000000002f588c .schedule+0x0/0xaa8
[c0000000009d7e10] c00000000001479c .shared_idle+0xa8/0xe8 (unreliable)
[c0000000009d7e90] c000000000014854 .cpu_idle+0x38/0x50
[c0000000009d7f00] c00000000003f1fc .start_secondary+0x15c/0x178
[c0000000009d7f90] c00000000000bd84 .enable_64b_mode+0x0/0x28
1:mon> r
R00 = 0000000000000108   R16 = 0000000000000000
R01 = c0000000009d7660   R17 = 0000000000000000
R02 = c0000000004e8550   R18 = 0000000000000000
R03 = c0000000009d76d0   R19 = 0000000000000000
R04 = 0000000000000004   R20 = 0000000000000000
R05 = c0000000009d7af0   R21 = 0000000000000000
R06 = 0000000000000006   R22 = 0000000000000000
R07 = 0000000000000000   R23 = 0000000000000001
R08 = 0000000000000014   R24 = 0000000000000001
R09 = 0000000000000501   R25 = 0000000000000000
R10 = 0000000000000000   R26 = 0000000000000000
R11 = 7265677368657265   R27 = 0000000000000000
R12 = 8000000000009032   R28 = 0000000000000000
R13 = c0000000003eec00   R29 = c0000000009d7b90
R14 = 0000000000000000   R30 = c00000000043a420
R15 = 0000000007a8dcb0   R31 = c0000000002f588c
pc  = c000000000013c7c .do_IRQ+0x0/0x100
lr  = c00000000000ae34 HardwareInterrupt_entry+0x8/0x54
msr = 8000000000029032   cr  = 28000088
ctr = c0000000000479f0   xer = 0000000000000004   trap =      700
1:mon>

System is a 2proc Power5 LPAR. Problem occurs only if _both_ the schedule() and
do_IRQ() probes are active at the same time. At this time, I am not able to
explain why we got into xmon even though we have a kprobe notifier before the
xmon call. Investigating....

Comment 6 Kevin Stafford 2005-08-24 21:48:36 UTC

This problem lives in x86_64 as well. 
-------------------------------------------------------------------------------
[krstaffo@elm3b30 ~]$ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 5
model name      : Opteron MP w/ 1MB
stepping        : 0
cpu MHz         : 1597.929
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow
bogomips        : 3145.72
TLB size        : 1088 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts ttp

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 5
model name      : Opteron MP w/ 1MB
stepping        : 0
cpu MHz         : 1597.929
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow
bogomips        : 3186.68
TLB size        : 1088 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts ttp
-------------------------------------------------------------------------------
I am unfamiliar with xmon debugging stuff.

Comment 7 Ananth N Mavinakayanahalli 2005-08-26 00:45:24 UTC

The crash is due to a hardware interrupt occuring between the time the
pre_handler for the schedule probe is run (kprobe_status = KPROBE_HIT_SS) and
the time the original instruction is single-stepped (kprobe_status =
KPROBE_HIT_SSDONE). In this case, we ignore the do_IRQ() probe due to the
hardware interrupt - snippet from kprobes_handler:

                p = get_kprobe(addr);
                if (p) {
                        if (kprobe_status == KPROBE_HIT_SS) {
                                regs->msr &= ~MSR_SE;
                                regs->msr |= kprobe_saved_msr;
                                unlock_kprobes();
                                goto no_kprobe;
                        }

Looks like the kprobes code is not geared to handle asynchronous interrupts and
can possibly be triggered on any arch that doesn't disable local irqs during the
 processing of the probe.

Comment 8 Prasanna S Panchamukhi 2005-08-26 15:23:50 UTC

I get the following xmon trace when the above modules are inserted on power4
LPAR. Looks like the hardware interrupt occured while within retprobe_trampoline().

llm16.in.ibm.com login: cpu 0x3: Vector: 700 (Program Check) at [c000000004293880]
    pc: c00000000000f57c: .do_IRQ+0x0/0xe4
    lr: c00000000000ae34: hardware_interrupt_entry+0x8/0x54
    sp: c000000004293b00
   msr: 9000000000029032
  current = 0xc00000000ffc0030
  paca    = 0xc0000000005d0400
    pid   = 0, comm = swapper
enter ? for help
3:mon> t
[link register   ] c00000000000ae34 hardware_interrupt_entry+0x8/0x54
[c000000004293b00] c00000000ffc0330 (unreliable)
--- Exception: 501 (Hardware Interrupt) at d000000000012020
[c000000004293df0] c00000000003a3bc kretprobe_trampoline+0x0/0x8 (unreliable)
[c000000004293e90] c00000000000fa7c .cpu_idle+0x4c/0x60
[c000000004293f00] c000000000031890 .start_secondary+0x108/0x144
[c000000004293f90] c00000000000bd28 .enable_64b_mode+0x0/0x28
3:mon> r
R00 = 0000000000000001   R16 = 0000000000000000
R01 = c000000004293b00   R17 = 0000000000000000
R02 = c000000000727150   R18 = 0000000000000000
R03 = c000000004293b70   R19 = 0000000000000000
R04 = c00000000ffc0330   R20 = 0000000000000000
R05 = c0000000fc323ea0   R21 = 0000000000c00000
R06 = 0000000022ff9f82   R22 = 0000000000000000
R07 = c00000000001133c   R23 = 0000000000000001
R08 = c000000004290080   R24 = 0000000000000003
R09 = 0000000000000501   R25 = 0000000000000190
R10 = 0000000000000000   R26 = 0000000000000568
R11 = 7265677368657265   R27 = 0000000000000010
R12 = 9000000000009432   R28 = 0000000000000008
R13 = c0000000005d0400   R29 = c000000004290000
R14 = 0000000000000000   R30 = c000000004290000
R15 = 0000000000000000   R31 = c000000004290080
pc  = c00000000000f57c .do_IRQ+0x0/0xe4
lr  = c00000000000ae34 hardware_interrupt_entry+0x8/0x54
msr = 9000000000029032   cr  = 28ff9f88
ctr = c0000000000c23e4   xer = 000000000000ff7f   trap =      700

Comment 9 Anil S Keshavamurthy 2005-08-26 19:16:48 UTC

Reproducible on IA64 too and the problems happens _only_ when both the modules 
are loaded. On IA64 the system freezes within few seconds of second modules 
getting loaded. Preliminary debug showed that all CPU's are stuck in 
ia64_spinlock_contention. Test configuration was Tiger4 with 4 CPU's running 
RHEL4 U2 Beta kernel.

Comment 10 Ananth N Mavinakayanahalli 2005-08-26 20:05:19 UTC

My understanding is that we run exception handlers with h/w interrupts enabled
and hence this problem. However, I am not sure if we can have a "completely
sane" solution here as there will still be a window (switching from task to
exception context) when a hardware interrupt can occur, where things can (and
probably will) go haywire. Asynchronous non-software events are the culprit :-)

Comment 11 Anil S Keshavamurthy 2005-08-26 20:34:04 UTC

(In reply to comment #10)
> My understanding is that we run exception handlers with h/w interrupts 
enabled
> and hence this problem. 

Yup, I agree with you.
And with the current kprobe design, I guess it is very dangerous to insert
probes on both ISR routines and non-ISR routines together. However having 
probes on just one type is fine.

Here is what happens if we have kprobes on both ISR and non-ISR routines 
together.
1) Non-ISR routine probe gets hit and we take the lock_kprobes()
2) And before we release the lock taken above, hardware ISR fires and 
we enter the exception handler(since we have probe on ISR routine) and
here the cpu can never get the lock and hence we have a dead lock( which
is what I am seeing on IA64).

>However, I am not sure if we can have a "completely
> sane" solution here as there will still be a window (switching from task to
> exception context) when a hardware interrupt can occur, where things can (and
> probably will) go haywire. Asynchronous non-software events are the 
culprit :-)

We need to redesing kprobes slightly where in we do not hold the lock_kprobes
() during exception handling. I guess it can be done. Thoughts??

Comment 12 Ananth N Mavinakayanahalli 2005-08-26 21:29:49 UTC

Well Anil, its not a deadlock and its not an issue with lock_kprobes(). The
problem is the h/w interrupt occurs in the window between the time you set
KPROBE_HIT_SS in the pre_handler and the time you actually single-step. The case
you describe is fine since the hardware interrupt entry isn't through the
exception handlers (I presume its the same in ia64) and hence you will spin till
you get the lock.

I can say this with certainity 'cos, on ppc64 atleast, we have a kernel debugger
(xmon) shipped with upstream kernels. What I am seeing is that we enter the
debugger (whose hook is after the kprobes call) due to a legitimate breakpoint
(kprobe on do_IRQ()) but it occurs in the window I described above, during which
time, we don't _expect_ any other kprobe hits. Its an atomic software only
sequence (set trap/step flag and single step once) which is disrupted by the
occurance of a hardware interrupt, which in turn causes another trap.

In my opinion, this issue needs handling at a higher (more generic) level than
kprobes => we'll need to figure a way to safely disable interrupts before _any_
exception code is run (arch/xxx/kernel/traps.c). That is easier said than done :)

In any case, lock_kprobes() is going away with the locking change patches. I am
able to reproduce the problem even without lock_kprobes()

Comment 13 Ananth N Mavinakayanahalli 2005-08-26 21:58:36 UTC

Created attachment 624 [details]
possible fix for issue with probing both do_irq and schedule

Ok, here is a possible solution to the issue. I can't think of a bug that could
be introduced by this change, but Prasanna will know better. However, I have
been able to run the two probes (schedule and do_irq) simultaneously with this
patch applied.

Prasanna, Anil, Jim, what are your thoughts?

Comment 14 Anil S Keshavamurthy 2005-08-26 22:32:27 UTC

(In reply to comment #12)
> Well Anil, its not a deadlock and its not an issue with lock_kprobes(). The
> problem is the h/w interrupt occurs in the window between the time you set
> KPROBE_HIT_SS in the pre_handler and the time you actually single-step. 
Ananth,
    The issues at least I am seeing is a deadlock due to lock_kprobes() and 
h/w interrupt. Say we hit a probe on non-ISR routione sys_mkdir() and in the 
execption handler we take a lock_kprobe() and before we get to release this 
lock, say a H/W interrupt occurs on this very same CPU where you have taken a 
lock. As a result of H/W ISR the ISR routine(say do_ISR() ) get executed. 
Suppose if we have a probe on do_ISR() routine, then we enter exception handler
(2nd time, recursively) where in we again try to take the same lock_kprobes() 
and spin here for ever since the lock is already take by this very same CPU.

> In my opinion, this issue needs handling at a higher (more generic) level 
than
> kprobes => we'll need to figure a way to safely disable interrupts before 
_any_
> exception code is run (arch/xxx/kernel/traps.c). That is easier said than 
done :)
I don;t disagree with you. This is the the ideal fix for this bug:-)


> In any case, lock_kprobes() is going away with the locking change patches. I 
am
> able to reproduce the problem even without lock_kprobes()

True, this is because when we recurse due to h/w interrupt occuring while we 
are in the middle of kprobes exception we might over write the per cpu info's.

Comment 15 Prasanna S Panchamukhi 2005-08-29 13:03:48 UTC

Ananth,

>                        if (kprobe_status == KPROBE_HIT_SS) {
>                                regs->msr &= ~MSR_SE;
>                                regs->msr |= kprobe_saved_msr;
>                                unlock_kprobes();
>                                goto no_kprobe;
>                        }

AFAIK the above code was added to allow probes on "int3/trap" instructions and
allow them to single step __inline___. As a result of single stepping original
instruction i.e, "int3/trap" the control was given back to other's registered in
the exception notify list.

More over in ppc64, kprobes handlers are executed without disabling local 
interrupts...Can't we execute the kprobe handler's with local interrupts
disabled? Are there any issues involved in executing the kprobe handlers with
local interrupts disabled?
 
I have tested your patch on ppc64 LPAR box, the system seems to hang with your
patch on 2.6.13-rc7 kernel.
Below is the trace after I do a soft reset on the hung system.
  
cpu 0x0: Vector: 100 (System Reset) at [c0000000005afac0]
    pc: c00000000000f8d8: .default_idle+0x88/0x154
    lr: c00000000003a3bc: kretprobe_trampoline+0x0/0x8
    sp: c0000000005afd40
   msr: 9000000000009032
  current = 0xc0000000005e3ea0
  paca    = 0xc0000000005cec00
    pid   = 0, comm = swapper
cenntteerr  ??  ff0V0ec (toSrys:t 10:00mo n(S> ystem Reset) at [c00000000428fb70]
    pc: c00000000000f8f4: .default_idle+0xa4/0x154
    lr: c00000000003a3bc: kretprobe_trampoline+0x0/0x8
    sp: c00000000428fdf0
   msr: 9000000000009032
  current = 0xc00000000ffc07a0
  paca    = 0xc0000000005cfc00
    pid   = 0, comm = swapper
cpu 0x3: Vector: 100 (System Reset) at [c000000004293b70]
    pc: c00000000000f8ec: .default_idle+0x9c/0x154
    lr: c00000000003a3bc: kretprobe_trampoline+0x0/0x8
    sp: c000000004293df0
   msr: 9000000000009032
  current = 0xc00000000ffc0030
  paca    = 0xc0000000005d0400
    pid   = 0, comm = swapper
cpu 0x1: Vector: 100 (System Reset) at [c00000000428bb70]
    pc: c00000000000f8f4: .default_idle+0xa4/0x154
    lr: c00000000003a3bc: kretprobe_trampoline+0x0/0x8
    sp: c00000000428bdf0
   msr: 9000000000009032
  current = 0xc00000000ffc4030
  paca    = 0xc0000000005cf400
    pid   = 0, comm = swapper
0:mon> t
[c0000000005afd40] c00000000003a3bc kretprobe_trampoline+0x0/0x8 (unreliable)
[c0000000005afde0] c00000000000fa7c .cpu_idle+0x4c/0x60
[c0000000005afe50] c00000000000c03c .rest_init+0x3c/0x54
[c0000000005afed0] c000000000555888 .start_kernel+0x294/0x310
[c0000000005aff90] c00000000000bfb4 .__setup_cpu_power3+0x0/0x4
0:mon> r
R00 = 0000000000000010   R16 = 0000000000000000
R01 = c0000000005afd40   R17 = 0000000000000000
R02 = c000000000727150   R18 = 0000000003a10000
R03 = c00000010381b240   R19 = 000000000291fa74
R04 = c0000000005e41a0   R20 = 0000000003f90340
R05 = 0000000000000004   R21 = 0000000000000038
R06 = 0000000022000082   R22 = bffffffffc5f0000
R07 = c00000000001133c   R23 = 0000000200000000
R08 = c0000000005ac080   R24 = c0000000005cec00
R09 = 0000000000000000   R25 = c000000000725208
R10 = c0000000005ac000   R26 = c0000000005a8b88
R11 = c0000000007288e8   R27 = 0000000000000010
R12 = 0000000048000028   R28 = 0000000000000008
R13 = c0000000005cec00   R29 = c0000000005ac000
R14 = 0000000000000000   R30 = c0000000005ac000
R15 = 0000000000000000   R31 = c0000000005ac080
pc  = c00000000000f8d8 .default_idle+0x88/0x154
lr  = c00000000003a3bc kretprobe_trampoline+0x0/0x8
msr = 9000000000009032   cr  = 22000088
ctr = c0000000002257fc   xer = 000000002000ff7f   trap =      100

0:mon>

                                                                               
                          Thanks
Prasanna

Comment 16 Ananth N Mavinakayanahalli 2005-08-29 13:54:48 UTC

Ok, I now see the issue with the patch. However, wouldn't it be better to have a
tighter if () condition than just the kprobe_status check? Something along the
lines of

if ((kprobe_status == KPROBE_HIT_SS) && (p->opcode == BREAKPOINT_INSTRUCTION))
          ....

Also, except for i386, all other archs run the exception handlers with
interrupts enabled. So, this isn't an issue just with ppc64. I believe its too
big a penalty to disable interrupts just to handle the unique case such as the
one that is causing the current problem. More than anything, I think it is a
performance hit to disable interrupts. (I could see nmissed = 13 with the
testcase running just for about 15 seconds). Also, I am not sure if upstream
maintainers would be ok with such a major change, given that we see this problem
on ia64 and x86_64 too.

I will now try reproduce the problem with retprobes - the xmon trace is clearly
due to system reset - it doesn't look like anything related to the current problem.

We should try and resolve this issue within the kprobes framework as far as
possible as the overhead on the system (disabling interrupts during exception
handling) would be unnecessarily high even during normal operations.

Comment 17 Prasanna S Panchamukhi 2005-08-29 14:41:05 UTC

if ((kprobe_status == KPROBE_HIT_SS) && (p->opcode == BREAKPOINT_INSTRUCTION))
This look ok, but still the system hang does not disappear.. on power 4 Lpar

In i386 architecture interrupts are disabled before you enter kprobe_handler()..
I am not sure if disabling the interrupts on other architectures will be a big
issues, if it really solves some major problem. Please check that...
I tried disabling the interrupt on ppc64 which seem to work ..offcourse with a
panelty of missing few probes during that period...


Thanks
Prasanna

Comment 18 Kevin Stafford 2005-08-29 17:18:16 UTC

I applied Ananth's x86_64 kprobes patch to kernel 2.6.13-rc5. It survived a
kernel build with the schedule and do_IRQ calls instrumented. Previously this
scenario caused a system crash.

Comment 19 Anil S Keshavamurthy 2005-08-29 17:39:45 UTC

(In reply to comment #18)
> I applied Ananth's x86_64 kprobes patch to kernel 2.6.13-rc5. It survived a
> kernel build with the schedule and do_IRQ calls instrumented. Previously this
> scenario caused a system crash.

Ananth's patch has not solved the system freeze completly. I am still seeing 
the system freeze with Ananth's patch on IA64. After much more deep debug, 
here is what I have found that is causing the deadlock due to Hardware 
interrupt and the the way we are trying to hold the lock.

The below code flow assumes that a probe has been registered on 
schedule() and on do_IRQ();

---code flow snip------

schedule()
probe_hit
kprobe_exceptions_notify()
pre_kprobes_handler()
lock_kprobes() {
   spin_lock(&kprobe_lock);
>>>>>>>>LOCAL INTERRUPT HAPPENS HERE >>>>>> Processor starts executing do_IRQ
   kprobe_cpu = smp_processor_id();

}


do_IRQ()
probe_hit
kprobe_exceptions_notify()
pre_kprobes_handler()
kprobe_running() {
	return kprobe_cpu == smp_processor_id();  
/* Returns FALSE here since LOCAL
INTERRUPT happened before kprobe_cpu got set */
}
lock_kprobes() {
   spin_lock(&kprobe_lock); /****** DEADLOCK HERE *******/
   kprobe_cpu = smp_processor_id();
}
---code flow snip end------

Since kprobe_running() failed, we try to take the lock again and end up in 
deadlock. The only way to address this deadlock is to change the lock to 
irqsave lock, but this will cause a huge performance penalty and since U2 is 
so close to code freeze, I suggest we document this as know issues where in 
say we can not insert probes on ISR routines.

Commments?

Comment 20 Ananth N Mavinakayanahalli 2005-08-29 17:47:17 UTC

As Anil has now characterized, there are multiple ways this problem can manifest
itself, I see a trap on ppc64 while its a hang on ia64. I tend agree with Anil
that its a little late in the game to introduce the irqsave approach. The
changes required to share the irq flags across handlers in itself is quite involved.

Given that this issue will be moot when the per_cpu changes are introduced, I
vote for documenting this "feature" - probing irq handlers in conjunction with
task state routines is currently vulnarable.

Comment 21 Anil S Keshavamurthy 2005-08-29 19:36:48 UTC

Created attachment 628 [details]
kpobe_cpu is updated while interrupts disabled

Made a quick patch, where in kprobe_cpu is now updated while interrupts being
disabled. This patch seems to work on IA64. Can you try this on PPC64.

thanks,
-Anil

Comment 22 Ananth N Mavinakayanahalli 2005-08-29 20:57:54 UTC

Created attachment 629 [details]
another attempt to fix irq-sched issue

Here is the updated patch to handler the irq-sched issue as I see on ppc64.
This, in conjunction with Anil's fix above allows my ppc64 lpar to run without
any issues.

Anil, request you make the changes for ia64 as appropriate.

Ananth

Comment 23 Kevin Stafford 2005-08-29 23:26:52 UTC

Hello,
 
I applied fix_1229.patch and x86_64 kp-sched-irq.patch to kernel 2.6.13-rc6. The
machine survived 3 (concurrent) kernel builds with do_IRQ and schedule
instrumented. Previously this scenario would freeze the system. 

[root@elm3b30 kernels] uname -a
Linux elm3b30 2.6.13-rc6 #1 SMP Mon Aug 29 15:07:42 PDT 2005 x86_64 x86_64
x86_64 GNU/Linux
[root@elm3b30 kernels]# cat /etc/*-release
Red Hat Enterprise Linux ES release 4 (Nahant Update 2)

Comment 24 Prasanna S Panchamukhi 2005-08-30 10:59:13 UTC

Combination of Anil's patch and Ananth's patch fixes the problem on my power4
lpar box.

-Prasanna

Comment 25 Prasanna S Panchamukhi 2005-08-30 13:21:55 UTC

Ananth, Anil,

The kernel crashed after running for almost 2hrs, the trace shows that 
hardware interrupt occured while in the kretprobe_trampoline() even after using
both the fixes on power4 lpar box.

Thanks
Prasanna

cpu 0x0: Vector: 700 (Program Check) at [c00000000474f510]
    pc: c00000000000f57c: .do_IRQ+0x0/0xe4
    lr: c00000000000ae34: hardware_interrupt_entry+0x8/0x54
    sp: c00000000474f790
   msr: 9000000000029032
  current = 0xc0000000ff97d7a0
  paca    = 0xc0000000005cec00
    pid   = 3694, comm = nifd
enter ? for help
0:mon> bt
   type            address
0:mon> t
[link register   ] c00000000000ae34 hardware_interrupt_entry+0x8/0x54
[c00000000474f790] 9000000000009032 (unreliable)
--- Exception: 501 (Hardware Interrupt) at d000000000012004
[c00000000474fa80] c00000000003a3bc kretprobe_trampoline+0x0/0x8 (unreliable)
[c00000000474fb60] c0000000000c27f0 .do_select+0x278/0x4c4
[c00000000474fcb0] c0000000000e1344 .compat_sys_select+0x390/0x690
[c00000000474fdc0] c000000000019770 .ppc32_select+0x14/0x28
[c00000000474fe30] c00000000000d600 syscall_exit+0x0/0x18
--- Exception: c01 (System Call) at 000000000fe29470
SP (ffd5b910) is in userspace
0:mon> r
R00 = c0000000004a37c4   R16 = 0000000000000005
R01 = c00000000474f790   R17 = c0000000046262e0
R02 = c000000000727150   R18 = c0000000046262e8
R03 = c00000000474f800   R19 = c0000000046262f0
R04 = 9000000000009032   R20 = 0000000000000000
R05 = c0000000000c27f0   R21 = c0000000046262c8
R06 = 0000000000000000   R22 = c0000000046262d0
R07 = c00000010381c580   R23 = c0000000046262d8
R08 = 0000000100691cb7   R24 = 0000000000000000
R09 = 0000000000000501   R25 = 0000000000000010
R10 = 0000000000000000   R26 = 0000000000000000
R11 = 7265677368657265   R27 = c00000000474faf0
R12 = 9000000000009432   R28 = c0000000005b4000
R13 = c0000000005cec00   R29 = 000000010069209e
R14 = 0000000000000000   R30 = c00000000060aa90
R15 = 0000000000000010   R31 = 00000000000003e8
pc  = c00000000000f57c .do_IRQ+0x0/0xe4
lr  = c00000000000ae34 hardware_interrupt_entry+0x8/0x54
msr = 9000000000029032   cr  = 28444428
ctr = c0000000000c243c   xer = 0000000020000000   trap =      700
0:mon>

Comment 26 Ananth N Mavinakayanahalli 2005-08-30 13:37:51 UTC

Hmm, I left the system to run overnight tests in a tight loop and it was still
continuing without failures when I came in today.

Comment 27 Frank Ch. Eigler 2005-08-30 16:57:20 UTC

Would a call to in_sched_function() come in handy someplace?

Comment 28 Ananth N Mavinakayanahalli 2005-08-30 17:59:19 UTC

Nope. Theoritically, you could trigger this bug with a probe on do_IRQ in
conjunction with a probe on any other task state function. Its totally dependent
on when the hardware interrupt occurs.

Comment 29 Ananth N Mavinakayanahalli 2005-08-30 21:11:26 UTC

Prasanna,

What is the testcase you are using to trigger the issue on a power4? Are they
just the mon_* files or something more?

Ananth

Comment 30 Anil S Keshavamurthy 2005-08-31 00:58:36 UTC

Created attachment 630 [details]
changing preempt_enable to preempt_enable_no_resched

Prasanna,
  On you Power4 box, can you apply yet another patch along with the other *two*
patches to see if crash you are seeing goes away. I hope the attached patch
compiles for ppc64 as I have not tested the attached patch.

Thanks,
Anil

Comment 31 Prasanna S Panchamukhi 2005-08-31 13:02:12 UTC

Anil,

Yes, using preempt_enable_no_resched() instead of preempt_enable() avoids the 
system crash. My Power4 lpar system has been running since 4hrs. 

Thanks
Prasanna

Comment 32 Ananth N Mavinakayanahalli 2005-08-31 13:12:54 UTC

Created attachment 631 [details]
use preempt_enable_no_resched() instead of preempt_enable() on ppc64

Anil,

Just took a look at your patch. We introduced the preempt_(dis/en)able in
kprobe_exceptions_notify so we avoid calling smp_processor_id() with preemption
on, in which case, we end up getting a flood of warnings if the kernel is
compiled with DEBUG_SPINLOCK and DEBUG_SPINLOCK_SLEEP.

So, here is a patch that addresses the issue with less changes, while also
taking care of the above case.

Prasanna,

Request you to please test with this patch instead.

Ananth

Comment 33 Kevin Stafford 2005-08-31 16:04:44 UTC

I patched 2.6.13-rc6 with fix_1229.patch & kp-sched-irq.patch (not
kp-ppc-preempt-noresched.patch) last night. These changes ran on elm3b90 SMP
ppc64 machine all night without a problem.

Comment 34 Anil S Keshavamurthy 2005-08-31 16:38:09 UTC

(In reply to comment #32)
> Created an attachment (id=631)
> use preempt_enable_no_resched() instead of preempt_enable() on ppc64
This simple changes should be fine for now though we endup disabling and 
enabling preempt one extra time for kprobes which not a big of a deal. 

By the way IA64 is happy too and I have not seen System feeze even under heavy 
make -j's and ./please_crash_me going on for past 16 hours, of course I have 
loaded both mon_sched.ko and mon_irq.ko :-)

-Anil

Comment 35 Kevin Stafford 2005-09-01 17:34:33 UTC

As Ananth requested:

I applied the 3 patches to SMP ppc64 running 2.6.13-rc6. I loaded both
mon_irq.ko and mon_sched.ko, and then executed a script which does make -j16 /
make clean in tight loop on a kernel. This was 20 hours (90 builds) ago, and the
machine is still alive. I think this one is looking good.

Comment 36 Anil S Keshavamurthy 2005-09-01 22:26:23 UTC

Created attachment 643 [details]
Mother of all patches to fix irq-sched issue on RHEL4 U2

Patch against RHEL4 U2 beta to fix #1229 on i386, x86_64, ppc64 and IA64.
This patch includes all the previous patches and applies cleanly on RHEL4 U2
kernel.

A patch against latest kernel 2.6.13-mm1 has been posted onto lkml
and has been queued into mm.
Here is the posting to lkml 
http://lkml.org/lkml/2005/9/1/295

Comment 37 Anil S Keshavamurthy 2005-09-01 23:59:22 UTC

Created attachment 644 [details]
Final - Mother of all patches to fix irq-sched issue on RHEL4 U2

Now inclues comments around lock_kprobes() and unlock_kprobes() and again
this patch is against RHEL4 U2 beta to fix #1229 on i386, x86_64, ppc64 and
IA64. This patch includes all the previous patches and applies cleanly on RHEL4
U2 kernel.

A patch against latest kernel 2.6.13-mm1 has been posted onto lkml
http://lkml.org/lkml/2005/9/1/295
http://lkml.org/lkml/2005/9/1/380

Please give your final spin on this patch before we ask Red Hat to include this
patch in their upcoming RHEL 4 U2 release.

Comment 38 Ananth N Mavinakayanahalli 2005-09-06 14:08:56 UTC

Patch Anil attached above is in -mm. Closing this bug