This is the mail archive of the
systemtap@sources.redhat.com
mailing list for the systemtap project.
RE: Hitachi djprobe mechanism
- From: "Keshavamurthy, Anil S" <anil dot s dot keshavamurthy at intel dot com>
- To: "Mathieu Desnoyers" <compudj at krystal dot dyndns dot org>
- Cc: "Andi Kleen" <ak at suse dot de>, "Karim Yaghmour" <karim at opersys dot com>, "Masami Hiramatsu" <masami dot hiramatsu at gmail dot com>, "Masami Hiramatsu" <hiramatu at sdl dot hitachi dot co dot jp>, "Roland McGrath" <roland at redhat dot com>, "Richard J Moore" <richardj_moore at uk dot ibm dot com>, <systemtap at sources dot redhat dot com>, <sugita at sdl dot hitachi dot co dot jp>, "Satoshi Oshima" <soshima at redhat dot com>, <michel dot dagenais at polymtl dot ca>
- Date: Mon, 1 Aug 2005 09:13:51 -0700
- Subject: RE: Hitachi djprobe mechanism
>
>* Keshavamurthy, Anil S (anil.s.keshavamurthy@intel.com) wrote:
>> Andi and others,
>> Sending an IPI to each other CPU's (all but self) and make *spin
>> on a lock* during the modification will *freeze* the system.
>Please do
>> not *spin* inside an IPI.
>>
>> My observation:
>> Here is what I had discovered, CPU2 had taken an
>> read_lock(&tasklist_lock) and CPU had entered IPI and is now
>busy *spin
>> on a lock*.
>> CPU3 had called write_lock_irq(&tasklist_lock) where CPU3
>first disables
>> the local irq and disables preemption and then is trying to
>> acquire the lock which is already taken by CPU2 and since CPU2 never
>> releases this lock as it is busy spin wait, CPU3 never enters IPI :-(
>>
>
>Yep, I see the problem : you cannot control other locks that
>would have been
>taken by other CPUs with interrupts disabled.
>
>Is there any way to send a non-maskable IPI ? This could solve
>this problem.
The only way I can think of is to use stop_machine_run(fn, data, cpu)
which freezes the machine
on all cpu's and runs fn() on cpu which is what we want.
This is slower than an IPI way but definetly very safe compared to IPI.
The only drawback is this is a very heavy weight operation and not sure
its impact on a busy production system.
Thanks,
-Anil
-Anil