This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: [RFC][Patch 2/2][take2]kprobe: kprobe-booster against 2.6.14-mm1 for i386
Hi, Ananth, Anil, and Satoshi
Ananth N Mavinakayanahalli wrote:
> On Tue, Dec 20, 2005 at 11:01:39AM -0800, Keshavamurthy Anil S wrote:
>
>>On Tue, Dec 20, 2005 at 10:21:14PM +0900, Masami Hiramatsu wrote:
>>
>>>Hi, Anil
>>>
>>>Masami Hiramatsu wrote:
>>>
>>>>>Issue No 1:
>>>>> In the above code patch, once you call reset_current_kprobes() and
>>>>>preempt_enable_no_resched() ( I see preeempt_enable_no_resched()
>>>>>in your fix to this code), you are not suppose to relay on
>>>>>&p->ainsn.insn, the reason is that, on the other cpu technically
>>>>>this memory can get freed due to unregister_kprobe() call.
>>>>>In other words, by the time this cpu tries to execte the instruction
>>>>>at regs->eip, that memory might have gone and system is bound to crash.
>>>>
>>>>I think old djprobe has the same issue. So, it will be solved by using
>>>>safety check routine after removing breakpoint. Current kprobe uses
>>>>synchronize_sched() for waiting rcu processing in unregister_kprobe().
>>>>In my opinion, the solution is that introducing safety check same as
>>>>djprobe instead of synchronize_sched() in unregister_kprobe().
>>>
>>>I found the rcu_barrier() function that works similar to djprobe's
>>>safety check routine.
>>>
>>>
>>>>The safety check routine of djprobe waits until all cpus execute the works,
>>>>and the works are executed from the keventd.
>>>
>>>The rcu_barrier() function waits until all cpus have gone through a
>>>quiescent state.
>>>
>>>- The CPU performs a process switch.
>>>- The CPU starts executing in User Mode.
>>>- The CPU executes the idle loop
>>>In each of these cases, We say that the CPU has gone through
>>>a quiescent state.
>>>
>>>If we call rcu_barrier() after "int3" is removed, we can ensure
>>>followings after the rcu_barrier() called.
>>>- interrupt handlers are finished on all CPUs.
>>>- p->ainsn.insn is not executed on all CPUs.
>>>And we can remove a boosted kprobe safely.
>>>
>>>Current kprobes uses synchronize_sched(). I think this function
>>>checks safety of only current CPU. So, it may not ensure above
>>>conditions. But rcu_barrier() checks safety of all CPUs. So, it
>>>can ensure above conditions.
>>>How would you think about this?
>>
>>You are correct, we need to ensure safety of all CPUs.
>>rcu_barrier() will work.
>
> Unless I've understood RCU incorrectly, rcu_barrier() is not needed in
> the kprobes case. As I read the code, you'll need to use rcu_barrier()
> in cases where you have to run the completion routine on each online cpu.
>
> All we care about in kprobes case is that the other cpus don't hold a
> reference to the just removed kprobe, so we can complete the kprobe
> unregistration. Return from synchronize_sched() ensures that all the
> other cpus have passed through a quiescent state (one of the conditions
> Masami has listed above).
You are correct. I had misunderstood the synchronize_sched().
I read the RCU code again, and I understood that the synchronize_sched()
is enough to check safety of the kprobe-booster.
So, previous kprobe-booster patch is safe, but it is for old -mm kernel.
I update and post it again soon.
> Also remember we don't define our own callback function either.
>
> Ananth
>
Thanks,
--
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: hiramatu@sdl.hitachi.co.jp