This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: Questions about kprobes implementation on SMP and preemptible kernels
On Mon, Jan 29, 2007 at 09:51:40AM +0530, Ananth N Mavinakayanahalli wrote:
On Sun, Jan 28, 2007 at 04:31:51PM -0600, Quentin Barnes wrote:
The point I was raising specifically about kprobe_exceptions_notify()
is special. Preemption must be held disabled from the point of the
exception to when kprobe_exceptions_notify() is entered. If it is
not held disabled, kprobe_running() can return the wrong processor.
A very serious kprobes bug!
The macro kprobe_running() invokes __get_cpu_var() which invokes
smp_processor_id() which can invoke debug_smp_processor_id(). The
function debug_smp_processor_id() has sanity checks to ensure that
the CPU thread is bound in various ways including making sure that
preempt_count is non-zero or that interrupts are disabled. If the
CPU isn't bound, debug_smp_processor_id declares a bug and logs
the state pointing out the offending function.
What I think was going on was that kprobe_exceptions_notify() was
causing these log messages on some architectures. What I suspect
is that rather than fixing the real bug by ensuring that preemption
was properly held disabled in a continuous chain from the point of
the exception through to invoking kprobe_exceptions_notify(), the
person buried the bug by adding the calls to preempt_disable()/
preempt_enable() in kprobe_exceptions_notify() to defeat the checks
in debug_smp_processor_id() -- and the original SMP bug is still there!
Does my explanation make sense? Is this what could have happened?
It does, but you are missing the more important point. If you notice the
placement of the notify_page_fault() (notify_die() before the page fault
notifier was introduced) in do_page_fault(), the notifier would get
invoked on _every_ page fault, irrespective of whether the fault was
caused due to a faulty kprobe handler or not.
I follow that. It's a general notification mechanism which would
require that placement.
If the page fault was due
to a kprobe handler, we have already disabled preemption and hence we
wouldn't have needed the preempt_(dis/en)able pair in
kprobe_exceptions_notify().
I follow that too.
However, if the fault was a legitimate,
non-kprobes induced one, you'd still enter kprobes_exceptions_notify()
and there is no way to tell if the fault happened in a preempt disabled
section or not.
I do agree with your reasoning *only if* you can ensure that the callout
from the page fault code to kprobes_exceptions_notify() happens only on
account of a page fault triggered by a buggy kprobe handler. In the
absence of such a guarantee, we have the preempt_* calls.
Ah, I think I see your point now. It's very subtle.
My concern was a fault occurring due to a non-kprobes event could be
incorrectly picked up as a kprobes event because we could end up executing
on another random processor by the time we enter kprobe_exceptions_notify()
which might just have a kprobes active on it. _But that can never
happen_. Whenever a kprobe is active, preemption is always disabled
throughout the life of that kprobe on that processor which as a side-effect
takes that processor out of the pool of available processors for
scheduling. Whatever processor we might get cannot be one that can
ever return a positive result from kprobe_running(). Got it! Thanks!
In my port, to make the code behavior of kprobe_exceptions_notify()
not so subtle, I just changed it to check preemptibility before calling
kprobe_running():
====
int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
unsigned long val, void *data)
{
struct die_args *args = (struct die_args *)data;
struct pt_regs *regs = args->regs;
int ret = NOTIFY_DONE;
switch (val) {
case DIE_TRANS_FAULT:
case DIE_PAGE_FAULT:
/* To be potentially processing a kprobe fault, we have
* to be non-preemptible, not in user space, and not
* executing thumb code. */
if (!preemptible() &&
regs && !user_mode(regs) && !thumb_mode(regs) &&
kprobe_running() &&
kprobe_fault_handler(regs, args->fsr)) {
ret = NOTIFY_STOP;
}
break;
default:
break;
}
return ret;
}
====
I think this algorithm is clearer than bumping "preempt_count" up
and down.
Ananth
Quentin