This is the mail archive of the mailing list for the systemtap project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Questions about kprobes implementation on SMP and preemptible kernels

On Sun, Jan 28, 2007 at 04:31:51PM -0600, Quentin Barnes wrote:
> On Thu, Jan 18, 2007 at 11:01:45PM +0530, Abhishek Sagar wrote:
> >>It seems to me that the preempt_disable()/preempt_enable() calls in
> >>kprobe_exceptions_notify() are either at best inert, or possibly
> >>hiding an existing bug.
> >
> >The decision to sandwich kprobe_exceptions_notify between
> >preempt_disable()/preempt_enable() is to 'explicitly' disable
> >preemption.
> Let's set aside for the moment the general point you raise below.
> The point I was raising specifically about kprobe_exceptions_notify()
> is special.  Preemption must be held disabled from the point of the
> exception to when kprobe_exceptions_notify() is entered.  If it is
> not held disabled, kprobe_running() can return the wrong processor.
> A very serious kprobes bug!
> The macro kprobe_running() invokes __get_cpu_var() which invokes
> smp_processor_id() which can invoke debug_smp_processor_id().  The
> function debug_smp_processor_id() has sanity checks to ensure that
> the CPU thread is bound in various ways including making sure that
> preempt_count is non-zero or that interrupts are disabled.  If the
> CPU isn't bound, debug_smp_processor_id declares a bug and logs
> the state pointing out the offending function.
> What I think was going on was that kprobe_exceptions_notify() was
> causing these log messages on some architectures.  What I suspect
> is that rather than fixing the real bug by ensuring that preemption
> was properly held disabled in a continuous chain from the point of
> the exception through to invoking kprobe_exceptions_notify(), the
> person buried the bug by adding the calls to preempt_disable()/
> preempt_enable() in kprobe_exceptions_notify() to defeat the checks
> in debug_smp_processor_id() -- and the original SMP bug is still there!
> Does my explanation make sense?  Is this what could have happened?

It does, but you are missing the more important point. If you notice the
placement of the notify_page_fault() (notify_die() before the page fault
notifier was introduced) in do_page_fault(), the notifier would get
invoked on _every_ page fault, irrespective of whether the fault was
caused due to a faulty kprobe handler or not. If the page fault was due
to a kprobe handler, we have already disabled preemption and hence we
wouldn't have needed the preempt_(dis/en)able pair in
kprobe_exceptions_notify(). However, if the fault was a legitimate,
non-kprobes induced one, you'd still enter kprobes_exceptions_notify()
and there is no way to tell if the fault happened in a preempt disabled
section or not.

I do agree with your reasoning *only if* you can ensure that the callout
from the page fault code to kprobes_exceptions_notify() happens only on
account of a page fault triggered by a buggy kprobe handler. In the
absence of such a guarantee, we have the preempt_* calls.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]