This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Questions about kprobes implementation on SMP and preemptible kernels

From: Quentin Barnes <qbarnes at urbana dot css dot mot dot com>
To: Abhishek Sagar <sagar dot abhishek at gmail dot com>
Cc: systemtap at sources dot redhat dot com, dwilder at us dot ibm dot com, grundym at us dot ibm dot com, ananth at in dot ibm dot com
Date: Sun, 28 Jan 2007 16:31:51 -0600
Subject: Re: Questions about kprobes implementation on SMP and preemptible kernels
References: <20070116034826.GA22002@urbana.css.mot.com> <20070116053955.GA14386@in.ibm.com> <20070118042921.GA26784@urbana.css.mot.com> <863e9df20701180931y3dad878va44c0f73173a665a@mail.gmail.com>

On Thu, Jan 18, 2007 at 11:01:45PM +0530, Abhishek Sagar wrote:

It seems to me that the preempt_disable()/preempt_enable() calls in
kprobe_exceptions_notify() are either at best inert, or possibly
hiding an existing bug.


The decision to sandwich kprobe_exceptions_notify between
preempt_disable()/preempt_enable() is to 'explicitly' disable
preemption.

Let's set aside for the moment the general point you raise below.

The point I was raising specifically about kprobe_exceptions_notify()
is special.  Preemption must be held disabled from the point of the
exception to when kprobe_exceptions_notify() is entered.  If it is
not held disabled, kprobe_running() can return the wrong processor.
A very serious kprobes bug!

The macro kprobe_running() invokes __get_cpu_var() which invokes
smp_processor_id() which can invoke debug_smp_processor_id().  The
function debug_smp_processor_id() has sanity checks to ensure that
the CPU thread is bound in various ways including making sure that
preempt_count is non-zero or that interrupts are disabled.  If the
CPU isn't bound, debug_smp_processor_id declares a bug and logs
the state pointing out the offending function.

What I think was going on was that kprobe_exceptions_notify() was
causing these log messages on some architectures.  What I suspect
is that rather than fixing the real bug by ensuring that preemption
was properly held disabled in a continuous chain from the point of
the exception through to invoking kprobe_exceptions_notify(), the
person buried the bug by adding the calls to preempt_disable()/
preempt_enable() in kprobe_exceptions_notify() to defeat the checks
in debug_smp_processor_id() -- and the original SMP bug is still there!

Does my explanation make sense? Is this what could have happened?

[...]

Seems like even though there are sections in the kernel which now seem
like non-preemptible (because they run with interrupts disabled), they
can be turned into preemptible sections in the future. The sections
which explicitly want to run in a preemption safe environment must
take proper preemption locks to safeguard themselves permanently.
kprobe_exceptions_notify is such a function because of its use of
per-cpu variables. So if in the future, the ARM undef exception
handling code becomes preemptibe, the per-cpu variable dependent
code-path in it (like kprobe_handler) would remain safe even then.

The confusion which now arises, is that the quote suggests that even
with local_irq_disable'd the current process can be rescheduled. But
having a look at preempt_schedule(), it shows the following check
right at the beginning:

if (likely(ti->preempt_count || irqs_disabled()))        /* --->
notice the 'likely' optimization */
               return;

It doesn't seem like preemption can be triggered in scenario quoted
from preempt-locking.txt ...can somebody explain this?


Sure looks like the documentation doesn't match the code.  I went
back to the oldest 2.6 kernel I had around (2.6.6) and it had the
same code as above.  As long as interrupts are disabled, a CPU
can't be rescheduled no matter what.

It would be nice to have some concrete clarification on this point.
Is the docs to be trusted regardless of the code, or are the docs
out of date?

I'm wondering if I found a bug on the ARM implementation of prefetch
and data abort exception handlers for SMP platforms with kernel
preemption enabled.  Immediately after switching to SVC mode in
__pabt_svc and __dabt_svc, the handler re-enables IRQs (interrupts)
if they were enabled prior to the exception.  If it re-enables
interrupts at this point, it seems to me that a preemptive kernel
(CONFIG_PREEMPT defined) could switch execution to another processor
breaking the chain of execution before it has a chance to note which
processor triggered the exception.  Any ARM Linux kernel experts on
this list to comment, or do I need to bounce this to another list?

In the __irq_svc path (IRQ interrupt handling), the handler
increments the preempt count before reenabling IRQs (and later
restores its previous value after servicing the interrupt) when
built for a preemptive kernel.


The __irq_svc_ path explicitly disables preemption because
rescheduling is not an option in hard irq context.


Right.  To allow nestable interrupts in the system, interrupts
must be reenabled.  To prevent preemption in an interrupt handler,
preempt_count is incremented while interrupts are enabled.

The abort
mode handling code-path however seems to have been opened up for
preemption (upon entry into SVC mode). In case of prefetch abort,
the processor sensitive information has already been gathered
in registers r0 and r1 before which interrupts are enabled.


No, it hasn't.  Not all of the processor sensitive information has
been gathered.  That's my point.  There is other information in the
case of an exception that exists outside of the CPU register file
(r0-r14).  That information may reside in coprocessor registers
like the MMU or other system state, or in our case, in kprobes data
structures.

So preempting them seems safe.


I don't think so, not until their individual handlers have a chance
to say so.

Similarly, in __dabt_svc, the processor specific data gathering
routines are called before interrupts are enabled. So preemption
seems safe there as well.  I've just skimmed through that part
of entry-armv.S, so kindly correct me if I've missed something
apparent.


I still don't think this works right on SMP ARM systems.  There is
other CPU-sensitive context information that can be lost on a task
switch.  Whether or not it break the general case, it does break
our use in the kprobes case.  If a reschedule to another processor
happens between the enable in __pabt_svc and the call to
notify_page_fault() call in do_translation_fault(), it's game over.

Does my explanation and scenario make sense?

Regards
Abhishek Sagar

Quentin

Follow-Ups:
- Re: Questions about kprobes implementation on SMP and preemptible kernels
  - From: Ananth N Mavinakayanahalli
- Re: Questions about kprobes implementation on SMP and preemptible kernels
  - From: Abhishek Sagar

References:
- Questions about kprobes implementation on SMP and preemptible kernels
  - From: Quentin Barnes
- Re: Questions about kprobes implementation on SMP and preemptible kernels
  - From: Ananth N Mavinakayanahalli
- Re: Questions about kprobes implementation on SMP and preemptible kernels
  - From: Quentin Barnes
- Re: Questions about kprobes implementation on SMP and preemptible kernels
  - From: Abhishek Sagar

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]