This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Questions about kprobes implementation on SMP and preemptible kernels


On Thu, Jan 18, 2007 at 11:01:45PM +0530, Abhishek Sagar wrote:
It seems to me that the preempt_disable()/preempt_enable() calls in
kprobe_exceptions_notify() are either at best inert, or possibly
hiding an existing bug.

The decision to sandwich kprobe_exceptions_notify between preempt_disable()/preempt_enable() is to 'explicitly' disable preemption.

Let's set aside for the moment the general point you raise below.


The point I was raising specifically about kprobe_exceptions_notify()
is special.  Preemption must be held disabled from the point of the
exception to when kprobe_exceptions_notify() is entered.  If it is
not held disabled, kprobe_running() can return the wrong processor.
A very serious kprobes bug!

The macro kprobe_running() invokes __get_cpu_var() which invokes
smp_processor_id() which can invoke debug_smp_processor_id().  The
function debug_smp_processor_id() has sanity checks to ensure that
the CPU thread is bound in various ways including making sure that
preempt_count is non-zero or that interrupts are disabled.  If the
CPU isn't bound, debug_smp_processor_id declares a bug and logs
the state pointing out the offending function.

What I think was going on was that kprobe_exceptions_notify() was
causing these log messages on some architectures.  What I suspect
is that rather than fixing the real bug by ensuring that preemption
was properly held disabled in a continuous chain from the point of
the exception through to invoking kprobe_exceptions_notify(), the
person buried the bug by adding the calls to preempt_disable()/
preempt_enable() in kprobe_exceptions_notify() to defeat the checks
in debug_smp_processor_id() -- and the original SMP bug is still there!

Does my explanation make sense? Is this what could have happened?

[...]
Seems like even though there are sections in the kernel which now seem
like non-preemptible (because they run with interrupts disabled), they
can be turned into preemptible sections in the future. The sections
which explicitly want to run in a preemption safe environment must
take proper preemption locks to safeguard themselves permanently.
kprobe_exceptions_notify is such a function because of its use of
per-cpu variables. So if in the future, the ARM undef exception
handling code becomes preemptibe, the per-cpu variable dependent
code-path in it (like kprobe_handler) would remain safe even then.

The confusion which now arises, is that the quote suggests that even
with local_irq_disable'd the current process can be rescheduled. But
having a look at preempt_schedule(), it shows the following check
right at the beginning:

if (likely(ti->preempt_count || irqs_disabled()))        /* --->
notice the 'likely' optimization */
               return;

It doesn't seem like preemption can be triggered in scenario quoted
from preempt-locking.txt ...can somebody explain this?

Sure looks like the documentation doesn't match the code. I went back to the oldest 2.6 kernel I had around (2.6.6) and it had the same code as above. As long as interrupts are disabled, a CPU can't be rescheduled no matter what.

It would be nice to have some concrete clarification on this point.
Is the docs to be trusted regardless of the code, or are the docs
out of date?

I'm wondering if I found a bug on the ARM implementation of prefetch
and data abort exception handlers for SMP platforms with kernel
preemption enabled.  Immediately after switching to SVC mode in
__pabt_svc and __dabt_svc, the handler re-enables IRQs (interrupts)
if they were enabled prior to the exception.  If it re-enables
interrupts at this point, it seems to me that a preemptive kernel
(CONFIG_PREEMPT defined) could switch execution to another processor
breaking the chain of execution before it has a chance to note which
processor triggered the exception.  Any ARM Linux kernel experts on
this list to comment, or do I need to bounce this to another list?

In the __irq_svc path (IRQ interrupt handling), the handler
increments the preempt count before reenabling IRQs (and later
restores its previous value after servicing the interrupt) when
built for a preemptive kernel.

The __irq_svc_ path explicitly disables preemption because rescheduling is not an option in hard irq context.

Right. To allow nestable interrupts in the system, interrupts must be reenabled. To prevent preemption in an interrupt handler, preempt_count is incremented while interrupts are enabled.

The abort
mode handling code-path however seems to have been opened up for
preemption (upon entry into SVC mode). In case of prefetch abort,
the processor sensitive information has already been gathered
in registers r0 and r1 before which interrupts are enabled.

No, it hasn't. Not all of the processor sensitive information has been gathered. That's my point. There is other information in the case of an exception that exists outside of the CPU register file (r0-r14). That information may reside in coprocessor registers like the MMU or other system state, or in our case, in kprobes data structures.

So preempting them seems safe.

I don't think so, not until their individual handlers have a chance to say so.

Similarly, in __dabt_svc, the processor specific data gathering
routines are called before interrupts are enabled. So preemption
seems safe there as well.  I've just skimmed through that part
of entry-armv.S, so kindly correct me if I've missed something
apparent.

I still don't think this works right on SMP ARM systems. There is other CPU-sensitive context information that can be lost on a task switch. Whether or not it break the general case, it does break our use in the kprobes case. If a reschedule to another processor happens between the enable in __pabt_svc and the call to notify_page_fault() call in do_translation_fault(), it's game over.

Does my explanation and scenario make sense?

Regards
Abhishek Sagar

Quentin



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]