This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: notify_page_fault() problem

From: Quentin Barnes <qbarnes at urbana dot css dot mot dot com>
To: Andi Kleen <andi at firstfloor dot org>
Cc: systemtap at sources dot redhat dot com
Date: Mon, 30 Apr 2007 22:24:00 -0500
Subject: Re: notify_page_fault() problem
References: <20070430201931.GA7328@urbana.css.mot.com> <p73fy6hzhdh.fsf@bingen.suse.de> <20070430211537.GA7723@urbana.css.mot.com> <20070501025659.GA16649@one.firstfloor.org>

On Tue, May 01, 2007 at 04:57:00AM +0200, Andi Kleen wrote:

However, vmalloc_sync_all() is i386 and x86_64 specific as well as their change to register_page_fault_notifier(). I don't see other platform doing anything else doing anything special in their register_page_fault_notifier().
They probably just haven't tested this particular case yet.


I got it to happen most often when running the syscall.exp test.
But it was still very intermittent though.  I'm guessing it
has to do with what else got placed on the same page with the
kprobe/kretprobe data structure (so it would occasionally get
coincidentally loaded and work) and if the system is running
preempt-enabled and how busy it is to have another page fault occur
before the kprobes data structure could get its translation fault to
happen.  If the system is quiescent, this bug's not going to show up
either.  I'm currently running my lowly 64MB ARM board with network
boot _and_ swap drives so a lot system pounding is going on most all
the time.

x86 also did it originally to handle NMI notifiers, which is a
x86 special (nested pagefault in NMI can lead to stack corruption
because NMIs are only blocked until the next IRET)

Ah, ok.

I have trouble believing that x86
and ARM are unique somehow with needing to address this problem.
Why doesn't anyone else hit this?  Is it a lurking problem or are
there other fixes in other forms out there?

The standard kprobes notifier is not modular so it won't hit this.


Are you saying the code in arch/*/kernel/kprobes.c and kernel/kprobes.c
is not marked as a modular so it won't hit this problem?

That doesn't matter.  Just having the kprobes and kretprobes data
structures being in module memory is all that matters.  What
happens is when get_kprobe() and aggr_pre_handler() walk the the
kprobe_table[] list and they stumble across a kprobe or kretprobe
data structure referenced in the table that's not mapped in hardware
yet.  That's what's generating the recursive faults I was seeing.

I guess part of the answer has to do with what people's expectations
are for intercepting faults with their kprobes fault handler though.


Yes, some have pretty broad exceptions.  It might be possible
to move it to a kernel address only path, but then some debuggers
seem to want to debug user mode too.


It would be nice to move those debugger hooks out of there and have
the debuggers use kprobes so their needs don't negatively impact the
system as a whole even when they're not in use.

But you're right there has been grumbling about the overhead
of the notifier call in the hot path.


I wasn't aware of any grumbling.  I'm not on the main kernel mailing
list.  I was just disappointed though to see that the fault handlers
are being notified for every single fault in the system, user or
kernel space.  While tracking down this bug, my debug logs were huge
just from having a user land app fault in some shared libraries.
Just a few instructions in a system's fault handler path can have
noticable performance repercussions.

-Andi

Quentin

Follow-Ups:
- Re: notify_page_fault() problem
  - From: Frank Ch. Eigler

References:
- notify_page_fault() problem
  - From: Quentin Barnes
- Re: notify_page_fault() problem
  - From: Andi Kleen
- Re: notify_page_fault() problem
  - From: Quentin Barnes
- Re: notify_page_fault() problem
  - From: Andi Kleen

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]