This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: kprobe fault handling
- From: Jim Keniston <jkenisto at us dot ibm dot com>
- To: Martin Hunt <hunt at redhat dot com>
- Cc: SystemTAP <systemtap at sources dot redhat dot com>
- Date: 06 Feb 2006 16:50:52 -0800
- Subject: Re: kprobe fault handling
- Organization:
- References: <1139255423.4689.10.camel@monkey2>
Hi, Martin. This is the subject of bugzilla #1303, which is assigned to
Prasanna. I believe that Prasanna is also the author of the original
kprobe_fault_handler code in kprobes, so he may be able to provide
insight that I don't have.
However, I've thought about this some, so here goes...
On Mon, 2006-02-06 at 11:50, Martin Hunt wrote:
> I've been trying to understand how kprobes fault handling is supposed to
> work and why it isn't doing what I thought it did.
>
> When page faults happen, do_page_fault() almost immediately calls
> notify_die(DIE_PAGE_FAULT,...) This calls the notifier chain which calls
> kprobe_exceptions_notify(). This calls kprobe_fault_handler().
>
> kprobe_fault_handler() checkes to see if there is a specific fault
> fandler for that kprobe, and if there is, it calls it.
I believe that your analysis is correct.
> Question: What
> do we imagine a probe-specific page fault handler would do? Why is it
> useful?
If an attempted memory access failed, the fault handler might be able to
clean up what the pre_handler or post_handler was trying to do.
For example, consider user-space return probes (which is currently hung
up on a couple of issues, including user-space access). When we enter
the probed function, the uretprobe version of arch_prepare_kretprobe()
has to do the following:
1. Reserve a kretprobe_instance.
2. Save a copy of the return address (a read from user space).
3. Replace the return address with the uretprobe trampoline address (a
write to user space).
4. Hash the kretprobe_instance.
If the page containing the return address is not in memory (a very
unlikely scenario, but one which I believe we must handle) then we'll
get a page fault on #2 or #3. The corresponding fault handler could
free up the kretprobe_instance reserved in step #1.
Of course, the obvious question is, what does the fault handler do
then? We don't want to return as if the memory access were successful,
because it wasn't. We don't want to let the page be demand-paged back
in, because that might cause us to sleep -- a definite no-no. But those
seem to be the only two possible outcomes, given the way kprobes
currently works.
The "right" thing to do, perhaps, is for kprobe_handler to notice that
the kprobe in question has a fault handler is associated with it, and do
a sort of setjmp before calling the pre_handler. Then the fault handler
could end in a corresponding longjmp. (The longjmp would bypass several
stack frames, including those of the fault handler,
kprobe_fault_handler, kprobe_exceptions_notify, notify_die and friends,
do_page_fault, and the pre_handler.)
I understand what fixup_exceptions() does, but it's not clear to me how
we can use it here. I guess if all user-space access by kprobes
handlers used the same user-read and user-write functions, then we'd
have a fixup for the user-read function and one for the user-write, and
each of these could vector into the aforementioned longjmp.
Do you envision that all user-space accesses would be via functions like
*_from_user() in the SystemTap runtime?
>
> Then there is this code, which I don't understand
> if (kcb->kprobe_status & KPROBE_HIT_SS) {
> resume_execution(cur, regs, kcb);
> regs->eflags |= kcb->kprobe_old_eflags;
>
> reset_current_kprobe();
> preempt_enable_no_resched();
> }
>
I think KPROBE_HIT_SS means that the fault happened while the probed
instruction was being single-stepped. Beyond that, I'm not sure what's
going on.
> And that's it. kprobe_fault_handler returns 0. No call to
> fixup_exceptions()! So do_page_fault() will have to do the fixups, but
> first it will print nasty might_sleep warnings and maybe actually sleep!
>
> I could have sworn this was not the case previously but it has been a
> very long time since I have looked at the code at this level. Anyway,
> this MUST be fixed.
>
> Martin
I agree that we need to resolve this. Thanks for getting this
discussion rolling again.
Jim