Summary: | skipped probes in FC6 | ||
---|---|---|---|
Product: | systemtap | Reporter: | Martin Hunt <hunt> |
Component: | runtime | Assignee: | Martin Hunt <hunt> |
Status: | RESOLVED FIXED | ||
Severity: | critical | ||
Priority: | P1 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: | removes the nmissed count usage if handler's access to user address space fails |
Description
Martin Hunt
2006-05-02 20:13:12 UTC
FC6 kernels have the new kprobes page fault handling and some other changes including some "boost" code. Another data point: Eliminating the userspace copy results in no skipped probes. So the attempt to do the userspace copy causes the skipped probe count to be incremented, which seems wrong to me. How about incrementing the nmissed count only if fixup_exception() fails ? As shown in the patch below. Thanks Prasanna Kprobes increments the nmissed count, even if user-specified pre/post handler page faults while access data from user address space gets fixed up. This patch removes the incrementing of nmissed count if fixup_exception() succeeds on page faults and increments the nmissed count only if fixup_exception() fails. Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> arch/i386/kernel/kprobes.c | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) diff -puN arch/i386/kernel/kprobes.c~kprobes-increment-nmissed-on-failed-pagefault-handling arch/i386/kernel/kprobes.c --- linux-2.6.17-rc3-mm1/arch/i386/kernel/kprobes.c~kprobes-increment-nmissed-on-failed-pagefault-handling 2006-05-03 18:20:59.000000000 +0530 +++ linux-2.6.17-rc3-mm1-prasanna/arch/i386/kernel/kprobes.c 2006-05-03 18:23:39.000000000 +0530 @@ -605,13 +605,6 @@ static int __kprobes kprobe_fault_handle case KPROBE_HIT_ACTIVE: case KPROBE_HIT_SSDONE: /* - * We increment the nmissed count for accounting, - * we can also use npre/npostfault count for accouting - * these specific fault cases. - */ - kprobes_inc_nmissed_count(cur); - - /* * We come here because instructions in the pre/post * handler caused the page_fault, this could happen * if handler tries to access user space by @@ -629,6 +622,13 @@ static int __kprobes kprobe_fault_handle return 1; /* + * We increment the nmissed count for accounting, + * we can also use npre/npostfault count for accouting + * these specific fault cases. + */ + kprobes_inc_nmissed_count(cur); + + /* * fixup_exception() could not handle it, * Let do_page_fault() fix it. */ While this should make the "skipped" count go away, those accesses would still fault. The next step is to figure out why. I verified Prasanna's patch fixed the skipped count. I did another test to confirm the bad userspace copies were caused by kprobes. First I build the stock 2.6.16.11 kernel. It does not have any problems. Then I copied over the new kprobes code from the FC6 kernels and rebuilt 2.6.16.11. This time it did have the userspace copy failures in the same 3 places in my test case. The example given by Martin, accesses the data in user address space. Most likely the data accessed from user address space may not be present in memory. It is known that such accesses causes page_faults, where in the fixup_exception() may not succeed most of the times, hence nmissed count goes up (even with the current above patch). Should we remove incrementing the nmissed count even if fixup_exception() fails? Most of the system calls access user address space to copy the agruments and they may experience the similar page_faults as above. Is it acceptable, if the probe handler fails most of the time, while accessing arguments from user address space. OR do we need to find some other way to get the agruments of the syscalls? Any thoughts? Thanks Prasanna (In reply to comment #6) > The example given by Martin, accesses the data in user address space. Most > likely the data accessed from user address space may not be present in memory. Quite possible - the faulting strings appear to come from the ps/sh executables' data pages. Maybe, under new kernels, those are faulted in later than before. This would be difficult to work around. Some possibilities: - Teach the translator to look for user_string($targetvar) constructs. Find a way of "prefetching" the requested area of user memory, before the probe handler gets under way. - Defer the user_string() calls in tapset script code to the .return handler, or some other point *after* the kernel's syscall routine ran (and presumably faulted in all needed pages). - Reconsider the atomic probe handler model. >[...] Should we remove incrementing the nmissed count even if fixup_exception() fails? "nmissed" is a misnomer in this case, since the fault occurred once the probe handler was already running, so may have done work. But for systemtap purposes, "missed" ideally means "never started; no side-effects occurred". Maybe we can choose to interpret kprobes.nmissed changing as an error, and only kretprobes.nmissed as a genuine "missed" event. Subject: Re: skipped probes in FC6 On Fri, 2006-05-05 at 06:56 +0000, prasanna at in dot ibm dot com wrote: > ------- Additional Comments From prasanna at in dot ibm dot com 2006-05-05 06:56 ------- > The example given by Martin, accesses the data in user address space. Most > likely the data accessed from user address space may not be present in memory. True. But such situations have in the past been extremely rare. What is it about the new kprobes that makes it now so common and predictable? We need to understand this better. > It is known that such accesses causes page_faults, where in the > fixup_exception() may not succeed most of the times, hence nmissed count goes up > (even with the current above patch). Could you explain what would cause such an event? It does not happen in my experience and I don't see how it could. If there is an entry in the fixup table, which there always should be if you use a proper userspace copy function, fixup exception will work. > Is it acceptable, if the probe handler fails most of the time, while accessing > arguments from user address space. OR do we need to find some other way to > get the agruments of the syscalls The current situation fails too often and is completely unacceptable. I cannot even run my simple demo programs. If we absolutely cannot fix the problem, then we must as a minimum change user_string() to just return "" or "<unknown>" and not generate errors or warnings when copies fail. Martin Subject: Re: skipped probes in FC6 On Fri, 2006-05-05 at 13:49 +0000, fche at redhat dot com wrote: > Quite possible - the faulting strings appear to come from the ps/sh executables' > data pages. Maybe, under new kernels, those are faulted in later than before. See my comment #5. The kernel is the same. Only a fairly small number of kprobes changes were different between the working and non-working versions of 2.6.16.11. > True. But such situations have in the past been extremely rare. What is
> it about the new kprobes that makes it now so common and predictable? We
> need to understand this better.
In the past, we did not had the fixup_exception() call from
kprobes_exception_handler() code path and due to which any page_fault
generated by accessing user_data from pre/post handler used to
fall back to do_page_fault() function and hence you were
not seeing failure that often.
Again when we fall back to do_page_fault() function
we will technically be in preempt_disable() state which
is also wrong and potentially we could see a hang the system
on UP system.
Hence the right thing to do here is to catch that page_fault exceptions and
try fixing up the exceptions without falling back on do_page_fault() code path.
With this simple fix, you would now experience any copy_from_user() calls from
pre/post hanlder to fail if that page is not in memory.
Subject: Re: skipped probes in FC6
On Fri, 2006-05-05 at 21:33 +0000, anil dot s dot keshavamurth
> Hence the right thing to do here is to catch that page_fault exceptions and
> try fixing up the exceptions without falling back on do_page_fault() code path.
> With this simple fix, you would now experience any copy_from_user() calls from
> pre/post hanlder to fail if that page is not in memory.
I agree. I have been looking into this and could use some help.
(In reply to comment #11) > Subject: Re: skipped probes in FC6 > On Fri, 2006-05-05 at 21:33 +0000, anil dot s dot keshavamurth > > Hence the right thing to do here is to catch that page_fault exceptions and > > try fixing up the exceptions without falling back on do_page_fault() code path. > > With this simple fix, you would now experience any copy_from_user() calls from > > pre/post hanlder to fail if that page is not in memory. > I agree. I have been looking into this and could use some help. Martin, What exact help are you looking here? Are you worried failing of copy_from_users() if the user pages are not in memory? Created attachment 1024 [details]
removes the nmissed count usage if handler's access to user address space fails
Martin,
This patch removes the usage of nmissed count if probe handler's access to user
address space fails. Also if accounting of such user address space is required,
we can add another counter later.
Thanks
Prasanna
I believe that we have concluded that this is NOT a kprobes bug, and in fact reflects a fix in kprobes's fault handling. See Anil's analysis in Comment #10. Kprobes handlers aren't supposed to sleep. When called from a handler, a function that accesses a non-resident user page must fail. That's what's happening now. What was happening before was that handler slept (BAD, BAD) while the non-resident page was brought it. So the memory access succeeded, but at the risk of hanging the system. The real bug, as I see it, is that the script now terminates when the memory access fails. I'm not sure whether this is a runtime bug or a translator bug. I'm reassigning this as a runtime bug because most of the proposed solutions involve at least some modification of the runtime. See http://sources.redhat.com/ml/systemtap/2006-q2/msg00493.html One possibly useful insight is that one way to force paging in of all relevant chunks of user-space code is to defer dereferencing of user_strings etc. until *after* the actual system call completes. The kernel will page them all in for us during the routine evaluation of the syscall. This could be either moving the probe point itself, or producing a partial argstr (and a fuller one at .return time; made easier with bug #1382). Closing this. There were multiple issues here. We fixed the skipped probe count problem and explained the increase in page fault errors. Remaining issue is what to do with user_string(). I have opened 2861 to focus on that issue. (In reply to comment #8) > Subject: Re: skipped probes in FC6 > > On Fri, 2006-05-05 at 06:56 +0000, prasanna at in dot ibm dot com wrote: > > ------- Additional Comments From prasanna at in dot ibm dot com 2006-05-05 06:56 ------- > > The example given by Martin, accesses the data in user address space. Most > > likely the data accessed from user address space may not be present in memory. > > True. But such situations have in the past been extremely rare. What is > it about the new kprobes that makes it now so common and predictable? We > need to understand this better. I found it happen when applications call open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3. All skip probe is because of openning the local-archive from some library. Maybe we need analyze the library's code. > > > It is known that such accesses causes page_faults, where in the > > fixup_exception() may not succeed most of the times, hence nmissed count goes up > > (even with the current above patch). > > Could you explain what would cause such an event? It does not happen in > my experience and I don't see how it could. If there is an entry in the > fixup table, which there always should be if you use a proper userspace > copy function, fixup exception will work. > > > Is it acceptable, if the probe handler fails most of the time, while accessing > > arguments from user address space. OR do we need to find some other way to > > get the agruments of the syscalls > > The current situation fails too often and is completely unacceptable. I > cannot even run my simple demo programs. > > If we absolutely cannot fix the problem, then we must as a minimum > change user_string() to just return "" or "<unknown>" and not generate > errors or warnings when copies fail. > > Martin > > > |