Bug 2637

Summary:	skipped probes in FC6
Product:	systemtap	Reporter:	Martin Hunt <hunt>
Component:	runtime	Assignee:	Martin Hunt <hunt>
Status:	RESOLVED FIXED
Severity:	critical
Priority:	P1
Version:	unspecified
Target Milestone:	---
Host:		Target:
Build:		Last reconfirmed:
Attachments:	removes the nmissed count usage if handler's access to user address space fails

Description Martin Hunt 2006-05-02 20:13:12 UTC

Testing using:
>stap -e 'probe kernel.function("sys_open"){printf("%s: %s\n", execname(),
user_string($filename))}'  -c ps

On all FC5 kernels, it works fine.
On all FC6 kernels, I get
sh: /etc/ld.so.cache
sh: /lib/libtermcap.so.2
sh: /lib/libdl.so.2
sh: /lib/libc.so.6
sh:
ERROR: user string copy fault at 0x080cf85a near identifier '$filename' at
<input>:1:78
  PID TTY          TIME CMD
 2437 pts/0    00:00:00 bash
 3095 pts/0    00:00:00 stap
 3194 pts/0    00:00:00 ps
WARNING: Number of errors: 1, skipped probes: 1

Changing user_string() in tapset/conversions.stp to ignore copy errors, I still
get skipped probes.
> WARNING: Number of errors: 0, skipped probes: 3

Comment 1 Martin Hunt 2006-05-02 20:18:19 UTC

FC6 kernels have the new kprobes page fault handling and some other changes
including some "boost" code.

Comment 2 Martin Hunt 2006-05-02 20:25:10 UTC

Another data point:

Eliminating the userspace copy results in no skipped probes. So the attempt to
do the userspace copy causes the skipped probe count to be incremented, which
seems wrong to me.

Comment 3 Prasanna S Panchamukhi 2006-05-03 13:05:08 UTC

How about incrementing the nmissed count only if fixup_exception()
fails ? As shown in the patch below.

Thanks
Prasanna

Kprobes increments the nmissed count, even if user-specified pre/post
handler page faults while access data from user address space gets
fixed up. This patch removes the incrementing of nmissed count if
fixup_exception() succeeds on page faults and increments the nmissed
count only if fixup_exception() fails.
                                                                               
                                                               
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
                                                                               
                                                               
 arch/i386/kernel/kprobes.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)
                                                                               
                                                               
diff -puN
arch/i386/kernel/kprobes.c~kprobes-increment-nmissed-on-failed-pagefault-handling
arch/i386/kernel/kprobes.c
---
linux-2.6.17-rc3-mm1/arch/i386/kernel/kprobes.c~kprobes-increment-nmissed-on-failed-pagefault-handling
     2006-05-03 18:20:59.000000000 +0530
+++ linux-2.6.17-rc3-mm1-prasanna/arch/i386/kernel/kprobes.c    2006-05-03
18:23:39.000000000 +0530
@@ -605,13 +605,6 @@ static int __kprobes kprobe_fault_handle
        case KPROBE_HIT_ACTIVE:
        case KPROBE_HIT_SSDONE:
                /*
-                * We increment the nmissed count for accounting,
-                * we can also use npre/npostfault count for accouting
-                * these specific fault cases.
-                */
-               kprobes_inc_nmissed_count(cur);
-
-               /*
                 * We come here because instructions in the pre/post
                 * handler caused the page_fault, this could happen
                 * if handler tries to access user space by
@@ -629,6 +622,13 @@ static int __kprobes kprobe_fault_handle
                        return 1;
                                                                               
                                                               
                /*
+                * We increment the nmissed count for accounting,
+                * we can also use npre/npostfault count for accouting
+                * these specific fault cases.
+                */
+               kprobes_inc_nmissed_count(cur);
+
+               /*
                 * fixup_exception() could not handle it,
                 * Let do_page_fault() fix it.
                 */

Comment 4 Frank Ch. Eigler 2006-05-03 13:08:29 UTC

While this should make the "skipped" count go away, those accesses would still
fault.  The next step is to figure out why.

Comment 5 Martin Hunt 2006-05-04 04:22:14 UTC

I verified Prasanna's patch fixed the skipped count.

I did another test to confirm the bad userspace copies were caused by kprobes.
First I build the stock 2.6.16.11 kernel. It does not have any problems. Then I
copied over the new kprobes code from the FC6 kernels and rebuilt 2.6.16.11.
This time it did have the userspace copy failures in the same 3 places in my
test case.

Comment 6 Prasanna S Panchamukhi 2006-05-05 06:56:35 UTC

The example given by Martin, accesses the data in user address space. Most
likely the data accessed from user address space may not be present in memory.
It is known that such accesses causes page_faults, where in the
fixup_exception() may not succeed most of the times, hence nmissed count goes up
 (even with the current above patch).

Should we remove incrementing the nmissed count even if fixup_exception() fails?
Most of the system calls access user address space to copy the agruments and
they may experience the similar page_faults as above. 

Is it acceptable, if the probe handler fails most of the time, while accessing
arguments from user address space. OR do we need to find some other way to
get the agruments of the syscalls?

Any thoughts?

Thanks
Prasanna

Comment 7 Frank Ch. Eigler 2006-05-05 13:49:34 UTC

(In reply to comment #6)

> The example given by Martin, accesses the data in user address space. Most
> likely the data accessed from user address space may not be present in memory.

Quite possible - the faulting strings appear to come from the ps/sh executables'
data pages.  Maybe, under new kernels, those are faulted in later than before.
This would be difficult to work around.  Some possibilities:

- Teach the translator to look for user_string($targetvar) constructs.  Find a
way of "prefetching" the requested area of user memory, before the probe handler
gets under way.

- Defer the user_string() calls in tapset script code to the .return handler, or
some other point *after* the kernel's syscall routine ran (and presumably
faulted in all needed pages).

- Reconsider the atomic probe handler model.


>[...]  Should we remove incrementing the nmissed count even if
fixup_exception() fails?

"nmissed" is a misnomer in this case, since the fault occurred once the probe
handler was already running, so may have done work.  But for systemtap purposes,
"missed" ideally means "never started; no side-effects occurred".  Maybe we can
choose to interpret kprobes.nmissed changing as an error, and only
kretprobes.nmissed as a genuine "missed" event.

Comment 8 Martin Hunt 2006-05-05 16:42:42 UTC

Subject: Re:  skipped probes in FC6

On Fri, 2006-05-05 at 06:56 +0000, prasanna at in dot ibm dot com wrote:
> ------- Additional Comments From prasanna at in dot ibm dot com  2006-05-05 06:56 -------
> The example given by Martin, accesses the data in user address space. Most
> likely the data accessed from user address space may not be present in memory.

True. But such situations have in the past been extremely rare. What is
it about the new kprobes that makes it now so common and predictable? We
need to understand this better. 

> It is known that such accesses causes page_faults, where in the
> fixup_exception() may not succeed most of the times, hence nmissed count goes up
>  (even with the current above patch).

Could you explain what would cause such an event? It does not happen in
my experience and I don't see how it could. If there is an entry in the
fixup table, which there always should be if you use a proper userspace
copy function, fixup exception will work.

> Is it acceptable, if the probe handler fails most of the time, while accessing
> arguments from user address space. OR do we need to find some other way to
> get the agruments of the syscalls

The current situation fails too often and is completely unacceptable. I
cannot even run my simple demo programs.

If we absolutely cannot fix the problem, then we must as a minimum
change user_string() to just return "" or "<unknown>" and not generate
errors or warnings when copies fail.

Martin

Comment 9 Martin Hunt 2006-05-05 16:45:13 UTC

Subject: Re:  skipped probes in FC6

On Fri, 2006-05-05 at 13:49 +0000, fche at redhat dot com wrote:

> Quite possible - the faulting strings appear to come from the ps/sh executables'
> data pages.  Maybe, under new kernels, those are faulted in later than before.

See my comment #5. The kernel is the same. Only a fairly small number of
kprobes changes were different between the working and non-working
versions of 2.6.16.11.

Comment 10 Anil S Keshavamurthy 2006-05-05 21:33:45 UTC

> True. But such situations have in the past been extremely rare. What is
> it about the new kprobes that makes it now so common and predictable? We
> need to understand this better. 

In the past, we did not had the fixup_exception() call from 
kprobes_exception_handler() code path and due to which any page_fault 
generated by accessing user_data from pre/post handler used to
fall back to do_page_fault() function and hence you were 
not seeing failure that often. 

Again when we fall back to do_page_fault() function 
we will technically be in preempt_disable() state which 
is also wrong and potentially we could see a hang  the system
on UP system.

Hence the right thing to do here is to catch that page_fault exceptions and
try fixing up the exceptions without falling back on do_page_fault() code path.
With this simple fix, you would now experience any copy_from_user() calls from
pre/post hanlder to fail if that page is not in memory.

Comment 11 Martin Hunt 2006-05-11 16:30:39 UTC

Subject: Re:  skipped probes in FC6

On Fri, 2006-05-05 at 21:33 +0000, anil dot s dot keshavamurth
> Hence the right thing to do here is to catch that page_fault exceptions and
> try fixing up the exceptions without falling back on do_page_fault() code path.
> With this simple fix, you would now experience any copy_from_user() calls from
> pre/post hanlder to fail if that page is not in memory.

I agree. I have been looking into this and could use some help.

Comment 12 Anil S Keshavamurthy 2006-05-11 17:01:05 UTC

(In reply to comment #11)
> Subject: Re:  skipped probes in FC6
> On Fri, 2006-05-05 at 21:33 +0000, anil dot s dot keshavamurth
> > Hence the right thing to do here is to catch that page_fault exceptions and
> > try fixing up the exceptions without falling back on do_page_fault() code 
path.
> > With this simple fix, you would now experience any copy_from_user() calls 
from
> > pre/post hanlder to fail if that page is not in memory.
> I agree. I have been looking into this and could use some help.
Martin, What exact help are you looking here? Are you worried failing of 
copy_from_users() if the user pages are not in memory?

Comment 13 Prasanna S Panchamukhi 2006-05-15 06:52:30 UTC

Created attachment 1024 [details]
removes the nmissed count usage if handler's access to user address space fails

Martin,

This patch removes the usage of nmissed count if probe handler's access to user
address space fails. Also if accounting of such user address space is required,
we can add another counter later.

Thanks
Prasanna

Comment 14 Jim Keniston 2006-05-31 21:58:42 UTC

I believe that we have concluded that this is NOT a kprobes bug, and in fact
reflects a fix in kprobes's fault handling.  See Anil's analysis in Comment #10.
Kprobes handlers aren't supposed to sleep.  When called from a handler, a
function that accesses a non-resident user page must fail.  That's what's
happening now.  What was happening before was that handler slept (BAD, BAD)
while the non-resident page was brought it.  So the memory access succeeded, but
at the risk of hanging the system.

The real bug, as I see it, is that the script now terminates when the memory
access fails.

I'm not sure whether this is a runtime bug or a translator bug.  I'm reassigning
this as a runtime bug because most of the proposed solutions involve at least
some modification of the runtime.  See
http://sources.redhat.com/ml/systemtap/2006-q2/msg00493.html

Comment 15 Frank Ch. Eigler 2006-06-06 18:57:17 UTC

One possibly useful insight is that one way to force paging in of all relevant
chunks of user-space code is to defer dereferencing of user_strings etc. until
*after* the actual system call completes.  The kernel will page them all in for
us during the routine evaluation of the syscall.  This could be either moving the
probe point itself, or producing a partial argstr (and a fuller one at .return
time; made easier with bug #1382).

Comment 16 Martin Hunt 2006-06-28 16:52:40 UTC

Closing this. There were multiple issues here. We fixed the skipped probe count
problem and explained the increase in page fault errors. Remaining issue is what
to do with user_string(). I have opened 2861 to focus on that issue.

Comment 17 Huang Peng 2006-09-30 09:29:55 UTC

(In reply to comment #8)
> Subject: Re:  skipped probes in FC6
> 
> On Fri, 2006-05-05 at 06:56 +0000, prasanna at in dot ibm dot com wrote:
> > ------- Additional Comments From prasanna at in dot ibm dot com  2006-05-05
06:56 -------
> > The example given by Martin, accesses the data in user address space. Most
> > likely the data accessed from user address space may not be present in memory.
> 
> True. But such situations have in the past been extremely rare. What is
> it about the new kprobes that makes it now so common and predictable? We
> need to understand this better. 
I found it happen when applications call open("/usr/lib/locale/locale-archive",
O_RDONLY|O_LARGEFILE) = 3. All skip probe is because of openning the
local-archive from some library. Maybe we need analyze the library's code.

> 
> > It is known that such accesses causes page_faults, where in the
> > fixup_exception() may not succeed most of the times, hence nmissed count goes up
> >  (even with the current above patch).
> 
> Could you explain what would cause such an event? It does not happen in
> my experience and I don't see how it could. If there is an entry in the
> fixup table, which there always should be if you use a proper userspace
> copy function, fixup exception will work.
> 
> > Is it acceptable, if the probe handler fails most of the time, while accessing
> > arguments from user address space. OR do we need to find some other way to
> > get the agruments of the syscalls
> 
> The current situation fails too often and is completely unacceptable. I
> cannot even run my simple demo programs.
> 
> If we absolutely cannot fix the problem, then we must as a minimum
> change user_string() to just return "" or "<unknown>" and not generate
> errors or warnings when copies fail.
> 
> Martin
> 
> 
>