Bug 18642 - Probes not always invoked when probed functions are called
Summary: Probes not always invoked when probed functions are called
Status: RESOLVED WORKSFORME
Alias: None
Product: systemtap
Classification: Unclassified
Component: runtime (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-08 08:08 UTC by Izi Anavi
Modified: 2016-05-24 15:29 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
stap-report output (9.89 KB, text/plain)
2015-07-08 08:08 UTC, Izi Anavi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Izi Anavi 2015-07-08 08:08:29 UTC
Created attachment 8421 [details]
stap-report output

Systemtap probes are invoked only some of the time when the probed function is called.

This happens in two scenarios which may or may not be related:

1. Scenario I:
System calls such as read, write or bind using probes syscall.* (e.g.: syscall.read, syscall.write, syscall.bind) are not called consistently.

In contrast, when using process.syscall probe, the probes are called consistently when the probed functions are called.

We can't understand the discrepancy between the probe types behavior.

2. Scenario II:
Probe for process creation using probe kernel.function("sys_execve") and user space probes for the created process (and its subprocesses) are not called consistently. In our system the created process is usually bash, and the user space probes on bash -
probe process("/bin/bash").function("pre_process_line")
probe process("/bin/bash").function("read_command")

are not invoked either when process creation probe isn't invoked.

--

It is worth mentioning that in these scenarios we run systemtap over an lxc container and all the relevant processes run in this container.

Attached is the output from stap-report. We run systemtap 2.7 over ubuntu 14.04.2 (kernel 3.13.0-24-generic).
Comment 1 David Smith 2015-07-08 14:12:46 UTC
(In reply to Izi Anavi from comment #0)
> Created attachment 8421 [details]
> stap-report output
> 
> Systemtap probes are invoked only some of the time when the probed function
> is called.
> 
> This happens in two scenarios which may or may not be related:
> 
> 1. Scenario I:
> System calls such as read, write or bind using probes syscall.* (e.g.:
> syscall.read, syscall.write, syscall.bind) are not called consistently.
> 
> In contrast, when using process.syscall probe, the probes are called
> consistently when the probed functions are called.
> 
> We can't understand the discrepancy between the probe types behavior.

I'm not sure this is 100% of your problem, but certainly glibc can be a culprit. Take open() for example. Based on how you call it and what architecture you are on, a user program's call to open() might end up actually calling (one or more) of the following syscalls: creat, open, or openat.

Try comparing systemtap's output to strace's output to take glibc out of the picture. Do something like this (once you've installed the systemtap testsuite - otherwise get sys.stp from here <https://www.sourceware.org/git/gitweb.cgi?p=systemtap.git;a=blob_plain;f=testsuite/systemtap.syscall/sys.stp;h=79c7ff57613e1561b72a10a4e2a859c0f425baa7;hb=HEAD>):

# strace foo.exe
# stap /usr/share/systemtap/testsuite/systemtap.syscall/sys.stp -c foo.exe

The syscalls seen by strace and the ones seen by systemtap should match up (once the executable gets going - systemtap sees earlier syscalls than strace does).

> 2. Scenario II:
> Probe for process creation using probe kernel.function("sys_execve") and
> user space probes for the created process (and its subprocesses) are not
> called consistently. In our system the created process is usually bash, and
> the user space probes on bash -
> probe process("/bin/bash").function("pre_process_line")
> probe process("/bin/bash").function("read_command")
> 
> are not invoked either when process creation probe isn't invoked.

If I'm reading this one correctly, you aren't probing widely enough. Instead of probing kernel.function("sys_execve"), you should be probing syscall.execve, syscall.compat_execve, syscall.execveat, and syscall.compat_execveat.

Once again compare strace's output to systemtap's on a target executable. Another thing to try would be a process.begin probe and see if that is missing process creation also. Are you sure the target process is doing an exec in all cases?

As far as missing process.function user-space probes go, I'm unsure what is going on. We probably need to clear up some of the earlier errors first.

> It is worth mentioning that in these scenarios we run systemtap over an lxc
> container and all the relevant processes run in this container.

Hmm. If possible, try to run the same scenarios on the host system and see what you get.

> Attached is the output from stap-report. We run systemtap 2.7 over ubuntu
> 14.04.2 (kernel 3.13.0-24-generic).

Those are recent enough versions of systemtap and kernel where this should work correctly.
Comment 2 Frank Ch. Eigler 2016-05-24 15:29:06 UTC
Testing your first scenario, I don't see a mismatch:
# stap -e 'probe process.syscall, nd_syscall.* { 
              if (pid() == target()) println(pp())
           }' -c /bin/PROGRAM

Hypothetical mismatches could occur because of the different underlying probing mechanisms (kprobes versus tracepoints), which can incur temporary failures in different ways.  (Use "stap -t" for more precise accounting of skipped probes.)


The second part of your report seems similar to bug #19812, which interfered with intra-container targeted user-space probing.  It was fixed in systemtap 3.0.