This is the mail archive of the
mailing list for the systemtap project.
Re: User-space probes: Plan B+
- From: "James Dickens" <jamesd dot wi at gmail dot com>
- To: "Jim Keniston" <jkenisto at us dot ibm dot com>
- Cc: SystemTAP <systemtap at sources dot redhat dot com>
- Date: Fri, 25 Aug 2006 03:11:48 -0500
- Subject: Re: User-space probes: Plan B+
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=f3bWE9UN0+RXmy0FJibdohRB2QYQzL775/tOPdl7MMNyaWYCkj3xLG0TY2AlloPFN4GvHuhaZtEiu44zZb74UoJ/oMG0NDd0t7r+0kY6zoxDm3ZPEIcsf5HhJLrJ6/BW+znfzB+MM5ptQZl2iA8WlXf6wDT4mZqR4uaTgviEJ3o=
- References: <firstname.lastname@example.org>
On 24 Aug 2006 18:13:24 -0700, Jim Keniston <email@example.com> wrote:
Here's where we stand on user-space probes (uprobes). The intent of
uprobes is to enable application developers to create low-overhead,
dynamic instrumentation for their apps, with uprobes-based
instrumentation interoperating usefully, as needed, with kprobes-based
instrumentation. Comments are welcome.
Last spring, Prasanna Panchamukhi offered up a kernel-only approach,
where instrumentation would be coded as a kernel module, a la kprobes.
This performed well (e.g. 1 usec per probepoint hit on my Pentium M),
but we got bad reviews on such things as the kernel-only approach
and the per-executable tracing (e.g., hooking read_page(s)).
I tried an approach based on ptrace, with no kernel enhancements, but
it lacked certain necessary features (e.g., #2-5 below), probe overhead
was 12-15x worse than Prasanna's approach, and I couldn't get it to
work when probing multiple processes. (Frank Eigler independently
suggested this approach and termed it "Plan B from outer space.")
is 12-15x worse than the current solution used in strace?
While I was stumped trying to make Plan B work, Roland McGrath made
utrace available to us. We looked this over as we found the time,
and it looked promising.
There has been much debate within the kprobes teams about the proper
programming model to support. Discussions at OLS didn't yield many
new ideas, let alone consensus.
The Current Approach: Overview
The approach we are now coding can be summarized as follows. (Okay,
it's not much like Plan B, but B+ sounds better than C.)
a. A system-call API that is an alternative to ptrace, provides
better support for probepoints and return probes, and exploits all
the process-lifetime events made accessible by utrace.
b. The "tracer" process detects events (e.g., probe hits) by polling
rather than catching SIGCHLD signals.
c. Hooks to allow kernel-mode instrumentation to cooperate with
user-mode "tracer" processes.
Here are the requirements we will satisfy with this approach.
0. Per-process (not per-executable) tracing.
1. Instrumentation can be coded entirely as a user-space app...
sounds like a nightmare waiting to happen, if i want to trace
something from userland into the kernel and back, i start writing
userland code, then into kernel code, and quite possibly having kernel
code access variables and statisics stored in userland, meaning lots
of checks that the user remembers to call the routines that safely
move data back and forth between the two?
how is this better than just enhancing a debugger such as gdb? how are
stacks dealt with, since you quite possibly having one process
investigate another, if you don't get everything perfect the program
being watched can corrupt the data of the second?
2. ... but in situations where performance is critical, uprobes can
run a named kernel handler without waking up the tracer process.
now if we start out coding our script to only work in userland, then
all of a sudden we decide we need better performance, we have to go
back and recode parts to work in kernel land and quite possibly break
our algorythms that were talking to kernel land, or probes in the
kernel that accessed userland data that just moved back into the
3. A user-mode tracer can invoke a previously registered kernel-mode
handler, so we have simple and efficient communication between user-
and kernel-mode instrumentation.
how do you keep a userland program from exploiting systemtaps
arcutecture and executing kernel probes from other active systemtap
scripts, isn't this a huge back door for rootkits especially once
people start using systemtaps methods for monitoring systems
4. Multiple tracer processes can trace the same tracee.
5. As needed, we can "pre-define" a set of useful kernel handlers.
6. Uprobes can be easily extended (exploiting utrace) to support
notifying the tracer of non-probepoint events in the probee,
such as signals and system calls.
7. The user API should be easier to use than the ptrace API.
8. Handlers run in process context -- the tracee's context (see
requirement 2) or the tracer's context while the tracee is stopped
(see requirement 3).
stack corruption or even slight stack placement differences, would
serverly limit the usefulness of the solution, it will have the same
effect as debugging an app in gdb, the app only breaks when the
userland debugger is not running.
A typical tracer app would do the following:
- Call uprobe_register() to establish a probepoint and be notified
(or run a kernel handler) when the probepoint is hit.
- Call uprobe_poll() repeatedly to poll for, and handle, events.
(A tracing app would have to spawn multiple threads to trace
- Whenever appropriate, call uprobe_run_khandler() to interoperate
with kernel-side instrumentation.
- Call uprobe_unregister() to cancel uprobes.
Apart from implementing kernel-side support for uprobes, the only
addition to the kernel API is a register_khandler() function that takes
a name, handler, and access-permission info. (The handler takes,
as optional args, pointers to a uprobe object and an arbitrary,
user-defined data area.)
A summary of the user-side API is attached.