This is the mail archive of the mailing list for the systemtap project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: User-space probes: Plan B+

On 25 Aug 2006 11:22:51 -0700, Jim Keniston <> wrote:
On Fri, 2006-08-25 at 01:11, James Dickens wrote:
> On 24 Aug 2006 18:13:24 -0700, Jim Keniston <> wrote:
> >
> > I tried an approach based on ptrace, with no kernel enhancements, but
> > it lacked certain necessary features (e.g., #2-5 below), probe overhead
> > was 12-15x worse than Prasanna's approach, and I couldn't get it to
> > work when probing multiple processes.  (Frank Eigler independently
> > suggested this approach and termed it "Plan B from outer space.")
> >
> is 12-15x worse than the current solution used in strace?

Slightly worse.  When just counting the occurrences of 1 system call, I
clocked strace at about 10 usec/hit.  See
And some folks reportedly consider strace too slow.

> > 1. Instrumentation can be coded entirely as a user-space app...
> sounds like a nightmare waiting to happen, if i want to trace
> something from userland into the kernel and back, i start writing
> userland code, then into kernel code, and quite possibly having kernel
> code access variables and statisics stored in userland, meaning lots
> of checks that the user remembers to call the routines that safely
> move data back and forth between the two?

Well, sure, users could get confused and do things wrong.  And your
scenario below where you migrate a piece of instrumentation from user
space to kernel space would have to be managed carefully, just like any
other design change.

if you do it entirely in the kernel, then you don't have to deal with
design changes based on how busy the target system is, so we can use
the same script the developer used to analzye during debugging even
when its in production with 1000 times the workload.

Probing a function that is called often would be a major slowdown, as
soon as you fire a probe the entire application stops, instrumenting
something like malloc creates a huge slow down as your process, goes
to the kernel, then back userland to run the script, and then back
even if the probe wasn't even interested in the particular event.

It gets worse with a multithreaded task, not only do you have the
probe firing more often, the application becomes serialized, so whole
process slows down tremendously making it not usable in a production
environment, it would also eliminates races. So users will either say
once I turned on the probes performance dies, or that the problem
disappears, the race is gone. The more scalable the application the
worse the slowdown.

But I think it's better to provide a feature for which a need has been
identified -- even if the feature requires careful use and a few minutes
to understand -- than to withhold the feature to protect people from
failing.  (I consider asm statements in gcc an extreme example of this
philosophy. :-))

its better to design the system with safety and security in mind. This can and has been done. They ended up with a solution that works for the expert programmer and overworked system administrator, as well as the weekend home user just hoping to help out a project find a bottleneck.

> how is this better than just enhancing a debugger such as gdb?

Among other things, gdb -batch is relatively slow (I measured 111 usec
per hit just to count breakpoint hits) and has no facility for
interacting with kernel-space instrumentation.

> how are
> stacks dealt with, since you quite possibly having one process
> investigate another, if you don't get everything perfect the program
> being watched can corrupt the data of the second?

Well, somebody with root privileges could register a handler that
scribbles just about anywhere, as is the case currently with kprobes.
But there's no reason to expect that there's any danger of the
particular problems you mention.

> >
> > 2. ... but in situations where performance is critical, uprobes can
> > run a named kernel handler without waking up the tracer process.
> >

To avoid the aforementioned multithreaded problem, we have to resort to counting probe fires without any intelligence about when we record the information and what information to store when we are called, it may be beneficial to do time expensive things like a stack trace, if we meet a certain criteria, or to slow down one thread occasionally to look for races.

James Dickens

> now if we start out coding our script to only work in userland, then
> all of a sudden we decide we need better performance, we have to go
> back and recode parts to work in kernel land and quite possibly break
> our algorythms that were talking to kernel land, or probes in the
> kernel that accessed userland data that just moved back into the
> kernel?

See above.

> > 3. A user-mode tracer can invoke a previously registered kernel-mode
> > handler, so we have simple and efficient communication between user-
> > and kernel-mode instrumentation.
> how do you keep a userland program from exploiting systemtaps
> arcutecture and executing kernel probes from other active systemtap
> scripts, isn't this a huge back door for rootkits especially once
> people start using systemtaps methods for monitoring systems
> continuously?

I've certainly thought about the potential for abuse via
uprobe_run_khandler().  If you had the connivance of somebody with root
privileges who installed a pernicious handler, you could do all sorts of
bad stuff (and make it relatively hard to track).  That's a big if,
though.  If a bad guy has root privileges, you're toast anyway.

And if you're worried about the handler reading/writing the wrong
process's address space, you can specify when you register the handler
that it can apply only to the process in the caller-provided uprobe
object -- and only when the caller has permission to trace that process.

> >
> > 8. Handlers run in process context -- the tracee's context (see
> > requirement 2) or the tracer's context while the tracee is stopped
> > (see requirement 3).
> >
> stack corruption or even slight stack placement differences, would
> serverly limit the usefulness of the solution,

Well, yes, both we and the user will have to be careful.  That's the
nature of programming.

> it will have the same
> effect as debugging an app in gdb, the app only breaks when the
> userland debugger is not running.

That (minimizing probe overhead) is one of the points of being able to
avoid unnecessary context switches, by just running a handler in the
kernel.  (See requirement #2.)

> James Dickens


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]