This is the mail archive of the
systemtap@sources.redhat.com
mailing list for the systemtap project.
RE: cost of kprobe and jprobe operations
- From: "Chen, Brad" <brad dot chen at intel dot com>
- To: "William Cohen" <wcohen at redhat dot com>
- Cc: <systemtap at sources dot redhat dot com>
- Date: Tue, 22 Feb 2005 13:29:32 -0800
- Subject: RE: cost of kprobe and jprobe operations
Very interesting. It looks as though your test executes
probes in a tight loop, hence best possible behavior with
respect to caches and such. On a busy server we would
expect more time between probes and the possibility of
interference in the caches, right?
These performance discussions might be more meaningful if
we had explicit performance goals for systemtap. For the
purpose of discussion let me suggest a few possible goals
for statistical flat PC profiling:
Very Good: 5 KHz @ 2% overhead
Good: 1KHz @ 1% overhead
Okay: 1KHz @ 3% overhead
The "Very Good" goal would match DCPI performance on
Alpha/Digital UNIX. I believe the "Good" goal is in the
neighborhood of OProfile performance; you likely have
better numbers. I'm assuming the profiler would use
built-in systemtap "histogram" variables.
With these in mind, 1% overhead on a 1.7 GHz machine is
17M cycles. At 1KHz sampling, that's 17K cycles per sample,
average. From your numbers below, it looks like at least
1686 cycles would go for interception, although a regular
pattern of cache and TLB misses could make things much
worse. So the "Good" goal looks like it might be within
reach.
The "Very Good" goal is less clear. 5KHz sampling with 2%
overhead would mean only about 6800 cycles per sample.
Flat profiles are nice; call graph profiles are nicer.
An appropriate goal for a call graph profiler might be
1KHz @ 3% overhead.
Any thoughts on defining performance goals for systemtap?
Brad
-----Original Message-----
From: systemtap-owner@sources.redhat.com
[mailto:systemtap-owner@sources.redhat.com] On Behalf Of William Cohen
Sent: Tuesday, February 22, 2005 11:17 AM
To: systemtap@sources.redhat.com
Subject: cost of kprobe and jprobe operations
I did some experiments this morning to get an idea of the cost of kprobe
and jprobe operations. The attached gzipped tarball has the simple
module that was used to get some measurements. It does a couple millions
jprobes and kprobes. The module was made with:
make -C /lib/modules/`uname -r`/build M=`pwd` modules
Insert and remove the module (measurements happen when module is
removed):
/sbin/insmod kprobe-timing.ko
/sbin/rmmod kprobe-timing
look at /var/log/messages to get the amount of time spent doing the
repeated calls. divide the number of cycles by count. This overhead
includes the loop overhead. I ran this on a couple of machines.
cycles per iteration
AMD Athlon 1.7GH Pentium III 860MHz
kprobe 1686 (0.99 microsec) 816 (0.95 microsec)
jprobe 1395 (0.82 microsec) 1382 (1.61 microsec)
On the Pentium III machine I was also able to get some profiling data
with oprofile. I probably should have done them as separate
microbenchmarks so know which things are jprobe and which are kprobe.
$ sudo opcontrol --reset
$ sudo /sbin/insmod kprobe-timing.ko
$ sudo opcontrol --start
$ sudo /sbin/rmmod kprobe-timing
$ sudo opcontrol --shutdown
$ opreport --long-filenames
/usr/lib/debug/lib/modules/2.6.10-1.1149_FC4smp/vmlinux -l
CPU: PIII, speed 861.246 MHz (estimated)
Counted DATA_MEM_REFS events (all memory references, cachable and non)
with a unit mask of 0x00 (No unit mask) count 1000000
samples % symbol name
156 11.2069 memcpy
142 10.2011 restore_all
127 9.1236 _spin_lock
119 8.5489 kprobe_handler
110 7.9023 do_debug
105 7.5431 kprobe_exceptions_notify
93 6.6810 setjmp_pre_handler
90 6.4655 longjmp_break_handler
77 5.5316 __module_text_address
58 4.1667 resume_execution
51 3.6638 _spin_unlock
43 3.0891 notifier_call_chain
39 2.8017 int3
37 2.6580 get_kprobe
26 1.8678 do_int3
22 1.5805 debug_stack_correct
17 1.2213 debug
12 0.8621 _spin_lock_irqsave
9 0.6466 lock_kprobes
4 0.2874 _spin_unlock_irqrestore
4 0.2874 poison_obj
4 0.2874 unlock_kprobes
-Will