This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: cost of kprobe and jprobe operations

From: "Chen, Brad" <brad dot chen at intel dot com>
To: "William Cohen" <wcohen at redhat dot com>
Cc: <systemtap at sources dot redhat dot com>
Date: Tue, 22 Feb 2005 13:29:32 -0800
Subject: RE: cost of kprobe and jprobe operations

Very interesting. It looks as though your test executes
probes in a tight loop, hence best possible behavior with 
respect to caches and such. On a busy server we would 
expect more time between probes and the possibility of 
interference in the caches, right?

These performance discussions might be more meaningful if 
we had explicit performance goals for systemtap. For the 
purpose of discussion let me suggest a few possible goals 
for statistical flat PC profiling:
	Very Good: 5 KHz @ 2% overhead
	Good: 1KHz @ 1% overhead
	Okay: 1KHz @ 3% overhead
The "Very Good" goal would match DCPI performance on 
Alpha/Digital UNIX. I believe the "Good" goal is in the 
neighborhood of OProfile performance; you likely have
better numbers. I'm assuming the profiler would use
built-in systemtap "histogram" variables.

With these in mind, 1% overhead on a 1.7 GHz machine is
17M cycles. At 1KHz sampling, that's 17K cycles per sample,
average. From your numbers below, it looks like at least 
1686 cycles would go for interception, although a regular 
pattern of cache and TLB misses could make things much
worse. So the "Good" goal looks like it might be within
reach.

The "Very Good" goal is less clear. 5KHz sampling with 2% 
overhead would mean only about 6800 cycles per sample.

Flat profiles are nice; call graph profiles are nicer.
An appropriate goal for a call graph profiler might be
1KHz @ 3% overhead.

Any thoughts on defining performance goals for systemtap?

Brad

-----Original Message-----
From: systemtap-owner@sources.redhat.com
[mailto:systemtap-owner@sources.redhat.com] On Behalf Of William Cohen
Sent: Tuesday, February 22, 2005 11:17 AM
To: systemtap@sources.redhat.com
Subject: cost of kprobe and jprobe operations

I did some experiments this morning to get an idea of the cost of kprobe

and jprobe operations. The attached gzipped tarball has the simple 
module that was used to get some measurements. It does a couple millions

jprobes and kprobes. The module was made with:

make -C /lib/modules/`uname -r`/build M=`pwd` modules

Insert and remove the module (measurements happen when module is
removed):

/sbin/insmod kprobe-timing.ko
/sbin/rmmod kprobe-timing

look at /var/log/messages to get the amount of time spent doing the 
repeated calls. divide the number of cycles by count. This overhead 
includes the loop overhead. I ran this on a couple of machines.

cycles per iteration
	AMD Athlon 1.7GH	Pentium III 860MHz
kprobe	1686	(0.99 microsec)	816	(0.95 microsec)
jprobe	1395	(0.82 microsec)	1382	(1.61 microsec)


On the Pentium III machine I was also able to get some profiling data 
with oprofile. I probably should have done them as separate 
microbenchmarks so know which things are jprobe and which are kprobe.

$ sudo  opcontrol --reset
$ sudo /sbin/insmod kprobe-timing.ko
$ sudo  opcontrol --start
$ sudo /sbin/rmmod kprobe-timing
$ sudo  opcontrol --shutdown
$ opreport --long-filenames 
/usr/lib/debug/lib/modules/2.6.10-1.1149_FC4smp/vmlinux -l
CPU: PIII, speed 861.246 MHz (estimated)
Counted DATA_MEM_REFS events (all memory references, cachable and non) 
with a unit mask of 0x00 (No unit mask) count 1000000
samples  %        symbol name
156      11.2069  memcpy
142      10.2011  restore_all
127       9.1236  _spin_lock
119       8.5489  kprobe_handler
110       7.9023  do_debug
105       7.5431  kprobe_exceptions_notify
93        6.6810  setjmp_pre_handler
90        6.4655  longjmp_break_handler
77        5.5316  __module_text_address
58        4.1667  resume_execution
51        3.6638  _spin_unlock
43        3.0891  notifier_call_chain
39        2.8017  int3
37        2.6580  get_kprobe
26        1.8678  do_int3
22        1.5805  debug_stack_correct
17        1.2213  debug
12        0.8621  _spin_lock_irqsave
9         0.6466  lock_kprobes
4         0.2874  _spin_unlock_irqrestore
4         0.2874  poison_obj
4         0.2874  unlock_kprobes

-Will

Follow-Ups:
- Re: cost of kprobe and jprobe operations
  - From: Frank Ch. Eigler

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]