This is the mail archive of the
systemtap@sources.redhat.com
mailing list for the systemtap project.
Re: Experiences with kprobes
- From: William Cohen <wcohen at redhat dot com>
- To: Baruch Even <baruch at ev-en dot org>
- Cc: systemtap at sources dot redhat dot com
- Date: Tue, 22 Mar 2005 09:48:54 -0500
- Subject: Re: Experiences with kprobes
- References: <4240000A.9090603@ev-en.org>
Baruch Even wrote:
Hello,
Just thought to share my current experience with kprobes, it might
interest some of you.
I'm trying to improve the performance of the Linux TCP stack (as an
end-host not a router), as such I need to measure the current
performance in order to search for bottlenecks.
I had a first version where I simply wrapped the the calls I needed with
rdtsc calls inline and added other measurements (number of packets acked
for each ACK packet and such). This worked beautifully, and I got some
nice results and some pretty good improvements as well.
They say "if it ain't broken don't fix it", but if it's not broken it's
no fun[0], so I tried to use kprobes as a way to get the measurement
code out of my current code patches. The thinking was that it will be a
lot easier to maintain the patches ready for LKML submission.
I also ported my code to 2.6.11 (since that's where kprobes is
available, I was on 2.6.6 before and no kprobes there[1]), and got
abysmal performance. After a bit of digging the overhead of the kprobes
approach was the only possible problem, if with the old method I got a
timing of about 3000 clocks on my machine[2], with the new one I got at
least 10000 with about 3 kprobes and 3 jprobes.
I ported kprobes to 2.6.6 and the same performance patterns appeared on
the formerly working code, with the only conclusion left that kprobes is
not suitable for this kind of performance measurements under very high
loads.
Hi,
I wrote some simple tests to check the overhead of kprobes and jprobes.
I have run them on an athlon and pentium III machine, but I haven't run
them on an pentium IV. It could be the costs are higher on Pentium IV.
Could you give these a try on your pentium IV machine? The following URL
has an attachment with software for measuring overhead:
http://sources.redhat.com/ml/systemtap/current/msg00093.html
Also is an smp kernel or premption being used? The current locking
mechanism in kprobes serializes multiple kprobes. Is it being possible
that some of the overhead could be due to serialization of the probes?
Another thing to consider is to use OProfile to get a better picture of
where the overhead is. OProfile uses NMI interrupts and will be able to
collect samples on where the processor is spending time in handling the
kprobe. The tarball in the archived email above can collect oprofile
data. However, you will want to have a kernel that supports the OProfile
collecting data using the processors performance monitoring unit (PMU);
either an SMP kernel or a UP kernel with CONFIG_X86_LOCAL_APIC set. The
fallback timer interrupt mechanism used in UP kernels isn't going to get
enough samples to get a good picture of what is going on.
The specifics for me is that the tests are running using dummynet
network to simulate a very high speed long distance network (about 300ms
rtt and 300Mbit/s bandwidth) so the packet rates are very high with BDP
of about 8000 packets, i.e. lots of ack packets to process).
What kind of rate are the probes firing at? n*8000 probe firings per
second? Could the delay introduced by the probes be affect behavior?
-Will
Baruch
[0] As a grad student, at least part of the idea is to have fun :-)
Even if you are not a graduate student the previous line holds. :)