This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: [PATCH] Linux Kernel Markers
- From: Martin Bligh <mbligh at google dot com>
- To: prasanna at in dot ibm dot com
- Cc: Andrew Morton <akpm at osdl dot org>, "Frank Ch. Eigler" <fche at redhat dot com>, Ingo Molnar <mingo at elte dot hu>, Mathieu Desnoyers <mathieu dot desnoyers at polymtl dot ca>, Paul Mundt <lethal at linux-sh dot org>, linux-kernel <linux-kernel at vger dot kernel dot org>, Jes Sorensen <jes at sgi dot com>, Tom Zanussi <zanussi at us dot ibm dot com>, Richard J Moore <richardj_moore at uk dot ibm dot com>, Michel Dagenais <michel dot dagenais at polymtl dot ca>, Christoph Hellwig <hch at infradead dot org>, Greg Kroah-Hartman <gregkh at suse dot de>, Thomas Gleixner <tglx at linutronix dot de>, William Cohen <wcohen at redhat dot com>, ltt-dev at shafik dot org, systemtap at sources dot redhat dot com, Alan Cox <alan at lxorguk dot ukuu dot org dot uk>
- Date: Tue, 19 Sep 2006 10:17:53 -0700
- Subject: Re: [PATCH] Linux Kernel Markers
- Domainkey-signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=received:message-id:date:from:user-agent: x-accept-language:mime-version:to:cc:subject:references:in-reply-to: content-type:content-transfer-encoding; b=oxrBR8cUO7TrvHXNqUZekSnt1AXBJTCM0K3047i8Y24RZG3qrSpndo3Y4O1UBrnJu ODI6KdsBba77cDa4Knp3A==
- References: <20060918234502.GA197@Krystal> <20060919081124.GA30394@elte.hu> <451008AC.6030006@google.com> <20060919154612.GU3951@redhat.com> <4510151B.5070304@google.com> <20060919093935.4ddcefc3.akpm@osdl.org> <45101DBA.7000901@google.com> <20060919063821.GB23836@in.ibm.com>
It seems like all we'd need to do
is "list all references to function, freeze kernel, update all
references, continue"
"overwrite first 5 bytes of old function with `jmp new_function'".
Yes, that's simple. but slower, as you have a double jump. Probably
a damned sight faster than int3 though.
The advantage of using int3 over jmp to launch the instrumented
module is that int3 (or breakpoint in most architectures) is an
atomic operation to insert.
Ah, good point. Though ... how much do we care what the speed of
insertion/removal actually is? If we can tolerate it being slow,
then just sync everyone up in an IPI to freeze them out whilst
doing the insert.
I am getting some more ideas...
1. Copy the original functions, instrument them and insert them as
a part of kernel module with different name prefix.
2. Insert breakpoint only on those routines at runtime.
3. When the breakpoint gets hit, change the instruction pointer to
the instrumented routine. No need to single step at all.
Surely this still carries the overhead of doing the breakpoint,
which was part of what we were trying to get away from? I suppose
we get more flexibility this way. Or does the slowness not actually
come from the int3, but only the single-stepping?
How about we combine all three ideas together ...
1. Load modified copy of the function in question.
2. overwrite the first instruction of the routine with an int3 that
does what you say (atomically)
3. Then overwrite the second instruction with a jump that's faster
4. Now atomically overwrite the int3 with a nop, and let the jump
take over.
Adv:
Can be enabled/disabled dynamically by inserting/removing
breakpoints. No overhead of single stepping.
No restriction of running the handler in interrupt context.
You can have pre-compiled instrumented routines.
This mechanism can be used for pre-defined set of routines and for
arbiratory probe points, you can use kprobes/jprobes/systemtap.
No need to be super-user for predefined breakpoints.
Dis:
Maintainence of the code, since it can code base need to be
duplicated and instrumented.
CONFIG_FOO_BAR .... turn it on or off to turn on the instrumentation.
compiled out by default. Compiled in when making the tracing functions.
The above idea is similar to runtime or dynamic patching, but here we
use int3(breakpoint) rather than jump instruction.
Depends what we're trying to fix. I was trying to fix two things:
1. Flexibility - kprobes seem unable to access all local variables etc
easily, and go anywhere inside the function. Plus keeping low overhead
for doing things like keeping counters in a function (see previous
example I mentioned for counting pages in shrink_list).
2. Overhead of the int3, which was allegedly 1000 cycles or so, though
faster after Ingo had played with it, it's still significant.
M.