This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Linux Kernel Markers

From: Martin Bligh <mbligh at google dot com>
To: prasanna at in dot ibm dot com
Cc: Andrew Morton <akpm at osdl dot org>, "Frank Ch. Eigler" <fche at redhat dot com>, Ingo Molnar <mingo at elte dot hu>, Mathieu Desnoyers <mathieu dot desnoyers at polymtl dot ca>, Paul Mundt <lethal at linux-sh dot org>, linux-kernel <linux-kernel at vger dot kernel dot org>, Jes Sorensen <jes at sgi dot com>, Tom Zanussi <zanussi at us dot ibm dot com>, Richard J Moore <richardj_moore at uk dot ibm dot com>, Michel Dagenais <michel dot dagenais at polymtl dot ca>, Christoph Hellwig <hch at infradead dot org>, Greg Kroah-Hartman <gregkh at suse dot de>, Thomas Gleixner <tglx at linutronix dot de>, William Cohen <wcohen at redhat dot com>, ltt-dev at shafik dot org, systemtap at sources dot redhat dot com, Alan Cox <alan at lxorguk dot ukuu dot org dot uk>
Date: Tue, 19 Sep 2006 10:17:53 -0700
Subject: Re: [PATCH] Linux Kernel Markers
Domainkey-signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=received:message-id:date:from:user-agent: x-accept-language:mime-version:to:cc:subject:references:in-reply-to: content-type:content-transfer-encoding; b=oxrBR8cUO7TrvHXNqUZekSnt1AXBJTCM0K3047i8Y24RZG3qrSpndo3Y4O1UBrnJu ODI6KdsBba77cDa4Knp3A==
References: <20060918234502.GA197@Krystal> <20060919081124.GA30394@elte.hu> <451008AC.6030006@google.com> <20060919154612.GU3951@redhat.com> <4510151B.5070304@google.com> <20060919093935.4ddcefc3.akpm@osdl.org> <45101DBA.7000901@google.com> <20060919063821.GB23836@in.ibm.com>

It seems like all we'd need to do
is "list all references to function, freeze kernel, update all
references, continue"
"overwrite first 5 bytes of old function with `jmp new_function'".


Yes, that's simple. but slower, as you have a double jump. Probably
a damned sight faster than int3 though.

The advantage of using int3 over jmp to launch the instrumented
module is that int3 (or breakpoint in most architectures) is an
atomic operation to insert.


Ah, good point. Though ... how much do we care what the speed of
insertion/removal actually is? If we can tolerate it being slow,
then just sync everyone up in an IPI to freeze them out whilst
doing the insert.

I am getting some more ideas... 1. Copy the original functions, instrument them and insert them as a part of kernel module with different name prefix. 2. Insert breakpoint only on those routines at runtime. 3. When the breakpoint gets hit, change the instruction pointer to the instrumented routine. No need to single step at all.


Surely this still carries the overhead of doing the breakpoint,
which was part of what we were trying to get away from? I suppose
we get more flexibility this way. Or does the slowness not actually
come from the int3, but only the single-stepping?

How about we combine all three ideas together ...

1. Load modified copy of the function in question.
2. overwrite the first instruction of the routine with an int3 that
does what you say (atomically)
3. Then overwrite the second instruction with a jump that's faster
4. Now atomically overwrite the int3 with a nop, and let the jump
take over.

Adv: Can be enabled/disabled dynamically by inserting/removing breakpoints. No overhead of single stepping. No restriction of running the handler in interrupt context. You can have pre-compiled instrumented routines. This mechanism can be used for pre-defined set of routines and for arbiratory probe points, you can use kprobes/jprobes/systemtap. No need to be super-user for predefined breakpoints. Dis: Maintainence of the code, since it can code base need to be duplicated and instrumented.


CONFIG_FOO_BAR .... turn it on or off to turn on the instrumentation.
compiled out by default. Compiled in when making the tracing functions.

The above idea is similar to runtime or dynamic patching, but here we
use int3(breakpoint) rather than jump instruction.

Depends what we're trying to fix. I was trying to fix two things:

1. Flexibility - kprobes seem unable to access all local variables etc
easily, and go anywhere inside the function. Plus keeping low overhead
for doing things like keeping counters in a function (see previous
example I mentioned for counting pages in shrink_list).

2. Overhead of the int3, which was allegedly 1000 cycles or so, though
faster after Ingo had played with it, it's still significant.

M.

Follow-Ups:
- Re: [PATCH] Linux Kernel Markers
  - From: S. P. Prasanna
- Re: [PATCH] Linux Kernel Markers
  - From: Mathieu Desnoyers
- Re: [PATCH] Linux Kernel Markers
  - From: Vara Prasad

References:
- [PATCH] Linux Kernel Markers
  - From: Mathieu Desnoyers
- Re: [PATCH] Linux Kernel Markers
  - From: Ingo Molnar
- Re: [PATCH] Linux Kernel Markers
  - From: Martin J. Bligh
- Re: [PATCH] Linux Kernel Markers
  - From: Frank Ch. Eigler
- Re: [PATCH] Linux Kernel Markers
  - From: Martin Bligh
- Re: [PATCH] Linux Kernel Markers
  - From: Andrew Morton
- Re: [PATCH] Linux Kernel Markers
  - From: Martin Bligh
- Re: [PATCH] Linux Kernel Markers
  - From: S. P. Prasanna

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]