Differences between revisions 24 and 25
Revision 24 as of 2008-02-04 21:27:31
Size: 6043
Editor: DavidSmith
Comment:
Revision 25 as of 2008-02-20 21:15:59
Size: 7169
Editor: DavidSmith
Comment: added info on the ".format(FORMAT)" specifier
Deletions are marked like this. Additions are marked like this.
Line 92: Line 92:
A simpler solution to this problem would be to go ahead and put the interesting structure fields in the marker itself. If we were only interested in the ''i_ino'' field, this would change the marker code to look like:
Line 93: Line 94:
A simpler solution to this problem would be to go ahead and put the interesting structure fields in the marker itself. If we were only interested in the ''i_ino'' field, this would change the marker code to look like:
Line 108: Line 108:
== How does systemtap handle markers that have the same name but different format strings? ==
It is possible (although not recommended) for two (or more) markers to have the same name but different format strings. Assume the following kernel function:

{{{
#include <linux/marker.h>
//...
int kfunc2(struct inode *i, int op)
{
        int rc = 0; // return code
        char msg[100];

        trace_mark(kfunc2, "inode %p op %d", inode, op);

        //... bulk of kfunc2() here...

        trace_mark(kfunc2, "msg %s rc %d", msg, rc);
        return(rc);
}
}}}
There are two instances of a marker named 'kfunc2', each with a different format string. If you wrote a probe that accessed '$arg1' as a number, it will fail since for the second instance of 'kfunc2' (since '$arg1' is a string in the second instance of the 'kfunc2' marker).

To uniquely specify either marker, use the optional marker probe '.format(FORMAT)' specifier.

{{{
global x = 0
probe kernel.mark("kfunc2").format("inode*") { x += $arg1 }
probe kernel.mark("kfunc2").format("msg*") { printf("msg %s\n", $arg1) }
}}}

Using Markers

What are markers?

Here is some text taken from the kernel documentation that describes markers:

  • A marker placed in code provides a hook to call a function (probe) that you can provide at runtime. A marker can be "on" (a probe is connected to it) or "off" (no probe is attached). When a marker is "off" it has no effect, except for adding a tiny time penalty (checking a condition for a branch) and space penalty (adding a few bytes for the function call at the end of the instrumented function and adds a data structure in a separate section). When a marker is "on", the function you provide is called each time the marker is executed, in the execution context of the caller. When the function provided ends its execution, it returns to the caller (continuing from the marker site).
  • You can put markers at important locations in the code. Markers are lightweight hooks that can pass an arbitrary number of parameters, described in a printk-like format string, to the attached probe function.
  • They can be used for tracing and performance accounting.

What do markers in kernel code look like?

#include <linux/marker.h>
//...
int kfunc(struct inode *i, int op)
{
        int rc = 0;        // return code
        trace_mark(kfunc_entry, "inode %p op %d", inode, op);

        //... bulk of kfunc() here...

        trace_mark(kfunc_exit, "rc %d", rc);
        return(rc);
}

This mythical function (named 'kfunc'), has 2 markers present in it. The first one has a subsystem_event of "kfunc_entry" and the second marker has a subsystem_event of "kfunc_exit". The kernel documentation suggests treating the first argument of trace_mark as having 2 parts: a 'subsystem' and an 'event'. In our example, the name of the function is used as the subsystem, and 'entry' and 'exit' are used as the events. As you can see, the second argument to trace_mark() is a format string (similar to one used by printk()), and the rest of the arguments depend on the format string.

How do I turn on marker support in my kernel?

The marker subsystem and the kernel markers themselves must be compiled into your kernel. Initial marker support is present in 2.6.24, but at this time you must also add 3 patches from the -mm tree to get full marker functionality.

linux-kernel-markers-create-modpost-file.patch: adds support for multiple probes per markers.

linux-kernel-markers-support-multiple-probes-update.patch: updates the previous patch.

linux-kernel-markers-support-multiple-probes.patch: adds support for creating a file called Module.markers which lists all markers present in a kernel and its modules (similar to the Module.symvers file).

After those patches have been applied to your kernel source, when running "make menuconfig", besides the normal options needed by systemtap, you'll also need to enable markers.

Instrumentation Support  --->
    [*] Activate markers

Selecting this option turns on the CONFIG_MAKERS define.

How do I use markers in systemtap?

To hook up a systemtap probe to a kernel marker, your systemtap script would use the 'kernel.mark("NAME")' facility. Using the example from above:

probe kernel.mark("kfunc_entry") { printf("kfunc_entry marker hit\n") }
probe kernel.mark("kfunc_exit") { printf("kfunc_exit marker hit\n") }

When kfunc() gets called by the kernel, both systemtap probes would be hit and you would see the appropriate output.

How do I access the marker format string?

The handler associated with a marker-based probe may read the format string specified at the marker call site. The format string is named $format.

Using the example from above:

probe kernel.mark("kfunc_entry") { printf("kfunc_entry marker hit: %s, %p, %d\n", $format, $arg1, $arg2) }

How do I access marker arguments?

The handler associated with a marker-based probe may read the optional parameters specified at the marker call site. These are named $arg1 through $argNN, where 'NN' is the number of parameters supplied by the marker. Number and string parameters are handled in a type-safe manner.

Using the example from above:

probe kernel.mark("kfunc_entry") { printf("kfunc_entry marker hit: %p, %d\n", $arg1, $arg2) }
probe kernel.mark("kfunc_exit") { print("kfunc_exit marker hit: %d\n", $arg1) }

How are marker arguments handled that are structure pointers?

Not well. Because systemtap has no DWARF information for marker arguments, it really doesn't know what type they are. In a DWARF-based probe, you could write something like this:

probe kernel.function("kfunc") { printf("inode number: %d\n", i->i_ino) }

However, if you try something similar with a marker-based probe, you'll get an error because systemtap doesn't know that:

  • $arg1 is a pointer
  • the type of what $arg1 points to

So, to work around this problem, you'll have to write an access function and use the '-g' (guru mode) systemtap option. The script to access the i_ino field out of struct inode would look like:

function inode_get_i_ino:long (i:long) %{ /* pure */
        struct inode *inode = (struct inode *)(long)THIS->i;
        THIS->__retvalue = kread(&(inode->i_ino));
        CATCH_DEREF_FAULT();
%}
probe kernel.mark("kfunc_entry") { printf("inode number: %d\n", inode_get_i_ino($arg1)) }

A simpler solution to this problem would be to go ahead and put the interesting structure fields in the marker itself. If we were only interested in the i_ino field, this would change the marker code to look like:

#include <linux/marker.h>
//...
int kfunc(struct inode *i, int op)
{
        int rc = 0;        // return code
        trace_mark(kfunc_entry, "inode %p i_ino %lu op %d", inode, inode->i_ino, op);

        //... bulk of kfunc() here...

        trace_mark(kfunc_exit, "rc %d", rc);
        return(rc);
}

How does systemtap handle markers that have the same name but different format strings?

It is possible (although not recommended) for two (or more) markers to have the same name but different format strings. Assume the following kernel function:

#include <linux/marker.h>
//...
int kfunc2(struct inode *i, int op)
{
        int rc = 0;        // return code
        char msg[100];

        trace_mark(kfunc2, "inode %p op %d", inode, op);

        //... bulk of kfunc2() here...

        trace_mark(kfunc2, "msg %s rc %d", msg, rc);
        return(rc);
}

There are two instances of a marker named 'kfunc2', each with a different format string. If you wrote a probe that accessed '$arg1' as a number, it will fail since for the second instance of 'kfunc2' (since '$arg1' is a string in the second instance of the 'kfunc2' marker).

To uniquely specify either marker, use the optional marker probe '.format(FORMAT)' specifier.

global x = 0
probe kernel.mark("kfunc2").format("inode*") { x += $arg1 }
probe kernel.mark("kfunc2").format("msg*") { printf("msg %s\n", $arg1) }

None: UsingMarkers (last edited 2009-02-21 01:51:00 by JoshStone)