The following are all syntactically valid probe points.
kernel.function("foo")
kernel.function("foo").return
module{"ext3"}.function("ext3_*")
kernel.function("no_such_function") ?
syscall.*
end
timer.ms(5000)
kernel.syscall.*
kernel.function("sys_*)
The following is the general syntax.
kernel.function("no_such_function") ?
Points in a kernel are identified by module, source file, line number, function name or some combination of these.
Here is a list of probe point specifications currently supported:
kernel.function(PATTERN) kernel.function(PATTERN).call kernel.function(PATTERN).return kernel.function(PATTERN).return.maxactive(VALUE) kernel.function(PATTERN).inline kernel.function(PATTERN).label(LPATTERN) module(MPATTERN).function(PATTERN) module(MPATTERN).function(PATTERN).call module(MPATTERN).function(PATTERN).return.maxactive(VALUE) module(MPATTERN).function(PATTERN).inline kernel.statement(PATTERN) kernel.statement(ADDRESS).absolute module(MPATTERN).statement(PATTERN)
The .function variant places a probe near the beginning of the named function, so that parameters are available as context variables.
The .return variant places a probe at the moment of return from the named function, so the return value is available as the $return context variable. The entry parameters are also available, though the function may have changed their values. Return probes may be further qualified with .maxactive, which specifies how many instances of the specified function can be probed simultaneously. You can leave off .maxactive in most cases, as the default should be sufficient. However, if you notice an excessive number of skipped probes, try setting .maxactive to incrementally higher values to see if the number of skipped probes decreases.
The .inline modifier for .function filters the results to include only instances of inlined functions. The .call modifier selects the opposite subset. Inline functions do not have an identifiable return point, so .return is not supported on .inline probes.
The .statement variant places a probe at the exact spot, exposing those local variables that are visible there.
In the above probe descriptions, MPATTERN stands for a string literal that identifies the loaded kernel module of interest and LPATTERN stands for a source program label. Both MPATTERN and LPATTERN may include asterisk (*), square brackets "[]", and question mark (?) wildcards.
PATTERN stands for a string literal that identifies a point in the program. It is composed of three parts:
Some of the source-level variables, such as function parameters, locals, or globals visible in the compilation unit, are visible to probe handlers. Refer to these variables by prefixing their name with a dollar sign within the scripts. In addition, a special syntax allows limited traversal of structures, pointers, and arrays.
$var refers to an in-scope variable var. If it is a type similar to an integer, it will be cast to a 64-bit integer for script use. Pointers similar to a string (char *) are copied to SystemTap string values by the kernel_string() or user_string() functions.
$var->field traverses a structure's field. The indirection operator may be repeated to follow additional levels of pointers.
$var[N] indexes into an array. The index is given with a literal number.
$$vars expands to a character string that is equivalent to sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x", $parm1, ..., $parmN, $var1, ..., $varN)
$$locals expands to a character string that is equivalent to sprintf("var1=%x ... varN=%x", $var1, ..., $varN)
$$parms expands to a character string that is equivalent to sprintf("parm1=%x ... parmN=%x", $parm1, ..., $parmN)
General syntax:
kernel.function("func[@file]"
module("modname").function("func[@file]"
# Refers to all kernel functions with "init" or "exit"
# in the name:
kernel.function("*init*"), kernel.function("*exit*")
# Refers to any functions within the "kernel/sched.c"
# file that span line 240:
kernel.function("*@kernel/sched.c:240")
# Refers to all functions in the ext3 module:
module("ext3").function("*")
General syntax:
kernel.statement("func@file:linenumber")
module("modname").statement("func@file:linenumber")
# Refers to the statement at line 2917 within the
# kernel/sched.c file:
kernel.statement("*@kernel/sched.c:2917")
# Refers to the statement at line bio_init+3 within the fs/bio.c file:
kernel.statement("bio_init@fs/bio.c+3")
In the absence of debugging information, you can still use the kprobe family of probes to examine the entry and exit points of kernel and module functions. You cannot look up the arguments or local variables of a function using these probes. However, you can access the parameters by following this procedure:
When you're stopped at the entry to a function, you can refer to the function's arguments by number. For example, when probing the function declared:
asmlinkage ssize_t sys_read(unsigned int fd, char __user * buf, size_t count)
You can obtain the values of fd, buf, and count, respectively, as uint_arg(1), pointer_arg(2), and ulong_arg(3). In this case, your probe code must first call asmlinkage(), because on some architectures the asmlinkage attribute affects how the function's arguments are passed.
When you're in a return probe, $return isn't supported without DWARF, but you can call returnval() to get the value of the register in which the function value is typically returned, or call returnstr() to get a string version of that value.
And at any code probepoint, you can call register("regname") to get the value of the specified CPU register when the probe point was hit. u_register("regname") is like register("regname"), but interprets the value as an unsigned integer.
SystemTap supports the following constructs:
kprobe.function(FUNCTION) kprobe.function(FUNCTION).return kprobe.module(NAME).function(FUNCTION) kprobe.module(NAME).function(FUNCTION).return kprobe.statement(ADDRESS).absolute
Use .function probes for kernel functions and .module probes for probing functions of a specified module. If you do not know the absolute address of a kernel or module function, use .statement probes. Do not use wildcards in FUNCTION and MODULE names. Wildcards cause the probe to not register. Also, run statement probes in guru mode only.
These probe points allow procfs pseudo-files in /proc/systemtap/MODNAME to be created, read and written. Specify the name of the systemtap module as MODNAME. There are four probe point variants supported by the translator:
procfs("PATH").read
procfs("PATH").write
procfs.read
procfs.write
PATH is the file name to be created, relative to /proc/systemtap/MODNAME. If no PATH is specified (as in the last two variants in the previous list), PATH defaults to "command".
When a user reads /proc/systemtap/MODNAME/PATH, the corresponding procfs read probe is triggered. Assign the string data to be read to a variable named $value, as follows:
procfs("PATH").read { $value = "100\n" }
When a user writes into /proc/systemtap/MODNAME/PATH, the corresponding procfs write probe is triggered. The data the user wrote is available in the string variable named $value, as follows:
procfs("PATH").write { printf("User wrote: %s", $value) }
Marker probe points begin with a kernel prefix which identifies the source of the symbol table used for finding markers. The suffix names the marker itself: mark.("MARK"). The marker name string, which can contain wildcard characters, is matched against the names given to the marker macros when the kernel or module is compiled. Optionally, you can specify format("FORMAT"). Specifying the marker format string allows differentiation between two markers with the same name but different marker format strings.
The handler associated with a marker probe reads any optional parameters specified at the macro call site named $arg1 through $argNN, where NN is the number of parameters supplied by the macro. Number and string parameters are passed in a type-safe manner.
The marker format string associated with a marker is available in $format. The marker name string is available in $name.
Here are the marker probe constructs:
kernel.mark("MARK")
kernel.mark("MARK").format("FORMAT")
For more information about marker probes, see http://sourceware.org/systemtap/wiki/UsingMarkers.
syscall.NAME syscall.NAME.return
Generally, two probes are defined for each normal system call as listed in the syscalls(2) manual page: one for entry and one for return. System calls that never return do not have a corresponding .return probe.
Each probe alias defines a variety of variables. Look at the tapset source code to find the most reliable source of variable definitions. Generally, each variable listed in the standard manual page is available as a script-level variable. For example, syscall.open exposes file name, flags, and mode. In addition, a standard suite of variables is available at most aliases, as follows:
Not all probe aliases obey all of these general guidelines. Please report exceptions that you encounter as a bug.
This family of probe points hooks to static probing tracepoints inserted into the kernel or kernel modules. As with marker probes, these tracepoints are special macro calls inserted by kernel developers to make probing faster and more reliable than with DWARF-based probes. DWARF debugging information is not required to probe tracepoints. Tracepoints have more strongly-typed parameters than marker probes.
Tracepoint probes begin with kernel. The next part names the tracepoint itself: trace("name"). The tracepoint name string, which can contain wildcard characters, is matched against the names defined by the kernel developers in the tracepoint header files.
The handler associated with a tracepoint-based probe can read the optional parameters specified at the macro call site. These parameters are named according to the declaration by the tracepoint author. For example, the tracepoint probe kernel.trace("sched_switch") provides the parameters $rq, $prev, and $next. If the parameter is a complex type such as a struct pointer, then a script can access fields with the same syntax as DWARF $target variables. Tracepoint parameters cannot be modified; however, in guru mode a script can modify fields of parameters.
The name of the tracepoint is available in $$name, and a string of name=value pairs for all parameters of the tracepoint is available in $$vars or $$parms.
timer.jiffies(N) timer.jiffies(N).randomize(M)
Intervals may be specified in units of time. There are two probe point variants similar to the jiffies timer:
timer.ms(N) timer.ms(N).randomize(M)
The resolution of the timers depends on the target kernel. For kernels prior to 2.6.17, timers are limited to jiffies resolution, so intervals are rounded up to the nearest jiffies interval. After 2.6.17, the implementation uses hrtimers for tighter precision, though the resulting resolution will be dependent upon architecture. In either case, if the randomize component is given, then the random value will be added to the interval before any rounding occurs.
Profiling timers are available to provide probes that execute on all CPUs at each system tick. This probe takes no parameters, as follows.
timer.profile
The following is an example of timer usage.
# Refers to a periodic interrupt, every 1000 jiffies: timer.jiffies(1000) # Fires every 5 seconds: timer.sec(5) # Refers to a periodic interrupt, every 1000 +/- 200 jiffies: timer.jiffies(1000).randomize(200)
The probe points begin and end are defined by the translator to refer to the time of session startup and shutdown. There are no target variables available in either context.
Here is a simple example:
probe error { println ("Oops, errors occurred. Here's a report anyway.")
foreach (coin in mint) { println (coin) } }
# In a tapset file:
probe begin(-1000) { ... }
# In a user script:
probe begin { ... }
.
Typecasting is supported using the @cast() operator. A script can define a pointer type for a long value, then access type members using the same syntax as with $target variables. After a pointer is saved into a script integer variable, the translator loses the necessary type information to access members from that pointer. The @cast() operator tells the translator how to read a pointer.
The following statement interprets p as a pointer to a struct or union named type_name and dereferences the member value:
@cast(p, "type_name"[, "module"])->member
The optional module parameter tells the translator where to look for information about that type. You can specify multiple modules as a list with colon (:) separators. If you do not specify the module parameter, the translator defaults to either the probe module for dwarf probes or to kernel for functions and all other probe types.
The following statement retrieves the parent PID from a kernel task_struct:
@cast(pointer, "task_struct", "kernel")->parent->tgid
The translator can create its own module with type information from a header surrounded by angle brackets (< >) if normal debugging information is not available. For kernel headers, prefix it with kernel to use the appropriate build system. All other headers are built with default GCC parameters into a user module. The following statements are examples.
@cast(tv, "timeval", "<sys/time.h>")->tv_sec @cast(task, "task_struct", "kernel<linux/sched.h>")->tgid
In guru mode, the translator allows scripts to assign new values to members of typecasted pointers.
Typecasting is also useful in the case of void* members whose type might be determinable at run time.
probe foo {
if ($var->type == 1) {
value = @cast($var->data, "type1")->bar
} else {
value = @cast($var->data, "type2")->baz
}
print(value)
}