2 .TH STAPPROBES 3stap @DATE@ "Red Hat"
4 stapprobes \- systemtap probe points
20 The following sections enumerate the variety of probe points supported
21 by the systemtap translator, and additional aliases defined by
22 standard tapset scripts.
24 The general probe point syntax is a dotted-symbol sequence. This
25 allows a breakdown of the event namespace into parts, somewhat like
26 the Domain Name System does on the Internet. Each component
27 identifier may be parametrized by a string or number literal, with a
28 syntax like a function call. A component may include a "*" character,
29 to expand to a set of matching probe points. Probe aliases likewise
30 expand to other probe points. Each and every resulting probe point is
31 normally resolved to some low-level system instrumentation facility
32 (e.g., a kprobe address, marker, or a timer configuration), otherwise
33 the elaboration phase will fail.
35 However, a probe point may be followed by a "?" character, to indicate
36 that it is optional, and that no error should result if it fails to
37 resolve. Optionalness passes down through all levels of
38 alias/wildcard expansion. Alternately, a probe point may be followed
39 by a "!" character, to indicate that it is both optional and
40 sufficient. (Think vaguely of the prolog cut operator.) If it does
41 resolve, then no further probe points in the same comma-separated list
42 will be resolved. Therefore, the "!" sufficiency mark only makes
43 sense in a list of probe point alternatives.
45 Additionally, a probe point may be followed by a "if (expr)" statement, in
46 order to enable/disable the probe point on-the-fly. With the "if" statement,
47 if the "expr" is false when the probe point is hit, the whole probe body
48 including alias's body is skipped. The condition is stacked up through
49 all levels of alias/wildcard expansion. So the final condition becomes
50 the logical-and of conditions of all expanded alias/wildcard.
54 valid probe points. (They are generally
56 invalid, depending on the contents of the tapsets, and the versions of
57 kernel/user software installed.)
60 kernel.function("foo").return
61 process("/bin/vi").statement(0x2222)
64 kernel.function("no_such_function") ?
65 module("awol").function("no_such_function") !
67 kprobe.function("foo")
71 Probes may be broadly classified into "synchronous" and
72 "asynchronous". A "synchronous" event is deemed to occur when any
73 processor executes an instruction matched by the specification. This
74 gives these probes a reference point (instruction address) from which
75 more contextual data may be available. Other families of probe points
76 refer to "asynchronous" events such as timers/counters rolling over,
77 where there is no fixed reference point that is related. Each probe
78 point specification may match multiple locations (for example, using
79 wildcards or aliases), and all them are then probed. A probe
80 declaration may also contain several comma-separated specifications,
81 all of which are probed.
87 are defined by the translator to refer to the time of session startup
88 and shutdown. All "begin" probe handlers are run, in some sequence,
89 during the startup of the session. All global variables will have
90 been initialized prior to this point. All "end" probes are run, in
91 some sequence, during the
93 shutdown of a session, such as in the aftermath of an
95 function call, or an interruption from the user. In the case of an
96 error-triggered shutdown, "end" probes are not run. There are no
97 target variables available in either context.
99 If the order of execution among "begin" or "end" probes is significant,
100 then an optional sequence number may be provided:
107 The number N may be positive or negative. The probe handlers are run in
108 increasing order, and the order between handlers with the same sequence
109 number is unspecified. When "begin" or "end" are given without a
110 sequence, they are effectively sequence zero.
114 probe point is similar to the
116 probe, except that each such probe handler run when the session ends
117 after errors have occurred. In such cases, "end" probes are skipped,
118 but each "error" prober is still attempted. This kind of probe can be
119 used to clean up or emit a "final gasp". It may also be numerically
120 parametrized to set a sequence.
125 is specially defined by the translator to mean "never". Its probe
126 handler is never run, though its statements are analyzed for symbol /
127 type correctness as usual. This probe point may be useful in
128 conjunction with optional probes.
134 aliases define several hundred probes, too many to
135 summarize here. They are:
143 Generally, two probes are defined for each normal system call as listed in the
145 manual page, one for entry and one for return. Those system calls that never
146 return do not have a corresponding
150 Each probe alias defines a variety of variables. Looking at the tapset source
151 code is the most reliable way. Generally, each variable listed in the standard
152 manual page is made available as a script-level variable, so
155 .IR filename ", " flags ", and " mode .
156 In addition, a standard suite of variables is available at most aliases:
159 A pretty-printed form of the entire argument list, without parentheses.
162 The name of the system call.
165 For return probes, a pretty-printed form of the system-call result.
167 Not all probe aliases obey all of these general guidelines. Please report
168 any bothersome ones you encounter as a bug.
173 Intervals defined by the standard kernel "jiffies" timer may be used
174 to trigger probe handlers asynchronously. Two probe point variants
175 are supported by the translator:
179 timer.jiffies(N).randomize(M)
182 The probe handler is run every N jiffies (a kernel-defined unit of
183 time, typically between 1 and 60 ms). If the "randomize" component is
184 given, a linearly distributed random value in the range [\-M..+M] is
185 added to N every time the handler is run. N is restricted to a
186 reasonable range (1 to around a million), and M is restricted to be
187 smaller than N. There are no target variables provided in either
188 context. It is possible for such probes to be run concurrently on
189 a multi-processor computer.
191 Alternatively, intervals may be specified in units of time.
192 There are two probe point variants similar to the jiffies timer:
196 timer.ms(N).randomize(M)
199 Here, N and M are specified in milliseconds, but the full options for units
200 are seconds (s/sec), milliseconds (ms/msec), microseconds (us/usec),
201 nanoseconds (ns/nsec), and hertz (hz). Randomization is not supported for
204 The actual resolution of the timers depends on the target kernel. For
205 kernels prior to 2.6.17, timers are limited to jiffies resolution, so
206 intervals are rounded up to the nearest jiffies interval. After 2.6.17,
207 the implementation uses hrtimers for tighter precision, though the actual
208 resolution will be arch-dependent. In either case, if the "randomize"
209 component is given, then the random value will be added to the interval
210 before any rounding occurs.
212 Profiling timers are also available to provide probes that execute on all
213 CPUs at the rate of the system tick (CONFIG_HZ).
214 This probe takes no parameters.
220 Full context information of the interrupted process is available, making
221 this probe suitable for a time-based sampling profiler.
225 This family of probe points uses symbolic debugging information for
226 the target kernel/module/program, as may be found in unstripped
227 executables, or the separate
229 packages. They allow placement of probes logically into the execution
230 path of the target program, by specifying a set of points in the
231 source or object code. When a matching statement executes on any
232 processor, the probe handler is run in that context.
234 Points in a kernel, which are identified by
235 module, source file, line number, function name, or some
236 combination of these.
238 Here is a list of probe point families currently supported. The
240 variant places a probe near the beginning of the named function, so that
241 parameters are available as context variables. The
243 variant places a probe at the moment
245 the return from the named function, so the return value is available
246 as the "$return" context variable. The
250 filters the results to include only instances of inlined functions.
253 modifier selects the opposite subset. Inline functions do not have an
254 identifiable return point, so
260 variant places a probe at the exact spot, exposing those local variables
261 that are visible there.
264 kernel.function(PATTERN)
266 kernel.function(PATTERN).call
268 kernel.function(PATTERN).return
270 kernel.function(PATTERN).inline
272 kernel.function(PATTERN).label(LPATTERN)
274 module(MPATTERN).function(PATTERN)
276 module(MPATTERN).function(PATTERN).call
278 module(MPATTERN).function(PATTERN).return
280 module(MPATTERN).function(PATTERN).inline
283 kernel.statement(PATTERN)
285 kernel.statement(ADDRESS).absolute
287 module(MPATTERN).statement(PATTERN)
290 In the above list, MPATTERN stands for a string literal that aims to
291 identify the loaded kernel module of interest and LPATTERN stands for
292 a source program label. Both MPATTERN and LPATTERN may include the "*"
293 "[]", and "?" wildcards.
294 PATTERN stands for a string literal that
295 aims to identify a point in the program. It is made up of three
298 The first part is the name of a function, as would appear in the
300 program's output. This part may use the "*" and "?" wildcarding
301 operators to match multiple names.
303 The second part is optional and begins with the "@" character.
304 It is followed by the path to the source file containing the function,
305 which may include a wildcard pattern, such as mm/slab*.
306 If it does not match as is, an implicit "*/" is optionally added
308 the pattern, so that a script need only name the last few components
309 of a possibly long source directory path.
311 Finally, the third part is optional if the file name part was given,
312 and identifies the line number in the source file preceded by a ":"
313 or a "+". The line number is assumed to be an
314 absolute line number if preceded by a ":", or relative to the entry of
315 the function if preceded by a "+".
316 All the lines in the function can be matched with ":*".
317 A range of lines x through y can be matched with ":x-y".
319 As an alternative, PATTERN may be a numeric constant, indicating an
320 address. Such an address may be found from symbol tables of the
321 appropriate kernel / module object file. It is verified against
322 known statement code boundaries, and will be relocated for use at
325 In guru mode only, absolute kernel-space addresses may be specified with
326 the ".absolute" suffix. Such an address is considered already relocated,
329 so it cannot be checked against statement/instruction boundaries.
331 Some of the source-level context variables, such as function parameters,
332 locals, globals visible in the compilation unit, may be visible to
333 probe handlers. They may refer to these variables by prefixing their
334 name with "$" within the scripts. In addition, a special syntax
335 allows limited traversal of structures, pointers, and arrays.
338 refers to an in-scope variable "var". If it's an integer-like type,
339 it will be cast to a 64-bit int for systemtap script use. String-like
340 pointers (char *) may be copied to systemtap string values using the
341 .IR kernel_string " or " user_string
345 traversal to a structure's field. The indirection operator
346 may be repeated to follow more levels of pointers.
349 is available in return probes only for functions that are declared
354 indexes into an array. The index is given with a
358 expands to a character string that is equivalent to
359 sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x", parm1, ..., parmN,
363 expands to a subset of $$vars for only local variables.
366 expands to a subset of $$vars for only function parameters.
369 is available in return probes only. It expands to a string that
370 is equivalent to sprintf("return=%x", $return)
371 if the probed function has a return value, or else an empty string.
373 For ".return" probes, context variables other than the "$return"
374 value itself are only available for the function call parameters.
375 The expressions evaluate to the
377 values of those variables, since that is when a snapshot is taken.
378 Other local variables are not generally accessible, since by the time
379 a ".return" probe hits, the probed function will have already returned.
383 In absence of debugging information, entry & exit points of kernel & module
384 functions can be probed using the "kprobe" family of probes.
385 However, these do not permit looking up the arguments / local variables
387 Following constructs are supported :
389 kprobe.function(FUNCTION)
390 kprobe.function(FUNCTION).return
391 kprobe.module(NAME).function(FUNCTION)
392 kprobe.module(NAME).function(FUNCTION).return
393 kprobe.statement.(ADDRESS).absolute
398 are recommended for kernel functions, whereas probes of type
400 are recommended for probing functions of the specified module.
401 In case the absolute address of a kernel or module function is known,
403 probes can be utilized.
411 contain wildcards, or the probe will not be registered.
412 Also, statement probes must be run under guru-mode only.
416 Support for user-space probing is available for kernels
417 that are configured with the utrace extensions. See
419 http://people.redhat.com/roland/utrace/
422 There are several forms. First, a non-symbolic probe point:
424 process(PID).statement(ADDRESS).absolute
428 kernel.statement(ADDRESS).absolute
429 in that both use raw (unverified) virtual addresses and provide
430 no $variables. The target PID parameter must identify a running
431 process, and ADDRESS should identify a valid instruction address.
432 All threads of that process will be probed.
434 Second, non-symbolic user-kernel interface events handled by
435 utrace may be probed:
438 process("PATH").begin
440 process(PID).thread.begin
441 process("PATH").thread.begin
446 process(PID).thread.end
447 process("PATH").thread.end
450 process("PATH").syscall
452 process(PID).syscall.return
453 process("PATH").syscall.return
454 process.syscall.return
457 process(PID).insn.block
458 process("PATH").insn.block
463 probe gets called when new process described by PID or PATH gets created.
466 probe gets called when a new thread described by PID or PATH gets created.
469 probe gets called when process described by PID or PATH dies.
472 probe gets called when a thread described by PID or PATH dies.
475 probe gets called when a thread described by PID or PATH makes a
476 system call. The system call number is available in the
478 context variable, and the first 6 arguments of the system call
481 (ex. $arg1, $arg2, ...) context variable.
484 probe gets called when a thread described by PID or PATH returns from a
485 system call. The system call number is available in the
487 context variable, and the return value of the system call is available
493 probe gets called for every single-stepped instruction of the process described by PID or PATH.
496 probe gets called for every block-stepped instruction of the process described by PID or PATH.
499 Third, symbolic static instrumentation compiled into programs and
500 shared libraries may be
503 process("PATH").mark("LABEL")
508 probe gets called via a static probe which is defined in the
510 STAP_PROBE1(handle,LABEL,arg1), which is defined in sdt.h. The handle is an application handle,
511 LABEL corresponds to the .mark argument, and arg1 is the argument.
512 STAP_PROBE1 is used for probes with 1 argument, STAP_PROBE2 is used
513 for probes with 2 arguments, and so on.
514 The arguments of the probe are available in the context variables
515 $arg1, $arg2, ... An alternative to using the STAP_PROBE macros is to
516 use the dtrace script to create custom macros.
519 Finally, full symbolic source-level probes in user-space programs
520 and shared libraries are supported. These are exactly analogous
521 to the symbolic DWARF-based kernel/module probes described above,
522 and expose similar contextual $-variables.
524 process("PATH").function("NAME")
525 process("PATH").statement("*@FILE.c:123")
526 process("PATH").function("*").return
527 process("PATH").function("myfun").label("foo")
531 Note that for all process probes,
533 names refer to executables that are searched the same way shells do: relative
534 to the working directory if they contain a "/" character, otherwise in
536 If a process probe is specified without a PID or PATH, all user
537 threads are probed. PATH may sometimes name a shared library
538 in which case all processes that map that shared library may be
543 These probe points allow procfs "files" in
544 /proc/systemtap/MODNAME to be created, read and written
546 is the name of the systemtap module). The
548 filesystem is a pseudo-filesystem which is used an an interface to
549 kernel data structures. There are four probe point variants supported
560 is the file name (relative to /proc/systemtap/MODNAME) to be created.
563 is specified (as in the last two variants above),
565 defaults to "command".
567 When a user reads /proc/systemtap/MODNAME/PATH, the corresponding
570 probe is triggered. The string data to be read should be assigned to
576 procfs("PATH").read { $value = "100\\n" }
579 When a user writes into /proc/systemtap/MODNAME/PATH, the
582 probe is triggered. The data the user wrote is available in the
583 string variable named
588 procfs("PATH").write { printf("user wrote: %s", $value) }
593 This family of probe points hooks up to static probing markers
594 inserted into the kernel or modules. These markers are special macro
595 calls inserted by kernel developers to make probing faster and more
596 reliable than with DWARF-based probes. Further, DWARF debugging
599 required to probe markers.
601 Marker probe points begin with
603 The next part names the marker itself:
605 The marker name string, which may contain the usual wildcard characters,
606 is matched against the names given to the marker macros when the kernel
607 and/or module was compiled. Optionally, you can specify
608 .BR format("format") .
609 Specifying the marker format string allows differentation between two
610 markers with the same name but different marker format strings.
612 The handler associated with a marker-based probe may read the
613 optional parameters specified at the macro call site. These are
615 .BR $arg1 " through " $argNN ,
616 where NN is the number of parameters supplied by the macro. Number
617 and string parameters are passed in a type-safe manner.
619 The marker format string associated with a marker is available in
621 And also the marker name string is avalable in
626 This family of probe points hooks up to static probing tracepoints
627 inserted into the kernel or modules. As with markers, these
628 tracepoints are special macro calls inserted by kernel developers to
629 make probing faster and more reliable than with DWARF-based probes,
630 and DWARF debugging information is not required to probe tracepoints.
631 Tracepoints have an extra advantage of more strongly-typed parameters
634 Tracepoint probes begin with
636 The next part names the tracepoint itself:
638 The tracepoint name string, which may contain the usual wildcard
639 characters, is matched against the names defined by the kernel
640 developers in the tracepoint header files.
642 The handler associated with a tracepoint-based probe may read the
643 optional parameters specified at the macro call site. These are
644 named according to the declaration by the tracepoint author. For
645 example, the tracepoint probe
646 .BR kernel.trace("sched_switch")
647 provides the parameters
648 .BR $rq ", " $prev ", and " $next .
649 If the parameter is a complex type, as in a struct pointer, then a
650 script can access fields with the same syntax as DWARF $target
651 variables. Also, tracepoint parameters cannot be modified, but in
652 guru-mode a script may modify fields of parameters.
654 The name of the tracepoint is available in
656 and a string of name=value pairs for all parameters of the tracepoint
658 .BR $$vars " or " $$parms .
660 .SS PERFORMANCE MONITORING HARDWARE
662 The perfmon family of probe points is used to access the performance
663 monitoring hardware available in modern processors. This family of
664 probes points needs the perfmon2 support in the kernel to access the
665 performance monitoring hardware.
667 Performance monitor hardware points begin with a
669 The next part of the names the event being counted
670 .BR counter("event") .
671 The event names are processor implementation specific with the
672 execption of the generic
673 .BR cycles " and " instructions
674 events, which are available on all processors. This sets up a counter
675 on the processor to count the number of events occuring on the
676 processor. For more details on the performance monitoring events
677 available on a specific processor use the command perfmon2 command:
684 is a handle used in the body of the probe for operations
685 involving the counter associated with the probe.
688 is a function that is passed the handle for the perfmon probe and returns
689 the current count for the event.
693 Here are some example probe points, defining the associated events.
696 refers to the startup and normal shutdown of the session. In this
697 case, the handler would run once during startup and twice during
700 timer.jiffies(1000).randomize(200)
701 refers to a periodic interrupt, every 1000 +/\- 200 jiffies.
703 kernel.function("*init*"), kernel.function("*exit*")
704 refers to all kernel functions with "init" or "exit" in the name.
706 kernel.function("*@kernel/sched.c:240")
707 refers to any functions within the "kernel/sched.c" file that span
710 kernel.mark("getuid")
711 refers to an STAP_MARK(getuid, ...) macro call in the kernel.
713 module("usb*").function("*sync*").return
714 refers to the moment of return from all functions with "sync" in the
715 name in any of the USB drivers.
717 kernel.statement(0xc0044852)
718 refers to the first byte of the statement whose compiled instructions
719 include the given address in the kernel.
721 kernel.statement("*@kernel/sched.c:2917")
722 refers to the statement of line 2917 within "kernel/sched.c".
724 kernel.statement("bio_init@fs/bio.c+3")
725 refers to the statement at line bio_init+3 within "fs/bio.c".
728 refers to the group of probe aliases with any name in the third position
732 .IR stapprobes.iosched (3stap),
733 .IR stapprobes.netdev (3stap),
734 .IR stapprobes.nfs (3stap),
735 .IR stapprobes.nfsd (3stap),
736 .IR stapprobes.pagefault (3stap),
737 .IR stapprobes.process (3stap),
738 .IR stapprobes.rpc (3stap),
739 .IR stapprobes.scsi (3stap),
740 .IR stapprobes.signal (3stap),
741 .IR stapprobes.socket (3stap),
742 .IR stapprobes.tcp (3stap),
743 .IR stapprobes.udp (3stap),