]> sourceware.org Git - systemtap.git/blame - man/stapprobes.3stap
perf.counter probe warnings.
[systemtap.git] / man / stapprobes.3stap
CommitLineData
5f92f126 1.\" t
ec1a2239 2.TH STAPPROBES 3stap
ba4a90fd
FCE
3.SH NAME
4stapprobes \- systemtap probe points
5
6.\" macros
7.de SAMPLE
8.br
9.RS
10.nf
11.nh
12..
13.de ESAMPLE
14.hy
15.fi
16.RE
17..
18
19.SH DESCRIPTION
20The following sections enumerate the variety of probe points supported
89965a32
FCE
21by the systemtap translator, and some of the additional aliases defined by
22standard tapset scripts. Many are individually documented in the
23.IR 3stap
24manual section, with the
25.IR probe::
26prefix.
67d1ed18
FCE
27
28.SH SYNTAX
29
30.PP
31.SAMPLE
32.BR probe " PROBEPOINT [" , " PROBEPOINT] " { " [STMT ...] " }
33.ESAMPLE
34.PP
35A probe declaration may list multiple comma-separated probe points in
36order to attach a handler to all of the named events. Normally, the
37handler statements are run whenever any of events occur.
ba4a90fd 38.PP
67d1ed18
FCE
39The syntax of a single probe point is a general dotted-symbol
40sequence. This allows a breakdown of the event namespace into parts,
41somewhat like the Domain Name System does on the Internet. Each
42component identifier may be parametrized by a string or number
43literal, with a syntax like a function call. A component may include
44a "*" character, to expand to a set of matching probe points. It may
45also include "**" to match multiple sequential components at once.
46Probe aliases likewise expand to other probe points.
2f5bbffa 47.PP
67d1ed18
FCE
48Probe aliases can be given on their own, or with a suffix. The suffix
49attaches to the underlying probe point that the alias is expanded
50to. For example,
2f5bbffa
SM
51.SAMPLE
52syscall.read.return.maxactive(10)
53.ESAMPLE
54expands to
55.SAMPLE
56kernel.function("sys_read").return.maxactive(10)
57.ESAMPLE
58with the component
59.IR maxactive(10)
60being recognized as a suffix.
61.PP
67d1ed18
FCE
62Normally, each and every probe point resulting from wildcard- and
63alias-expansion must be resolved to some low-level system
64instrumentation facility (e.g., a kprobe address, marker, or a timer
65configuration), otherwise the elaboration phase will fail.
d898100a
FCE
66.PP
67However, a probe point may be followed by a "?" character, to indicate
68that it is optional, and that no error should result if it fails to
69resolve. Optionalness passes down through all levels of
70alias/wildcard expansion. Alternately, a probe point may be followed
71by a "!" character, to indicate that it is both optional and
37f6433e 72sufficient. (Think vaguely of the Prolog cut operator.) If it does
d898100a
FCE
73resolve, then no further probe points in the same comma-separated list
74will be resolved. Therefore, the "!" sufficiency mark only makes
75sense in a list of probe point alternatives.
dfd11cc3
MH
76.PP
77Additionally, a probe point may be followed by a "if (expr)" statement, in
78order to enable/disable the probe point on-the-fly. With the "if" statement,
79if the "expr" is false when the probe point is hit, the whole probe body
80including alias's body is skipped. The condition is stacked up through
81all levels of alias/wildcard expansion. So the final condition becomes
67d1ed18
FCE
82the logical-and of conditions of all expanded alias/wildcard. The expressions
83are necessarily restricted to global variables.
84.PP
e904ad95
FCE
85These are all
86.B syntactically
87valid probe points. (They are generally
88.B semantically
89invalid, depending on the contents of the tapsets, and the versions of
90kernel/user software installed.)
ca88561f 91
ba4a90fd
FCE
92.SAMPLE
93kernel.function("foo").return
e904ad95 94process("/bin/vi").statement(0x2222)
ba4a90fd 95end
729286d8 96syscall.*
2f5bbffa 97syscall.*.return.maxactive(10)
649260f3 98sys**open
6e3347a9 99kernel.function("no_such_function") ?
d898100a 100module("awol").function("no_such_function") !
dfd11cc3 101signal.*? if (switch)
94c3c803 102kprobe.function("foo")
ba4a90fd
FCE
103.ESAMPLE
104
6f05b6ab
FCE
105Probes may be broadly classified into "synchronous" and
106"asynchronous". A "synchronous" event is deemed to occur when any
107processor executes an instruction matched by the specification. This
108gives these probes a reference point (instruction address) from which
109more contextual data may be available. Other families of probe points
110refer to "asynchronous" events such as timers/counters rolling over,
111where there is no fixed reference point that is related. Each probe
112point specification may match multiple locations (for example, using
113wildcards or aliases), and all them are then probed. A probe
114declaration may also contain several comma-separated specifications,
115all of which are probed.
116
5f92f126
FCE
117.SH DWARF DEBUGINFO
118
119Resolving some probe points requires DWARF debuginfo or "debug
120symbols" for the specific part being instrumented. For some others,
121DWARF is automatically synthesized on the fly from source code header
122files. For others, it is not needed at all. Since a systemtap script
123may use any mixture of probe points together, the union of their DWARF
124requirements has to be met on the computer where script compilation
125occurs. (See the \fI\-\-use\-server\fR option and the \fBstap-server\
126(8)\fR man page for information about the remote compilation facility,
127which allows these requirements to be met on a different machine.)
128.PP
129The following point lists many of the available probe point families,
130to classify them with respect to their need for DWARF debuginfo.
131
132.TS
133l l l.
7bfd1083 134\fBDWARF NON-DWARF\fP
5f92f126 135
7bfd1083 136kernel.function, .statement kernel.mark
79dc1dee 137module.function, .statement process.mark, process.plt
7bfd1083
TJL
138process.function, .statement begin, end, error, never
139process.mark \fI(backup)\fP timer
140 perf
141 procfs
142\fBAUTO-DWARF\fP kernel.statement.absolute
143 kernel.data
144kernel.trace kprobe.function
145 process.statement.absolute
146 process.begin, .end, .error
5f92f126
FCE
147.TE
148
149.SH PROBE POINT FAMILIES
150
65aeaea0 151.SS BEGIN/END/ERROR
ba4a90fd
FCE
152
153The probe points
154.IR begin " and " end
155are defined by the translator to refer to the time of session startup
156and shutdown. All "begin" probe handlers are run, in some sequence,
157during the startup of the session. All global variables will have
158been initialized prior to this point. All "end" probes are run, in
159some sequence, during the
160.I normal
161shutdown of a session, such as in the aftermath of an
162.I exit ()
163function call, or an interruption from the user. In the case of an
164error-triggered shutdown, "end" probes are not run. There are no
165target variables available in either context.
6a256b03
JS
166.PP
167If the order of execution among "begin" or "end" probes is significant,
168then an optional sequence number may be provided:
ca88561f 169
6a256b03
JS
170.SAMPLE
171begin(N)
172end(N)
173.ESAMPLE
ca88561f 174
6a256b03
JS
175The number N may be positive or negative. The probe handlers are run in
176increasing order, and the order between handlers with the same sequence
177number is unspecified. When "begin" or "end" are given without a
178sequence, they are effectively sequence zero.
ba4a90fd 179
65aeaea0
FCE
180The
181.IR error
182probe point is similar to the
183.IR end
d898100a
FCE
184probe, except that each such probe handler run when the session ends
185after errors have occurred. In such cases, "end" probes are skipped,
37f6433e 186but each "error" probe is still attempted. This kind of probe can be
d898100a
FCE
187used to clean up or emit a "final gasp". It may also be numerically
188parametrized to set a sequence.
65aeaea0 189
6e3347a9
FCE
190.SS NEVER
191The probe point
192.IR never
193is specially defined by the translator to mean "never". Its probe
194handler is never run, though its statements are analyzed for symbol /
195type correctness as usual. This probe point may be useful in
196conjunction with optional probes.
197
1027502b
FCE
198.SS SYSCALL
199
200The
201.IR syscall.*
202aliases define several hundred probes, too many to
56bd0316 203detail here. They are of the general form:
1027502b
FCE
204
205.SAMPLE
206syscall.NAME
207.br
208syscall.NAME.return
209.ESAMPLE
210
211Generally, two probes are defined for each normal system call as listed in the
212.IR syscalls(2)
213manual page, one for entry and one for return. Those system calls that never
214return do not have a corresponding
215.IR .return
216probe.
217.PP
df7f3a01 218Each probe alias provides a variety of variables. Looking at the tapset source
1027502b
FCE
219code is the most reliable way. Generally, each variable listed in the standard
220manual page is made available as a script-level variable, so
221.IR syscall.open
222exposes
223.IR filename ", " flags ", and " mode .
224In addition, a standard suite of variables is available at most aliases:
225.TP
226.IR argstr
227A pretty-printed form of the entire argument list, without parentheses.
228.TP
229.IR name
230The name of the system call.
231.TP
232.IR retstr
233For return probes, a pretty-printed form of the system-call result.
234.PP
df7f3a01
FCE
235As usual for probe aliases, these variables are all simply initialized
236once from the underlying $context variables, so that later changes to
237$context variables are not automatically reflected. Not all probe
238aliases obey all of these general guidelines. Please report any
239bothersome ones you encounter as a bug.
1027502b
FCE
240
241
ba4a90fd
FCE
242.SS TIMERS
243
244Intervals defined by the standard kernel "jiffies" timer may be used
245to trigger probe handlers asynchronously. Two probe point variants
246are supported by the translator:
ca88561f 247
ba4a90fd
FCE
248.SAMPLE
249timer.jiffies(N)
250timer.jiffies(N).randomize(M)
251.ESAMPLE
ca88561f 252
ba4a90fd
FCE
253The probe handler is run every N jiffies (a kernel-defined unit of
254time, typically between 1 and 60 ms). If the "randomize" component is
13d2ecdb 255given, a linearly distributed random value in the range [\-M..+M] is
ba4a90fd
FCE
256added to N every time the handler is run. N is restricted to a
257reasonable range (1 to around a million), and M is restricted to be
258smaller than N. There are no target variables provided in either
259context. It is possible for such probes to be run concurrently on
260a multi-processor computer.
422d1ceb 261.PP
197a4d62 262Alternatively, intervals may be specified in units of time.
422d1ceb 263There are two probe point variants similar to the jiffies timer:
ca88561f 264
422d1ceb
FCE
265.SAMPLE
266timer.ms(N)
267timer.ms(N).randomize(M)
268.ESAMPLE
ca88561f 269
197a4d62
JS
270Here, N and M are specified in milliseconds, but the full options for units
271are seconds (s/sec), milliseconds (ms/msec), microseconds (us/usec),
272nanoseconds (ns/nsec), and hertz (hz). Randomization is not supported for
273hertz timers.
274
275The actual resolution of the timers depends on the target kernel. For
276kernels prior to 2.6.17, timers are limited to jiffies resolution, so
277intervals are rounded up to the nearest jiffies interval. After 2.6.17,
278the implementation uses hrtimers for tighter precision, though the actual
279resolution will be arch-dependent. In either case, if the "randomize"
280component is given, then the random value will be added to the interval
281before any rounding occurs.
39e57ce0 282.PP
ab8b5560
FCE
283Profiling timers are also available to provide probes that execute on
284all CPUs at the rate of the system tick (CONFIG_HZ). This probe takes
285no parameters. On some kernels, this is a one-concurrent-user-only or
286disabled facility, resulting in error -16 (EBUSY) during probe
287registration.
ca88561f 288
39e57ce0
FCE
289.SAMPLE
290timer.profile
291.ESAMPLE
ca88561f 292
39e57ce0
FCE
293Full context information of the interrupted process is available, making
294this probe suitable for a time-based sampling profiler.
ba4a90fd
FCE
295
296.SS DWARF
297
298This family of probe points uses symbolic debugging information for
299the target kernel/module/program, as may be found in unstripped
300executables, or the separate
301.I debuginfo
302packages. They allow placement of probes logically into the execution
303path of the target program, by specifying a set of points in the
304source or object code. When a matching statement executes on any
305processor, the probe handler is run in that context.
306.PP
307Points in a kernel, which are identified by
ca88561f 308module, source file, line number, function name, or some
6f05b6ab 309combination of these.
ba4a90fd
FCE
310.PP
311Here is a list of probe point families currently supported. The
312.B .function
313variant places a probe near the beginning of the named function, so that
314parameters are available as context variables. The
315.B .return
39e3139a
FCE
316variant places a probe at the moment
317.B after
318the return from the named function, so the return value is available
319as the "$return" context variable. The
54efe513 320.B .inline
b8da0ad1 321modifier for
54efe513 322.B .function
b8da0ad1
FCE
323filters the results to include only instances of inlined functions.
324The
325.B .call
736d8a14
SC
326modifier selects the opposite subset. The
327.B .exported
328modifier
4bda987e
SC
329filters the results to include only exported functions. Inline
330functions do not have an identifiable return point, so
54efe513
GH
331.B .return
332is not supported on
333.B .inline
334probes. The
ba4a90fd
FCE
335.B .statement
336variant places a probe at the exact spot, exposing those local variables
337that are visible there.
ca88561f 338
ba4a90fd
FCE
339.SAMPLE
340kernel.function(PATTERN)
341.br
b8da0ad1
FCE
342kernel.function(PATTERN).call
343.br
ba4a90fd
FCE
344kernel.function(PATTERN).return
345.br
b8da0ad1 346kernel.function(PATTERN).inline
54efe513 347.br
592470cd
SC
348kernel.function(PATTERN).label(LPATTERN)
349.br
ba4a90fd
FCE
350module(MPATTERN).function(PATTERN)
351.br
b8da0ad1
FCE
352module(MPATTERN).function(PATTERN).call
353.br
ba4a90fd
FCE
354module(MPATTERN).function(PATTERN).return
355.br
b8da0ad1
FCE
356module(MPATTERN).function(PATTERN).inline
357.br
2cab6244
JS
358module(MPATTERN).function(PATTERN).label(LPATTERN)
359.br
54efe513 360.br
ba4a90fd
FCE
361kernel.statement(PATTERN)
362.br
37ebca01
FCE
363kernel.statement(ADDRESS).absolute
364.br
ba4a90fd 365module(MPATTERN).statement(PATTERN)
6f017dee
FCE
366.br
367process("PATH").function("NAME")
368.br
369process("PATH").statement("*@FILE.c:123")
370.br
b73a1293
SC
371process("PATH").library("PATH").function("NAME")
372.br
373process("PATH").library("PATH").statement("*@FILE.c:123")
374.br
6f017dee
FCE
375process("PATH").function("*").return
376.br
377process("PATH").function("myfun").label("foo")
5fa99496
FCE
378.br
379process(PID).statement(ADDRESS).absolute
ba4a90fd 380.ESAMPLE
ca88561f 381
6f017dee
FCE
382(See the USER-SPACE section below for more information on the process
383probes.)
384
ba4a90fd 385In the above list, MPATTERN stands for a string literal that aims to
592470cd
SC
386identify the loaded kernel module of interest and LPATTERN stands for
387a source program label. Both MPATTERN and LPATTERN may include the "*"
388"[]", and "?" wildcards.
389PATTERN stands for a string literal that
6f05b6ab 390aims to identify a point in the program. It is made up of three
ca88561f
MM
391parts:
392.IP \(bu 4
393The first part is the name of a function, as would appear in the
ba4a90fd
FCE
394.I nm
395program's output. This part may use the "*" and "?" wildcarding
ca88561f
MM
396operators to match multiple names.
397.IP \(bu 4
398The second part is optional and begins with the "@" character.
399It is followed by the path to the source file containing the function,
400which may include a wildcard pattern, such as mm/slab*.
79640c29 401If it does not match as is, an implicit "*/" is optionally added
ea384b8c 402.I before
79640c29
FCE
403the pattern, so that a script need only name the last few components
404of a possibly long source directory path.
ca88561f 405.IP \(bu 4
ba4a90fd 406Finally, the third part is optional if the file name part was given,
1bd128a3
SC
407and identifies the line number in the source file preceded by a ":"
408or a "+". The line number is assumed to be an
409absolute line number if preceded by a ":", or relative to the entry of
99a5f9cf
SC
410the function if preceded by a "+".
411All the lines in the function can be matched with ":*".
f7470174 412A range of lines x through y can be matched with ":x\-y".
ca88561f 413.PP
ba4a90fd 414As an alternative, PATTERN may be a numeric constant, indicating an
ea384b8c
FCE
415address. Such an address may be found from symbol tables of the
416appropriate kernel / module object file. It is verified against
417known statement code boundaries, and will be relocated for use at
418run time.
419.PP
420In guru mode only, absolute kernel-space addresses may be specified with
421the ".absolute" suffix. Such an address is considered already relocated,
422as if it came from
423.BR /proc/kallsyms ,
424so it cannot be checked against statement/instruction boundaries.
6f017dee
FCE
425
426.SS CONTEXT VARIABLES
427
ba4a90fd 428.PP
6f017dee 429Many of the source-level context variables, such as function parameters,
ba4a90fd
FCE
430locals, globals visible in the compilation unit, may be visible to
431probe handlers. They may refer to these variables by prefixing their
432name with "$" within the scripts. In addition, a special syntax
6f017dee
FCE
433allows limited traversal of structures, pointers, and arrays. More
434syntax allows pretty-printing of individual variables or their groups.
435See also
436.BR @cast .
437
ba4a90fd
FCE
438.TP
439$var
440refers to an in-scope variable "var". If it's an integer-like type,
7b9361d5
FCE
441it will be cast to a 64-bit int for systemtap script use. String-like
442pointers (char *) may be copied to systemtap string values using the
443.IR kernel_string " or " user_string
444functions.
ba4a90fd 445.TP
179a00c3
MW
446@var("varname")
447an alternative syntax for
448.IR $varname
449.
450.TP
451@var("varname@src/file.c")
452refers to the global (either file local or external) variable
453.IR varname
454defined when the file
455.IR src/file.c
456was compiled. The CU in which the variable is resolved is the first CU
457in the module of the probe point which matches the given file name at
458the end and has the shortest file name path (e.g. given
459.IR @var("foo@bar/baz.c")
460and CUs with file name paths
461.IR src/sub/module/bar/baz.c
462and
463.IR src/bar/baz.c
464the second CU will be chosen to resolve the (file) global variable
465.IR foo
466.
467.TP
ab5e90c2
FCE
468$var\->field traversal via a structure's or a pointer's field. This
469generalized indirection operator may be repeated to follow more
470levels. Note that the
471.IR .
472operator is not used for plain structure
473members, only
474.IR \->
475for both purposes. (This is because "." is reserved for string
476concatenation.)
ba4a90fd 477.TP
a43ba433
FCE
478$return
479is available in return probes only for functions that are declared
480with a return value.
481.TP
ba4a90fd 482$var[N]
33b081c5
JS
483indexes into an array. The index given with a literal number or even
484an arbitrary numeric expression.
6f017dee
FCE
485.PP
486A number of operators exist for such basic context variable expressions:
34af38db 487.TP
2cb3fe26
SC
488$$vars
489expands to a character string that is equivalent to
6f017dee
FCE
490.SAMPLE
491sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x",
492 parm1, ..., parmN, var1, ..., varN)
493.ESAMPLE
494for each variable in scope at the probe point. Some values may be
495printed as
496.IR =?
497if their run-time location cannot be found.
2cb3fe26
SC
498.TP
499$$locals
a43ba433 500expands to a subset of $$vars for only local variables.
2cb3fe26
SC
501.TP
502$$parms
a43ba433
FCE
503expands to a subset of $$vars for only function parameters.
504.TP
505$$return
506is available in return probes only. It expands to a string that
fd574705 507is equivalent to sprintf("return=%x", $return)
a43ba433 508if the probed function has a return value, or else an empty string.
6f017dee
FCE
509.TP
510& $EXPR
511expands to the address of the given context variable expression, if it
512is addressable.
513.TP
514@defined($EXPR)
515expands to 1 or 0 iff the given context variable expression is resolvable,
516for use in conditionals such as
517.SAMPLE
f7470174 518@defined($foo\->bar) ? $foo\->bar : 0
6f017dee
FCE
519.ESAMPLE
520.TP
521$EXPR$
522expands to a string with all of $EXPR's members, equivalent to
523.SAMPLE
524sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}",
525 $EXPR\->a, $EXPR\->b)
526.ESAMPLE
527.TP
528$EXPR$$
529expands to a string with all of $var's members and submembers, equivalent to
530.SAMPLE
531sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}",
532 $EXPR\->a, $EXPR\->b, $EXPR\->c\->x, $EXPR\->c\->y, $EXPR\->d[0])
533.ESAMPLE
534
3f5a5bb1
FCE
535.SS MORE ON RETURN PROBES
536
537.PP
538For the kernel ".return" probes, only a certain fixed number of
539returns may be outstanding. The default is a relatively small number,
540on the order of a few times the number of physical CPUs. If many
541different threads concurrently call the same blocking function, such
542as futex(2) or read(2), this limit could be exceeded, and skipped
543"kretprobes" would be reported by "stap -t". To work around this,
544specify a
545.SAMPLE
546probe FOO.return.maxactive(NNN)
547.ESAMPLE
548suffix, with a large enough NNN to cover all expected concurrently blocked
549threads. Alternately, use the
550.SAMPLE
551stap -DKRETACTIVE=NNNN
552.ESAMPLE
553stap command line macro setting to override the default for all
554".return" probes.
1c0b8e23 555
39e3139a 556.PP
1c0b8e23
FCE
557For ".return" probes, context variables other than the "$return" may
558be accessible, as a convenience for a script programmer wishing to
559access function parameters. These values are \fBsnapshots\fP
560taken at the time of function entry. Local variables within the
561function are \fBnot\fP generally accessible, since those variables did
562not exist in allocated/initialized form at the snapshot moment.
8cc799a5 563.PP
1c0b8e23
FCE
564In addition, arbitrary entry-time expressions can also be saved for
565".return" probes using the
8cc799a5
JS
566.IR @entry(expr)
567operator. For example, one can compute the elapsed time of a function:
568.SAMPLE
569probe kernel.function("do_filp_open").return {
570 println( get_timeofday_us() \- @entry(get_timeofday_us()) )
571}
572.ESAMPLE
39e3139a 573
1c0b8e23
FCE
574.PP
575The following table summarizes how values related to a function
576parameter context variable, a pointer named \fBaddr\fP, may be
577accessed from a
578.IR .return
579probe.
580.\" summarized from http://sourceware.org/ml/systemtap/2012-q1/msg00025.html
581.TS
582l l l.
583\fBat-entry value past-exit value\fP
584
585$addr \fInot available\fP
586$addr->x->y @cast(@entry($addr),"struct zz")->x->y
587$addr[0] {kernel,user}_{char,int,...}(& $addr[0])
588.TE
589
ba4a90fd 590
94c3c803
AM
591.SS DWARFLESS
592In absence of debugging information, entry & exit points of kernel & module
593functions can be probed using the "kprobe" family of probes.
594However, these do not permit looking up the arguments / local variables
595of the function.
596Following constructs are supported :
597.SAMPLE
598kprobe.function(FUNCTION)
599kprobe.function(FUNCTION).return
600kprobe.module(NAME).function(FUNCTION)
601kprobe.module(NAME).function(FUNCTION).return
602kprobe.statement.(ADDRESS).absolute
603.ESAMPLE
604.PP
605Probes of type
606.B function
607are recommended for kernel functions, whereas probes of type
608.B module
609are recommended for probing functions of the specified module.
610In case the absolute address of a kernel or module function is known,
611.B statement
612probes can be utilized.
613.PP
614Note that
615.I FUNCTION
616and
617.I MODULE
618names
619.B must not
620contain wildcards, or the probe will not be registered.
621Also, statement probes must be run under guru-mode only.
622
623
1ada6f08 624.SS USER-SPACE
38e96af8
FCE
625Support for user-space probing is available for kernels that are
626configured with the utrace extensions, or have the uprobes facility in
627linux 3.5. (Various kernel build configuration options need to be
628enabled; systemtap will advise if these are missing.)
629
0a1c696d
FCE
630.PP
631There are several forms. First, a non-symbolic probe point:
1ada6f08
FCE
632.SAMPLE
633process(PID).statement(ADDRESS).absolute
634.ESAMPLE
635is analogous to
636.IR
637kernel.statement(ADDRESS).absolute
638in that both use raw (unverified) virtual addresses and provide
639no $variables. The target PID parameter must identify a running
640process, and ADDRESS should identify a valid instruction address.
641All threads of that process will be probed.
29cb9b42 642.PP
0a1c696d
FCE
643Second, non-symbolic user-kernel interface events handled by
644utrace may be probed:
29cb9b42 645.SAMPLE
dd078c96 646process(PID).begin
82f0e81b 647process("FULLPATH").begin
986e98de 648process.begin
dd078c96 649process(PID).thread.begin
82f0e81b 650process("FULLPATH").thread.begin
986e98de 651process.thread.begin
dd078c96 652process(PID).end
82f0e81b 653process("FULLPATH").end
986e98de 654process.end
dd078c96 655process(PID).thread.end
82f0e81b 656process("FULLPATH").thread.end
986e98de 657process.thread.end
29cb9b42 658process(PID).syscall
82f0e81b 659process("FULLPATH").syscall
986e98de 660process.syscall
29cb9b42 661process(PID).syscall.return
82f0e81b 662process("FULLPATH").syscall.return
986e98de 663process.syscall.return
0afb7073 664process(PID).insn
82f0e81b 665process("FULLPATH").insn
0afb7073 666process(PID).insn.block
82f0e81b 667process("FULLPATH").insn.block
29cb9b42
DS
668.ESAMPLE
669.PP
670A
dd078c96 671.B .begin
82f0e81b 672probe gets called when new process described by PID or FULLPATH gets created.
29cb9b42 673A
dd078c96 674.B .thread.begin
82f0e81b 675probe gets called when a new thread described by PID or FULLPATH gets created.
159cb109 676A
dd078c96 677.B .end
82f0e81b 678probe gets called when process described by PID or FULLPATH dies.
dd078c96
DS
679A
680.B .thread.end
82f0e81b 681probe gets called when a thread described by PID or FULLPATH dies.
29cb9b42
DS
682A
683.B .syscall
82f0e81b 684probe gets called when a thread described by PID or FULLPATH makes a
6270adc1
MH
685system call. The system call number is available in the
686.BR $syscall
687context variable, and the first 6 arguments of the system call
688are available in the
689.BR $argN
690(ex. $arg1, $arg2, ...) context variable.
29cb9b42
DS
691A
692.B .syscall.return
82f0e81b 693probe gets called when a thread described by PID or FULLPATH returns from a
5d67b47c
MH
694system call. The system call number is available in the
695.BR $syscall
696context variable, and the return value of the system call is available
697in the
698.BR $return
29cb9b42 699context variable.
a96d1db0 700A
0afb7073 701.B .insn
82f0e81b 702probe gets called for every single-stepped instruction of the process described by PID or FULLPATH.
0afb7073
FCE
703A
704.B .insn.block
82f0e81b
FCE
705probe gets called for every block-stepped instruction of the process described by PID or FULLPATH.
706.PP
707If a process probe is specified without a PID or FULLPATH, all user
708threads will be probed. However, if systemtap was invoked with the
f7470174 709.IR \-c " or " \-x
82f0e81b 710options, then process probes are restricted to the process
6d5d594e
LB
711hierarchy associated with the target process. If a process probe is
712specified without a PID or FULLPATH, but with the
713.IR \-c "
714option, the PATH of the
715.IR \-c "
716cmd will be heuristically filled into the process PATH.
0a1c696d
FCE
717
718.PP
719Third, symbolic static instrumentation compiled into programs and
720shared libraries may be
721probed:
722.SAMPLE
723process("PATH").mark("LABEL")
a794dbeb 724process("PATH").provider("PROVIDER").mark("LABEL")
0a1c696d
FCE
725.ESAMPLE
726.PP
f28a8c28
SC
727A
728.B .mark
729probe gets called via a static probe which is defined in the
38e96af8
FCE
730application by STAP_PROBE1(PROVIDER,LABEL,arg1), which are macros defined in
731.BR sys/sdt.h .
732The PROVIDER is an arbitrary application identifier, LABEL is the
733marker site identifier, and arg1 is the integer-typed argument.
734STAP_PROBE1 is used for probes with 1 argument, STAP_PROBE2 is used
735for probes with 2 arguments, and so on. The arguments of the probe
736are available in the context variables $arg1, $arg2, ... An
737alternative to using the STAP_PROBE macros is to use the dtrace script
738to create custom macros. Additionally, the variables $$name and
739$$provider are available as parts of the probe point name. The
740.B sys/sdt.h
741macro names DTRACE_PROBE* are available as aliases for STAP_PROBE*.
0a1c696d 742
29cb9b42 743.PP
38e96af8
FCE
744Finally, full symbolic source-level probes in user-space programs and
745shared libraries are supported. These are exactly analogous to the
746symbolic DWARF-based kernel/module probes described above. They
747expose the same sorts of context $variables for function parameters,
748local variables, and so on.
0a1c696d
FCE
749.SAMPLE
750process("PATH").function("NAME")
751process("PATH").statement("*@FILE.c:123")
4d0fcb93
SC
752process("PATH").plt("NAME")
753process("PATH").library("PATH").plt("NAME")
b73a1293
SC
754process("PATH").library("PATH").function("NAME")
755process("PATH").library("PATH").statement("*@FILE.c:123")
0a1c696d
FCE
756process("PATH").function("*").return
757process("PATH").function("myfun").label("foo")
758.ESAMPLE
759
760.PP
761Note that for all process probes,
29cb9b42 762.I PATH
ea384b8c
FCE
763names refer to executables that are searched the same way shells do: relative
764to the working directory if they contain a "/" character, otherwise in
765.BR $PATH .
d1bcbe71
RH
766If PATH names refer to scripts, the actual interpreters (specified in the
767script in the first line after the #! characters) are probed.
b73a1293
SC
768If PATH is a process component parameter referring to shared libraries
769then all processes that map it at runtime would be selected for
770probing. If PATH is a library component parameter referring to shared
771libraries then the process specified by the process component would be
79dc1dee
FCE
772selected.
773
774.PP
775A .plt probe will probe functions in the program linkage table
4d0fcb93 776corresponding to the rest of the probe point. .plt can be specified
79dc1dee
FCE
777as a shorthand for .plt("*"). The symbol name is available as a
778$$name context variable; function arguments are not available, since
779PLTs are processed without debuginfo.
780
781.PP
82f0e81b
FCE
782If the PATH string contains wildcards as in the MPATTERN case, then
783standard globbing is performed to find all matching paths. In this
784case, the
785.BR $PATH
786environment variable is not used.
787
788.PP
153e7a22
FCE
789If systemtap was invoked with the
790.IR \-c " or " \-x
760695db
FCE
791options, then process probes are restricted to the process
792hierarchy associated with the target process.
1ada6f08 793
9cb48751
DS
794.SS PROCFS
795
796These probe points allow procfs "files" in
c243f608
LB
797/proc/systemtap/MODNAME to be created, read and written using a
798permission that may be modified using the proper umask value. Default permissions are 0400 for read
799probes, and 0200 for write probes. If both a read and write probe are being
800used on the same file, a default permission of 0600 will be used.
801Using procfs.umask(0040).read would
802result in a 0404 permission set for the file.
9cb48751
DS
803.RI ( MODNAME
804is the name of the systemtap module). The
805.I proc
806filesystem is a pseudo-filesystem which is used an an interface to
c243f608 807kernel data structures. There are several probe point variants supported
9cb48751 808by the translator:
ca88561f 809
9cb48751
DS
810.SAMPLE
811procfs("PATH").read
c243f608 812procfs("PATH").umask(UMASK).read
38975255 813procfs("PATH").read.maxsize(MAXSIZE)
c243f608 814procfs("PATH").umask(UMASK).maxsize(MAXSIZE)
9cb48751 815procfs("PATH").write
c243f608 816procfs("PATH").umask(UMASK).write
9cb48751 817procfs.read
c243f608 818procfs.umask(UMASK).read
38975255 819procfs.read.maxsize(MAXSIZE)
c243f608 820procfs.umask(UMASK).read.maxsize(MAXSIZE)
9cb48751 821procfs.write
c243f608 822procfs.umask(UMASK).write
9cb48751 823.ESAMPLE
ca88561f 824
9cb48751
DS
825.I PATH
826is the file name (relative to /proc/systemtap/MODNAME) to be created.
827If no
828.I PATH
829is specified (as in the last two variants above),
830.I PATH
831defaults to "command".
832.PP
833When a user reads /proc/systemtap/MODNAME/PATH, the corresponding
834procfs
835.I read
836probe is triggered. The string data to be read should be assigned to
837a variable named
838.IR $value ,
839like this:
ca88561f 840
9cb48751
DS
841.SAMPLE
842procfs("PATH").read { $value = "100\\n" }
843.ESAMPLE
844.PP
845When a user writes into /proc/systemtap/MODNAME/PATH, the
846corresponding procfs
847.I write
848probe is triggered. The data the user wrote is available in the
849string variable named
850.IR $value ,
851like this:
ca88561f 852
9cb48751
DS
853.SAMPLE
854procfs("PATH").write { printf("user wrote: %s", $value) }
855.ESAMPLE
38975255
DS
856.PP
857.I MAXSIZE
858is the size of the procfs read buffer. Specifying
859.I MAXSIZE
860allows larger procfs output. If no
861.I MAXSIZE
862is specified, the procfs read buffer defaults to
863.I STP_PROCFS_BUFSIZE
864(which defaults to
865.IR MAXSTRINGLEN ,
866the maximum length of a string).
867If setting the procfs read buffers for more than one file is needed,
868it may be easiest to override the
869.I STP_PROCFS_BUFSIZE
870definition.
871Here's an example of using
872.IR MAXSIZE :
873
874.SAMPLE
875procfs.read.maxsize(1024) {
876 $value = "long string..."
877 $value .= "another long string..."
878 $value .= "another long string..."
879 $value .= "another long string..."
880}
881.ESAMPLE
9cb48751 882
da00b50e
SM
883.SS NETFILTER HOOKS
884
885These probe points allow observation of network packets using the
886netfilter mechanism. A netfilter probe in systemtap corresponds to a
887netfilter hook function in the original netfilter probes API. It is
888probably more convenient to use
889.IR tapset::netfilter (3stap),
890which wraps the primitive netfilter hooks and does the work of
891extracting useful information from the context variables.
892
893.PP
894There are several probe point variants supported by the translator:
895
896.SAMPLE
897netfilter.hook("HOOKNAME").pf("PROTOCOL_F")
898netfilter.pf("PROTOCOL_F").hook("HOOKNAME")
899netfilter.hook("HOOKNAME").pf("PROTOCOL_F").priority("PRIORITY")
900netfilter.pf("PROTOCOL_F").hook("HOOKNAME").priority("PRIORITY")
901.ESAMPLE
902
903.PP
904.I PROTOCOL_F
905is the protocol family to listen for, currently one of
906.I NFPROTO_IPV4,
907.I NFPROTO_IPV6,
908.I NFPROTO_ARP,
909or
910.I NFPROTO_BRIDGE.
911
912.PP
913.I HOOKNAME
914is the point, or 'hook', in the protocol stack at which to intercept
915the packet. The available hook names for each protocol family are
916taken from the kernel header files <linux/netfilter_ipv4.h>,
917<linux/netfilter_ipv6.h>, <linux/netfilter_arp.h> and
918<linux/netfilter_bridge.h>. For instance, allowable hook names for
919.I NFPROTO_IPV4
920are
921.I NF_INET_PRE_ROUTING,
922.I NF_INET_LOCAL_IN,
923.I NF_INET_FORWARD,
924.I NF_INET_LOCAL_OUT,
925and
926.I NF_INET_POST_ROUTING.
927
928.PP
929.I PRIORITY
930is an integer priority giving the order in which the probe point
931should be triggered relative to any other netfilter hook functions
932which trigger on the same packet. Hook functions execute on each
933packet in order from smallest priority number to largest priority number. If no
934.I PRIORITY
935is specified (as in the first two probe point variants above),
936.I PRIORITY
937defaults to "0".
938
939There are a number of predefined priority names of the form
940.I NF_IP_PRI_*
941and
942.I NF_IP6_PRI_*
943which are defined in the kernel header files <linux/netfilter_ipv4.h> and <linux/netfilter_ipv6.h> respectively. The script is permitted to use these
944instead of specifying an integer priority. (The probe points for
945.I NFPROTO_ARP
946and
947.I NFPROTO_BRIDGE
948currently do not expose any named hook priorities to the script writer.)
949Thus, allowable ways to specify the priority include:
950
951.SAMPLE
952priority("255")
953priority("NF_IP_PRI_SELINUX_LAST")
954.ESAMPLE
955
956A script using guru mode is permitted to specify any identifier or
957number as the parameter for hook, pf, and priority. This feature
958should be used with caution, as the parameter is inserted verbatim into
959the C code generated by systemtap.
960
961The netfilter probe points define the following context variables:
962.TP
963.IR $skb
964The address of the sk_buff struct representing the packet. See
965<linux/skbuff.h> for details on how to use this struct, or
966alternatively use the tapset
967.IR tapset::netfilter (3stap)
968for easy access to key information.
969
970.TP
971.IR $in
972The address of the net_device struct representing the network device
973on which the packet was received (if any). May be 0 if the device is
974unknown or undefined at that stage in the protocol stack.
975
976.TP
977.IR $out
978The address of the net_device struct representing the network device
979on which the packet will be sent (if any). May be 0 if the device is
980unknown or undefined at that stage in the protocol stack.
981
982.TP
983.IR $verdict
984(Guru mode only.) Assigning one of the verdict values defined in
985<linux/netfilter.h> to this variable alters the further progress of
986the packet through the protocol stack. For instance, the following
987guru mode script forces all ipv6 network packets to be dropped:
988
989.SAMPLE
990probe netfilter.pf("NFPROTO_IPV6").hook("NF_IP6_PRE_ROUTING") {
c49ffe6c 991 $verdict = 0 /* nf_drop */
da00b50e
SM
992}
993.ESAMPLE
994
c49ffe6c
SM
995For convenience, unlike the primitive probe points discussed here, the
996probes defined in
997.IR tapset::netfilter (3stap)
998export the lowercase names of the verdict constants (e.g. NF_DROP
999becomes nf_drop) as local variables.
1000
6f05b6ab
FCE
1001.SS MARKERS
1002
1003This family of probe points hooks up to static probing markers
1004inserted into the kernel or modules. These markers are special macro
1005calls inserted by kernel developers to make probing faster and more
1006reliable than with DWARF-based probes. Further, DWARF debugging
1007information is
1008.I not
1009required to probe markers.
1010
1011Marker probe points begin with
f781f849
DS
1012.BR kernel .
1013The next part names the marker itself:
6f05b6ab
FCE
1014.BR mark("name") .
1015The marker name string, which may contain the usual wildcard characters,
1016is matched against the names given to the marker macros when the kernel
eb973c2a
DS
1017and/or module was compiled. Optionally, you can specify
1018.BR format("format") .
37f6433e 1019Specifying the marker format string allows differentiation between two
eb973c2a 1020markers with the same name but different marker format strings.
6f05b6ab
FCE
1021
1022The handler associated with a marker-based probe may read the
1023optional parameters specified at the macro call site. These are
1024named
1025.BR $arg1 " through " $argNN ,
1026where NN is the number of parameters supplied by the macro. Number
1027and string parameters are passed in a type-safe manner.
1028
eb973c2a
DS
1029The marker format string associated with a marker is available in
1030.BR $format .
37f6433e 1031And also the marker name string is available in
bc54e71c 1032.BR $name .
eb973c2a 1033
bc724b8b
JS
1034.SS TRACEPOINTS
1035
1036This family of probe points hooks up to static probing tracepoints
1037inserted into the kernel or modules. As with markers, these
1038tracepoints are special macro calls inserted by kernel developers to
1039make probing faster and more reliable than with DWARF-based probes,
1040and DWARF debugging information is not required to probe tracepoints.
1041Tracepoints have an extra advantage of more strongly-typed parameters
1042than markers.
1043
1044Tracepoint probes begin with
1045.BR kernel .
1046The next part names the tracepoint itself:
1047.BR trace("name") .
1048The tracepoint name string, which may contain the usual wildcard
1049characters, is matched against the names defined by the kernel
1050developers in the tracepoint header files.
1051
1052The handler associated with a tracepoint-based probe may read the
1053optional parameters specified at the macro call site. These are
1054named according to the declaration by the tracepoint author. For
1055example, the tracepoint probe
1056.BR kernel.trace("sched_switch")
1057provides the parameters
1058.BR $rq ", " $prev ", and " $next .
1059If the parameter is a complex type, as in a struct pointer, then a
1060script can access fields with the same syntax as DWARF $target
1061variables. Also, tracepoint parameters cannot be modified, but in
1062guru-mode a script may modify fields of parameters.
1063
1064The name of the tracepoint is available in
1065.BR $$name ,
1066and a string of name=value pairs for all parameters of the tracepoint
1067is available in
046e7190 1068.BR $$vars " or " $$parms .
bc724b8b 1069
dd225250
PS
1070.SS HARDWARE BREAKPOINTS
1071This family of probes is used to set hardware watchpoints for a given
1072 (global) kernel symbol. The probes take three components as inputs :
1073
10741. The
1075.BR virtual address / name
1076of the kernel symbol to be traced is supplied as argument to this class
1077of probes. ( Probes for only data segment variables are supported. Probing
1078local variables of a function cannot be done.)
1079
10802. Nature of access to be probed :
1081a.
1082.I .write
1083probe gets triggered when a write happens at the specified address/symbol
1084name.
1085b.
1086.I rw
1087probe is triggered when either a read or write happens.
1088
10893.
1090.BR .length
1091(optional)
1092Users have the option of specifying the address interval to be probed
1093using "length" constructs. The user-specified length gets approximated
1094to the closest possible address length that the architecture can
1095support. If the specified length exceeds the limits imposed by
1096architecture, an error message is flagged and probe registration fails.
1097Wherever 'length' is not specified, the translator requests a hardware
1098breakpoint probe of length 1. It should be noted that the "length"
1099construct is not valid with symbol names.
1100
1101Following constructs are supported :
1102.SAMPLE
1103probe kernel.data(ADDRESS).write
1104probe kernel.data(ADDRESS).rw
1105probe kernel.data(ADDRESS).length(LEN).write
1106probe kernel.data(ADDRESS).length(LEN).rw
1107probe kernel.data("SYMBOL_NAME").write
1108probe kernel.data("SYMBOL_NAME").rw
1109.ESAMPLE
1110
1111This set of probes make use of the debug registers of the processor,
1112which is a scarce resource. (4 on x86 , 1 on powerpc ) The script
1113translation flags a warning if a user requests more hardware breakpoint probes
1114than the limits set by architecture. For example,a pass-2 warning is flashed
1115when an input script requests 5 hardware breakpoint probes on an x86
1116system while x86 architecture supports a maximum of 4 breakpoints.
1117Users are cautioned to set probes judiciously.
1118
9becfcef
MW
1119.SS PERF
1120
1121This
1122.IR prototype
1123family of probe points interfaces to the kernel "perf event"
cb7d3cd8 1124infrastructure for controlling hardware performance counters.
9becfcef
MW
1125The events being attached to are described by the "type",
1126"config" fields of the
1127.IR perf_event_attr
1128structure, and are sampled at an interval governed by the
1129"sample_period" field.
1130
1131These fields are made available to systemtap scripts using
1132the following syntax:
1133.SAMPLE
1134probe perf.type(NN).config(MM).sample(XX)
1135probe perf.type(NN).config(MM)
dbdab5c8
SC
1136probe perf.type(NN).config(MM).process("PROC")
1137probe perf.type(NN).config(MM).counter("COUNTER")
1138probe perf.type(NN).config(MM).process("PROC").counter("COUNTER")
9becfcef
MW
1139.ESAMPLE
1140The systemtap probe handler is called once per XX increments
1141of the underlying performance counter. The default sampling
1142count is 1000000.
1143The range of valid type/config is described by the
1144.IR perf_event_open (2)
1145system call, and/or the
1146.IR linux/perf_event.h
1147file. Invalid combinations or exhausted hardware counter resources
1148result in errors during systemtap script startup. Systemtap does
1149not sanity-check the values: it merely passes them through to
6a8fe809
SC
1150the kernel for error- and safety-checking. By default the perf event
1151probe is systemwide unless .process is specified, which will bind the
fce2c5df 1152probe to a specific task. If the name is omitted then it
dbdab5c8 1153is inferred from the stap -c argument. A perf event can be read on
75cd04ca
SC
1154demand using .counter. The body of the perf probe handler will not be
1155invoked for a .counter probe; instead, the counter is read in a user
1156space probe via:
dbdab5c8
SC
1157.TP
1158 process("PROCESS").statement("func@file") {stat <<< @perf("NAME")}
1159
fce2c5df 1160
ba4a90fd
FCE
1161.SH EXAMPLES
1162.PP
1163Here are some example probe points, defining the associated events.
1164.TP
1165begin, end, end
1166refers to the startup and normal shutdown of the session. In this
1167case, the handler would run once during startup and twice during
1168shutdown.
1169.TP
1170timer.jiffies(1000).randomize(200)
13d2ecdb 1171refers to a periodic interrupt, every 1000 +/\- 200 jiffies.
ba4a90fd
FCE
1172.TP
1173kernel.function("*init*"), kernel.function("*exit*")
1174refers to all kernel functions with "init" or "exit" in the name.
1175.TP
199d126d
MW
1176kernel.function("*@kernel/time.c:240")
1177refers to any functions within the "kernel/time.c" file that span
6ff00e1d
FCE
1178line 240.
1179.BR
1180Note
1181that this is
1182.BR not
1183a probe at the statement at that line number. Use the
1184.IR
1185kernel.statement
1186probe instead.
ba4a90fd 1187.TP
6f05b6ab
FCE
1188kernel.mark("getuid")
1189refers to an STAP_MARK(getuid, ...) macro call in the kernel.
1190.TP
ba4a90fd
FCE
1191module("usb*").function("*sync*").return
1192refers to the moment of return from all functions with "sync" in the
1193name in any of the USB drivers.
1194.TP
1195kernel.statement(0xc0044852)
1196refers to the first byte of the statement whose compiled instructions
1197include the given address in the kernel.
b4ceace2 1198.TP
199d126d
MW
1199kernel.statement("*@kernel/time.c:296")
1200refers to the statement of line 296 within "kernel/time.c".
1bd128a3
SC
1201.TP
1202kernel.statement("bio_init@fs/bio.c+3")
1203refers to the statement at line bio_init+3 within "fs/bio.c".
a5ae3f3d 1204.TP
dd225250 1205kernel.data("pid_max").write
cb7d3cd8 1206refers to a hardware breakpoint of type "write" set on pid_max
dd225250 1207.TP
729286d8 1208syscall.*.return
b4ceace2 1209refers to the group of probe aliases with any name in the third position
ba4a90fd
FCE
1210
1211.SH SEE ALSO
78db65bd 1212.IR stap (1),
89965a32
FCE
1213.IR probe::* (3stap),
1214.IR tapset::* (3stap)
1c0b8e23
FCE
1215
1216.\" Local Variables:
1217.\" mode: nroff
1218.\" End:
This page took 0.284581 seconds and 5 git commands to generate.