]> sourceware.org Git - systemtap.git/blame - man/stapprobes.3stap
PR21020: reorganize data passing abi for java method parameters
[systemtap.git] / man / stapprobes.3stap
CommitLineData
5f92f126 1.\" t
ec1a2239 2.TH STAPPROBES 3stap
ba4a90fd
FCE
3.SH NAME
4stapprobes \- systemtap probe points
5
6.\" macros
7.de SAMPLE
fc6851a6
JL
8
9.nr oldin \\n(.i
ba4a90fd
FCE
10.br
11.RS
12.nf
13.nh
14..
15.de ESAMPLE
16.hy
17.fi
18.RE
fc6851a6
JL
19.in \\n[oldin]u
20
ba4a90fd
FCE
21..
22
23.SH DESCRIPTION
24The following sections enumerate the variety of probe points supported
89965a32
FCE
25by the systemtap translator, and some of the additional aliases defined by
26standard tapset scripts. Many are individually documented in the
27.IR 3stap
28manual section, with the
29.IR probe::
30prefix.
67d1ed18
FCE
31
32.SH SYNTAX
33
34.PP
35.SAMPLE
36.BR probe " PROBEPOINT [" , " PROBEPOINT] " { " [STMT ...] " }
37.ESAMPLE
38.PP
39A probe declaration may list multiple comma-separated probe points in
40order to attach a handler to all of the named events. Normally, the
f5dfa571
FCE
41handler statements are run whenever any of events occur. Depending on
42the type of probe point, the handler statements may refer to context
43variables (denoted with a dollar-sign prefix like $foo) to read or
44write state. This may include function parameters for function
45probes, or local variables for statement probes.
ba4a90fd 46.PP
67d1ed18
FCE
47The syntax of a single probe point is a general dotted-symbol
48sequence. This allows a breakdown of the event namespace into parts,
49somewhat like the Domain Name System does on the Internet. Each
50component identifier may be parametrized by a string or number
51literal, with a syntax like a function call. A component may include
52a "*" character, to expand to a set of matching probe points. It may
53also include "**" to match multiple sequential components at once.
54Probe aliases likewise expand to other probe points.
2f5bbffa 55.PP
67d1ed18
FCE
56Probe aliases can be given on their own, or with a suffix. The suffix
57attaches to the underlying probe point that the alias is expanded
58to. For example,
2f5bbffa
SM
59.SAMPLE
60syscall.read.return.maxactive(10)
61.ESAMPLE
62expands to
63.SAMPLE
64kernel.function("sys_read").return.maxactive(10)
65.ESAMPLE
66with the component
67.IR maxactive(10)
68being recognized as a suffix.
69.PP
67d1ed18
FCE
70Normally, each and every probe point resulting from wildcard- and
71alias-expansion must be resolved to some low-level system
72instrumentation facility (e.g., a kprobe address, marker, or a timer
73configuration), otherwise the elaboration phase will fail.
d898100a
FCE
74.PP
75However, a probe point may be followed by a "?" character, to indicate
76that it is optional, and that no error should result if it fails to
77resolve. Optionalness passes down through all levels of
78alias/wildcard expansion. Alternately, a probe point may be followed
79by a "!" character, to indicate that it is both optional and
37f6433e 80sufficient. (Think vaguely of the Prolog cut operator.) If it does
d898100a
FCE
81resolve, then no further probe points in the same comma-separated list
82will be resolved. Therefore, the "!" sufficiency mark only makes
83sense in a list of probe point alternatives.
dfd11cc3
MH
84.PP
85Additionally, a probe point may be followed by a "if (expr)" statement, in
86order to enable/disable the probe point on-the-fly. With the "if" statement,
87if the "expr" is false when the probe point is hit, the whole probe body
88including alias's body is skipped. The condition is stacked up through
89all levels of alias/wildcard expansion. So the final condition becomes
67d1ed18
FCE
90the logical-and of conditions of all expanded alias/wildcard. The expressions
91are necessarily restricted to global variables.
92.PP
e904ad95
FCE
93These are all
94.B syntactically
95valid probe points. (They are generally
96.B semantically
97invalid, depending on the contents of the tapsets, and the versions of
98kernel/user software installed.)
ca88561f 99
ba4a90fd
FCE
100.SAMPLE
101kernel.function("foo").return
e904ad95 102process("/bin/vi").statement(0x2222)
ba4a90fd 103end
729286d8 104syscall.*
2f5bbffa 105syscall.*.return.maxactive(10)
380d759b 106syscall.{open,close}
649260f3 107sys**open
6e3347a9 108kernel.function("no_such_function") ?
d898100a 109module("awol").function("no_such_function") !
dfd11cc3 110signal.*? if (switch)
94c3c803 111kprobe.function("foo")
ba4a90fd
FCE
112.ESAMPLE
113
6f05b6ab
FCE
114Probes may be broadly classified into "synchronous" and
115"asynchronous". A "synchronous" event is deemed to occur when any
116processor executes an instruction matched by the specification. This
117gives these probes a reference point (instruction address) from which
118more contextual data may be available. Other families of probe points
119refer to "asynchronous" events such as timers/counters rolling over,
120where there is no fixed reference point that is related. Each probe
121point specification may match multiple locations (for example, using
122wildcards or aliases), and all them are then probed. A probe
123declaration may also contain several comma-separated specifications,
124all of which are probed.
125
380d759b
FL
126Brace expansion is a mechanism which allows a list of probe points to be
127generated. It is very similar to shell expansion. A component may be surrounded
128by a pair of curly braces to indicate that the comma-separated sequence of
129one or more subcomponents will each constitute a new probe point. The braces
130may be arbitrarily nested. The ordering of expanded results is based on
131product order.
132
133The question mark (?), exclamation mark (!) indicators and probe point conditions
134may not be placed in any expansions that are before the last component.
135
136The following is an example of brace expansion.
137
138.SAMPLE
139syscall.{write,read}
140# Expands to
141syscall.write, syscall.read
142
143{kernel,module("nfs")}.function("nfs*")!
144# Expands to
145kernel.function("nfs*")!, module("nfs").function("nfs*")!
146.ESAMPLE
147
5f92f126
FCE
148.SH DWARF DEBUGINFO
149
150Resolving some probe points requires DWARF debuginfo or "debug
c5ae4566 151symbols" for the \fIspecific program\fR being instrumented. For some others,
5f92f126
FCE
152DWARF is automatically synthesized on the fly from source code header
153files. For others, it is not needed at all. Since a systemtap script
154may use any mixture of probe points together, the union of their DWARF
155requirements has to be met on the computer where script compilation
156occurs. (See the \fI\-\-use\-server\fR option and the \fBstap-server\
157(8)\fR man page for information about the remote compilation facility,
158which allows these requirements to be met on a different machine.)
159.PP
160The following point lists many of the available probe point families,
c5ae4566
FCE
161to classify them with respect to their need for DWARF debuginfo for
162the specific program for that probe point.
5f92f126
FCE
163
164.TS
165l l l.
c899cb78 166\fBDWARF NON-DWARF SYMBOL-TABLE\fP
5f92f126 167
c899cb78
FCE
168kernel.function, .statement kernel.mark kernel.function\fI*\fP
169module.function, .statement process.mark, process.plt module.function\fI*\fP
170process.function, .statement begin, end, error, never process.function\fI*\fP
171process.mark\fI*\fP timer
172\.function.callee perf
7bfd1083 173 procfs
30b02257 174\fBAUTO-GENERATED-DWARF\fP kernel.statement.absolute
7bfd1083
TJL
175 kernel.data
176kernel.trace kprobe.function
177 process.statement.absolute
94fe8dd0 178 process.begin, .end
30b02257
FCE
179 netfilter
180 java
5f92f126
FCE
181.TE
182
c899cb78
FCE
183.PP
184The probe types marked with \fI*\fP asterisks mark fallbacks, where
185systemtap can sometimes infer subset or substitute information. In
186general, the more symbolic / debugging information available, the
187higher quality probing will be available.
188
189
94fe8dd0
FCE
190.SH ON-THE-FLY ARMING
191
192The following types of probe points may be armed/disarmed on-the-fly
193to save overheads during uninteresting times. Arming conditions may
194also be added to other types of probes, but will be treated as a
195wrapping conditional and won't benefit from overhead savings.
196
197.TS
198l l.
199\fBDISARMABLE exceptions\fP
94fe8dd0
FCE
200kernel.function, kernel.statement
201module.function, module.statement
202process.*.function, process.*.statement
203process.*.plt, process.*.mark
85176706 204timer. timer.profile
94fe8dd0
FCE
205java
206.TE
207
5f92f126
FCE
208.SH PROBE POINT FAMILIES
209
65aeaea0 210.SS BEGIN/END/ERROR
ba4a90fd
FCE
211
212The probe points
213.IR begin " and " end
214are defined by the translator to refer to the time of session startup
215and shutdown. All "begin" probe handlers are run, in some sequence,
216during the startup of the session. All global variables will have
217been initialized prior to this point. All "end" probes are run, in
218some sequence, during the
219.I normal
220shutdown of a session, such as in the aftermath of an
221.I exit ()
222function call, or an interruption from the user. In the case of an
223error-triggered shutdown, "end" probes are not run. There are no
224target variables available in either context.
6a256b03
JS
225.PP
226If the order of execution among "begin" or "end" probes is significant,
227then an optional sequence number may be provided:
ca88561f 228
6a256b03
JS
229.SAMPLE
230begin(N)
231end(N)
232.ESAMPLE
ca88561f 233
6a256b03
JS
234The number N may be positive or negative. The probe handlers are run in
235increasing order, and the order between handlers with the same sequence
236number is unspecified. When "begin" or "end" are given without a
237sequence, they are effectively sequence zero.
ba4a90fd 238
65aeaea0
FCE
239The
240.IR error
241probe point is similar to the
242.IR end
d898100a
FCE
243probe, except that each such probe handler run when the session ends
244after errors have occurred. In such cases, "end" probes are skipped,
37f6433e 245but each "error" probe is still attempted. This kind of probe can be
d898100a
FCE
246used to clean up or emit a "final gasp". It may also be numerically
247parametrized to set a sequence.
65aeaea0 248
6e3347a9
FCE
249.SS NEVER
250The probe point
251.IR never
252is specially defined by the translator to mean "never". Its probe
253handler is never run, though its statements are analyzed for symbol /
254type correctness as usual. This probe point may be useful in
255conjunction with optional probes.
256
bafd76f1 257.SS SYSCALL and ND_SYSCALL
1027502b
FCE
258
259The
bafd76f1 260.IR syscall.* " and " nd_syscall.*
1027502b 261aliases define several hundred probes, too many to
56bd0316 262detail here. They are of the general form:
1027502b
FCE
263
264.SAMPLE
265syscall.NAME
266.br
bafd76f1
FCE
267nd_syscall.NAME
268.br
1027502b 269syscall.NAME.return
bafd76f1
FCE
270.br
271nd_syscall.NAME.return
1027502b
FCE
272.ESAMPLE
273
bafd76f1 274Generally, a pair of probes are defined for each normal system call as listed in the
1027502b
FCE
275.IR syscalls(2)
276manual page, one for entry and one for return. Those system calls that never
277return do not have a corresponding
278.IR .return
bafd76f1
FCE
279probe. The nd_* family of probes are about the same, except it uses
280.B non-DWARF
281based searching mechanisms, which may result in a lower quality of symbolic
282context data (parameters), and may miss some system calls. You may want to
283try them first, in case kernel debugging information is not immediately available.
1027502b 284.PP
df7f3a01 285Each probe alias provides a variety of variables. Looking at the tapset source
1027502b
FCE
286code is the most reliable way. Generally, each variable listed in the standard
287manual page is made available as a script-level variable, so
288.IR syscall.open
289exposes
290.IR filename ", " flags ", and " mode .
291In addition, a standard suite of variables is available at most aliases:
292.TP
293.IR argstr
294A pretty-printed form of the entire argument list, without parentheses.
295.TP
296.IR name
297The name of the system call.
298.TP
299.IR retstr
300For return probes, a pretty-printed form of the system-call result.
301.PP
08d1d743
FCE
302As usual for probe aliases, these variables are all initialized once
303from the underlying $context variables, so that later changes to
df7f3a01
FCE
304$context variables are not automatically reflected. Not all probe
305aliases obey all of these general guidelines. Please report any
08d1d743
FCE
306bothersome ones you encounter as a bug. Note that on some
307kernel/userspace architecture combinations (e.g., 32-bit userspace on
30864-bit kernel), the underlying $context variables may need explicit
309sign extension / masking. When this is an issue, consider using the
310tapset-provided variables instead of raw $context variables.
c34eceea
FCE
311.PP
312If debuginfo availability is a problem, you may try using the
313non-DWARF syscall probe aliases instead. Use the
314.IR nd_syscall.
315prefix instead of
316.IR syscall.
317The same context variables are available, as far as possible.
1027502b 318
ba4a90fd
FCE
319.SS TIMERS
320
42b97387
DS
321There are two main types of timer probes: "jiffies" timer probes and
322time interval timer probes.
323
ba4a90fd
FCE
324Intervals defined by the standard kernel "jiffies" timer may be used
325to trigger probe handlers asynchronously. Two probe point variants
326are supported by the translator:
ca88561f 327
ba4a90fd
FCE
328.SAMPLE
329timer.jiffies(N)
330timer.jiffies(N).randomize(M)
331.ESAMPLE
ca88561f 332
ba4a90fd
FCE
333The probe handler is run every N jiffies (a kernel-defined unit of
334time, typically between 1 and 60 ms). If the "randomize" component is
13d2ecdb 335given, a linearly distributed random value in the range [\-M..+M] is
ba4a90fd
FCE
336added to N every time the handler is run. N is restricted to a
337reasonable range (1 to around a million), and M is restricted to be
338smaller than N. There are no target variables provided in either
339context. It is possible for such probes to be run concurrently on
340a multi-processor computer.
422d1ceb 341.PP
197a4d62 342Alternatively, intervals may be specified in units of time.
422d1ceb 343There are two probe point variants similar to the jiffies timer:
ca88561f 344
422d1ceb
FCE
345.SAMPLE
346timer.ms(N)
347timer.ms(N).randomize(M)
348.ESAMPLE
ca88561f 349
197a4d62
JS
350Here, N and M are specified in milliseconds, but the full options for units
351are seconds (s/sec), milliseconds (ms/msec), microseconds (us/usec),
352nanoseconds (ns/nsec), and hertz (hz). Randomization is not supported for
353hertz timers.
354
355The actual resolution of the timers depends on the target kernel. For
356kernels prior to 2.6.17, timers are limited to jiffies resolution, so
357intervals are rounded up to the nearest jiffies interval. After 2.6.17,
358the implementation uses hrtimers for tighter precision, though the actual
359resolution will be arch-dependent. In either case, if the "randomize"
360component is given, then the random value will be added to the interval
361before any rounding occurs.
39e57ce0 362.PP
ab8b5560 363Profiling timers are also available to provide probes that execute on
07c818a0
FL
364all CPUs at the rate of the system tick (CONFIG_HZ) or at a given
365frequency (hz). On some kernels, this is a one-concurrent-user-only or
e996e76a 366disabled facility, resulting in error \-16 (EBUSY) during probe
ab8b5560 367registration.
ca88561f 368
39e57ce0 369.SAMPLE
acf7bde9 370timer.profile.tick
07c818a0 371timer.profile.freq.hz(N)
39e57ce0 372.ESAMPLE
ca88561f 373
39e57ce0
FCE
374Full context information of the interrupted process is available, making
375this probe suitable for a time-based sampling profiler.
acf7bde9
SM
376.PP
377It is recommended to use the tapset probe
378.IR timer.profile
379rather than timer.profile.tick. This probe point behaves identically
380to timer.profile.tick when the underlying functionality is available,
381and falls back to using perf.sw.cpu_clock on some recent kernels which
382lack the corresponding profile timer facility.
07c818a0
FL
383.PP
384Profiling timers with specified frequencies are only accurate up to around
385100 hz. You may need to provide a larger value to achieve the desired
386rate.
42b97387
DS
387.PP
388Note that if a timer probe is set to fire at a very high rate
389and if the probe body is complex, succeeding timer probes can get
390skipped, since the time for them to run has already passed. Normally
391systemtap reports missed probes, but it will not report these skipped
392probes.
ba4a90fd
FCE
393
394.SS DWARF
395
396This family of probe points uses symbolic debugging information for
397the target kernel/module/program, as may be found in unstripped
398executables, or the separate
399.I debuginfo
400packages. They allow placement of probes logically into the execution
401path of the target program, by specifying a set of points in the
402source or object code. When a matching statement executes on any
403processor, the probe handler is run in that context.
404.PP
7c86df9f
JL
405Probe points in the DWARF family can be identified by the target kernel
406module (or user process), source file, line number, function name, or
407some combination of these.
408.PP
409Here is a list of DWARF probe points currently supported:
ba4a90fd
FCE
410.SAMPLE
411kernel.function(PATTERN)
b8da0ad1 412kernel.function(PATTERN).call
7c86df9f 413kernel.function(PATTERN).callee(PATTERN)
f0e06c0d
FL
414kernel.function(PATTERN).callee(PATTERN).return
415kernel.function(PATTERN).callee(PATTERN).call
7c86df9f 416kernel.function(PATTERN).callees(DEPTH)
ba4a90fd 417kernel.function(PATTERN).return
b8da0ad1 418kernel.function(PATTERN).inline
592470cd 419kernel.function(PATTERN).label(LPATTERN)
ba4a90fd 420module(MPATTERN).function(PATTERN)
b8da0ad1 421module(MPATTERN).function(PATTERN).call
7c86df9f 422module(MPATTERN).function(PATTERN).callee(PATTERN)
f0e06c0d
FL
423module(MPATTERN).function(PATTERN).callee(PATTERN).return
424module(MPATTERN).function(PATTERN).callee(PATTERN).call
7c86df9f 425module(MPATTERN).function(PATTERN).callees(DEPTH)
ba4a90fd 426module(MPATTERN).function(PATTERN).return
b8da0ad1 427module(MPATTERN).function(PATTERN).inline
2cab6244 428module(MPATTERN).function(PATTERN).label(LPATTERN)
ba4a90fd 429kernel.statement(PATTERN)
5e758862 430kernel.statement(PATTERN).nearest
37ebca01 431kernel.statement(ADDRESS).absolute
ba4a90fd 432module(MPATTERN).statement(PATTERN)
6f017dee 433process("PATH").function("NAME")
6f017dee 434process("PATH").statement("*@FILE.c:123")
b73a1293 435process("PATH").library("PATH").function("NAME")
b73a1293 436process("PATH").library("PATH").statement("*@FILE.c:123")
5e758862 437process("PATH").library("PATH").statement("*@FILE.c:123").nearest
6f017dee 438process("PATH").function("*").return
6f017dee 439process("PATH").function("myfun").label("foo")
7c86df9f 440process("PATH").function("foo").callee("bar")
f0e06c0d
FL
441process("PATH").function("foo").callee("bar").return
442process("PATH").function("foo").callee("bar").call
7c86df9f 443process("PATH").function("foo").callees(DEPTH)
af127b9f
AJ
444process(PID).function("NAME")
445process(PID).function("myfun").label("foo")
2e96714f
SC
446process(PID).plt("NAME")
447process(PID).plt("NAME").return
af127b9f 448process(PID).statement("*@FILE.c:123")
5e758862 449process(PID).statement("*@FILE.c:123").nearest
5fa99496 450process(PID).statement(ADDRESS).absolute
ba4a90fd 451.ESAMPLE
6f017dee
FCE
452(See the USER-SPACE section below for more information on the process
453probes.)
7c86df9f
JL
454.PP
455The list above includes multiple variants and modifiers which provide
456additional functionality or filters. They are:
457.RS
458.TP
459\fB.function\fR
460Places a probe near the beginning of the named function, so that
461parameters are available as context variables.
462.TP
463\fB.return\fR
464Places a probe at the moment \fBafter\fR the return from the named
465function, so the return value is available as the "$return" context
466variable.
467.TP
468\fB.inline\fR
469Filters the results to include only instances of inlined functions. Note
7f357865 470that inlined functions do not have an identifiable return point, so
7c86df9f
JL
471\fB.return\fR is not supported on \fB.inline\fR probes.
472.TP
473\fB.call\fR
474Filters the results to include only non-inlined functions (the opposite
475set of \fB.inline\fR)
476.TP
477\fB.exported\fR
478Filters the results to include only exported functions.
479.TP
7c86df9f 480\fB.statement\fR
7f357865 481Places a probe at the exact spot, exposing those local variables that
7c86df9f
JL
482are visible there.
483.TP
5e758862 484\fB.statement.nearest\fR
18abf95e
JL
485Places a probe at the nearest available line number for each line number
486given in the statement.
5e758862 487.TP
7c86df9f
JL
488\fB.callee\fR
489Places a probe on the callee function given in the \fB.callee\fR
490modifier, where the callee must be a function called by the target
491function given in \fB.function\fR. The advantage of doing this over
492directly probing the callee function is that this probe point is run
493only when the callee is called from the target function (add the
494-DSTAP_CALLEE_MATCHALL directive to override this when calling
495\fBstap\fR(1)).
496
497Note that only callees that can be statically determined are available.
498For example, calls through function pointers are not available.
499Additionally, calls to functions located in other objects (e.g.
074c54b6
JL
500libraries) are not available (instead use another probe point). This
501feature will only work for code compiled with GCC 4.7+.
7c86df9f
JL
502.TP
503\fB.callees\fR
504Shortcut for \fB.callee("*")\fR, which places a probe on all callees of
505the function.
506.TP
507\fB.callees\fR(DEPTH)
508Recursively places probes on callees. For example, \fB.callees(2)\fR
509will probe both callees of the target function, as well as callees of
510those callees. And \fB.callees(3)\fR goes one level deeper, etc...
511A callee probe at depth N is only triggered when the N callers in the
512callstack match those that were statically determined during analysis
69b254f2 513(this also may be overridden using -DSTAP_CALLEE_MATCHALL).
7c86df9f
JL
514.RE
515.PP
516In the above list of probe points, MPATTERN stands for a string literal
00468ace
JL
517that aims to identify the loaded kernel module of interest. For in-tree
518kernel modules, the name suffices (e.g. "btrfs"). The name may also
519include the "*", "[]", and "?" wildcards to match multiple in-tree
520modules. Out-of-tree modules are also supported by specifying the full
521path to the ko file. Wildcards are not supported. The file must follow
522the convention of being named <module_name>.ko (characters ',' and '-'
523are replaced by '_').
524.PP
525LPATTERN stands for a source program label. It may also contain "*",
526"[]", and "?" wildcards. PATTERN stands for a string literal that aims
527to identify a point in the program. It is made up of three parts:
ca88561f
MM
528.IP \(bu 4
529The first part is the name of a function, as would appear in the
ba4a90fd
FCE
530.I nm
531program's output. This part may use the "*" and "?" wildcarding
ca88561f
MM
532operators to match multiple names.
533.IP \(bu 4
534The second part is optional and begins with the "@" character.
535It is followed by the path to the source file containing the function,
536which may include a wildcard pattern, such as mm/slab*.
79640c29 537If it does not match as is, an implicit "*/" is optionally added
ea384b8c 538.I before
79640c29
FCE
539the pattern, so that a script need only name the last few components
540of a possibly long source directory path.
ca88561f 541.IP \(bu 4
ba4a90fd 542Finally, the third part is optional if the file name part was given,
1bd128a3
SC
543and identifies the line number in the source file preceded by a ":"
544or a "+". The line number is assumed to be an
775ccddf
JL
545absolute line number if preceded by a ":", or relative to the
546declaration line of the function if preceded by a "+".
99a5f9cf 547All the lines in the function can be matched with ":*".
354ddff1
JL
548A range of lines x through y can be matched with ":x\-y". Ranges and
549specific lines can be mixed using commas, e.g. ":x,y\-z".
ca88561f 550.PP
ba4a90fd 551As an alternative, PATTERN may be a numeric constant, indicating an
ea384b8c
FCE
552address. Such an address may be found from symbol tables of the
553appropriate kernel / module object file. It is verified against
554known statement code boundaries, and will be relocated for use at
555run time.
556.PP
557In guru mode only, absolute kernel-space addresses may be specified with
558the ".absolute" suffix. Such an address is considered already relocated,
559as if it came from
560.BR /proc/kallsyms ,
561so it cannot be checked against statement/instruction boundaries.
6f017dee
FCE
562.SS CONTEXT VARIABLES
563
ba4a90fd 564.PP
6f017dee 565Many of the source-level context variables, such as function parameters,
ba4a90fd
FCE
566locals, globals visible in the compilation unit, may be visible to
567probe handlers. They may refer to these variables by prefixing their
568name with "$" within the scripts. In addition, a special syntax
6f017dee
FCE
569allows limited traversal of structures, pointers, and arrays. More
570syntax allows pretty-printing of individual variables or their groups.
571See also
572.BR @cast .
f8b9be91
FCE
573Note that variables may be inaccessible due to them being paged out,
574or for a few other reasons. See also man
575.IR error::fault (7stap).
6f017dee 576
ba4a90fd
FCE
577.TP
578$var
579refers to an in-scope variable "var". If it's an integer-like type,
7b9361d5
FCE
580it will be cast to a 64-bit int for systemtap script use. String-like
581pointers (char *) may be copied to systemtap string values using the
582.IR kernel_string " or " user_string
583functions.
ba4a90fd 584.TP
179a00c3
MW
585@var("varname")
586an alternative syntax for
587.IR $varname
588.
589.TP
590@var("varname@src/file.c")
591refers to the global (either file local or external) variable
592.IR varname
593defined when the file
594.IR src/file.c
595was compiled. The CU in which the variable is resolved is the first CU
596in the module of the probe point which matches the given file name at
597the end and has the shortest file name path (e.g. given
598.IR @var("foo@bar/baz.c")
599and CUs with file name paths
600.IR src/sub/module/bar/baz.c
601and
602.IR src/bar/baz.c
603the second CU will be chosen to resolve the (file) global variable
604.IR foo
605.
606.TP
ab5e90c2
FCE
607$var\->field traversal via a structure's or a pointer's field. This
608generalized indirection operator may be repeated to follow more
609levels. Note that the
610.IR .
611operator is not used for plain structure
612members, only
613.IR \->
614for both purposes. (This is because "." is reserved for string
615concatenation.)
ba4a90fd 616.TP
a43ba433
FCE
617$return
618is available in return probes only for functions that are declared
462a0d51 619with a return value, which can be determined using @defined($return).
a43ba433 620.TP
ba4a90fd 621$var[N]
33b081c5
JS
622indexes into an array. The index given with a literal number or even
623an arbitrary numeric expression.
6f017dee
FCE
624.PP
625A number of operators exist for such basic context variable expressions:
34af38db 626.TP
2cb3fe26
SC
627$$vars
628expands to a character string that is equivalent to
6f017dee
FCE
629.SAMPLE
630sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x",
631 parm1, ..., parmN, var1, ..., varN)
632.ESAMPLE
633for each variable in scope at the probe point. Some values may be
634printed as
635.IR =?
636if their run-time location cannot be found.
2cb3fe26
SC
637.TP
638$$locals
a43ba433 639expands to a subset of $$vars for only local variables.
2cb3fe26
SC
640.TP
641$$parms
a43ba433
FCE
642expands to a subset of $$vars for only function parameters.
643.TP
644$$return
645is available in return probes only. It expands to a string that
fd574705 646is equivalent to sprintf("return=%x", $return)
a43ba433 647if the probed function has a return value, or else an empty string.
6f017dee
FCE
648.TP
649& $EXPR
650expands to the address of the given context variable expression, if it
651is addressable.
652.TP
653@defined($EXPR)
654expands to 1 or 0 iff the given context variable expression is resolvable,
655for use in conditionals such as
656.SAMPLE
f7470174 657@defined($foo\->bar) ? $foo\->bar : 0
6f017dee
FCE
658.ESAMPLE
659.TP
660$EXPR$
661expands to a string with all of $EXPR's members, equivalent to
662.SAMPLE
663sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}",
664 $EXPR\->a, $EXPR\->b)
665.ESAMPLE
666.TP
667$EXPR$$
668expands to a string with all of $var's members and submembers, equivalent to
669.SAMPLE
670sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}",
671 $EXPR\->a, $EXPR\->b, $EXPR\->c\->x, $EXPR\->c\->y, $EXPR\->d[0])
672.ESAMPLE
673
3f5a5bb1
FCE
674.SS MORE ON RETURN PROBES
675
676.PP
677For the kernel ".return" probes, only a certain fixed number of
678returns may be outstanding. The default is a relatively small number,
679on the order of a few times the number of physical CPUs. If many
680different threads concurrently call the same blocking function, such
681as futex(2) or read(2), this limit could be exceeded, and skipped
e996e76a 682"kretprobes" would be reported by "stap \-t". To work around this,
3f5a5bb1
FCE
683specify a
684.SAMPLE
685probe FOO.return.maxactive(NNN)
686.ESAMPLE
687suffix, with a large enough NNN to cover all expected concurrently blocked
688threads. Alternately, use the
689.SAMPLE
e996e76a 690stap \-DKRETACTIVE=NNNN
3f5a5bb1
FCE
691.ESAMPLE
692stap command line macro setting to override the default for all
693".return" probes.
1c0b8e23 694
39e3139a 695.PP
1c0b8e23
FCE
696For ".return" probes, context variables other than the "$return" may
697be accessible, as a convenience for a script programmer wishing to
698access function parameters. These values are \fBsnapshots\fP
d1025fe6 699taken at the time of function entry. (Local variables within the
1c0b8e23 700function are \fBnot\fP generally accessible, since those variables did
d1025fe6
FCE
701not exist in allocated/initialized form at the snapshot moment.)
702These entry-snapshot variables should be accessed via
703.IR @entry($var) .
8cc799a5 704.PP
1c0b8e23
FCE
705In addition, arbitrary entry-time expressions can also be saved for
706".return" probes using the
8cc799a5
JS
707.IR @entry(expr)
708operator. For example, one can compute the elapsed time of a function:
709.SAMPLE
710probe kernel.function("do_filp_open").return {
711 println( get_timeofday_us() \- @entry(get_timeofday_us()) )
712}
713.ESAMPLE
39e3139a 714
1c0b8e23
FCE
715.PP
716The following table summarizes how values related to a function
717parameter context variable, a pointer named \fBaddr\fP, may be
718accessed from a
719.IR .return
720probe.
721.\" summarized from http://sourceware.org/ml/systemtap/2012-q1/msg00025.html
722.TS
723l l l.
724\fBat-entry value past-exit value\fP
725
726$addr \fInot available\fP
727$addr->x->y @cast(@entry($addr),"struct zz")->x->y
728$addr[0] {kernel,user}_{char,int,...}(& $addr[0])
729.TE
730
ba4a90fd 731
94c3c803
AM
732.SS DWARFLESS
733In absence of debugging information, entry & exit points of kernel & module
734functions can be probed using the "kprobe" family of probes.
735However, these do not permit looking up the arguments / local variables
736of the function.
737Following constructs are supported :
738.SAMPLE
739kprobe.function(FUNCTION)
3c57fe1f 740kprobe.function(FUNCTION).call
94c3c803
AM
741kprobe.function(FUNCTION).return
742kprobe.module(NAME).function(FUNCTION)
3c57fe1f 743kprobe.module(NAME).function(FUNCTION).call
94c3c803 744kprobe.module(NAME).function(FUNCTION).return
6448e5a5 745kprobe.statement(ADDRESS).absolute
94c3c803
AM
746.ESAMPLE
747.PP
748Probes of type
749.B function
750are recommended for kernel functions, whereas probes of type
751.B module
752are recommended for probing functions of the specified module.
753In case the absolute address of a kernel or module function is known,
754.B statement
755probes can be utilized.
756.PP
757Note that
758.I FUNCTION
759and
760.I MODULE
761names
762.B must not
763contain wildcards, or the probe will not be registered.
764Also, statement probes must be run under guru-mode only.
765
766
1ada6f08 767.SS USER-SPACE
38e96af8
FCE
768Support for user-space probing is available for kernels that are
769configured with the utrace extensions, or have the uprobes facility in
770linux 3.5. (Various kernel build configuration options need to be
771enabled; systemtap will advise if these are missing.)
772
0a1c696d
FCE
773.PP
774There are several forms. First, a non-symbolic probe point:
1ada6f08
FCE
775.SAMPLE
776process(PID).statement(ADDRESS).absolute
777.ESAMPLE
778is analogous to
779.IR
780kernel.statement(ADDRESS).absolute
781in that both use raw (unverified) virtual addresses and provide
782no $variables. The target PID parameter must identify a running
783process, and ADDRESS should identify a valid instruction address.
784All threads of that process will be probed.
29cb9b42 785.PP
0a1c696d
FCE
786Second, non-symbolic user-kernel interface events handled by
787utrace may be probed:
29cb9b42 788.SAMPLE
dd078c96 789process(PID).begin
82f0e81b 790process("FULLPATH").begin
986e98de 791process.begin
dd078c96 792process(PID).thread.begin
82f0e81b 793process("FULLPATH").thread.begin
986e98de 794process.thread.begin
dd078c96 795process(PID).end
82f0e81b 796process("FULLPATH").end
986e98de 797process.end
dd078c96 798process(PID).thread.end
82f0e81b 799process("FULLPATH").thread.end
986e98de 800process.thread.end
29cb9b42 801process(PID).syscall
82f0e81b 802process("FULLPATH").syscall
986e98de 803process.syscall
29cb9b42 804process(PID).syscall.return
82f0e81b 805process("FULLPATH").syscall.return
986e98de 806process.syscall.return
0afb7073 807process(PID).insn
82f0e81b 808process("FULLPATH").insn
0afb7073 809process(PID).insn.block
82f0e81b 810process("FULLPATH").insn.block
29cb9b42
DS
811.ESAMPLE
812.PP
813A
dd078c96 814.B .begin
82f0e81b 815probe gets called when new process described by PID or FULLPATH gets created.
29cb9b42 816A
dd078c96 817.B .thread.begin
82f0e81b 818probe gets called when a new thread described by PID or FULLPATH gets created.
159cb109 819A
dd078c96 820.B .end
82f0e81b 821probe gets called when process described by PID or FULLPATH dies.
dd078c96
DS
822A
823.B .thread.end
82f0e81b 824probe gets called when a thread described by PID or FULLPATH dies.
29cb9b42
DS
825A
826.B .syscall
82f0e81b 827probe gets called when a thread described by PID or FULLPATH makes a
6270adc1
MH
828system call. The system call number is available in the
829.BR $syscall
830context variable, and the first 6 arguments of the system call
831are available in the
832.BR $argN
833(ex. $arg1, $arg2, ...) context variable.
29cb9b42
DS
834A
835.B .syscall.return
82f0e81b 836probe gets called when a thread described by PID or FULLPATH returns from a
5d67b47c
MH
837system call. The system call number is available in the
838.BR $syscall
839context variable, and the return value of the system call is available
840in the
841.BR $return
29cb9b42 842context variable.
a96d1db0 843A
0afb7073 844.B .insn
82f0e81b 845probe gets called for every single-stepped instruction of the process described by PID or FULLPATH.
0afb7073
FCE
846A
847.B .insn.block
82f0e81b
FCE
848probe gets called for every block-stepped instruction of the process described by PID or FULLPATH.
849.PP
850If a process probe is specified without a PID or FULLPATH, all user
851threads will be probed. However, if systemtap was invoked with the
f7470174 852.IR \-c " or " \-x
82f0e81b 853options, then process probes are restricted to the process
6d5d594e 854hierarchy associated with the target process. If a process probe is
fc18e6c4 855unspecified (i.e. without a PID or FULLPATH), but with the
6d5d594e
LB
856.IR \-c "
857option, the PATH of the
858.IR \-c "
fc18e6c4
JL
859cmd will be heuristically filled into the process PATH. In that case,
860only command parameters are allowed in the \fI-c\fR command (i.e. no
861command substitution allowed and no occurrences of any of these
862characters: '|&;<>(){}').
0a1c696d
FCE
863
864.PP
865Third, symbolic static instrumentation compiled into programs and
866shared libraries may be
867probed:
868.SAMPLE
869process("PATH").mark("LABEL")
a794dbeb 870process("PATH").provider("PROVIDER").mark("LABEL")
af127b9f
AJ
871process(PID).mark("LABEL")
872process(PID).provider("PROVIDER").mark("LABEL")
0a1c696d
FCE
873.ESAMPLE
874.PP
f28a8c28
SC
875A
876.B .mark
877probe gets called via a static probe which is defined in the
38e96af8
FCE
878application by STAP_PROBE1(PROVIDER,LABEL,arg1), which are macros defined in
879.BR sys/sdt.h .
880The PROVIDER is an arbitrary application identifier, LABEL is the
881marker site identifier, and arg1 is the integer-typed argument.
882STAP_PROBE1 is used for probes with 1 argument, STAP_PROBE2 is used
883for probes with 2 arguments, and so on. The arguments of the probe
884are available in the context variables $arg1, $arg2, ... An
885alternative to using the STAP_PROBE macros is to use the dtrace script
886to create custom macros. Additionally, the variables $$name and
887$$provider are available as parts of the probe point name. The
888.B sys/sdt.h
889macro names DTRACE_PROBE* are available as aliases for STAP_PROBE*.
0a1c696d 890
29cb9b42 891.PP
38e96af8
FCE
892Finally, full symbolic source-level probes in user-space programs and
893shared libraries are supported. These are exactly analogous to the
894symbolic DWARF-based kernel/module probes described above. They
895expose the same sorts of context $variables for function parameters,
896local variables, and so on.
0a1c696d
FCE
897.SAMPLE
898process("PATH").function("NAME")
899process("PATH").statement("*@FILE.c:123")
4d0fcb93
SC
900process("PATH").plt("NAME")
901process("PATH").library("PATH").plt("NAME")
b73a1293
SC
902process("PATH").library("PATH").function("NAME")
903process("PATH").library("PATH").statement("*@FILE.c:123")
0a1c696d
FCE
904process("PATH").function("*").return
905process("PATH").function("myfun").label("foo")
7c86df9f 906process("PATH").function("foo").callee("bar")
2e96714f 907process("PATH").plt("NAME").return
af127b9f
AJ
908process(PID).function("NAME")
909process(PID).statement("*@FILE.c:123")
910process(PID).plt("NAME")
0a1c696d
FCE
911.ESAMPLE
912
913.PP
914Note that for all process probes,
29cb9b42 915.I PATH
ea384b8c
FCE
916names refer to executables that are searched the same way shells do: relative
917to the working directory if they contain a "/" character, otherwise in
918.BR $PATH .
d1bcbe71
RH
919If PATH names refer to scripts, the actual interpreters (specified in the
920script in the first line after the #! characters) are probed.
78683caf 921
e8b46a9e
FL
922.PP
923Tapset process probes placed in the special directory
924$prefix/share/systemtap/tapset/PATH/ with relative paths will have their
925process parameter prefixed with the location of the tapset. For example,
926
927.SAMPLE
928process("foo").function("NAME")
929.ESAMPLE
930.PP
931expands to
932.SAMPLE
933process("/usr/bin/foo").function("NAME")
934.ESAMPLE
935
936.PP
937when placed in $prefix/share/systemtap/tapset/PATH/usr/bin/
938
78683caf 939.PP
b73a1293 940If PATH is a process component parameter referring to shared libraries
78683caf
JL
941then all processes that map it at runtime would be selected for probing.
942If PATH is a library component parameter referring to shared libraries
943then the process specified by the process component would be selected.
944Note that the PATH pattern in a library component will always apply to
945libraries statically determined to be in use by the process. However,
946you may also specify the full path to any library file even if not
947statically needed by the process.
79dc1dee
FCE
948
949.PP
950A .plt probe will probe functions in the program linkage table
4d0fcb93 951corresponding to the rest of the probe point. .plt can be specified
79dc1dee
FCE
952as a shorthand for .plt("*"). The symbol name is available as a
953$$name context variable; function arguments are not available, since
2e96714f
SC
954PLTs are processed without debuginfo. A .plt.return probe places a
955probe at the moment \fBafter\fR the return from the named
956function.
79dc1dee
FCE
957
958.PP
82f0e81b
FCE
959If the PATH string contains wildcards as in the MPATTERN case, then
960standard globbing is performed to find all matching paths. In this
961case, the
962.BR $PATH
963environment variable is not used.
964
965.PP
153e7a22
FCE
966If systemtap was invoked with the
967.IR \-c " or " \-x
760695db
FCE
968options, then process probes are restricted to the process
969hierarchy associated with the target process.
1ada6f08 970
982026f1
SM
971.SS JAVA
972Support for probing Java methods is available using Byteman as a
973backend. Byteman is an instrumentation tool from the JBoss project
974which systemtap can use to monitor invocations for a specific method
975or line in a Java program.
976.PP
977Systemtap does so by generating a Byteman script listing the probes to
978instrument and then invoking the Byteman
979.IR bminstall
d885563b 980utility.
982026f1 981.PP
768754f8 982This Java instrumentation support is currently a prototype feature
d885563b
FCE
983with major limitations. Moreover, Java probing currently does not
984work across users; the stap script must run (with appropriate
985permissions) under the same user that the Java process being
986probed. (Thus a stap script under root currently cannot probe Java
987methods in a non-root-user Java process.)
982026f1
SM
988
989.PP
990The first probe type refers to Java processes by the name of the Java process:
991.SAMPLE
992java("PNAME").class("CLASSNAME").method("PATTERN")
993java("PNAME").class("CLASSNAME").method("PATTERN").return
994.ESAMPLE
269cd0ae
LB
995The PNAME argument must be a pre-existing jvm pid, and be identifiable
996via a jps listing.
997.PP
982026f1
SM
998The PATTERN parameter specifies the signature of the Java method to
999probe. The signature must consist of the exact name of the method,
1000followed by a bracketed list of the types of the arguments, for
1001instance "myMethod(int,double,Foo)". Wildcards are not supported.
1002.PP
1003The probe can be set to trigger at a specific line within the method
1004by appending a line number with colon, just as in other types of
1005probes: "myMethod(int,double,Foo):245".
1006.PP
1007The CLASSNAME parameter identifies the Java class the method belongs
1008to, either with or without the package qualification. By default, the
1009probe only triggers on descendants of the class that do not override
1010the method definition of the original class. However, CLASSNAME can
1011take an optional caret prefix, as in
1012.IR ^org.my.MyClass,
1013which specifies that the probe should also trigger on all descendants
1014of MyClass that override the original method. For instance, every method
1015with signature foo(int) in program org.my.MyApp can be probed at once using
1016.SAMPLE
1017java("org.my.MyApp").class("^java.lang.Object").method("foo(int)")
1018.ESAMPLE
1019.PP
1020The second probe type works analogously, but refers to Java processes by PID:
1021.SAMPLE
1022java(PID).class("CLASSNAME").method("PATTERN")
1023java(PID).class("CLASSNAME").method("PATTERN").return
1024.ESAMPLE
1025(PIDs for an already running process can be obtained using the
1026.IR jps (1)
1027utility.)
a26d56a4
SM
1028.PP
1029Context variables defined within java probes include
a26d56a4
SM
1030.IR $arg1
1031through
1032.IR $arg10
f8bc2a5e
FCE
1033(for up to the first 10 arguments of a method), represented as character-pointers
1034for the
1035.B toString()
1036form of each actual argument.
1037The
1038.IR arg1
1039through
1040.IR arg10
1041script variables provide access to these as ordinary strings, fetched via
1042.IR user_string_warn() .
1043.PP
1044Prior to systemtap version 3.1,
1045.IR $arg1
1046through
1047.IR $arg10
1048could contain either integers or character pointers, depending on the types of the
1049objects being passed to each particular java method. This previous behaviour may
1050be invoked with the
1051.I stap --compatible=3.0
1052flag.
982026f1 1053
9cb48751
DS
1054.SS PROCFS
1055
1056These probe points allow procfs "files" in
c243f608
LB
1057/proc/systemtap/MODNAME to be created, read and written using a
1058permission that may be modified using the proper umask value. Default permissions are 0400 for read
1059probes, and 0200 for write probes. If both a read and write probe are being
1060used on the same file, a default permission of 0600 will be used.
1061Using procfs.umask(0040).read would
1062result in a 0404 permission set for the file.
9cb48751
DS
1063.RI ( MODNAME
1064is the name of the systemtap module). The
1065.I proc
b7110b33 1066filesystem is a pseudo-filesystem which is used as an interface to
c243f608 1067kernel data structures. There are several probe point variants supported
9cb48751 1068by the translator:
ca88561f 1069
9cb48751
DS
1070.SAMPLE
1071procfs("PATH").read
c243f608 1072procfs("PATH").umask(UMASK).read
38975255 1073procfs("PATH").read.maxsize(MAXSIZE)
c243f608 1074procfs("PATH").umask(UMASK).maxsize(MAXSIZE)
9cb48751 1075procfs("PATH").write
c243f608 1076procfs("PATH").umask(UMASK).write
9cb48751 1077procfs.read
c243f608 1078procfs.umask(UMASK).read
38975255 1079procfs.read.maxsize(MAXSIZE)
c243f608 1080procfs.umask(UMASK).read.maxsize(MAXSIZE)
9cb48751 1081procfs.write
c243f608 1082procfs.umask(UMASK).write
9cb48751 1083.ESAMPLE
ca88561f 1084
9cb48751
DS
1085.I PATH
1086is the file name (relative to /proc/systemtap/MODNAME) to be created.
1087If no
1088.I PATH
1089is specified (as in the last two variants above),
1090.I PATH
1091defaults to "command".
1092.PP
1093When a user reads /proc/systemtap/MODNAME/PATH, the corresponding
1094procfs
1095.I read
1096probe is triggered. The string data to be read should be assigned to
1097a variable named
1098.IR $value ,
1099like this:
ca88561f 1100
9cb48751
DS
1101.SAMPLE
1102procfs("PATH").read { $value = "100\\n" }
1103.ESAMPLE
1104.PP
1105When a user writes into /proc/systemtap/MODNAME/PATH, the
1106corresponding procfs
1107.I write
1108probe is triggered. The data the user wrote is available in the
1109string variable named
1110.IR $value ,
1111like this:
ca88561f 1112
9cb48751
DS
1113.SAMPLE
1114procfs("PATH").write { printf("user wrote: %s", $value) }
1115.ESAMPLE
38975255
DS
1116.PP
1117.I MAXSIZE
1118is the size of the procfs read buffer. Specifying
1119.I MAXSIZE
1120allows larger procfs output. If no
1121.I MAXSIZE
1122is specified, the procfs read buffer defaults to
1123.I STP_PROCFS_BUFSIZE
1124(which defaults to
1125.IR MAXSTRINGLEN ,
1126the maximum length of a string).
1127If setting the procfs read buffers for more than one file is needed,
1128it may be easiest to override the
1129.I STP_PROCFS_BUFSIZE
1130definition.
1131Here's an example of using
1132.IR MAXSIZE :
1133
1134.SAMPLE
1135procfs.read.maxsize(1024) {
1136 $value = "long string..."
1137 $value .= "another long string..."
1138 $value .= "another long string..."
1139 $value .= "another long string..."
1140}
1141.ESAMPLE
9cb48751 1142
da00b50e
SM
1143.SS NETFILTER HOOKS
1144
1145These probe points allow observation of network packets using the
1146netfilter mechanism. A netfilter probe in systemtap corresponds to a
1147netfilter hook function in the original netfilter probes API. It is
1148probably more convenient to use
1149.IR tapset::netfilter (3stap),
1150which wraps the primitive netfilter hooks and does the work of
1151extracting useful information from the context variables.
1152
1153.PP
1154There are several probe point variants supported by the translator:
1155
1156.SAMPLE
1157netfilter.hook("HOOKNAME").pf("PROTOCOL_F")
1158netfilter.pf("PROTOCOL_F").hook("HOOKNAME")
1159netfilter.hook("HOOKNAME").pf("PROTOCOL_F").priority("PRIORITY")
1160netfilter.pf("PROTOCOL_F").hook("HOOKNAME").priority("PRIORITY")
1161.ESAMPLE
1162
1163.PP
1164.I PROTOCOL_F
1165is the protocol family to listen for, currently one of
1166.I NFPROTO_IPV4,
1167.I NFPROTO_IPV6,
1168.I NFPROTO_ARP,
1169or
1170.I NFPROTO_BRIDGE.
1171
1172.PP
1173.I HOOKNAME
1174is the point, or 'hook', in the protocol stack at which to intercept
1175the packet. The available hook names for each protocol family are
1176taken from the kernel header files <linux/netfilter_ipv4.h>,
1177<linux/netfilter_ipv6.h>, <linux/netfilter_arp.h> and
1178<linux/netfilter_bridge.h>. For instance, allowable hook names for
1179.I NFPROTO_IPV4
1180are
1181.I NF_INET_PRE_ROUTING,
1182.I NF_INET_LOCAL_IN,
1183.I NF_INET_FORWARD,
1184.I NF_INET_LOCAL_OUT,
1185and
1186.I NF_INET_POST_ROUTING.
1187
1188.PP
1189.I PRIORITY
1190is an integer priority giving the order in which the probe point
1191should be triggered relative to any other netfilter hook functions
1192which trigger on the same packet. Hook functions execute on each
1193packet in order from smallest priority number to largest priority number. If no
1194.I PRIORITY
1195is specified (as in the first two probe point variants above),
1196.I PRIORITY
1197defaults to "0".
1198
1199There are a number of predefined priority names of the form
1200.I NF_IP_PRI_*
1201and
1202.I NF_IP6_PRI_*
1203which are defined in the kernel header files <linux/netfilter_ipv4.h> and <linux/netfilter_ipv6.h> respectively. The script is permitted to use these
1204instead of specifying an integer priority. (The probe points for
1205.I NFPROTO_ARP
1206and
1207.I NFPROTO_BRIDGE
1208currently do not expose any named hook priorities to the script writer.)
1209Thus, allowable ways to specify the priority include:
1210
1211.SAMPLE
1212priority("255")
1213priority("NF_IP_PRI_SELINUX_LAST")
1214.ESAMPLE
1215
1216A script using guru mode is permitted to specify any identifier or
1217number as the parameter for hook, pf, and priority. This feature
1218should be used with caution, as the parameter is inserted verbatim into
1219the C code generated by systemtap.
1220
1221The netfilter probe points define the following context variables:
1222.TP
4d914c37
FCE
1223.IR $hooknum
1224The hook number.
1225.TP
da00b50e
SM
1226.IR $skb
1227The address of the sk_buff struct representing the packet. See
1228<linux/skbuff.h> for details on how to use this struct, or
1229alternatively use the tapset
1230.IR tapset::netfilter (3stap)
1231for easy access to key information.
1232
1233.TP
1234.IR $in
1235The address of the net_device struct representing the network device
1236on which the packet was received (if any). May be 0 if the device is
1237unknown or undefined at that stage in the protocol stack.
1238
1239.TP
1240.IR $out
1241The address of the net_device struct representing the network device
1242on which the packet will be sent (if any). May be 0 if the device is
1243unknown or undefined at that stage in the protocol stack.
1244
1245.TP
1246.IR $verdict
1247(Guru mode only.) Assigning one of the verdict values defined in
1248<linux/netfilter.h> to this variable alters the further progress of
1249the packet through the protocol stack. For instance, the following
1250guru mode script forces all ipv6 network packets to be dropped:
1251
1252.SAMPLE
1253probe netfilter.pf("NFPROTO_IPV6").hook("NF_IP6_PRE_ROUTING") {
c49ffe6c 1254 $verdict = 0 /* nf_drop */
da00b50e
SM
1255}
1256.ESAMPLE
1257
c49ffe6c
SM
1258For convenience, unlike the primitive probe points discussed here, the
1259probes defined in
1260.IR tapset::netfilter (3stap)
1261export the lowercase names of the verdict constants (e.g. NF_DROP
1262becomes nf_drop) as local variables.
1263
6032e2ce 1264.SS KERNEL TRACEPOINTS
bc724b8b
JS
1265
1266This family of probe points hooks up to static probing tracepoints
1267inserted into the kernel or modules. As with markers, these
1268tracepoints are special macro calls inserted by kernel developers to
1269make probing faster and more reliable than with DWARF-based probes,
1270and DWARF debugging information is not required to probe tracepoints.
1271Tracepoints have an extra advantage of more strongly-typed parameters
1272than markers.
1273
6032e2ce
FCE
1274Tracepoint probes look like:
1275.BR kernel.trace("name") .
bc724b8b
JS
1276The tracepoint name string, which may contain the usual wildcard
1277characters, is matched against the names defined by the kernel
970d1d06
JL
1278developers in the tracepoint header files. To restrict the search to
1279specific subsystems (e.g. sched, ext3, etc...), the following syntax
1280can be used:
1281.BR kernel.trace("system:name") .
1282The tracepoint system string may also contain the usual wildcard
1283characters.
bc724b8b
JS
1284
1285The handler associated with a tracepoint-based probe may read the
1286optional parameters specified at the macro call site. These are
1287named according to the declaration by the tracepoint author. For
1288example, the tracepoint probe
970d1d06 1289.BR kernel.trace("sched:sched_switch")
bc724b8b 1290provides the parameters
970d1d06 1291.BR $prev " and " $next .
bc724b8b
JS
1292If the parameter is a complex type, as in a struct pointer, then a
1293script can access fields with the same syntax as DWARF $target
1294variables. Also, tracepoint parameters cannot be modified, but in
1295guru-mode a script may modify fields of parameters.
1296
970d1d06
JL
1297The subsystem and name of the tracepoint are available in
1298.BR $$system " and " $$name
bc724b8b
JS
1299and a string of name=value pairs for all parameters of the tracepoint
1300is available in
046e7190 1301.BR $$vars " or " $$parms .
bc724b8b 1302
6032e2ce
FCE
1303.SS KERNEL MARKERS (OBSOLETE)
1304
1305This family of probe points hooks up to an older style of static
1306probing markers inserted into older kernels or modules. These markers
1307are special STAP_MARK macro calls inserted by kernel developers to
1308make probing faster and more reliable than with DWARF-based probes.
1309Further, DWARF debugging information is
1310.I not
1311required to probe markers.
1312
1313Marker probe points begin with
1314.BR kernel .
1315The next part names the marker itself:
1316.BR mark("name") .
1317The marker name string, which may contain the usual wildcard characters,
1318is matched against the names given to the marker macros when the kernel
1319and/or module was compiled. Optionally, you can specify
1320.BR format("format") .
1321Specifying the marker format string allows differentiation between two
1322markers with the same name but different marker format strings.
1323
1324The handler associated with a marker-based probe may read the
1325optional parameters specified at the macro call site. These are
1326named
1327.BR $arg1 " through " $argNN ,
1328where NN is the number of parameters supplied by the macro. Number
1329and string parameters are passed in a type-safe manner.
1330
1331The marker format string associated with a marker is available in
1332.BR $format .
1333And also the marker name string is available in
1334.BR $name .
1335
dd225250
PS
1336.SS HARDWARE BREAKPOINTS
1337This family of probes is used to set hardware watchpoints for a given
1338 (global) kernel symbol. The probes take three components as inputs :
1339
13401. The
1341.BR virtual address / name
1342of the kernel symbol to be traced is supplied as argument to this class
1343of probes. ( Probes for only data segment variables are supported. Probing
1344local variables of a function cannot be done.)
1345
13462. Nature of access to be probed :
1347a.
1348.I .write
1349probe gets triggered when a write happens at the specified address/symbol
1350name.
1351b.
1352.I rw
1353probe is triggered when either a read or write happens.
1354
13553.
1356.BR .length
1357(optional)
1358Users have the option of specifying the address interval to be probed
1359using "length" constructs. The user-specified length gets approximated
1360to the closest possible address length that the architecture can
1361support. If the specified length exceeds the limits imposed by
1362architecture, an error message is flagged and probe registration fails.
1363Wherever 'length' is not specified, the translator requests a hardware
1364breakpoint probe of length 1. It should be noted that the "length"
1365construct is not valid with symbol names.
1366
1367Following constructs are supported :
1368.SAMPLE
1369probe kernel.data(ADDRESS).write
1370probe kernel.data(ADDRESS).rw
1371probe kernel.data(ADDRESS).length(LEN).write
1372probe kernel.data(ADDRESS).length(LEN).rw
1373probe kernel.data("SYMBOL_NAME").write
1374probe kernel.data("SYMBOL_NAME").rw
1375.ESAMPLE
1376
1377This set of probes make use of the debug registers of the processor,
1378which is a scarce resource. (4 on x86 , 1 on powerpc ) The script
1379translation flags a warning if a user requests more hardware breakpoint probes
1380than the limits set by architecture. For example,a pass-2 warning is flashed
1381when an input script requests 5 hardware breakpoint probes on an x86
1382system while x86 architecture supports a maximum of 4 breakpoints.
1383Users are cautioned to set probes judiciously.
1384
9becfcef
MW
1385.SS PERF
1386
f8b9be91 1387This family of probe points interfaces to the kernel "perf event"
cb7d3cd8 1388infrastructure for controlling hardware performance counters.
9becfcef
MW
1389The events being attached to are described by the "type",
1390"config" fields of the
1391.IR perf_event_attr
1392structure, and are sampled at an interval governed by the
b113d360 1393"sample_period" and "sample_freq" fields.
9becfcef
MW
1394
1395These fields are made available to systemtap scripts using
1396the following syntax:
1397.SAMPLE
1398probe perf.type(NN).config(MM).sample(XX)
07c818a0 1399probe perf.type(NN).config(MM).hz(XX)
9becfcef 1400probe perf.type(NN).config(MM)
dbdab5c8
SC
1401probe perf.type(NN).config(MM).process("PROC")
1402probe perf.type(NN).config(MM).counter("COUNTER")
1403probe perf.type(NN).config(MM).process("PROC").counter("COUNTER")
9becfcef
MW
1404.ESAMPLE
1405The systemtap probe handler is called once per XX increments
b113d360
FL
1406of the underlying performance counter when using the .sample field
1407or at a frequency in hertz when using the .hz field. When not specified,
1408the default behavior is to sample at a count of 1000000.
9becfcef
MW
1409The range of valid type/config is described by the
1410.IR perf_event_open (2)
1411system call, and/or the
1412.IR linux/perf_event.h
1413file. Invalid combinations or exhausted hardware counter resources
1414result in errors during systemtap script startup. Systemtap does
1415not sanity-check the values: it merely passes them through to
6a8fe809
SC
1416the kernel for error- and safety-checking. By default the perf event
1417probe is systemwide unless .process is specified, which will bind the
fce2c5df 1418probe to a specific task. If the name is omitted then it
e996e76a 1419is inferred from the stap \-c argument. A perf event can be read on
75cd04ca
SC
1420demand using .counter. The body of the perf probe handler will not be
1421invoked for a .counter probe; instead, the counter is read in a user
1422space probe via:
dbdab5c8
SC
1423.TP
1424 process("PROCESS").statement("func@file") {stat <<< @perf("NAME")}
1425
fce2c5df 1426
ba4a90fd
FCE
1427.SH EXAMPLES
1428.PP
1429Here are some example probe points, defining the associated events.
1430.TP
1431begin, end, end
1432refers to the startup and normal shutdown of the session. In this
1433case, the handler would run once during startup and twice during
1434shutdown.
1435.TP
1436timer.jiffies(1000).randomize(200)
13d2ecdb 1437refers to a periodic interrupt, every 1000 +/\- 200 jiffies.
ba4a90fd
FCE
1438.TP
1439kernel.function("*init*"), kernel.function("*exit*")
1440refers to all kernel functions with "init" or "exit" in the name.
1441.TP
199d126d
MW
1442kernel.function("*@kernel/time.c:240")
1443refers to any functions within the "kernel/time.c" file that span
6ff00e1d
FCE
1444line 240.
1445.BR
1446Note
1447that this is
1448.BR not
1449a probe at the statement at that line number. Use the
1450.IR
1451kernel.statement
1452probe instead.
ba4a90fd 1453.TP
6032e2ce
FCE
1454kernel.trace("sched_*")
1455refers to all scheduler-related (really, prefixed) tracepoints in
1456the kernel.
1457.TP
6f05b6ab 1458kernel.mark("getuid")
6032e2ce 1459refers to an obsolete STAP_MARK(getuid, ...) macro call in the kernel.
6f05b6ab 1460.TP
ba4a90fd
FCE
1461module("usb*").function("*sync*").return
1462refers to the moment of return from all functions with "sync" in the
1463name in any of the USB drivers.
1464.TP
1465kernel.statement(0xc0044852)
1466refers to the first byte of the statement whose compiled instructions
1467include the given address in the kernel.
b4ceace2 1468.TP
199d126d
MW
1469kernel.statement("*@kernel/time.c:296")
1470refers to the statement of line 296 within "kernel/time.c".
1bd128a3
SC
1471.TP
1472kernel.statement("bio_init@fs/bio.c+3")
1473refers to the statement at line bio_init+3 within "fs/bio.c".
a5ae3f3d 1474.TP
dd225250 1475kernel.data("pid_max").write
cb7d3cd8 1476refers to a hardware breakpoint of type "write" set on pid_max
dd225250 1477.TP
729286d8 1478syscall.*.return
b4ceace2 1479refers to the group of probe aliases with any name in the third position
ba4a90fd
FCE
1480
1481.SH SEE ALSO
5dfce2b6
FCE
1482.nh
1483.nf
78db65bd 1484.IR stap (1),
89965a32
FCE
1485.IR probe::* (3stap),
1486.IR tapset::* (3stap)
1c0b8e23
FCE
1487
1488.\" Local Variables:
1489.\" mode: nroff
1490.\" End:
This page took 0.37372 seconds and 5 git commands to generate.