]>
Commit | Line | Data |
---|---|---|
5f92f126 | 1 | .\" t |
ec1a2239 | 2 | .TH STAPPROBES 3stap |
ba4a90fd FCE |
3 | .SH NAME |
4 | stapprobes \- systemtap probe points | |
5 | ||
6 | .\" macros | |
7 | .de SAMPLE | |
8 | .br | |
9 | .RS | |
10 | .nf | |
11 | .nh | |
12 | .. | |
13 | .de ESAMPLE | |
14 | .hy | |
15 | .fi | |
16 | .RE | |
17 | .. | |
18 | ||
19 | .SH DESCRIPTION | |
20 | The following sections enumerate the variety of probe points supported | |
89965a32 FCE |
21 | by the systemtap translator, and some of the additional aliases defined by |
22 | standard tapset scripts. Many are individually documented in the | |
23 | .IR 3stap | |
24 | manual section, with the | |
25 | .IR probe:: | |
26 | prefix. | |
67d1ed18 FCE |
27 | |
28 | .SH SYNTAX | |
29 | ||
30 | .PP | |
31 | .SAMPLE | |
32 | .BR probe " PROBEPOINT [" , " PROBEPOINT] " { " [STMT ...] " } | |
33 | .ESAMPLE | |
34 | .PP | |
35 | A probe declaration may list multiple comma-separated probe points in | |
36 | order to attach a handler to all of the named events. Normally, the | |
37 | handler statements are run whenever any of events occur. | |
ba4a90fd | 38 | .PP |
67d1ed18 FCE |
39 | The syntax of a single probe point is a general dotted-symbol |
40 | sequence. This allows a breakdown of the event namespace into parts, | |
41 | somewhat like the Domain Name System does on the Internet. Each | |
42 | component identifier may be parametrized by a string or number | |
43 | literal, with a syntax like a function call. A component may include | |
44 | a "*" character, to expand to a set of matching probe points. It may | |
45 | also include "**" to match multiple sequential components at once. | |
46 | Probe aliases likewise expand to other probe points. | |
2f5bbffa | 47 | .PP |
67d1ed18 FCE |
48 | Probe aliases can be given on their own, or with a suffix. The suffix |
49 | attaches to the underlying probe point that the alias is expanded | |
50 | to. For example, | |
2f5bbffa SM |
51 | .SAMPLE |
52 | syscall.read.return.maxactive(10) | |
53 | .ESAMPLE | |
54 | expands to | |
55 | .SAMPLE | |
56 | kernel.function("sys_read").return.maxactive(10) | |
57 | .ESAMPLE | |
58 | with the component | |
59 | .IR maxactive(10) | |
60 | being recognized as a suffix. | |
61 | .PP | |
67d1ed18 FCE |
62 | Normally, each and every probe point resulting from wildcard- and |
63 | alias-expansion must be resolved to some low-level system | |
64 | instrumentation facility (e.g., a kprobe address, marker, or a timer | |
65 | configuration), otherwise the elaboration phase will fail. | |
d898100a FCE |
66 | .PP |
67 | However, a probe point may be followed by a "?" character, to indicate | |
68 | that it is optional, and that no error should result if it fails to | |
69 | resolve. Optionalness passes down through all levels of | |
70 | alias/wildcard expansion. Alternately, a probe point may be followed | |
71 | by a "!" character, to indicate that it is both optional and | |
37f6433e | 72 | sufficient. (Think vaguely of the Prolog cut operator.) If it does |
d898100a FCE |
73 | resolve, then no further probe points in the same comma-separated list |
74 | will be resolved. Therefore, the "!" sufficiency mark only makes | |
75 | sense in a list of probe point alternatives. | |
dfd11cc3 MH |
76 | .PP |
77 | Additionally, a probe point may be followed by a "if (expr)" statement, in | |
78 | order to enable/disable the probe point on-the-fly. With the "if" statement, | |
79 | if the "expr" is false when the probe point is hit, the whole probe body | |
80 | including alias's body is skipped. The condition is stacked up through | |
81 | all levels of alias/wildcard expansion. So the final condition becomes | |
67d1ed18 FCE |
82 | the logical-and of conditions of all expanded alias/wildcard. The expressions |
83 | are necessarily restricted to global variables. | |
84 | .PP | |
e904ad95 FCE |
85 | These are all |
86 | .B syntactically | |
87 | valid probe points. (They are generally | |
88 | .B semantically | |
89 | invalid, depending on the contents of the tapsets, and the versions of | |
90 | kernel/user software installed.) | |
ca88561f | 91 | |
ba4a90fd FCE |
92 | .SAMPLE |
93 | kernel.function("foo").return | |
e904ad95 | 94 | process("/bin/vi").statement(0x2222) |
ba4a90fd | 95 | end |
729286d8 | 96 | syscall.* |
2f5bbffa | 97 | syscall.*.return.maxactive(10) |
649260f3 | 98 | sys**open |
6e3347a9 | 99 | kernel.function("no_such_function") ? |
d898100a | 100 | module("awol").function("no_such_function") ! |
dfd11cc3 | 101 | signal.*? if (switch) |
94c3c803 | 102 | kprobe.function("foo") |
ba4a90fd FCE |
103 | .ESAMPLE |
104 | ||
6f05b6ab FCE |
105 | Probes may be broadly classified into "synchronous" and |
106 | "asynchronous". A "synchronous" event is deemed to occur when any | |
107 | processor executes an instruction matched by the specification. This | |
108 | gives these probes a reference point (instruction address) from which | |
109 | more contextual data may be available. Other families of probe points | |
110 | refer to "asynchronous" events such as timers/counters rolling over, | |
111 | where there is no fixed reference point that is related. Each probe | |
112 | point specification may match multiple locations (for example, using | |
113 | wildcards or aliases), and all them are then probed. A probe | |
114 | declaration may also contain several comma-separated specifications, | |
115 | all of which are probed. | |
116 | ||
5f92f126 FCE |
117 | .SH DWARF DEBUGINFO |
118 | ||
119 | Resolving some probe points requires DWARF debuginfo or "debug | |
120 | symbols" for the specific part being instrumented. For some others, | |
121 | DWARF is automatically synthesized on the fly from source code header | |
122 | files. For others, it is not needed at all. Since a systemtap script | |
123 | may use any mixture of probe points together, the union of their DWARF | |
124 | requirements has to be met on the computer where script compilation | |
125 | occurs. (See the \fI\-\-use\-server\fR option and the \fBstap-server\ | |
126 | (8)\fR man page for information about the remote compilation facility, | |
127 | which allows these requirements to be met on a different machine.) | |
128 | .PP | |
129 | The following point lists many of the available probe point families, | |
130 | to classify them with respect to their need for DWARF debuginfo. | |
131 | ||
132 | .TS | |
133 | l l l. | |
7bfd1083 | 134 | \fBDWARF NON-DWARF\fP |
5f92f126 | 135 | |
7bfd1083 | 136 | kernel.function, .statement kernel.mark |
79dc1dee | 137 | module.function, .statement process.mark, process.plt |
7bfd1083 TJL |
138 | process.function, .statement begin, end, error, never |
139 | process.mark \fI(backup)\fP timer | |
140 | perf | |
141 | procfs | |
142 | \fBAUTO-DWARF\fP kernel.statement.absolute | |
143 | kernel.data | |
144 | kernel.trace kprobe.function | |
145 | process.statement.absolute | |
146 | process.begin, .end, .error | |
5f92f126 FCE |
147 | .TE |
148 | ||
149 | .SH PROBE POINT FAMILIES | |
150 | ||
65aeaea0 | 151 | .SS BEGIN/END/ERROR |
ba4a90fd FCE |
152 | |
153 | The probe points | |
154 | .IR begin " and " end | |
155 | are defined by the translator to refer to the time of session startup | |
156 | and shutdown. All "begin" probe handlers are run, in some sequence, | |
157 | during the startup of the session. All global variables will have | |
158 | been initialized prior to this point. All "end" probes are run, in | |
159 | some sequence, during the | |
160 | .I normal | |
161 | shutdown of a session, such as in the aftermath of an | |
162 | .I exit () | |
163 | function call, or an interruption from the user. In the case of an | |
164 | error-triggered shutdown, "end" probes are not run. There are no | |
165 | target variables available in either context. | |
6a256b03 JS |
166 | .PP |
167 | If the order of execution among "begin" or "end" probes is significant, | |
168 | then an optional sequence number may be provided: | |
ca88561f | 169 | |
6a256b03 JS |
170 | .SAMPLE |
171 | begin(N) | |
172 | end(N) | |
173 | .ESAMPLE | |
ca88561f | 174 | |
6a256b03 JS |
175 | The number N may be positive or negative. The probe handlers are run in |
176 | increasing order, and the order between handlers with the same sequence | |
177 | number is unspecified. When "begin" or "end" are given without a | |
178 | sequence, they are effectively sequence zero. | |
ba4a90fd | 179 | |
65aeaea0 FCE |
180 | The |
181 | .IR error | |
182 | probe point is similar to the | |
183 | .IR end | |
d898100a FCE |
184 | probe, except that each such probe handler run when the session ends |
185 | after errors have occurred. In such cases, "end" probes are skipped, | |
37f6433e | 186 | but each "error" probe is still attempted. This kind of probe can be |
d898100a FCE |
187 | used to clean up or emit a "final gasp". It may also be numerically |
188 | parametrized to set a sequence. | |
65aeaea0 | 189 | |
6e3347a9 FCE |
190 | .SS NEVER |
191 | The probe point | |
192 | .IR never | |
193 | is specially defined by the translator to mean "never". Its probe | |
194 | handler is never run, though its statements are analyzed for symbol / | |
195 | type correctness as usual. This probe point may be useful in | |
196 | conjunction with optional probes. | |
197 | ||
1027502b FCE |
198 | .SS SYSCALL |
199 | ||
200 | The | |
201 | .IR syscall.* | |
202 | aliases define several hundred probes, too many to | |
56bd0316 | 203 | detail here. They are of the general form: |
1027502b FCE |
204 | |
205 | .SAMPLE | |
206 | syscall.NAME | |
207 | .br | |
208 | syscall.NAME.return | |
209 | .ESAMPLE | |
210 | ||
211 | Generally, two probes are defined for each normal system call as listed in the | |
212 | .IR syscalls(2) | |
213 | manual page, one for entry and one for return. Those system calls that never | |
214 | return do not have a corresponding | |
215 | .IR .return | |
216 | probe. | |
217 | .PP | |
df7f3a01 | 218 | Each probe alias provides a variety of variables. Looking at the tapset source |
1027502b FCE |
219 | code is the most reliable way. Generally, each variable listed in the standard |
220 | manual page is made available as a script-level variable, so | |
221 | .IR syscall.open | |
222 | exposes | |
223 | .IR filename ", " flags ", and " mode . | |
224 | In addition, a standard suite of variables is available at most aliases: | |
225 | .TP | |
226 | .IR argstr | |
227 | A pretty-printed form of the entire argument list, without parentheses. | |
228 | .TP | |
229 | .IR name | |
230 | The name of the system call. | |
231 | .TP | |
232 | .IR retstr | |
233 | For return probes, a pretty-printed form of the system-call result. | |
234 | .PP | |
df7f3a01 FCE |
235 | As usual for probe aliases, these variables are all simply initialized |
236 | once from the underlying $context variables, so that later changes to | |
237 | $context variables are not automatically reflected. Not all probe | |
238 | aliases obey all of these general guidelines. Please report any | |
239 | bothersome ones you encounter as a bug. | |
c34eceea FCE |
240 | .PP |
241 | If debuginfo availability is a problem, you may try using the | |
242 | non-DWARF syscall probe aliases instead. Use the | |
243 | .IR nd_syscall. | |
244 | prefix instead of | |
245 | .IR syscall. | |
246 | The same context variables are available, as far as possible. | |
1027502b | 247 | |
ba4a90fd FCE |
248 | .SS TIMERS |
249 | ||
250 | Intervals defined by the standard kernel "jiffies" timer may be used | |
251 | to trigger probe handlers asynchronously. Two probe point variants | |
252 | are supported by the translator: | |
ca88561f | 253 | |
ba4a90fd FCE |
254 | .SAMPLE |
255 | timer.jiffies(N) | |
256 | timer.jiffies(N).randomize(M) | |
257 | .ESAMPLE | |
ca88561f | 258 | |
ba4a90fd FCE |
259 | The probe handler is run every N jiffies (a kernel-defined unit of |
260 | time, typically between 1 and 60 ms). If the "randomize" component is | |
13d2ecdb | 261 | given, a linearly distributed random value in the range [\-M..+M] is |
ba4a90fd FCE |
262 | added to N every time the handler is run. N is restricted to a |
263 | reasonable range (1 to around a million), and M is restricted to be | |
264 | smaller than N. There are no target variables provided in either | |
265 | context. It is possible for such probes to be run concurrently on | |
266 | a multi-processor computer. | |
422d1ceb | 267 | .PP |
197a4d62 | 268 | Alternatively, intervals may be specified in units of time. |
422d1ceb | 269 | There are two probe point variants similar to the jiffies timer: |
ca88561f | 270 | |
422d1ceb FCE |
271 | .SAMPLE |
272 | timer.ms(N) | |
273 | timer.ms(N).randomize(M) | |
274 | .ESAMPLE | |
ca88561f | 275 | |
197a4d62 JS |
276 | Here, N and M are specified in milliseconds, but the full options for units |
277 | are seconds (s/sec), milliseconds (ms/msec), microseconds (us/usec), | |
278 | nanoseconds (ns/nsec), and hertz (hz). Randomization is not supported for | |
279 | hertz timers. | |
280 | ||
281 | The actual resolution of the timers depends on the target kernel. For | |
282 | kernels prior to 2.6.17, timers are limited to jiffies resolution, so | |
283 | intervals are rounded up to the nearest jiffies interval. After 2.6.17, | |
284 | the implementation uses hrtimers for tighter precision, though the actual | |
285 | resolution will be arch-dependent. In either case, if the "randomize" | |
286 | component is given, then the random value will be added to the interval | |
287 | before any rounding occurs. | |
39e57ce0 | 288 | .PP |
ab8b5560 FCE |
289 | Profiling timers are also available to provide probes that execute on |
290 | all CPUs at the rate of the system tick (CONFIG_HZ). This probe takes | |
291 | no parameters. On some kernels, this is a one-concurrent-user-only or | |
e996e76a | 292 | disabled facility, resulting in error \-16 (EBUSY) during probe |
ab8b5560 | 293 | registration. |
ca88561f | 294 | |
39e57ce0 FCE |
295 | .SAMPLE |
296 | timer.profile | |
297 | .ESAMPLE | |
ca88561f | 298 | |
39e57ce0 FCE |
299 | Full context information of the interrupted process is available, making |
300 | this probe suitable for a time-based sampling profiler. | |
ba4a90fd FCE |
301 | |
302 | .SS DWARF | |
303 | ||
304 | This family of probe points uses symbolic debugging information for | |
305 | the target kernel/module/program, as may be found in unstripped | |
306 | executables, or the separate | |
307 | .I debuginfo | |
308 | packages. They allow placement of probes logically into the execution | |
309 | path of the target program, by specifying a set of points in the | |
310 | source or object code. When a matching statement executes on any | |
311 | processor, the probe handler is run in that context. | |
312 | .PP | |
313 | Points in a kernel, which are identified by | |
ca88561f | 314 | module, source file, line number, function name, or some |
6f05b6ab | 315 | combination of these. |
ba4a90fd FCE |
316 | .PP |
317 | Here is a list of probe point families currently supported. The | |
318 | .B .function | |
319 | variant places a probe near the beginning of the named function, so that | |
320 | parameters are available as context variables. The | |
321 | .B .return | |
39e3139a FCE |
322 | variant places a probe at the moment |
323 | .B after | |
324 | the return from the named function, so the return value is available | |
325 | as the "$return" context variable. The | |
54efe513 | 326 | .B .inline |
b8da0ad1 | 327 | modifier for |
54efe513 | 328 | .B .function |
b8da0ad1 FCE |
329 | filters the results to include only instances of inlined functions. |
330 | The | |
331 | .B .call | |
736d8a14 SC |
332 | modifier selects the opposite subset. The |
333 | .B .exported | |
334 | modifier | |
4bda987e SC |
335 | filters the results to include only exported functions. Inline |
336 | functions do not have an identifiable return point, so | |
54efe513 GH |
337 | .B .return |
338 | is not supported on | |
339 | .B .inline | |
340 | probes. The | |
ba4a90fd FCE |
341 | .B .statement |
342 | variant places a probe at the exact spot, exposing those local variables | |
343 | that are visible there. | |
ca88561f | 344 | |
ba4a90fd FCE |
345 | .SAMPLE |
346 | kernel.function(PATTERN) | |
347 | .br | |
b8da0ad1 FCE |
348 | kernel.function(PATTERN).call |
349 | .br | |
ba4a90fd FCE |
350 | kernel.function(PATTERN).return |
351 | .br | |
b8da0ad1 | 352 | kernel.function(PATTERN).inline |
54efe513 | 353 | .br |
592470cd SC |
354 | kernel.function(PATTERN).label(LPATTERN) |
355 | .br | |
ba4a90fd FCE |
356 | module(MPATTERN).function(PATTERN) |
357 | .br | |
b8da0ad1 FCE |
358 | module(MPATTERN).function(PATTERN).call |
359 | .br | |
ba4a90fd FCE |
360 | module(MPATTERN).function(PATTERN).return |
361 | .br | |
b8da0ad1 FCE |
362 | module(MPATTERN).function(PATTERN).inline |
363 | .br | |
2cab6244 JS |
364 | module(MPATTERN).function(PATTERN).label(LPATTERN) |
365 | .br | |
54efe513 | 366 | .br |
ba4a90fd FCE |
367 | kernel.statement(PATTERN) |
368 | .br | |
37ebca01 FCE |
369 | kernel.statement(ADDRESS).absolute |
370 | .br | |
ba4a90fd | 371 | module(MPATTERN).statement(PATTERN) |
6f017dee FCE |
372 | .br |
373 | process("PATH").function("NAME") | |
374 | .br | |
375 | process("PATH").statement("*@FILE.c:123") | |
376 | .br | |
b73a1293 SC |
377 | process("PATH").library("PATH").function("NAME") |
378 | .br | |
379 | process("PATH").library("PATH").statement("*@FILE.c:123") | |
380 | .br | |
6f017dee FCE |
381 | process("PATH").function("*").return |
382 | .br | |
383 | process("PATH").function("myfun").label("foo") | |
5fa99496 FCE |
384 | .br |
385 | process(PID).statement(ADDRESS).absolute | |
ba4a90fd | 386 | .ESAMPLE |
ca88561f | 387 | |
6f017dee FCE |
388 | (See the USER-SPACE section below for more information on the process |
389 | probes.) | |
390 | ||
ba4a90fd | 391 | In the above list, MPATTERN stands for a string literal that aims to |
592470cd SC |
392 | identify the loaded kernel module of interest and LPATTERN stands for |
393 | a source program label. Both MPATTERN and LPATTERN may include the "*" | |
394 | "[]", and "?" wildcards. | |
395 | PATTERN stands for a string literal that | |
6f05b6ab | 396 | aims to identify a point in the program. It is made up of three |
ca88561f MM |
397 | parts: |
398 | .IP \(bu 4 | |
399 | The first part is the name of a function, as would appear in the | |
ba4a90fd FCE |
400 | .I nm |
401 | program's output. This part may use the "*" and "?" wildcarding | |
ca88561f MM |
402 | operators to match multiple names. |
403 | .IP \(bu 4 | |
404 | The second part is optional and begins with the "@" character. | |
405 | It is followed by the path to the source file containing the function, | |
406 | which may include a wildcard pattern, such as mm/slab*. | |
79640c29 | 407 | If it does not match as is, an implicit "*/" is optionally added |
ea384b8c | 408 | .I before |
79640c29 FCE |
409 | the pattern, so that a script need only name the last few components |
410 | of a possibly long source directory path. | |
ca88561f | 411 | .IP \(bu 4 |
ba4a90fd | 412 | Finally, the third part is optional if the file name part was given, |
1bd128a3 SC |
413 | and identifies the line number in the source file preceded by a ":" |
414 | or a "+". The line number is assumed to be an | |
415 | absolute line number if preceded by a ":", or relative to the entry of | |
99a5f9cf SC |
416 | the function if preceded by a "+". |
417 | All the lines in the function can be matched with ":*". | |
f7470174 | 418 | A range of lines x through y can be matched with ":x\-y". |
ca88561f | 419 | .PP |
ba4a90fd | 420 | As an alternative, PATTERN may be a numeric constant, indicating an |
ea384b8c FCE |
421 | address. Such an address may be found from symbol tables of the |
422 | appropriate kernel / module object file. It is verified against | |
423 | known statement code boundaries, and will be relocated for use at | |
424 | run time. | |
425 | .PP | |
426 | In guru mode only, absolute kernel-space addresses may be specified with | |
427 | the ".absolute" suffix. Such an address is considered already relocated, | |
428 | as if it came from | |
429 | .BR /proc/kallsyms , | |
430 | so it cannot be checked against statement/instruction boundaries. | |
6f017dee FCE |
431 | |
432 | .SS CONTEXT VARIABLES | |
433 | ||
ba4a90fd | 434 | .PP |
6f017dee | 435 | Many of the source-level context variables, such as function parameters, |
ba4a90fd FCE |
436 | locals, globals visible in the compilation unit, may be visible to |
437 | probe handlers. They may refer to these variables by prefixing their | |
438 | name with "$" within the scripts. In addition, a special syntax | |
6f017dee FCE |
439 | allows limited traversal of structures, pointers, and arrays. More |
440 | syntax allows pretty-printing of individual variables or their groups. | |
441 | See also | |
442 | .BR @cast . | |
443 | ||
ba4a90fd FCE |
444 | .TP |
445 | $var | |
446 | refers to an in-scope variable "var". If it's an integer-like type, | |
7b9361d5 FCE |
447 | it will be cast to a 64-bit int for systemtap script use. String-like |
448 | pointers (char *) may be copied to systemtap string values using the | |
449 | .IR kernel_string " or " user_string | |
450 | functions. | |
ba4a90fd | 451 | .TP |
179a00c3 MW |
452 | @var("varname") |
453 | an alternative syntax for | |
454 | .IR $varname | |
455 | . | |
456 | .TP | |
457 | @var("varname@src/file.c") | |
458 | refers to the global (either file local or external) variable | |
459 | .IR varname | |
460 | defined when the file | |
461 | .IR src/file.c | |
462 | was compiled. The CU in which the variable is resolved is the first CU | |
463 | in the module of the probe point which matches the given file name at | |
464 | the end and has the shortest file name path (e.g. given | |
465 | .IR @var("foo@bar/baz.c") | |
466 | and CUs with file name paths | |
467 | .IR src/sub/module/bar/baz.c | |
468 | and | |
469 | .IR src/bar/baz.c | |
470 | the second CU will be chosen to resolve the (file) global variable | |
471 | .IR foo | |
472 | . | |
473 | .TP | |
ab5e90c2 FCE |
474 | $var\->field traversal via a structure's or a pointer's field. This |
475 | generalized indirection operator may be repeated to follow more | |
476 | levels. Note that the | |
477 | .IR . | |
478 | operator is not used for plain structure | |
479 | members, only | |
480 | .IR \-> | |
481 | for both purposes. (This is because "." is reserved for string | |
482 | concatenation.) | |
ba4a90fd | 483 | .TP |
a43ba433 FCE |
484 | $return |
485 | is available in return probes only for functions that are declared | |
486 | with a return value. | |
487 | .TP | |
ba4a90fd | 488 | $var[N] |
33b081c5 JS |
489 | indexes into an array. The index given with a literal number or even |
490 | an arbitrary numeric expression. | |
6f017dee FCE |
491 | .PP |
492 | A number of operators exist for such basic context variable expressions: | |
34af38db | 493 | .TP |
2cb3fe26 SC |
494 | $$vars |
495 | expands to a character string that is equivalent to | |
6f017dee FCE |
496 | .SAMPLE |
497 | sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x", | |
498 | parm1, ..., parmN, var1, ..., varN) | |
499 | .ESAMPLE | |
500 | for each variable in scope at the probe point. Some values may be | |
501 | printed as | |
502 | .IR =? | |
503 | if their run-time location cannot be found. | |
2cb3fe26 SC |
504 | .TP |
505 | $$locals | |
a43ba433 | 506 | expands to a subset of $$vars for only local variables. |
2cb3fe26 SC |
507 | .TP |
508 | $$parms | |
a43ba433 FCE |
509 | expands to a subset of $$vars for only function parameters. |
510 | .TP | |
511 | $$return | |
512 | is available in return probes only. It expands to a string that | |
fd574705 | 513 | is equivalent to sprintf("return=%x", $return) |
a43ba433 | 514 | if the probed function has a return value, or else an empty string. |
6f017dee FCE |
515 | .TP |
516 | & $EXPR | |
517 | expands to the address of the given context variable expression, if it | |
518 | is addressable. | |
519 | .TP | |
520 | @defined($EXPR) | |
521 | expands to 1 or 0 iff the given context variable expression is resolvable, | |
522 | for use in conditionals such as | |
523 | .SAMPLE | |
f7470174 | 524 | @defined($foo\->bar) ? $foo\->bar : 0 |
6f017dee FCE |
525 | .ESAMPLE |
526 | .TP | |
527 | $EXPR$ | |
528 | expands to a string with all of $EXPR's members, equivalent to | |
529 | .SAMPLE | |
530 | sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}", | |
531 | $EXPR\->a, $EXPR\->b) | |
532 | .ESAMPLE | |
533 | .TP | |
534 | $EXPR$$ | |
535 | expands to a string with all of $var's members and submembers, equivalent to | |
536 | .SAMPLE | |
537 | sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}", | |
538 | $EXPR\->a, $EXPR\->b, $EXPR\->c\->x, $EXPR\->c\->y, $EXPR\->d[0]) | |
539 | .ESAMPLE | |
540 | ||
3f5a5bb1 FCE |
541 | .SS MORE ON RETURN PROBES |
542 | ||
543 | .PP | |
544 | For the kernel ".return" probes, only a certain fixed number of | |
545 | returns may be outstanding. The default is a relatively small number, | |
546 | on the order of a few times the number of physical CPUs. If many | |
547 | different threads concurrently call the same blocking function, such | |
548 | as futex(2) or read(2), this limit could be exceeded, and skipped | |
e996e76a | 549 | "kretprobes" would be reported by "stap \-t". To work around this, |
3f5a5bb1 FCE |
550 | specify a |
551 | .SAMPLE | |
552 | probe FOO.return.maxactive(NNN) | |
553 | .ESAMPLE | |
554 | suffix, with a large enough NNN to cover all expected concurrently blocked | |
555 | threads. Alternately, use the | |
556 | .SAMPLE | |
e996e76a | 557 | stap \-DKRETACTIVE=NNNN |
3f5a5bb1 FCE |
558 | .ESAMPLE |
559 | stap command line macro setting to override the default for all | |
560 | ".return" probes. | |
1c0b8e23 | 561 | |
39e3139a | 562 | .PP |
1c0b8e23 FCE |
563 | For ".return" probes, context variables other than the "$return" may |
564 | be accessible, as a convenience for a script programmer wishing to | |
565 | access function parameters. These values are \fBsnapshots\fP | |
566 | taken at the time of function entry. Local variables within the | |
567 | function are \fBnot\fP generally accessible, since those variables did | |
568 | not exist in allocated/initialized form at the snapshot moment. | |
8cc799a5 | 569 | .PP |
1c0b8e23 FCE |
570 | In addition, arbitrary entry-time expressions can also be saved for |
571 | ".return" probes using the | |
8cc799a5 JS |
572 | .IR @entry(expr) |
573 | operator. For example, one can compute the elapsed time of a function: | |
574 | .SAMPLE | |
575 | probe kernel.function("do_filp_open").return { | |
576 | println( get_timeofday_us() \- @entry(get_timeofday_us()) ) | |
577 | } | |
578 | .ESAMPLE | |
39e3139a | 579 | |
1c0b8e23 FCE |
580 | .PP |
581 | The following table summarizes how values related to a function | |
582 | parameter context variable, a pointer named \fBaddr\fP, may be | |
583 | accessed from a | |
584 | .IR .return | |
585 | probe. | |
586 | .\" summarized from http://sourceware.org/ml/systemtap/2012-q1/msg00025.html | |
587 | .TS | |
588 | l l l. | |
589 | \fBat-entry value past-exit value\fP | |
590 | ||
591 | $addr \fInot available\fP | |
592 | $addr->x->y @cast(@entry($addr),"struct zz")->x->y | |
593 | $addr[0] {kernel,user}_{char,int,...}(& $addr[0]) | |
594 | .TE | |
595 | ||
ba4a90fd | 596 | |
94c3c803 AM |
597 | .SS DWARFLESS |
598 | In absence of debugging information, entry & exit points of kernel & module | |
599 | functions can be probed using the "kprobe" family of probes. | |
600 | However, these do not permit looking up the arguments / local variables | |
601 | of the function. | |
602 | Following constructs are supported : | |
603 | .SAMPLE | |
604 | kprobe.function(FUNCTION) | |
3c57fe1f | 605 | kprobe.function(FUNCTION).call |
94c3c803 AM |
606 | kprobe.function(FUNCTION).return |
607 | kprobe.module(NAME).function(FUNCTION) | |
3c57fe1f | 608 | kprobe.module(NAME).function(FUNCTION).call |
94c3c803 AM |
609 | kprobe.module(NAME).function(FUNCTION).return |
610 | kprobe.statement.(ADDRESS).absolute | |
611 | .ESAMPLE | |
612 | .PP | |
613 | Probes of type | |
614 | .B function | |
615 | are recommended for kernel functions, whereas probes of type | |
616 | .B module | |
617 | are recommended for probing functions of the specified module. | |
618 | In case the absolute address of a kernel or module function is known, | |
619 | .B statement | |
620 | probes can be utilized. | |
621 | .PP | |
622 | Note that | |
623 | .I FUNCTION | |
624 | and | |
625 | .I MODULE | |
626 | names | |
627 | .B must not | |
628 | contain wildcards, or the probe will not be registered. | |
629 | Also, statement probes must be run under guru-mode only. | |
630 | ||
631 | ||
1ada6f08 | 632 | .SS USER-SPACE |
38e96af8 FCE |
633 | Support for user-space probing is available for kernels that are |
634 | configured with the utrace extensions, or have the uprobes facility in | |
635 | linux 3.5. (Various kernel build configuration options need to be | |
636 | enabled; systemtap will advise if these are missing.) | |
637 | ||
0a1c696d FCE |
638 | .PP |
639 | There are several forms. First, a non-symbolic probe point: | |
1ada6f08 FCE |
640 | .SAMPLE |
641 | process(PID).statement(ADDRESS).absolute | |
642 | .ESAMPLE | |
643 | is analogous to | |
644 | .IR | |
645 | kernel.statement(ADDRESS).absolute | |
646 | in that both use raw (unverified) virtual addresses and provide | |
647 | no $variables. The target PID parameter must identify a running | |
648 | process, and ADDRESS should identify a valid instruction address. | |
649 | All threads of that process will be probed. | |
29cb9b42 | 650 | .PP |
0a1c696d FCE |
651 | Second, non-symbolic user-kernel interface events handled by |
652 | utrace may be probed: | |
29cb9b42 | 653 | .SAMPLE |
dd078c96 | 654 | process(PID).begin |
82f0e81b | 655 | process("FULLPATH").begin |
986e98de | 656 | process.begin |
dd078c96 | 657 | process(PID).thread.begin |
82f0e81b | 658 | process("FULLPATH").thread.begin |
986e98de | 659 | process.thread.begin |
dd078c96 | 660 | process(PID).end |
82f0e81b | 661 | process("FULLPATH").end |
986e98de | 662 | process.end |
dd078c96 | 663 | process(PID).thread.end |
82f0e81b | 664 | process("FULLPATH").thread.end |
986e98de | 665 | process.thread.end |
29cb9b42 | 666 | process(PID).syscall |
82f0e81b | 667 | process("FULLPATH").syscall |
986e98de | 668 | process.syscall |
29cb9b42 | 669 | process(PID).syscall.return |
82f0e81b | 670 | process("FULLPATH").syscall.return |
986e98de | 671 | process.syscall.return |
0afb7073 | 672 | process(PID).insn |
82f0e81b | 673 | process("FULLPATH").insn |
0afb7073 | 674 | process(PID).insn.block |
82f0e81b | 675 | process("FULLPATH").insn.block |
29cb9b42 DS |
676 | .ESAMPLE |
677 | .PP | |
678 | A | |
dd078c96 | 679 | .B .begin |
82f0e81b | 680 | probe gets called when new process described by PID or FULLPATH gets created. |
29cb9b42 | 681 | A |
dd078c96 | 682 | .B .thread.begin |
82f0e81b | 683 | probe gets called when a new thread described by PID or FULLPATH gets created. |
159cb109 | 684 | A |
dd078c96 | 685 | .B .end |
82f0e81b | 686 | probe gets called when process described by PID or FULLPATH dies. |
dd078c96 DS |
687 | A |
688 | .B .thread.end | |
82f0e81b | 689 | probe gets called when a thread described by PID or FULLPATH dies. |
29cb9b42 DS |
690 | A |
691 | .B .syscall | |
82f0e81b | 692 | probe gets called when a thread described by PID or FULLPATH makes a |
6270adc1 MH |
693 | system call. The system call number is available in the |
694 | .BR $syscall | |
695 | context variable, and the first 6 arguments of the system call | |
696 | are available in the | |
697 | .BR $argN | |
698 | (ex. $arg1, $arg2, ...) context variable. | |
29cb9b42 DS |
699 | A |
700 | .B .syscall.return | |
82f0e81b | 701 | probe gets called when a thread described by PID or FULLPATH returns from a |
5d67b47c MH |
702 | system call. The system call number is available in the |
703 | .BR $syscall | |
704 | context variable, and the return value of the system call is available | |
705 | in the | |
706 | .BR $return | |
29cb9b42 | 707 | context variable. |
a96d1db0 | 708 | A |
0afb7073 | 709 | .B .insn |
82f0e81b | 710 | probe gets called for every single-stepped instruction of the process described by PID or FULLPATH. |
0afb7073 FCE |
711 | A |
712 | .B .insn.block | |
82f0e81b FCE |
713 | probe gets called for every block-stepped instruction of the process described by PID or FULLPATH. |
714 | .PP | |
715 | If a process probe is specified without a PID or FULLPATH, all user | |
716 | threads will be probed. However, if systemtap was invoked with the | |
f7470174 | 717 | .IR \-c " or " \-x |
82f0e81b | 718 | options, then process probes are restricted to the process |
6d5d594e LB |
719 | hierarchy associated with the target process. If a process probe is |
720 | specified without a PID or FULLPATH, but with the | |
721 | .IR \-c " | |
722 | option, the PATH of the | |
723 | .IR \-c " | |
724 | cmd will be heuristically filled into the process PATH. | |
0a1c696d FCE |
725 | |
726 | .PP | |
727 | Third, symbolic static instrumentation compiled into programs and | |
728 | shared libraries may be | |
729 | probed: | |
730 | .SAMPLE | |
731 | process("PATH").mark("LABEL") | |
a794dbeb | 732 | process("PATH").provider("PROVIDER").mark("LABEL") |
0a1c696d FCE |
733 | .ESAMPLE |
734 | .PP | |
f28a8c28 SC |
735 | A |
736 | .B .mark | |
737 | probe gets called via a static probe which is defined in the | |
38e96af8 FCE |
738 | application by STAP_PROBE1(PROVIDER,LABEL,arg1), which are macros defined in |
739 | .BR sys/sdt.h . | |
740 | The PROVIDER is an arbitrary application identifier, LABEL is the | |
741 | marker site identifier, and arg1 is the integer-typed argument. | |
742 | STAP_PROBE1 is used for probes with 1 argument, STAP_PROBE2 is used | |
743 | for probes with 2 arguments, and so on. The arguments of the probe | |
744 | are available in the context variables $arg1, $arg2, ... An | |
745 | alternative to using the STAP_PROBE macros is to use the dtrace script | |
746 | to create custom macros. Additionally, the variables $$name and | |
747 | $$provider are available as parts of the probe point name. The | |
748 | .B sys/sdt.h | |
749 | macro names DTRACE_PROBE* are available as aliases for STAP_PROBE*. | |
0a1c696d | 750 | |
29cb9b42 | 751 | .PP |
38e96af8 FCE |
752 | Finally, full symbolic source-level probes in user-space programs and |
753 | shared libraries are supported. These are exactly analogous to the | |
754 | symbolic DWARF-based kernel/module probes described above. They | |
755 | expose the same sorts of context $variables for function parameters, | |
756 | local variables, and so on. | |
0a1c696d FCE |
757 | .SAMPLE |
758 | process("PATH").function("NAME") | |
759 | process("PATH").statement("*@FILE.c:123") | |
4d0fcb93 SC |
760 | process("PATH").plt("NAME") |
761 | process("PATH").library("PATH").plt("NAME") | |
b73a1293 SC |
762 | process("PATH").library("PATH").function("NAME") |
763 | process("PATH").library("PATH").statement("*@FILE.c:123") | |
0a1c696d FCE |
764 | process("PATH").function("*").return |
765 | process("PATH").function("myfun").label("foo") | |
766 | .ESAMPLE | |
767 | ||
768 | .PP | |
769 | Note that for all process probes, | |
29cb9b42 | 770 | .I PATH |
ea384b8c FCE |
771 | names refer to executables that are searched the same way shells do: relative |
772 | to the working directory if they contain a "/" character, otherwise in | |
773 | .BR $PATH . | |
d1bcbe71 RH |
774 | If PATH names refer to scripts, the actual interpreters (specified in the |
775 | script in the first line after the #! characters) are probed. | |
b73a1293 SC |
776 | If PATH is a process component parameter referring to shared libraries |
777 | then all processes that map it at runtime would be selected for | |
778 | probing. If PATH is a library component parameter referring to shared | |
779 | libraries then the process specified by the process component would be | |
79dc1dee FCE |
780 | selected. |
781 | ||
782 | .PP | |
783 | A .plt probe will probe functions in the program linkage table | |
4d0fcb93 | 784 | corresponding to the rest of the probe point. .plt can be specified |
79dc1dee FCE |
785 | as a shorthand for .plt("*"). The symbol name is available as a |
786 | $$name context variable; function arguments are not available, since | |
787 | PLTs are processed without debuginfo. | |
788 | ||
789 | .PP | |
82f0e81b FCE |
790 | If the PATH string contains wildcards as in the MPATTERN case, then |
791 | standard globbing is performed to find all matching paths. In this | |
792 | case, the | |
793 | .BR $PATH | |
794 | environment variable is not used. | |
795 | ||
796 | .PP | |
153e7a22 FCE |
797 | If systemtap was invoked with the |
798 | .IR \-c " or " \-x | |
760695db FCE |
799 | options, then process probes are restricted to the process |
800 | hierarchy associated with the target process. | |
1ada6f08 | 801 | |
982026f1 SM |
802 | .SS JAVA |
803 | Support for probing Java methods is available using Byteman as a | |
804 | backend. Byteman is an instrumentation tool from the JBoss project | |
805 | which systemtap can use to monitor invocations for a specific method | |
806 | or line in a Java program. | |
807 | .PP | |
808 | Systemtap does so by generating a Byteman script listing the probes to | |
809 | instrument and then invoking the Byteman | |
810 | .IR bminstall | |
811 | utility. A custom option "\-D OPTION" (see the Byteman documentation | |
812 | for more details) can be passed to bminstall by invoking "stap \-J | |
813 | OPTION". The systemtap option "\-j" is also provided as a shorthand for | |
814 | "\-J org.jboss.byteman.compile.to.bytecode". | |
815 | .PP | |
816 | This Java instrumentation support currently has a major limitation: java | |
817 | probes attach only to one Java process at a time; other Java processes | |
818 | beyond the first one to be observed are ignored. | |
819 | ||
820 | .PP | |
821 | The first probe type refers to Java processes by the name of the Java process: | |
822 | .SAMPLE | |
823 | java("PNAME").class("CLASSNAME").method("PATTERN") | |
824 | java("PNAME").class("CLASSNAME").method("PATTERN").return | |
825 | .ESAMPLE | |
826 | The PATTERN parameter specifies the signature of the Java method to | |
827 | probe. The signature must consist of the exact name of the method, | |
828 | followed by a bracketed list of the types of the arguments, for | |
829 | instance "myMethod(int,double,Foo)". Wildcards are not supported. | |
830 | .PP | |
831 | The probe can be set to trigger at a specific line within the method | |
832 | by appending a line number with colon, just as in other types of | |
833 | probes: "myMethod(int,double,Foo):245". | |
834 | .PP | |
835 | The CLASSNAME parameter identifies the Java class the method belongs | |
836 | to, either with or without the package qualification. By default, the | |
837 | probe only triggers on descendants of the class that do not override | |
838 | the method definition of the original class. However, CLASSNAME can | |
839 | take an optional caret prefix, as in | |
840 | .IR ^org.my.MyClass, | |
841 | which specifies that the probe should also trigger on all descendants | |
842 | of MyClass that override the original method. For instance, every method | |
843 | with signature foo(int) in program org.my.MyApp can be probed at once using | |
844 | .SAMPLE | |
845 | java("org.my.MyApp").class("^java.lang.Object").method("foo(int)") | |
846 | .ESAMPLE | |
847 | .PP | |
848 | The second probe type works analogously, but refers to Java processes by PID: | |
849 | .SAMPLE | |
850 | java(PID).class("CLASSNAME").method("PATTERN") | |
851 | java(PID).class("CLASSNAME").method("PATTERN").return | |
852 | .ESAMPLE | |
853 | (PIDs for an already running process can be obtained using the | |
854 | .IR jps (1) | |
855 | utility.) | |
856 | ||
9cb48751 DS |
857 | .SS PROCFS |
858 | ||
859 | These probe points allow procfs "files" in | |
c243f608 LB |
860 | /proc/systemtap/MODNAME to be created, read and written using a |
861 | permission that may be modified using the proper umask value. Default permissions are 0400 for read | |
862 | probes, and 0200 for write probes. If both a read and write probe are being | |
863 | used on the same file, a default permission of 0600 will be used. | |
864 | Using procfs.umask(0040).read would | |
865 | result in a 0404 permission set for the file. | |
9cb48751 DS |
866 | .RI ( MODNAME |
867 | is the name of the systemtap module). The | |
868 | .I proc | |
869 | filesystem is a pseudo-filesystem which is used an an interface to | |
c243f608 | 870 | kernel data structures. There are several probe point variants supported |
9cb48751 | 871 | by the translator: |
ca88561f | 872 | |
9cb48751 DS |
873 | .SAMPLE |
874 | procfs("PATH").read | |
c243f608 | 875 | procfs("PATH").umask(UMASK).read |
38975255 | 876 | procfs("PATH").read.maxsize(MAXSIZE) |
c243f608 | 877 | procfs("PATH").umask(UMASK).maxsize(MAXSIZE) |
9cb48751 | 878 | procfs("PATH").write |
c243f608 | 879 | procfs("PATH").umask(UMASK).write |
9cb48751 | 880 | procfs.read |
c243f608 | 881 | procfs.umask(UMASK).read |
38975255 | 882 | procfs.read.maxsize(MAXSIZE) |
c243f608 | 883 | procfs.umask(UMASK).read.maxsize(MAXSIZE) |
9cb48751 | 884 | procfs.write |
c243f608 | 885 | procfs.umask(UMASK).write |
9cb48751 | 886 | .ESAMPLE |
ca88561f | 887 | |
9cb48751 DS |
888 | .I PATH |
889 | is the file name (relative to /proc/systemtap/MODNAME) to be created. | |
890 | If no | |
891 | .I PATH | |
892 | is specified (as in the last two variants above), | |
893 | .I PATH | |
894 | defaults to "command". | |
895 | .PP | |
896 | When a user reads /proc/systemtap/MODNAME/PATH, the corresponding | |
897 | procfs | |
898 | .I read | |
899 | probe is triggered. The string data to be read should be assigned to | |
900 | a variable named | |
901 | .IR $value , | |
902 | like this: | |
ca88561f | 903 | |
9cb48751 DS |
904 | .SAMPLE |
905 | procfs("PATH").read { $value = "100\\n" } | |
906 | .ESAMPLE | |
907 | .PP | |
908 | When a user writes into /proc/systemtap/MODNAME/PATH, the | |
909 | corresponding procfs | |
910 | .I write | |
911 | probe is triggered. The data the user wrote is available in the | |
912 | string variable named | |
913 | .IR $value , | |
914 | like this: | |
ca88561f | 915 | |
9cb48751 DS |
916 | .SAMPLE |
917 | procfs("PATH").write { printf("user wrote: %s", $value) } | |
918 | .ESAMPLE | |
38975255 DS |
919 | .PP |
920 | .I MAXSIZE | |
921 | is the size of the procfs read buffer. Specifying | |
922 | .I MAXSIZE | |
923 | allows larger procfs output. If no | |
924 | .I MAXSIZE | |
925 | is specified, the procfs read buffer defaults to | |
926 | .I STP_PROCFS_BUFSIZE | |
927 | (which defaults to | |
928 | .IR MAXSTRINGLEN , | |
929 | the maximum length of a string). | |
930 | If setting the procfs read buffers for more than one file is needed, | |
931 | it may be easiest to override the | |
932 | .I STP_PROCFS_BUFSIZE | |
933 | definition. | |
934 | Here's an example of using | |
935 | .IR MAXSIZE : | |
936 | ||
937 | .SAMPLE | |
938 | procfs.read.maxsize(1024) { | |
939 | $value = "long string..." | |
940 | $value .= "another long string..." | |
941 | $value .= "another long string..." | |
942 | $value .= "another long string..." | |
943 | } | |
944 | .ESAMPLE | |
9cb48751 | 945 | |
da00b50e SM |
946 | .SS NETFILTER HOOKS |
947 | ||
948 | These probe points allow observation of network packets using the | |
949 | netfilter mechanism. A netfilter probe in systemtap corresponds to a | |
950 | netfilter hook function in the original netfilter probes API. It is | |
951 | probably more convenient to use | |
952 | .IR tapset::netfilter (3stap), | |
953 | which wraps the primitive netfilter hooks and does the work of | |
954 | extracting useful information from the context variables. | |
955 | ||
956 | .PP | |
957 | There are several probe point variants supported by the translator: | |
958 | ||
959 | .SAMPLE | |
960 | netfilter.hook("HOOKNAME").pf("PROTOCOL_F") | |
961 | netfilter.pf("PROTOCOL_F").hook("HOOKNAME") | |
962 | netfilter.hook("HOOKNAME").pf("PROTOCOL_F").priority("PRIORITY") | |
963 | netfilter.pf("PROTOCOL_F").hook("HOOKNAME").priority("PRIORITY") | |
964 | .ESAMPLE | |
965 | ||
966 | .PP | |
967 | .I PROTOCOL_F | |
968 | is the protocol family to listen for, currently one of | |
969 | .I NFPROTO_IPV4, | |
970 | .I NFPROTO_IPV6, | |
971 | .I NFPROTO_ARP, | |
972 | or | |
973 | .I NFPROTO_BRIDGE. | |
974 | ||
975 | .PP | |
976 | .I HOOKNAME | |
977 | is the point, or 'hook', in the protocol stack at which to intercept | |
978 | the packet. The available hook names for each protocol family are | |
979 | taken from the kernel header files <linux/netfilter_ipv4.h>, | |
980 | <linux/netfilter_ipv6.h>, <linux/netfilter_arp.h> and | |
981 | <linux/netfilter_bridge.h>. For instance, allowable hook names for | |
982 | .I NFPROTO_IPV4 | |
983 | are | |
984 | .I NF_INET_PRE_ROUTING, | |
985 | .I NF_INET_LOCAL_IN, | |
986 | .I NF_INET_FORWARD, | |
987 | .I NF_INET_LOCAL_OUT, | |
988 | and | |
989 | .I NF_INET_POST_ROUTING. | |
990 | ||
991 | .PP | |
992 | .I PRIORITY | |
993 | is an integer priority giving the order in which the probe point | |
994 | should be triggered relative to any other netfilter hook functions | |
995 | which trigger on the same packet. Hook functions execute on each | |
996 | packet in order from smallest priority number to largest priority number. If no | |
997 | .I PRIORITY | |
998 | is specified (as in the first two probe point variants above), | |
999 | .I PRIORITY | |
1000 | defaults to "0". | |
1001 | ||
1002 | There are a number of predefined priority names of the form | |
1003 | .I NF_IP_PRI_* | |
1004 | and | |
1005 | .I NF_IP6_PRI_* | |
1006 | which are defined in the kernel header files <linux/netfilter_ipv4.h> and <linux/netfilter_ipv6.h> respectively. The script is permitted to use these | |
1007 | instead of specifying an integer priority. (The probe points for | |
1008 | .I NFPROTO_ARP | |
1009 | and | |
1010 | .I NFPROTO_BRIDGE | |
1011 | currently do not expose any named hook priorities to the script writer.) | |
1012 | Thus, allowable ways to specify the priority include: | |
1013 | ||
1014 | .SAMPLE | |
1015 | priority("255") | |
1016 | priority("NF_IP_PRI_SELINUX_LAST") | |
1017 | .ESAMPLE | |
1018 | ||
1019 | A script using guru mode is permitted to specify any identifier or | |
1020 | number as the parameter for hook, pf, and priority. This feature | |
1021 | should be used with caution, as the parameter is inserted verbatim into | |
1022 | the C code generated by systemtap. | |
1023 | ||
1024 | The netfilter probe points define the following context variables: | |
1025 | .TP | |
1026 | .IR $skb | |
1027 | The address of the sk_buff struct representing the packet. See | |
1028 | <linux/skbuff.h> for details on how to use this struct, or | |
1029 | alternatively use the tapset | |
1030 | .IR tapset::netfilter (3stap) | |
1031 | for easy access to key information. | |
1032 | ||
1033 | .TP | |
1034 | .IR $in | |
1035 | The address of the net_device struct representing the network device | |
1036 | on which the packet was received (if any). May be 0 if the device is | |
1037 | unknown or undefined at that stage in the protocol stack. | |
1038 | ||
1039 | .TP | |
1040 | .IR $out | |
1041 | The address of the net_device struct representing the network device | |
1042 | on which the packet will be sent (if any). May be 0 if the device is | |
1043 | unknown or undefined at that stage in the protocol stack. | |
1044 | ||
1045 | .TP | |
1046 | .IR $verdict | |
1047 | (Guru mode only.) Assigning one of the verdict values defined in | |
1048 | <linux/netfilter.h> to this variable alters the further progress of | |
1049 | the packet through the protocol stack. For instance, the following | |
1050 | guru mode script forces all ipv6 network packets to be dropped: | |
1051 | ||
1052 | .SAMPLE | |
1053 | probe netfilter.pf("NFPROTO_IPV6").hook("NF_IP6_PRE_ROUTING") { | |
c49ffe6c | 1054 | $verdict = 0 /* nf_drop */ |
da00b50e SM |
1055 | } |
1056 | .ESAMPLE | |
1057 | ||
c49ffe6c SM |
1058 | For convenience, unlike the primitive probe points discussed here, the |
1059 | probes defined in | |
1060 | .IR tapset::netfilter (3stap) | |
1061 | export the lowercase names of the verdict constants (e.g. NF_DROP | |
1062 | becomes nf_drop) as local variables. | |
1063 | ||
6f05b6ab FCE |
1064 | .SS MARKERS |
1065 | ||
1066 | This family of probe points hooks up to static probing markers | |
1067 | inserted into the kernel or modules. These markers are special macro | |
1068 | calls inserted by kernel developers to make probing faster and more | |
1069 | reliable than with DWARF-based probes. Further, DWARF debugging | |
1070 | information is | |
1071 | .I not | |
1072 | required to probe markers. | |
1073 | ||
1074 | Marker probe points begin with | |
f781f849 DS |
1075 | .BR kernel . |
1076 | The next part names the marker itself: | |
6f05b6ab FCE |
1077 | .BR mark("name") . |
1078 | The marker name string, which may contain the usual wildcard characters, | |
1079 | is matched against the names given to the marker macros when the kernel | |
eb973c2a DS |
1080 | and/or module was compiled. Optionally, you can specify |
1081 | .BR format("format") . | |
37f6433e | 1082 | Specifying the marker format string allows differentiation between two |
eb973c2a | 1083 | markers with the same name but different marker format strings. |
6f05b6ab FCE |
1084 | |
1085 | The handler associated with a marker-based probe may read the | |
1086 | optional parameters specified at the macro call site. These are | |
1087 | named | |
1088 | .BR $arg1 " through " $argNN , | |
1089 | where NN is the number of parameters supplied by the macro. Number | |
1090 | and string parameters are passed in a type-safe manner. | |
1091 | ||
eb973c2a DS |
1092 | The marker format string associated with a marker is available in |
1093 | .BR $format . | |
37f6433e | 1094 | And also the marker name string is available in |
bc54e71c | 1095 | .BR $name . |
eb973c2a | 1096 | |
bc724b8b JS |
1097 | .SS TRACEPOINTS |
1098 | ||
1099 | This family of probe points hooks up to static probing tracepoints | |
1100 | inserted into the kernel or modules. As with markers, these | |
1101 | tracepoints are special macro calls inserted by kernel developers to | |
1102 | make probing faster and more reliable than with DWARF-based probes, | |
1103 | and DWARF debugging information is not required to probe tracepoints. | |
1104 | Tracepoints have an extra advantage of more strongly-typed parameters | |
1105 | than markers. | |
1106 | ||
1107 | Tracepoint probes begin with | |
1108 | .BR kernel . | |
1109 | The next part names the tracepoint itself: | |
1110 | .BR trace("name") . | |
1111 | The tracepoint name string, which may contain the usual wildcard | |
1112 | characters, is matched against the names defined by the kernel | |
1113 | developers in the tracepoint header files. | |
1114 | ||
1115 | The handler associated with a tracepoint-based probe may read the | |
1116 | optional parameters specified at the macro call site. These are | |
1117 | named according to the declaration by the tracepoint author. For | |
1118 | example, the tracepoint probe | |
1119 | .BR kernel.trace("sched_switch") | |
1120 | provides the parameters | |
1121 | .BR $rq ", " $prev ", and " $next . | |
1122 | If the parameter is a complex type, as in a struct pointer, then a | |
1123 | script can access fields with the same syntax as DWARF $target | |
1124 | variables. Also, tracepoint parameters cannot be modified, but in | |
1125 | guru-mode a script may modify fields of parameters. | |
1126 | ||
1127 | The name of the tracepoint is available in | |
1128 | .BR $$name , | |
1129 | and a string of name=value pairs for all parameters of the tracepoint | |
1130 | is available in | |
046e7190 | 1131 | .BR $$vars " or " $$parms . |
bc724b8b | 1132 | |
dd225250 PS |
1133 | .SS HARDWARE BREAKPOINTS |
1134 | This family of probes is used to set hardware watchpoints for a given | |
1135 | (global) kernel symbol. The probes take three components as inputs : | |
1136 | ||
1137 | 1. The | |
1138 | .BR virtual address / name | |
1139 | of the kernel symbol to be traced is supplied as argument to this class | |
1140 | of probes. ( Probes for only data segment variables are supported. Probing | |
1141 | local variables of a function cannot be done.) | |
1142 | ||
1143 | 2. Nature of access to be probed : | |
1144 | a. | |
1145 | .I .write | |
1146 | probe gets triggered when a write happens at the specified address/symbol | |
1147 | name. | |
1148 | b. | |
1149 | .I rw | |
1150 | probe is triggered when either a read or write happens. | |
1151 | ||
1152 | 3. | |
1153 | .BR .length | |
1154 | (optional) | |
1155 | Users have the option of specifying the address interval to be probed | |
1156 | using "length" constructs. The user-specified length gets approximated | |
1157 | to the closest possible address length that the architecture can | |
1158 | support. If the specified length exceeds the limits imposed by | |
1159 | architecture, an error message is flagged and probe registration fails. | |
1160 | Wherever 'length' is not specified, the translator requests a hardware | |
1161 | breakpoint probe of length 1. It should be noted that the "length" | |
1162 | construct is not valid with symbol names. | |
1163 | ||
1164 | Following constructs are supported : | |
1165 | .SAMPLE | |
1166 | probe kernel.data(ADDRESS).write | |
1167 | probe kernel.data(ADDRESS).rw | |
1168 | probe kernel.data(ADDRESS).length(LEN).write | |
1169 | probe kernel.data(ADDRESS).length(LEN).rw | |
1170 | probe kernel.data("SYMBOL_NAME").write | |
1171 | probe kernel.data("SYMBOL_NAME").rw | |
1172 | .ESAMPLE | |
1173 | ||
1174 | This set of probes make use of the debug registers of the processor, | |
1175 | which is a scarce resource. (4 on x86 , 1 on powerpc ) The script | |
1176 | translation flags a warning if a user requests more hardware breakpoint probes | |
1177 | than the limits set by architecture. For example,a pass-2 warning is flashed | |
1178 | when an input script requests 5 hardware breakpoint probes on an x86 | |
1179 | system while x86 architecture supports a maximum of 4 breakpoints. | |
1180 | Users are cautioned to set probes judiciously. | |
1181 | ||
9becfcef MW |
1182 | .SS PERF |
1183 | ||
1184 | This | |
1185 | .IR prototype | |
1186 | family of probe points interfaces to the kernel "perf event" | |
cb7d3cd8 | 1187 | infrastructure for controlling hardware performance counters. |
9becfcef MW |
1188 | The events being attached to are described by the "type", |
1189 | "config" fields of the | |
1190 | .IR perf_event_attr | |
1191 | structure, and are sampled at an interval governed by the | |
1192 | "sample_period" field. | |
1193 | ||
1194 | These fields are made available to systemtap scripts using | |
1195 | the following syntax: | |
1196 | .SAMPLE | |
1197 | probe perf.type(NN).config(MM).sample(XX) | |
1198 | probe perf.type(NN).config(MM) | |
dbdab5c8 SC |
1199 | probe perf.type(NN).config(MM).process("PROC") |
1200 | probe perf.type(NN).config(MM).counter("COUNTER") | |
1201 | probe perf.type(NN).config(MM).process("PROC").counter("COUNTER") | |
9becfcef MW |
1202 | .ESAMPLE |
1203 | The systemtap probe handler is called once per XX increments | |
1204 | of the underlying performance counter. The default sampling | |
1205 | count is 1000000. | |
1206 | The range of valid type/config is described by the | |
1207 | .IR perf_event_open (2) | |
1208 | system call, and/or the | |
1209 | .IR linux/perf_event.h | |
1210 | file. Invalid combinations or exhausted hardware counter resources | |
1211 | result in errors during systemtap script startup. Systemtap does | |
1212 | not sanity-check the values: it merely passes them through to | |
6a8fe809 SC |
1213 | the kernel for error- and safety-checking. By default the perf event |
1214 | probe is systemwide unless .process is specified, which will bind the | |
fce2c5df | 1215 | probe to a specific task. If the name is omitted then it |
e996e76a | 1216 | is inferred from the stap \-c argument. A perf event can be read on |
75cd04ca SC |
1217 | demand using .counter. The body of the perf probe handler will not be |
1218 | invoked for a .counter probe; instead, the counter is read in a user | |
1219 | space probe via: | |
dbdab5c8 SC |
1220 | .TP |
1221 | process("PROCESS").statement("func@file") {stat <<< @perf("NAME")} | |
1222 | ||
fce2c5df | 1223 | |
ba4a90fd FCE |
1224 | .SH EXAMPLES |
1225 | .PP | |
1226 | Here are some example probe points, defining the associated events. | |
1227 | .TP | |
1228 | begin, end, end | |
1229 | refers to the startup and normal shutdown of the session. In this | |
1230 | case, the handler would run once during startup and twice during | |
1231 | shutdown. | |
1232 | .TP | |
1233 | timer.jiffies(1000).randomize(200) | |
13d2ecdb | 1234 | refers to a periodic interrupt, every 1000 +/\- 200 jiffies. |
ba4a90fd FCE |
1235 | .TP |
1236 | kernel.function("*init*"), kernel.function("*exit*") | |
1237 | refers to all kernel functions with "init" or "exit" in the name. | |
1238 | .TP | |
199d126d MW |
1239 | kernel.function("*@kernel/time.c:240") |
1240 | refers to any functions within the "kernel/time.c" file that span | |
6ff00e1d FCE |
1241 | line 240. |
1242 | .BR | |
1243 | Note | |
1244 | that this is | |
1245 | .BR not | |
1246 | a probe at the statement at that line number. Use the | |
1247 | .IR | |
1248 | kernel.statement | |
1249 | probe instead. | |
ba4a90fd | 1250 | .TP |
6f05b6ab FCE |
1251 | kernel.mark("getuid") |
1252 | refers to an STAP_MARK(getuid, ...) macro call in the kernel. | |
1253 | .TP | |
ba4a90fd FCE |
1254 | module("usb*").function("*sync*").return |
1255 | refers to the moment of return from all functions with "sync" in the | |
1256 | name in any of the USB drivers. | |
1257 | .TP | |
1258 | kernel.statement(0xc0044852) | |
1259 | refers to the first byte of the statement whose compiled instructions | |
1260 | include the given address in the kernel. | |
b4ceace2 | 1261 | .TP |
199d126d MW |
1262 | kernel.statement("*@kernel/time.c:296") |
1263 | refers to the statement of line 296 within "kernel/time.c". | |
1bd128a3 SC |
1264 | .TP |
1265 | kernel.statement("bio_init@fs/bio.c+3") | |
1266 | refers to the statement at line bio_init+3 within "fs/bio.c". | |
a5ae3f3d | 1267 | .TP |
dd225250 | 1268 | kernel.data("pid_max").write |
cb7d3cd8 | 1269 | refers to a hardware breakpoint of type "write" set on pid_max |
dd225250 | 1270 | .TP |
729286d8 | 1271 | syscall.*.return |
b4ceace2 | 1272 | refers to the group of probe aliases with any name in the third position |
ba4a90fd FCE |
1273 | |
1274 | .SH SEE ALSO | |
78db65bd | 1275 | .IR stap (1), |
89965a32 FCE |
1276 | .IR probe::* (3stap), |
1277 | .IR tapset::* (3stap) | |
1c0b8e23 FCE |
1278 | |
1279 | .\" Local Variables: | |
1280 | .\" mode: nroff | |
1281 | .\" End: |