]>
Commit | Line | Data |
---|---|---|
5f92f126 | 1 | .\" t |
ec1a2239 | 2 | .TH STAPPROBES 3stap |
ba4a90fd FCE |
3 | .SH NAME |
4 | stapprobes \- systemtap probe points | |
5 | ||
6 | .\" macros | |
7 | .de SAMPLE | |
8 | .br | |
9 | .RS | |
10 | .nf | |
11 | .nh | |
12 | .. | |
13 | .de ESAMPLE | |
14 | .hy | |
15 | .fi | |
16 | .RE | |
17 | .. | |
18 | ||
19 | .SH DESCRIPTION | |
20 | The following sections enumerate the variety of probe points supported | |
89965a32 FCE |
21 | by the systemtap translator, and some of the additional aliases defined by |
22 | standard tapset scripts. Many are individually documented in the | |
23 | .IR 3stap | |
24 | manual section, with the | |
25 | .IR probe:: | |
26 | prefix. | |
67d1ed18 FCE |
27 | |
28 | .SH SYNTAX | |
29 | ||
30 | .PP | |
31 | .SAMPLE | |
32 | .BR probe " PROBEPOINT [" , " PROBEPOINT] " { " [STMT ...] " } | |
33 | .ESAMPLE | |
34 | .PP | |
35 | A probe declaration may list multiple comma-separated probe points in | |
36 | order to attach a handler to all of the named events. Normally, the | |
37 | handler statements are run whenever any of events occur. | |
ba4a90fd | 38 | .PP |
67d1ed18 FCE |
39 | The syntax of a single probe point is a general dotted-symbol |
40 | sequence. This allows a breakdown of the event namespace into parts, | |
41 | somewhat like the Domain Name System does on the Internet. Each | |
42 | component identifier may be parametrized by a string or number | |
43 | literal, with a syntax like a function call. A component may include | |
44 | a "*" character, to expand to a set of matching probe points. It may | |
45 | also include "**" to match multiple sequential components at once. | |
46 | Probe aliases likewise expand to other probe points. | |
2f5bbffa | 47 | .PP |
67d1ed18 FCE |
48 | Probe aliases can be given on their own, or with a suffix. The suffix |
49 | attaches to the underlying probe point that the alias is expanded | |
50 | to. For example, | |
2f5bbffa SM |
51 | .SAMPLE |
52 | syscall.read.return.maxactive(10) | |
53 | .ESAMPLE | |
54 | expands to | |
55 | .SAMPLE | |
56 | kernel.function("sys_read").return.maxactive(10) | |
57 | .ESAMPLE | |
58 | with the component | |
59 | .IR maxactive(10) | |
60 | being recognized as a suffix. | |
61 | .PP | |
67d1ed18 FCE |
62 | Normally, each and every probe point resulting from wildcard- and |
63 | alias-expansion must be resolved to some low-level system | |
64 | instrumentation facility (e.g., a kprobe address, marker, or a timer | |
65 | configuration), otherwise the elaboration phase will fail. | |
d898100a FCE |
66 | .PP |
67 | However, a probe point may be followed by a "?" character, to indicate | |
68 | that it is optional, and that no error should result if it fails to | |
69 | resolve. Optionalness passes down through all levels of | |
70 | alias/wildcard expansion. Alternately, a probe point may be followed | |
71 | by a "!" character, to indicate that it is both optional and | |
37f6433e | 72 | sufficient. (Think vaguely of the Prolog cut operator.) If it does |
d898100a FCE |
73 | resolve, then no further probe points in the same comma-separated list |
74 | will be resolved. Therefore, the "!" sufficiency mark only makes | |
75 | sense in a list of probe point alternatives. | |
dfd11cc3 MH |
76 | .PP |
77 | Additionally, a probe point may be followed by a "if (expr)" statement, in | |
78 | order to enable/disable the probe point on-the-fly. With the "if" statement, | |
79 | if the "expr" is false when the probe point is hit, the whole probe body | |
80 | including alias's body is skipped. The condition is stacked up through | |
81 | all levels of alias/wildcard expansion. So the final condition becomes | |
67d1ed18 FCE |
82 | the logical-and of conditions of all expanded alias/wildcard. The expressions |
83 | are necessarily restricted to global variables. | |
84 | .PP | |
e904ad95 FCE |
85 | These are all |
86 | .B syntactically | |
87 | valid probe points. (They are generally | |
88 | .B semantically | |
89 | invalid, depending on the contents of the tapsets, and the versions of | |
90 | kernel/user software installed.) | |
ca88561f | 91 | |
ba4a90fd FCE |
92 | .SAMPLE |
93 | kernel.function("foo").return | |
e904ad95 | 94 | process("/bin/vi").statement(0x2222) |
ba4a90fd | 95 | end |
729286d8 | 96 | syscall.* |
2f5bbffa | 97 | syscall.*.return.maxactive(10) |
649260f3 | 98 | sys**open |
6e3347a9 | 99 | kernel.function("no_such_function") ? |
d898100a | 100 | module("awol").function("no_such_function") ! |
dfd11cc3 | 101 | signal.*? if (switch) |
94c3c803 | 102 | kprobe.function("foo") |
ba4a90fd FCE |
103 | .ESAMPLE |
104 | ||
6f05b6ab FCE |
105 | Probes may be broadly classified into "synchronous" and |
106 | "asynchronous". A "synchronous" event is deemed to occur when any | |
107 | processor executes an instruction matched by the specification. This | |
108 | gives these probes a reference point (instruction address) from which | |
109 | more contextual data may be available. Other families of probe points | |
110 | refer to "asynchronous" events such as timers/counters rolling over, | |
111 | where there is no fixed reference point that is related. Each probe | |
112 | point specification may match multiple locations (for example, using | |
113 | wildcards or aliases), and all them are then probed. A probe | |
114 | declaration may also contain several comma-separated specifications, | |
115 | all of which are probed. | |
116 | ||
5f92f126 FCE |
117 | .SH DWARF DEBUGINFO |
118 | ||
119 | Resolving some probe points requires DWARF debuginfo or "debug | |
120 | symbols" for the specific part being instrumented. For some others, | |
121 | DWARF is automatically synthesized on the fly from source code header | |
122 | files. For others, it is not needed at all. Since a systemtap script | |
123 | may use any mixture of probe points together, the union of their DWARF | |
124 | requirements has to be met on the computer where script compilation | |
125 | occurs. (See the \fI\-\-use\-server\fR option and the \fBstap-server\ | |
126 | (8)\fR man page for information about the remote compilation facility, | |
127 | which allows these requirements to be met on a different machine.) | |
128 | .PP | |
129 | The following point lists many of the available probe point families, | |
130 | to classify them with respect to their need for DWARF debuginfo. | |
131 | ||
132 | .TS | |
133 | l l l. | |
7bfd1083 | 134 | \fBDWARF NON-DWARF\fP |
5f92f126 | 135 | |
7bfd1083 | 136 | kernel.function, .statement kernel.mark |
79dc1dee | 137 | module.function, .statement process.mark, process.plt |
7bfd1083 TJL |
138 | process.function, .statement begin, end, error, never |
139 | process.mark \fI(backup)\fP timer | |
140 | perf | |
141 | procfs | |
142 | \fBAUTO-DWARF\fP kernel.statement.absolute | |
143 | kernel.data | |
144 | kernel.trace kprobe.function | |
145 | process.statement.absolute | |
146 | process.begin, .end, .error | |
5f92f126 FCE |
147 | .TE |
148 | ||
149 | .SH PROBE POINT FAMILIES | |
150 | ||
65aeaea0 | 151 | .SS BEGIN/END/ERROR |
ba4a90fd FCE |
152 | |
153 | The probe points | |
154 | .IR begin " and " end | |
155 | are defined by the translator to refer to the time of session startup | |
156 | and shutdown. All "begin" probe handlers are run, in some sequence, | |
157 | during the startup of the session. All global variables will have | |
158 | been initialized prior to this point. All "end" probes are run, in | |
159 | some sequence, during the | |
160 | .I normal | |
161 | shutdown of a session, such as in the aftermath of an | |
162 | .I exit () | |
163 | function call, or an interruption from the user. In the case of an | |
164 | error-triggered shutdown, "end" probes are not run. There are no | |
165 | target variables available in either context. | |
6a256b03 JS |
166 | .PP |
167 | If the order of execution among "begin" or "end" probes is significant, | |
168 | then an optional sequence number may be provided: | |
ca88561f | 169 | |
6a256b03 JS |
170 | .SAMPLE |
171 | begin(N) | |
172 | end(N) | |
173 | .ESAMPLE | |
ca88561f | 174 | |
6a256b03 JS |
175 | The number N may be positive or negative. The probe handlers are run in |
176 | increasing order, and the order between handlers with the same sequence | |
177 | number is unspecified. When "begin" or "end" are given without a | |
178 | sequence, they are effectively sequence zero. | |
ba4a90fd | 179 | |
65aeaea0 FCE |
180 | The |
181 | .IR error | |
182 | probe point is similar to the | |
183 | .IR end | |
d898100a FCE |
184 | probe, except that each such probe handler run when the session ends |
185 | after errors have occurred. In such cases, "end" probes are skipped, | |
37f6433e | 186 | but each "error" probe is still attempted. This kind of probe can be |
d898100a FCE |
187 | used to clean up or emit a "final gasp". It may also be numerically |
188 | parametrized to set a sequence. | |
65aeaea0 | 189 | |
6e3347a9 FCE |
190 | .SS NEVER |
191 | The probe point | |
192 | .IR never | |
193 | is specially defined by the translator to mean "never". Its probe | |
194 | handler is never run, though its statements are analyzed for symbol / | |
195 | type correctness as usual. This probe point may be useful in | |
196 | conjunction with optional probes. | |
197 | ||
1027502b FCE |
198 | .SS SYSCALL |
199 | ||
200 | The | |
201 | .IR syscall.* | |
202 | aliases define several hundred probes, too many to | |
56bd0316 | 203 | detail here. They are of the general form: |
1027502b FCE |
204 | |
205 | .SAMPLE | |
206 | syscall.NAME | |
207 | .br | |
208 | syscall.NAME.return | |
209 | .ESAMPLE | |
210 | ||
211 | Generally, two probes are defined for each normal system call as listed in the | |
212 | .IR syscalls(2) | |
213 | manual page, one for entry and one for return. Those system calls that never | |
214 | return do not have a corresponding | |
215 | .IR .return | |
216 | probe. | |
217 | .PP | |
df7f3a01 | 218 | Each probe alias provides a variety of variables. Looking at the tapset source |
1027502b FCE |
219 | code is the most reliable way. Generally, each variable listed in the standard |
220 | manual page is made available as a script-level variable, so | |
221 | .IR syscall.open | |
222 | exposes | |
223 | .IR filename ", " flags ", and " mode . | |
224 | In addition, a standard suite of variables is available at most aliases: | |
225 | .TP | |
226 | .IR argstr | |
227 | A pretty-printed form of the entire argument list, without parentheses. | |
228 | .TP | |
229 | .IR name | |
230 | The name of the system call. | |
231 | .TP | |
232 | .IR retstr | |
233 | For return probes, a pretty-printed form of the system-call result. | |
234 | .PP | |
df7f3a01 FCE |
235 | As usual for probe aliases, these variables are all simply initialized |
236 | once from the underlying $context variables, so that later changes to | |
237 | $context variables are not automatically reflected. Not all probe | |
238 | aliases obey all of these general guidelines. Please report any | |
239 | bothersome ones you encounter as a bug. | |
1027502b FCE |
240 | |
241 | ||
ba4a90fd FCE |
242 | .SS TIMERS |
243 | ||
244 | Intervals defined by the standard kernel "jiffies" timer may be used | |
245 | to trigger probe handlers asynchronously. Two probe point variants | |
246 | are supported by the translator: | |
ca88561f | 247 | |
ba4a90fd FCE |
248 | .SAMPLE |
249 | timer.jiffies(N) | |
250 | timer.jiffies(N).randomize(M) | |
251 | .ESAMPLE | |
ca88561f | 252 | |
ba4a90fd FCE |
253 | The probe handler is run every N jiffies (a kernel-defined unit of |
254 | time, typically between 1 and 60 ms). If the "randomize" component is | |
13d2ecdb | 255 | given, a linearly distributed random value in the range [\-M..+M] is |
ba4a90fd FCE |
256 | added to N every time the handler is run. N is restricted to a |
257 | reasonable range (1 to around a million), and M is restricted to be | |
258 | smaller than N. There are no target variables provided in either | |
259 | context. It is possible for such probes to be run concurrently on | |
260 | a multi-processor computer. | |
422d1ceb | 261 | .PP |
197a4d62 | 262 | Alternatively, intervals may be specified in units of time. |
422d1ceb | 263 | There are two probe point variants similar to the jiffies timer: |
ca88561f | 264 | |
422d1ceb FCE |
265 | .SAMPLE |
266 | timer.ms(N) | |
267 | timer.ms(N).randomize(M) | |
268 | .ESAMPLE | |
ca88561f | 269 | |
197a4d62 JS |
270 | Here, N and M are specified in milliseconds, but the full options for units |
271 | are seconds (s/sec), milliseconds (ms/msec), microseconds (us/usec), | |
272 | nanoseconds (ns/nsec), and hertz (hz). Randomization is not supported for | |
273 | hertz timers. | |
274 | ||
275 | The actual resolution of the timers depends on the target kernel. For | |
276 | kernels prior to 2.6.17, timers are limited to jiffies resolution, so | |
277 | intervals are rounded up to the nearest jiffies interval. After 2.6.17, | |
278 | the implementation uses hrtimers for tighter precision, though the actual | |
279 | resolution will be arch-dependent. In either case, if the "randomize" | |
280 | component is given, then the random value will be added to the interval | |
281 | before any rounding occurs. | |
39e57ce0 | 282 | .PP |
ab8b5560 FCE |
283 | Profiling timers are also available to provide probes that execute on |
284 | all CPUs at the rate of the system tick (CONFIG_HZ). This probe takes | |
285 | no parameters. On some kernels, this is a one-concurrent-user-only or | |
286 | disabled facility, resulting in error -16 (EBUSY) during probe | |
287 | registration. | |
ca88561f | 288 | |
39e57ce0 FCE |
289 | .SAMPLE |
290 | timer.profile | |
291 | .ESAMPLE | |
ca88561f | 292 | |
39e57ce0 FCE |
293 | Full context information of the interrupted process is available, making |
294 | this probe suitable for a time-based sampling profiler. | |
ba4a90fd FCE |
295 | |
296 | .SS DWARF | |
297 | ||
298 | This family of probe points uses symbolic debugging information for | |
299 | the target kernel/module/program, as may be found in unstripped | |
300 | executables, or the separate | |
301 | .I debuginfo | |
302 | packages. They allow placement of probes logically into the execution | |
303 | path of the target program, by specifying a set of points in the | |
304 | source or object code. When a matching statement executes on any | |
305 | processor, the probe handler is run in that context. | |
306 | .PP | |
307 | Points in a kernel, which are identified by | |
ca88561f | 308 | module, source file, line number, function name, or some |
6f05b6ab | 309 | combination of these. |
ba4a90fd FCE |
310 | .PP |
311 | Here is a list of probe point families currently supported. The | |
312 | .B .function | |
313 | variant places a probe near the beginning of the named function, so that | |
314 | parameters are available as context variables. The | |
315 | .B .return | |
39e3139a FCE |
316 | variant places a probe at the moment |
317 | .B after | |
318 | the return from the named function, so the return value is available | |
319 | as the "$return" context variable. The | |
54efe513 | 320 | .B .inline |
b8da0ad1 | 321 | modifier for |
54efe513 | 322 | .B .function |
b8da0ad1 FCE |
323 | filters the results to include only instances of inlined functions. |
324 | The | |
325 | .B .call | |
736d8a14 SC |
326 | modifier selects the opposite subset. The |
327 | .B .exported | |
328 | modifier | |
4bda987e SC |
329 | filters the results to include only exported functions. Inline |
330 | functions do not have an identifiable return point, so | |
54efe513 GH |
331 | .B .return |
332 | is not supported on | |
333 | .B .inline | |
334 | probes. The | |
ba4a90fd FCE |
335 | .B .statement |
336 | variant places a probe at the exact spot, exposing those local variables | |
337 | that are visible there. | |
ca88561f | 338 | |
ba4a90fd FCE |
339 | .SAMPLE |
340 | kernel.function(PATTERN) | |
341 | .br | |
b8da0ad1 FCE |
342 | kernel.function(PATTERN).call |
343 | .br | |
ba4a90fd FCE |
344 | kernel.function(PATTERN).return |
345 | .br | |
b8da0ad1 | 346 | kernel.function(PATTERN).inline |
54efe513 | 347 | .br |
592470cd SC |
348 | kernel.function(PATTERN).label(LPATTERN) |
349 | .br | |
ba4a90fd FCE |
350 | module(MPATTERN).function(PATTERN) |
351 | .br | |
b8da0ad1 FCE |
352 | module(MPATTERN).function(PATTERN).call |
353 | .br | |
ba4a90fd FCE |
354 | module(MPATTERN).function(PATTERN).return |
355 | .br | |
b8da0ad1 FCE |
356 | module(MPATTERN).function(PATTERN).inline |
357 | .br | |
2cab6244 JS |
358 | module(MPATTERN).function(PATTERN).label(LPATTERN) |
359 | .br | |
54efe513 | 360 | .br |
ba4a90fd FCE |
361 | kernel.statement(PATTERN) |
362 | .br | |
37ebca01 FCE |
363 | kernel.statement(ADDRESS).absolute |
364 | .br | |
ba4a90fd | 365 | module(MPATTERN).statement(PATTERN) |
6f017dee FCE |
366 | .br |
367 | process("PATH").function("NAME") | |
368 | .br | |
369 | process("PATH").statement("*@FILE.c:123") | |
370 | .br | |
b73a1293 SC |
371 | process("PATH").library("PATH").function("NAME") |
372 | .br | |
373 | process("PATH").library("PATH").statement("*@FILE.c:123") | |
374 | .br | |
6f017dee FCE |
375 | process("PATH").function("*").return |
376 | .br | |
377 | process("PATH").function("myfun").label("foo") | |
5fa99496 FCE |
378 | .br |
379 | process(PID).statement(ADDRESS).absolute | |
ba4a90fd | 380 | .ESAMPLE |
ca88561f | 381 | |
6f017dee FCE |
382 | (See the USER-SPACE section below for more information on the process |
383 | probes.) | |
384 | ||
ba4a90fd | 385 | In the above list, MPATTERN stands for a string literal that aims to |
592470cd SC |
386 | identify the loaded kernel module of interest and LPATTERN stands for |
387 | a source program label. Both MPATTERN and LPATTERN may include the "*" | |
388 | "[]", and "?" wildcards. | |
389 | PATTERN stands for a string literal that | |
6f05b6ab | 390 | aims to identify a point in the program. It is made up of three |
ca88561f MM |
391 | parts: |
392 | .IP \(bu 4 | |
393 | The first part is the name of a function, as would appear in the | |
ba4a90fd FCE |
394 | .I nm |
395 | program's output. This part may use the "*" and "?" wildcarding | |
ca88561f MM |
396 | operators to match multiple names. |
397 | .IP \(bu 4 | |
398 | The second part is optional and begins with the "@" character. | |
399 | It is followed by the path to the source file containing the function, | |
400 | which may include a wildcard pattern, such as mm/slab*. | |
79640c29 | 401 | If it does not match as is, an implicit "*/" is optionally added |
ea384b8c | 402 | .I before |
79640c29 FCE |
403 | the pattern, so that a script need only name the last few components |
404 | of a possibly long source directory path. | |
ca88561f | 405 | .IP \(bu 4 |
ba4a90fd | 406 | Finally, the third part is optional if the file name part was given, |
1bd128a3 SC |
407 | and identifies the line number in the source file preceded by a ":" |
408 | or a "+". The line number is assumed to be an | |
409 | absolute line number if preceded by a ":", or relative to the entry of | |
99a5f9cf SC |
410 | the function if preceded by a "+". |
411 | All the lines in the function can be matched with ":*". | |
f7470174 | 412 | A range of lines x through y can be matched with ":x\-y". |
ca88561f | 413 | .PP |
ba4a90fd | 414 | As an alternative, PATTERN may be a numeric constant, indicating an |
ea384b8c FCE |
415 | address. Such an address may be found from symbol tables of the |
416 | appropriate kernel / module object file. It is verified against | |
417 | known statement code boundaries, and will be relocated for use at | |
418 | run time. | |
419 | .PP | |
420 | In guru mode only, absolute kernel-space addresses may be specified with | |
421 | the ".absolute" suffix. Such an address is considered already relocated, | |
422 | as if it came from | |
423 | .BR /proc/kallsyms , | |
424 | so it cannot be checked against statement/instruction boundaries. | |
6f017dee FCE |
425 | |
426 | .SS CONTEXT VARIABLES | |
427 | ||
ba4a90fd | 428 | .PP |
6f017dee | 429 | Many of the source-level context variables, such as function parameters, |
ba4a90fd FCE |
430 | locals, globals visible in the compilation unit, may be visible to |
431 | probe handlers. They may refer to these variables by prefixing their | |
432 | name with "$" within the scripts. In addition, a special syntax | |
6f017dee FCE |
433 | allows limited traversal of structures, pointers, and arrays. More |
434 | syntax allows pretty-printing of individual variables or their groups. | |
435 | See also | |
436 | .BR @cast . | |
437 | ||
ba4a90fd FCE |
438 | .TP |
439 | $var | |
440 | refers to an in-scope variable "var". If it's an integer-like type, | |
7b9361d5 FCE |
441 | it will be cast to a 64-bit int for systemtap script use. String-like |
442 | pointers (char *) may be copied to systemtap string values using the | |
443 | .IR kernel_string " or " user_string | |
444 | functions. | |
ba4a90fd | 445 | .TP |
179a00c3 MW |
446 | @var("varname") |
447 | an alternative syntax for | |
448 | .IR $varname | |
449 | . | |
450 | .TP | |
451 | @var("varname@src/file.c") | |
452 | refers to the global (either file local or external) variable | |
453 | .IR varname | |
454 | defined when the file | |
455 | .IR src/file.c | |
456 | was compiled. The CU in which the variable is resolved is the first CU | |
457 | in the module of the probe point which matches the given file name at | |
458 | the end and has the shortest file name path (e.g. given | |
459 | .IR @var("foo@bar/baz.c") | |
460 | and CUs with file name paths | |
461 | .IR src/sub/module/bar/baz.c | |
462 | and | |
463 | .IR src/bar/baz.c | |
464 | the second CU will be chosen to resolve the (file) global variable | |
465 | .IR foo | |
466 | . | |
467 | .TP | |
ab5e90c2 FCE |
468 | $var\->field traversal via a structure's or a pointer's field. This |
469 | generalized indirection operator may be repeated to follow more | |
470 | levels. Note that the | |
471 | .IR . | |
472 | operator is not used for plain structure | |
473 | members, only | |
474 | .IR \-> | |
475 | for both purposes. (This is because "." is reserved for string | |
476 | concatenation.) | |
ba4a90fd | 477 | .TP |
a43ba433 FCE |
478 | $return |
479 | is available in return probes only for functions that are declared | |
480 | with a return value. | |
481 | .TP | |
ba4a90fd | 482 | $var[N] |
33b081c5 JS |
483 | indexes into an array. The index given with a literal number or even |
484 | an arbitrary numeric expression. | |
6f017dee FCE |
485 | .PP |
486 | A number of operators exist for such basic context variable expressions: | |
34af38db | 487 | .TP |
2cb3fe26 SC |
488 | $$vars |
489 | expands to a character string that is equivalent to | |
6f017dee FCE |
490 | .SAMPLE |
491 | sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x", | |
492 | parm1, ..., parmN, var1, ..., varN) | |
493 | .ESAMPLE | |
494 | for each variable in scope at the probe point. Some values may be | |
495 | printed as | |
496 | .IR =? | |
497 | if their run-time location cannot be found. | |
2cb3fe26 SC |
498 | .TP |
499 | $$locals | |
a43ba433 | 500 | expands to a subset of $$vars for only local variables. |
2cb3fe26 SC |
501 | .TP |
502 | $$parms | |
a43ba433 FCE |
503 | expands to a subset of $$vars for only function parameters. |
504 | .TP | |
505 | $$return | |
506 | is available in return probes only. It expands to a string that | |
fd574705 | 507 | is equivalent to sprintf("return=%x", $return) |
a43ba433 | 508 | if the probed function has a return value, or else an empty string. |
6f017dee FCE |
509 | .TP |
510 | & $EXPR | |
511 | expands to the address of the given context variable expression, if it | |
512 | is addressable. | |
513 | .TP | |
514 | @defined($EXPR) | |
515 | expands to 1 or 0 iff the given context variable expression is resolvable, | |
516 | for use in conditionals such as | |
517 | .SAMPLE | |
f7470174 | 518 | @defined($foo\->bar) ? $foo\->bar : 0 |
6f017dee FCE |
519 | .ESAMPLE |
520 | .TP | |
521 | $EXPR$ | |
522 | expands to a string with all of $EXPR's members, equivalent to | |
523 | .SAMPLE | |
524 | sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}", | |
525 | $EXPR\->a, $EXPR\->b) | |
526 | .ESAMPLE | |
527 | .TP | |
528 | $EXPR$$ | |
529 | expands to a string with all of $var's members and submembers, equivalent to | |
530 | .SAMPLE | |
531 | sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}", | |
532 | $EXPR\->a, $EXPR\->b, $EXPR\->c\->x, $EXPR\->c\->y, $EXPR\->d[0]) | |
533 | .ESAMPLE | |
534 | ||
3f5a5bb1 FCE |
535 | .SS MORE ON RETURN PROBES |
536 | ||
537 | .PP | |
538 | For the kernel ".return" probes, only a certain fixed number of | |
539 | returns may be outstanding. The default is a relatively small number, | |
540 | on the order of a few times the number of physical CPUs. If many | |
541 | different threads concurrently call the same blocking function, such | |
542 | as futex(2) or read(2), this limit could be exceeded, and skipped | |
543 | "kretprobes" would be reported by "stap -t". To work around this, | |
544 | specify a | |
545 | .SAMPLE | |
546 | probe FOO.return.maxactive(NNN) | |
547 | .ESAMPLE | |
548 | suffix, with a large enough NNN to cover all expected concurrently blocked | |
549 | threads. Alternately, use the | |
550 | .SAMPLE | |
551 | stap -DKRETACTIVE=NNNN | |
552 | .ESAMPLE | |
553 | stap command line macro setting to override the default for all | |
554 | ".return" probes. | |
1c0b8e23 | 555 | |
39e3139a | 556 | .PP |
1c0b8e23 FCE |
557 | For ".return" probes, context variables other than the "$return" may |
558 | be accessible, as a convenience for a script programmer wishing to | |
559 | access function parameters. These values are \fBsnapshots\fP | |
560 | taken at the time of function entry. Local variables within the | |
561 | function are \fBnot\fP generally accessible, since those variables did | |
562 | not exist in allocated/initialized form at the snapshot moment. | |
8cc799a5 | 563 | .PP |
1c0b8e23 FCE |
564 | In addition, arbitrary entry-time expressions can also be saved for |
565 | ".return" probes using the | |
8cc799a5 JS |
566 | .IR @entry(expr) |
567 | operator. For example, one can compute the elapsed time of a function: | |
568 | .SAMPLE | |
569 | probe kernel.function("do_filp_open").return { | |
570 | println( get_timeofday_us() \- @entry(get_timeofday_us()) ) | |
571 | } | |
572 | .ESAMPLE | |
39e3139a | 573 | |
1c0b8e23 FCE |
574 | .PP |
575 | The following table summarizes how values related to a function | |
576 | parameter context variable, a pointer named \fBaddr\fP, may be | |
577 | accessed from a | |
578 | .IR .return | |
579 | probe. | |
580 | .\" summarized from http://sourceware.org/ml/systemtap/2012-q1/msg00025.html | |
581 | .TS | |
582 | l l l. | |
583 | \fBat-entry value past-exit value\fP | |
584 | ||
585 | $addr \fInot available\fP | |
586 | $addr->x->y @cast(@entry($addr),"struct zz")->x->y | |
587 | $addr[0] {kernel,user}_{char,int,...}(& $addr[0]) | |
588 | .TE | |
589 | ||
ba4a90fd | 590 | |
94c3c803 AM |
591 | .SS DWARFLESS |
592 | In absence of debugging information, entry & exit points of kernel & module | |
593 | functions can be probed using the "kprobe" family of probes. | |
594 | However, these do not permit looking up the arguments / local variables | |
595 | of the function. | |
596 | Following constructs are supported : | |
597 | .SAMPLE | |
598 | kprobe.function(FUNCTION) | |
599 | kprobe.function(FUNCTION).return | |
600 | kprobe.module(NAME).function(FUNCTION) | |
601 | kprobe.module(NAME).function(FUNCTION).return | |
602 | kprobe.statement.(ADDRESS).absolute | |
603 | .ESAMPLE | |
604 | .PP | |
605 | Probes of type | |
606 | .B function | |
607 | are recommended for kernel functions, whereas probes of type | |
608 | .B module | |
609 | are recommended for probing functions of the specified module. | |
610 | In case the absolute address of a kernel or module function is known, | |
611 | .B statement | |
612 | probes can be utilized. | |
613 | .PP | |
614 | Note that | |
615 | .I FUNCTION | |
616 | and | |
617 | .I MODULE | |
618 | names | |
619 | .B must not | |
620 | contain wildcards, or the probe will not be registered. | |
621 | Also, statement probes must be run under guru-mode only. | |
622 | ||
623 | ||
1ada6f08 | 624 | .SS USER-SPACE |
38e96af8 FCE |
625 | Support for user-space probing is available for kernels that are |
626 | configured with the utrace extensions, or have the uprobes facility in | |
627 | linux 3.5. (Various kernel build configuration options need to be | |
628 | enabled; systemtap will advise if these are missing.) | |
629 | ||
0a1c696d FCE |
630 | .PP |
631 | There are several forms. First, a non-symbolic probe point: | |
1ada6f08 FCE |
632 | .SAMPLE |
633 | process(PID).statement(ADDRESS).absolute | |
634 | .ESAMPLE | |
635 | is analogous to | |
636 | .IR | |
637 | kernel.statement(ADDRESS).absolute | |
638 | in that both use raw (unverified) virtual addresses and provide | |
639 | no $variables. The target PID parameter must identify a running | |
640 | process, and ADDRESS should identify a valid instruction address. | |
641 | All threads of that process will be probed. | |
29cb9b42 | 642 | .PP |
0a1c696d FCE |
643 | Second, non-symbolic user-kernel interface events handled by |
644 | utrace may be probed: | |
29cb9b42 | 645 | .SAMPLE |
dd078c96 | 646 | process(PID).begin |
82f0e81b | 647 | process("FULLPATH").begin |
986e98de | 648 | process.begin |
dd078c96 | 649 | process(PID).thread.begin |
82f0e81b | 650 | process("FULLPATH").thread.begin |
986e98de | 651 | process.thread.begin |
dd078c96 | 652 | process(PID).end |
82f0e81b | 653 | process("FULLPATH").end |
986e98de | 654 | process.end |
dd078c96 | 655 | process(PID).thread.end |
82f0e81b | 656 | process("FULLPATH").thread.end |
986e98de | 657 | process.thread.end |
29cb9b42 | 658 | process(PID).syscall |
82f0e81b | 659 | process("FULLPATH").syscall |
986e98de | 660 | process.syscall |
29cb9b42 | 661 | process(PID).syscall.return |
82f0e81b | 662 | process("FULLPATH").syscall.return |
986e98de | 663 | process.syscall.return |
0afb7073 | 664 | process(PID).insn |
82f0e81b | 665 | process("FULLPATH").insn |
0afb7073 | 666 | process(PID).insn.block |
82f0e81b | 667 | process("FULLPATH").insn.block |
29cb9b42 DS |
668 | .ESAMPLE |
669 | .PP | |
670 | A | |
dd078c96 | 671 | .B .begin |
82f0e81b | 672 | probe gets called when new process described by PID or FULLPATH gets created. |
29cb9b42 | 673 | A |
dd078c96 | 674 | .B .thread.begin |
82f0e81b | 675 | probe gets called when a new thread described by PID or FULLPATH gets created. |
159cb109 | 676 | A |
dd078c96 | 677 | .B .end |
82f0e81b | 678 | probe gets called when process described by PID or FULLPATH dies. |
dd078c96 DS |
679 | A |
680 | .B .thread.end | |
82f0e81b | 681 | probe gets called when a thread described by PID or FULLPATH dies. |
29cb9b42 DS |
682 | A |
683 | .B .syscall | |
82f0e81b | 684 | probe gets called when a thread described by PID or FULLPATH makes a |
6270adc1 MH |
685 | system call. The system call number is available in the |
686 | .BR $syscall | |
687 | context variable, and the first 6 arguments of the system call | |
688 | are available in the | |
689 | .BR $argN | |
690 | (ex. $arg1, $arg2, ...) context variable. | |
29cb9b42 DS |
691 | A |
692 | .B .syscall.return | |
82f0e81b | 693 | probe gets called when a thread described by PID or FULLPATH returns from a |
5d67b47c MH |
694 | system call. The system call number is available in the |
695 | .BR $syscall | |
696 | context variable, and the return value of the system call is available | |
697 | in the | |
698 | .BR $return | |
29cb9b42 | 699 | context variable. |
a96d1db0 | 700 | A |
0afb7073 | 701 | .B .insn |
82f0e81b | 702 | probe gets called for every single-stepped instruction of the process described by PID or FULLPATH. |
0afb7073 FCE |
703 | A |
704 | .B .insn.block | |
82f0e81b FCE |
705 | probe gets called for every block-stepped instruction of the process described by PID or FULLPATH. |
706 | .PP | |
707 | If a process probe is specified without a PID or FULLPATH, all user | |
708 | threads will be probed. However, if systemtap was invoked with the | |
f7470174 | 709 | .IR \-c " or " \-x |
82f0e81b | 710 | options, then process probes are restricted to the process |
6d5d594e LB |
711 | hierarchy associated with the target process. If a process probe is |
712 | specified without a PID or FULLPATH, but with the | |
713 | .IR \-c " | |
714 | option, the PATH of the | |
715 | .IR \-c " | |
716 | cmd will be heuristically filled into the process PATH. | |
0a1c696d FCE |
717 | |
718 | .PP | |
719 | Third, symbolic static instrumentation compiled into programs and | |
720 | shared libraries may be | |
721 | probed: | |
722 | .SAMPLE | |
723 | process("PATH").mark("LABEL") | |
a794dbeb | 724 | process("PATH").provider("PROVIDER").mark("LABEL") |
0a1c696d FCE |
725 | .ESAMPLE |
726 | .PP | |
f28a8c28 SC |
727 | A |
728 | .B .mark | |
729 | probe gets called via a static probe which is defined in the | |
38e96af8 FCE |
730 | application by STAP_PROBE1(PROVIDER,LABEL,arg1), which are macros defined in |
731 | .BR sys/sdt.h . | |
732 | The PROVIDER is an arbitrary application identifier, LABEL is the | |
733 | marker site identifier, and arg1 is the integer-typed argument. | |
734 | STAP_PROBE1 is used for probes with 1 argument, STAP_PROBE2 is used | |
735 | for probes with 2 arguments, and so on. The arguments of the probe | |
736 | are available in the context variables $arg1, $arg2, ... An | |
737 | alternative to using the STAP_PROBE macros is to use the dtrace script | |
738 | to create custom macros. Additionally, the variables $$name and | |
739 | $$provider are available as parts of the probe point name. The | |
740 | .B sys/sdt.h | |
741 | macro names DTRACE_PROBE* are available as aliases for STAP_PROBE*. | |
0a1c696d | 742 | |
29cb9b42 | 743 | .PP |
38e96af8 FCE |
744 | Finally, full symbolic source-level probes in user-space programs and |
745 | shared libraries are supported. These are exactly analogous to the | |
746 | symbolic DWARF-based kernel/module probes described above. They | |
747 | expose the same sorts of context $variables for function parameters, | |
748 | local variables, and so on. | |
0a1c696d FCE |
749 | .SAMPLE |
750 | process("PATH").function("NAME") | |
751 | process("PATH").statement("*@FILE.c:123") | |
4d0fcb93 SC |
752 | process("PATH").plt("NAME") |
753 | process("PATH").library("PATH").plt("NAME") | |
b73a1293 SC |
754 | process("PATH").library("PATH").function("NAME") |
755 | process("PATH").library("PATH").statement("*@FILE.c:123") | |
0a1c696d FCE |
756 | process("PATH").function("*").return |
757 | process("PATH").function("myfun").label("foo") | |
758 | .ESAMPLE | |
759 | ||
760 | .PP | |
761 | Note that for all process probes, | |
29cb9b42 | 762 | .I PATH |
ea384b8c FCE |
763 | names refer to executables that are searched the same way shells do: relative |
764 | to the working directory if they contain a "/" character, otherwise in | |
765 | .BR $PATH . | |
d1bcbe71 RH |
766 | If PATH names refer to scripts, the actual interpreters (specified in the |
767 | script in the first line after the #! characters) are probed. | |
b73a1293 SC |
768 | If PATH is a process component parameter referring to shared libraries |
769 | then all processes that map it at runtime would be selected for | |
770 | probing. If PATH is a library component parameter referring to shared | |
771 | libraries then the process specified by the process component would be | |
79dc1dee FCE |
772 | selected. |
773 | ||
774 | .PP | |
775 | A .plt probe will probe functions in the program linkage table | |
4d0fcb93 | 776 | corresponding to the rest of the probe point. .plt can be specified |
79dc1dee FCE |
777 | as a shorthand for .plt("*"). The symbol name is available as a |
778 | $$name context variable; function arguments are not available, since | |
779 | PLTs are processed without debuginfo. | |
780 | ||
781 | .PP | |
82f0e81b FCE |
782 | If the PATH string contains wildcards as in the MPATTERN case, then |
783 | standard globbing is performed to find all matching paths. In this | |
784 | case, the | |
785 | .BR $PATH | |
786 | environment variable is not used. | |
787 | ||
788 | .PP | |
153e7a22 FCE |
789 | If systemtap was invoked with the |
790 | .IR \-c " or " \-x | |
760695db FCE |
791 | options, then process probes are restricted to the process |
792 | hierarchy associated with the target process. | |
1ada6f08 | 793 | |
9cb48751 DS |
794 | .SS PROCFS |
795 | ||
796 | These probe points allow procfs "files" in | |
c243f608 LB |
797 | /proc/systemtap/MODNAME to be created, read and written using a |
798 | permission that may be modified using the proper umask value. Default permissions are 0400 for read | |
799 | probes, and 0200 for write probes. If both a read and write probe are being | |
800 | used on the same file, a default permission of 0600 will be used. | |
801 | Using procfs.umask(0040).read would | |
802 | result in a 0404 permission set for the file. | |
9cb48751 DS |
803 | .RI ( MODNAME |
804 | is the name of the systemtap module). The | |
805 | .I proc | |
806 | filesystem is a pseudo-filesystem which is used an an interface to | |
c243f608 | 807 | kernel data structures. There are several probe point variants supported |
9cb48751 | 808 | by the translator: |
ca88561f | 809 | |
9cb48751 DS |
810 | .SAMPLE |
811 | procfs("PATH").read | |
c243f608 | 812 | procfs("PATH").umask(UMASK).read |
38975255 | 813 | procfs("PATH").read.maxsize(MAXSIZE) |
c243f608 | 814 | procfs("PATH").umask(UMASK).maxsize(MAXSIZE) |
9cb48751 | 815 | procfs("PATH").write |
c243f608 | 816 | procfs("PATH").umask(UMASK).write |
9cb48751 | 817 | procfs.read |
c243f608 | 818 | procfs.umask(UMASK).read |
38975255 | 819 | procfs.read.maxsize(MAXSIZE) |
c243f608 | 820 | procfs.umask(UMASK).read.maxsize(MAXSIZE) |
9cb48751 | 821 | procfs.write |
c243f608 | 822 | procfs.umask(UMASK).write |
9cb48751 | 823 | .ESAMPLE |
ca88561f | 824 | |
9cb48751 DS |
825 | .I PATH |
826 | is the file name (relative to /proc/systemtap/MODNAME) to be created. | |
827 | If no | |
828 | .I PATH | |
829 | is specified (as in the last two variants above), | |
830 | .I PATH | |
831 | defaults to "command". | |
832 | .PP | |
833 | When a user reads /proc/systemtap/MODNAME/PATH, the corresponding | |
834 | procfs | |
835 | .I read | |
836 | probe is triggered. The string data to be read should be assigned to | |
837 | a variable named | |
838 | .IR $value , | |
839 | like this: | |
ca88561f | 840 | |
9cb48751 DS |
841 | .SAMPLE |
842 | procfs("PATH").read { $value = "100\\n" } | |
843 | .ESAMPLE | |
844 | .PP | |
845 | When a user writes into /proc/systemtap/MODNAME/PATH, the | |
846 | corresponding procfs | |
847 | .I write | |
848 | probe is triggered. The data the user wrote is available in the | |
849 | string variable named | |
850 | .IR $value , | |
851 | like this: | |
ca88561f | 852 | |
9cb48751 DS |
853 | .SAMPLE |
854 | procfs("PATH").write { printf("user wrote: %s", $value) } | |
855 | .ESAMPLE | |
38975255 DS |
856 | .PP |
857 | .I MAXSIZE | |
858 | is the size of the procfs read buffer. Specifying | |
859 | .I MAXSIZE | |
860 | allows larger procfs output. If no | |
861 | .I MAXSIZE | |
862 | is specified, the procfs read buffer defaults to | |
863 | .I STP_PROCFS_BUFSIZE | |
864 | (which defaults to | |
865 | .IR MAXSTRINGLEN , | |
866 | the maximum length of a string). | |
867 | If setting the procfs read buffers for more than one file is needed, | |
868 | it may be easiest to override the | |
869 | .I STP_PROCFS_BUFSIZE | |
870 | definition. | |
871 | Here's an example of using | |
872 | .IR MAXSIZE : | |
873 | ||
874 | .SAMPLE | |
875 | procfs.read.maxsize(1024) { | |
876 | $value = "long string..." | |
877 | $value .= "another long string..." | |
878 | $value .= "another long string..." | |
879 | $value .= "another long string..." | |
880 | } | |
881 | .ESAMPLE | |
9cb48751 | 882 | |
da00b50e SM |
883 | .SS NETFILTER HOOKS |
884 | ||
885 | These probe points allow observation of network packets using the | |
886 | netfilter mechanism. A netfilter probe in systemtap corresponds to a | |
887 | netfilter hook function in the original netfilter probes API. It is | |
888 | probably more convenient to use | |
889 | .IR tapset::netfilter (3stap), | |
890 | which wraps the primitive netfilter hooks and does the work of | |
891 | extracting useful information from the context variables. | |
892 | ||
893 | .PP | |
894 | There are several probe point variants supported by the translator: | |
895 | ||
896 | .SAMPLE | |
897 | netfilter.hook("HOOKNAME").pf("PROTOCOL_F") | |
898 | netfilter.pf("PROTOCOL_F").hook("HOOKNAME") | |
899 | netfilter.hook("HOOKNAME").pf("PROTOCOL_F").priority("PRIORITY") | |
900 | netfilter.pf("PROTOCOL_F").hook("HOOKNAME").priority("PRIORITY") | |
901 | .ESAMPLE | |
902 | ||
903 | .PP | |
904 | .I PROTOCOL_F | |
905 | is the protocol family to listen for, currently one of | |
906 | .I NFPROTO_IPV4, | |
907 | .I NFPROTO_IPV6, | |
908 | .I NFPROTO_ARP, | |
909 | or | |
910 | .I NFPROTO_BRIDGE. | |
911 | ||
912 | .PP | |
913 | .I HOOKNAME | |
914 | is the point, or 'hook', in the protocol stack at which to intercept | |
915 | the packet. The available hook names for each protocol family are | |
916 | taken from the kernel header files <linux/netfilter_ipv4.h>, | |
917 | <linux/netfilter_ipv6.h>, <linux/netfilter_arp.h> and | |
918 | <linux/netfilter_bridge.h>. For instance, allowable hook names for | |
919 | .I NFPROTO_IPV4 | |
920 | are | |
921 | .I NF_INET_PRE_ROUTING, | |
922 | .I NF_INET_LOCAL_IN, | |
923 | .I NF_INET_FORWARD, | |
924 | .I NF_INET_LOCAL_OUT, | |
925 | and | |
926 | .I NF_INET_POST_ROUTING. | |
927 | ||
928 | .PP | |
929 | .I PRIORITY | |
930 | is an integer priority giving the order in which the probe point | |
931 | should be triggered relative to any other netfilter hook functions | |
932 | which trigger on the same packet. Hook functions execute on each | |
933 | packet in order from smallest priority number to largest priority number. If no | |
934 | .I PRIORITY | |
935 | is specified (as in the first two probe point variants above), | |
936 | .I PRIORITY | |
937 | defaults to "0". | |
938 | ||
939 | There are a number of predefined priority names of the form | |
940 | .I NF_IP_PRI_* | |
941 | and | |
942 | .I NF_IP6_PRI_* | |
943 | which are defined in the kernel header files <linux/netfilter_ipv4.h> and <linux/netfilter_ipv6.h> respectively. The script is permitted to use these | |
944 | instead of specifying an integer priority. (The probe points for | |
945 | .I NFPROTO_ARP | |
946 | and | |
947 | .I NFPROTO_BRIDGE | |
948 | currently do not expose any named hook priorities to the script writer.) | |
949 | Thus, allowable ways to specify the priority include: | |
950 | ||
951 | .SAMPLE | |
952 | priority("255") | |
953 | priority("NF_IP_PRI_SELINUX_LAST") | |
954 | .ESAMPLE | |
955 | ||
956 | A script using guru mode is permitted to specify any identifier or | |
957 | number as the parameter for hook, pf, and priority. This feature | |
958 | should be used with caution, as the parameter is inserted verbatim into | |
959 | the C code generated by systemtap. | |
960 | ||
961 | The netfilter probe points define the following context variables: | |
962 | .TP | |
963 | .IR $skb | |
964 | The address of the sk_buff struct representing the packet. See | |
965 | <linux/skbuff.h> for details on how to use this struct, or | |
966 | alternatively use the tapset | |
967 | .IR tapset::netfilter (3stap) | |
968 | for easy access to key information. | |
969 | ||
970 | .TP | |
971 | .IR $in | |
972 | The address of the net_device struct representing the network device | |
973 | on which the packet was received (if any). May be 0 if the device is | |
974 | unknown or undefined at that stage in the protocol stack. | |
975 | ||
976 | .TP | |
977 | .IR $out | |
978 | The address of the net_device struct representing the network device | |
979 | on which the packet will be sent (if any). May be 0 if the device is | |
980 | unknown or undefined at that stage in the protocol stack. | |
981 | ||
982 | .TP | |
983 | .IR $verdict | |
984 | (Guru mode only.) Assigning one of the verdict values defined in | |
985 | <linux/netfilter.h> to this variable alters the further progress of | |
986 | the packet through the protocol stack. For instance, the following | |
987 | guru mode script forces all ipv6 network packets to be dropped: | |
988 | ||
989 | .SAMPLE | |
990 | probe netfilter.pf("NFPROTO_IPV6").hook("NF_IP6_PRE_ROUTING") { | |
c49ffe6c | 991 | $verdict = 0 /* nf_drop */ |
da00b50e SM |
992 | } |
993 | .ESAMPLE | |
994 | ||
c49ffe6c SM |
995 | For convenience, unlike the primitive probe points discussed here, the |
996 | probes defined in | |
997 | .IR tapset::netfilter (3stap) | |
998 | export the lowercase names of the verdict constants (e.g. NF_DROP | |
999 | becomes nf_drop) as local variables. | |
1000 | ||
6f05b6ab FCE |
1001 | .SS MARKERS |
1002 | ||
1003 | This family of probe points hooks up to static probing markers | |
1004 | inserted into the kernel or modules. These markers are special macro | |
1005 | calls inserted by kernel developers to make probing faster and more | |
1006 | reliable than with DWARF-based probes. Further, DWARF debugging | |
1007 | information is | |
1008 | .I not | |
1009 | required to probe markers. | |
1010 | ||
1011 | Marker probe points begin with | |
f781f849 DS |
1012 | .BR kernel . |
1013 | The next part names the marker itself: | |
6f05b6ab FCE |
1014 | .BR mark("name") . |
1015 | The marker name string, which may contain the usual wildcard characters, | |
1016 | is matched against the names given to the marker macros when the kernel | |
eb973c2a DS |
1017 | and/or module was compiled. Optionally, you can specify |
1018 | .BR format("format") . | |
37f6433e | 1019 | Specifying the marker format string allows differentiation between two |
eb973c2a | 1020 | markers with the same name but different marker format strings. |
6f05b6ab FCE |
1021 | |
1022 | The handler associated with a marker-based probe may read the | |
1023 | optional parameters specified at the macro call site. These are | |
1024 | named | |
1025 | .BR $arg1 " through " $argNN , | |
1026 | where NN is the number of parameters supplied by the macro. Number | |
1027 | and string parameters are passed in a type-safe manner. | |
1028 | ||
eb973c2a DS |
1029 | The marker format string associated with a marker is available in |
1030 | .BR $format . | |
37f6433e | 1031 | And also the marker name string is available in |
bc54e71c | 1032 | .BR $name . |
eb973c2a | 1033 | |
bc724b8b JS |
1034 | .SS TRACEPOINTS |
1035 | ||
1036 | This family of probe points hooks up to static probing tracepoints | |
1037 | inserted into the kernel or modules. As with markers, these | |
1038 | tracepoints are special macro calls inserted by kernel developers to | |
1039 | make probing faster and more reliable than with DWARF-based probes, | |
1040 | and DWARF debugging information is not required to probe tracepoints. | |
1041 | Tracepoints have an extra advantage of more strongly-typed parameters | |
1042 | than markers. | |
1043 | ||
1044 | Tracepoint probes begin with | |
1045 | .BR kernel . | |
1046 | The next part names the tracepoint itself: | |
1047 | .BR trace("name") . | |
1048 | The tracepoint name string, which may contain the usual wildcard | |
1049 | characters, is matched against the names defined by the kernel | |
1050 | developers in the tracepoint header files. | |
1051 | ||
1052 | The handler associated with a tracepoint-based probe may read the | |
1053 | optional parameters specified at the macro call site. These are | |
1054 | named according to the declaration by the tracepoint author. For | |
1055 | example, the tracepoint probe | |
1056 | .BR kernel.trace("sched_switch") | |
1057 | provides the parameters | |
1058 | .BR $rq ", " $prev ", and " $next . | |
1059 | If the parameter is a complex type, as in a struct pointer, then a | |
1060 | script can access fields with the same syntax as DWARF $target | |
1061 | variables. Also, tracepoint parameters cannot be modified, but in | |
1062 | guru-mode a script may modify fields of parameters. | |
1063 | ||
1064 | The name of the tracepoint is available in | |
1065 | .BR $$name , | |
1066 | and a string of name=value pairs for all parameters of the tracepoint | |
1067 | is available in | |
046e7190 | 1068 | .BR $$vars " or " $$parms . |
bc724b8b | 1069 | |
dd225250 PS |
1070 | .SS HARDWARE BREAKPOINTS |
1071 | This family of probes is used to set hardware watchpoints for a given | |
1072 | (global) kernel symbol. The probes take three components as inputs : | |
1073 | ||
1074 | 1. The | |
1075 | .BR virtual address / name | |
1076 | of the kernel symbol to be traced is supplied as argument to this class | |
1077 | of probes. ( Probes for only data segment variables are supported. Probing | |
1078 | local variables of a function cannot be done.) | |
1079 | ||
1080 | 2. Nature of access to be probed : | |
1081 | a. | |
1082 | .I .write | |
1083 | probe gets triggered when a write happens at the specified address/symbol | |
1084 | name. | |
1085 | b. | |
1086 | .I rw | |
1087 | probe is triggered when either a read or write happens. | |
1088 | ||
1089 | 3. | |
1090 | .BR .length | |
1091 | (optional) | |
1092 | Users have the option of specifying the address interval to be probed | |
1093 | using "length" constructs. The user-specified length gets approximated | |
1094 | to the closest possible address length that the architecture can | |
1095 | support. If the specified length exceeds the limits imposed by | |
1096 | architecture, an error message is flagged and probe registration fails. | |
1097 | Wherever 'length' is not specified, the translator requests a hardware | |
1098 | breakpoint probe of length 1. It should be noted that the "length" | |
1099 | construct is not valid with symbol names. | |
1100 | ||
1101 | Following constructs are supported : | |
1102 | .SAMPLE | |
1103 | probe kernel.data(ADDRESS).write | |
1104 | probe kernel.data(ADDRESS).rw | |
1105 | probe kernel.data(ADDRESS).length(LEN).write | |
1106 | probe kernel.data(ADDRESS).length(LEN).rw | |
1107 | probe kernel.data("SYMBOL_NAME").write | |
1108 | probe kernel.data("SYMBOL_NAME").rw | |
1109 | .ESAMPLE | |
1110 | ||
1111 | This set of probes make use of the debug registers of the processor, | |
1112 | which is a scarce resource. (4 on x86 , 1 on powerpc ) The script | |
1113 | translation flags a warning if a user requests more hardware breakpoint probes | |
1114 | than the limits set by architecture. For example,a pass-2 warning is flashed | |
1115 | when an input script requests 5 hardware breakpoint probes on an x86 | |
1116 | system while x86 architecture supports a maximum of 4 breakpoints. | |
1117 | Users are cautioned to set probes judiciously. | |
1118 | ||
9becfcef MW |
1119 | .SS PERF |
1120 | ||
1121 | This | |
1122 | .IR prototype | |
1123 | family of probe points interfaces to the kernel "perf event" | |
cb7d3cd8 | 1124 | infrastructure for controlling hardware performance counters. |
9becfcef MW |
1125 | The events being attached to are described by the "type", |
1126 | "config" fields of the | |
1127 | .IR perf_event_attr | |
1128 | structure, and are sampled at an interval governed by the | |
1129 | "sample_period" field. | |
1130 | ||
1131 | These fields are made available to systemtap scripts using | |
1132 | the following syntax: | |
1133 | .SAMPLE | |
1134 | probe perf.type(NN).config(MM).sample(XX) | |
1135 | probe perf.type(NN).config(MM) | |
dbdab5c8 SC |
1136 | probe perf.type(NN).config(MM).process("PROC") |
1137 | probe perf.type(NN).config(MM).counter("COUNTER") | |
1138 | probe perf.type(NN).config(MM).process("PROC").counter("COUNTER") | |
9becfcef MW |
1139 | .ESAMPLE |
1140 | The systemtap probe handler is called once per XX increments | |
1141 | of the underlying performance counter. The default sampling | |
1142 | count is 1000000. | |
1143 | The range of valid type/config is described by the | |
1144 | .IR perf_event_open (2) | |
1145 | system call, and/or the | |
1146 | .IR linux/perf_event.h | |
1147 | file. Invalid combinations or exhausted hardware counter resources | |
1148 | result in errors during systemtap script startup. Systemtap does | |
1149 | not sanity-check the values: it merely passes them through to | |
6a8fe809 SC |
1150 | the kernel for error- and safety-checking. By default the perf event |
1151 | probe is systemwide unless .process is specified, which will bind the | |
fce2c5df | 1152 | probe to a specific task. If the name is omitted then it |
dbdab5c8 | 1153 | is inferred from the stap -c argument. A perf event can be read on |
75cd04ca SC |
1154 | demand using .counter. The body of the perf probe handler will not be |
1155 | invoked for a .counter probe; instead, the counter is read in a user | |
1156 | space probe via: | |
dbdab5c8 SC |
1157 | .TP |
1158 | process("PROCESS").statement("func@file") {stat <<< @perf("NAME")} | |
1159 | ||
fce2c5df | 1160 | |
ba4a90fd FCE |
1161 | .SH EXAMPLES |
1162 | .PP | |
1163 | Here are some example probe points, defining the associated events. | |
1164 | .TP | |
1165 | begin, end, end | |
1166 | refers to the startup and normal shutdown of the session. In this | |
1167 | case, the handler would run once during startup and twice during | |
1168 | shutdown. | |
1169 | .TP | |
1170 | timer.jiffies(1000).randomize(200) | |
13d2ecdb | 1171 | refers to a periodic interrupt, every 1000 +/\- 200 jiffies. |
ba4a90fd FCE |
1172 | .TP |
1173 | kernel.function("*init*"), kernel.function("*exit*") | |
1174 | refers to all kernel functions with "init" or "exit" in the name. | |
1175 | .TP | |
199d126d MW |
1176 | kernel.function("*@kernel/time.c:240") |
1177 | refers to any functions within the "kernel/time.c" file that span | |
6ff00e1d FCE |
1178 | line 240. |
1179 | .BR | |
1180 | Note | |
1181 | that this is | |
1182 | .BR not | |
1183 | a probe at the statement at that line number. Use the | |
1184 | .IR | |
1185 | kernel.statement | |
1186 | probe instead. | |
ba4a90fd | 1187 | .TP |
6f05b6ab FCE |
1188 | kernel.mark("getuid") |
1189 | refers to an STAP_MARK(getuid, ...) macro call in the kernel. | |
1190 | .TP | |
ba4a90fd FCE |
1191 | module("usb*").function("*sync*").return |
1192 | refers to the moment of return from all functions with "sync" in the | |
1193 | name in any of the USB drivers. | |
1194 | .TP | |
1195 | kernel.statement(0xc0044852) | |
1196 | refers to the first byte of the statement whose compiled instructions | |
1197 | include the given address in the kernel. | |
b4ceace2 | 1198 | .TP |
199d126d MW |
1199 | kernel.statement("*@kernel/time.c:296") |
1200 | refers to the statement of line 296 within "kernel/time.c". | |
1bd128a3 SC |
1201 | .TP |
1202 | kernel.statement("bio_init@fs/bio.c+3") | |
1203 | refers to the statement at line bio_init+3 within "fs/bio.c". | |
a5ae3f3d | 1204 | .TP |
dd225250 | 1205 | kernel.data("pid_max").write |
cb7d3cd8 | 1206 | refers to a hardware breakpoint of type "write" set on pid_max |
dd225250 | 1207 | .TP |
729286d8 | 1208 | syscall.*.return |
b4ceace2 | 1209 | refers to the group of probe aliases with any name in the third position |
ba4a90fd FCE |
1210 | |
1211 | .SH SEE ALSO | |
78db65bd | 1212 | .IR stap (1), |
89965a32 FCE |
1213 | .IR probe::* (3stap), |
1214 | .IR tapset::* (3stap) | |
1c0b8e23 FCE |
1215 | |
1216 | .\" Local Variables: | |
1217 | .\" mode: nroff | |
1218 | .\" End: |