]>
Commit | Line | Data |
---|---|---|
5f92f126 | 1 | .\" t |
ec1a2239 | 2 | .TH STAPPROBES 3stap |
ba4a90fd FCE |
3 | .SH NAME |
4 | stapprobes \- systemtap probe points | |
5 | ||
6 | .\" macros | |
7 | .de SAMPLE | |
fc6851a6 JL |
8 | |
9 | .nr oldin \\n(.i | |
ba4a90fd FCE |
10 | .br |
11 | .RS | |
12 | .nf | |
13 | .nh | |
14 | .. | |
15 | .de ESAMPLE | |
16 | .hy | |
17 | .fi | |
18 | .RE | |
fc6851a6 JL |
19 | .in \\n[oldin]u |
20 | ||
ba4a90fd FCE |
21 | .. |
22 | ||
23 | .SH DESCRIPTION | |
24 | The following sections enumerate the variety of probe points supported | |
89965a32 FCE |
25 | by the systemtap translator, and some of the additional aliases defined by |
26 | standard tapset scripts. Many are individually documented in the | |
27 | .IR 3stap | |
28 | manual section, with the | |
29 | .IR probe:: | |
30 | prefix. | |
67d1ed18 FCE |
31 | |
32 | .SH SYNTAX | |
33 | ||
34 | .PP | |
35 | .SAMPLE | |
36 | .BR probe " PROBEPOINT [" , " PROBEPOINT] " { " [STMT ...] " } | |
37 | .ESAMPLE | |
38 | .PP | |
39 | A probe declaration may list multiple comma-separated probe points in | |
40 | order to attach a handler to all of the named events. Normally, the | |
41 | handler statements are run whenever any of events occur. | |
ba4a90fd | 42 | .PP |
67d1ed18 FCE |
43 | The syntax of a single probe point is a general dotted-symbol |
44 | sequence. This allows a breakdown of the event namespace into parts, | |
45 | somewhat like the Domain Name System does on the Internet. Each | |
46 | component identifier may be parametrized by a string or number | |
47 | literal, with a syntax like a function call. A component may include | |
48 | a "*" character, to expand to a set of matching probe points. It may | |
49 | also include "**" to match multiple sequential components at once. | |
50 | Probe aliases likewise expand to other probe points. | |
2f5bbffa | 51 | .PP |
67d1ed18 FCE |
52 | Probe aliases can be given on their own, or with a suffix. The suffix |
53 | attaches to the underlying probe point that the alias is expanded | |
54 | to. For example, | |
2f5bbffa SM |
55 | .SAMPLE |
56 | syscall.read.return.maxactive(10) | |
57 | .ESAMPLE | |
58 | expands to | |
59 | .SAMPLE | |
60 | kernel.function("sys_read").return.maxactive(10) | |
61 | .ESAMPLE | |
62 | with the component | |
63 | .IR maxactive(10) | |
64 | being recognized as a suffix. | |
65 | .PP | |
67d1ed18 FCE |
66 | Normally, each and every probe point resulting from wildcard- and |
67 | alias-expansion must be resolved to some low-level system | |
68 | instrumentation facility (e.g., a kprobe address, marker, or a timer | |
69 | configuration), otherwise the elaboration phase will fail. | |
d898100a FCE |
70 | .PP |
71 | However, a probe point may be followed by a "?" character, to indicate | |
72 | that it is optional, and that no error should result if it fails to | |
73 | resolve. Optionalness passes down through all levels of | |
74 | alias/wildcard expansion. Alternately, a probe point may be followed | |
75 | by a "!" character, to indicate that it is both optional and | |
37f6433e | 76 | sufficient. (Think vaguely of the Prolog cut operator.) If it does |
d898100a FCE |
77 | resolve, then no further probe points in the same comma-separated list |
78 | will be resolved. Therefore, the "!" sufficiency mark only makes | |
79 | sense in a list of probe point alternatives. | |
dfd11cc3 MH |
80 | .PP |
81 | Additionally, a probe point may be followed by a "if (expr)" statement, in | |
82 | order to enable/disable the probe point on-the-fly. With the "if" statement, | |
83 | if the "expr" is false when the probe point is hit, the whole probe body | |
84 | including alias's body is skipped. The condition is stacked up through | |
85 | all levels of alias/wildcard expansion. So the final condition becomes | |
67d1ed18 FCE |
86 | the logical-and of conditions of all expanded alias/wildcard. The expressions |
87 | are necessarily restricted to global variables. | |
88 | .PP | |
e904ad95 FCE |
89 | These are all |
90 | .B syntactically | |
91 | valid probe points. (They are generally | |
92 | .B semantically | |
93 | invalid, depending on the contents of the tapsets, and the versions of | |
94 | kernel/user software installed.) | |
ca88561f | 95 | |
ba4a90fd FCE |
96 | .SAMPLE |
97 | kernel.function("foo").return | |
e904ad95 | 98 | process("/bin/vi").statement(0x2222) |
ba4a90fd | 99 | end |
729286d8 | 100 | syscall.* |
2f5bbffa | 101 | syscall.*.return.maxactive(10) |
649260f3 | 102 | sys**open |
6e3347a9 | 103 | kernel.function("no_such_function") ? |
d898100a | 104 | module("awol").function("no_such_function") ! |
dfd11cc3 | 105 | signal.*? if (switch) |
94c3c803 | 106 | kprobe.function("foo") |
ba4a90fd FCE |
107 | .ESAMPLE |
108 | ||
6f05b6ab FCE |
109 | Probes may be broadly classified into "synchronous" and |
110 | "asynchronous". A "synchronous" event is deemed to occur when any | |
111 | processor executes an instruction matched by the specification. This | |
112 | gives these probes a reference point (instruction address) from which | |
113 | more contextual data may be available. Other families of probe points | |
114 | refer to "asynchronous" events such as timers/counters rolling over, | |
115 | where there is no fixed reference point that is related. Each probe | |
116 | point specification may match multiple locations (for example, using | |
117 | wildcards or aliases), and all them are then probed. A probe | |
118 | declaration may also contain several comma-separated specifications, | |
119 | all of which are probed. | |
120 | ||
5f92f126 FCE |
121 | .SH DWARF DEBUGINFO |
122 | ||
123 | Resolving some probe points requires DWARF debuginfo or "debug | |
124 | symbols" for the specific part being instrumented. For some others, | |
125 | DWARF is automatically synthesized on the fly from source code header | |
126 | files. For others, it is not needed at all. Since a systemtap script | |
127 | may use any mixture of probe points together, the union of their DWARF | |
128 | requirements has to be met on the computer where script compilation | |
129 | occurs. (See the \fI\-\-use\-server\fR option and the \fBstap-server\ | |
130 | (8)\fR man page for information about the remote compilation facility, | |
131 | which allows these requirements to be met on a different machine.) | |
132 | .PP | |
133 | The following point lists many of the available probe point families, | |
134 | to classify them with respect to their need for DWARF debuginfo. | |
135 | ||
136 | .TS | |
137 | l l l. | |
7bfd1083 | 138 | \fBDWARF NON-DWARF\fP |
5f92f126 | 139 | |
7bfd1083 | 140 | kernel.function, .statement kernel.mark |
79dc1dee | 141 | module.function, .statement process.mark, process.plt |
7bfd1083 TJL |
142 | process.function, .statement begin, end, error, never |
143 | process.mark \fI(backup)\fP timer | |
144 | perf | |
145 | procfs | |
146 | \fBAUTO-DWARF\fP kernel.statement.absolute | |
147 | kernel.data | |
148 | kernel.trace kprobe.function | |
149 | process.statement.absolute | |
150 | process.begin, .end, .error | |
5f92f126 FCE |
151 | .TE |
152 | ||
153 | .SH PROBE POINT FAMILIES | |
154 | ||
65aeaea0 | 155 | .SS BEGIN/END/ERROR |
ba4a90fd FCE |
156 | |
157 | The probe points | |
158 | .IR begin " and " end | |
159 | are defined by the translator to refer to the time of session startup | |
160 | and shutdown. All "begin" probe handlers are run, in some sequence, | |
161 | during the startup of the session. All global variables will have | |
162 | been initialized prior to this point. All "end" probes are run, in | |
163 | some sequence, during the | |
164 | .I normal | |
165 | shutdown of a session, such as in the aftermath of an | |
166 | .I exit () | |
167 | function call, or an interruption from the user. In the case of an | |
168 | error-triggered shutdown, "end" probes are not run. There are no | |
169 | target variables available in either context. | |
6a256b03 JS |
170 | .PP |
171 | If the order of execution among "begin" or "end" probes is significant, | |
172 | then an optional sequence number may be provided: | |
ca88561f | 173 | |
6a256b03 JS |
174 | .SAMPLE |
175 | begin(N) | |
176 | end(N) | |
177 | .ESAMPLE | |
ca88561f | 178 | |
6a256b03 JS |
179 | The number N may be positive or negative. The probe handlers are run in |
180 | increasing order, and the order between handlers with the same sequence | |
181 | number is unspecified. When "begin" or "end" are given without a | |
182 | sequence, they are effectively sequence zero. | |
ba4a90fd | 183 | |
65aeaea0 FCE |
184 | The |
185 | .IR error | |
186 | probe point is similar to the | |
187 | .IR end | |
d898100a FCE |
188 | probe, except that each such probe handler run when the session ends |
189 | after errors have occurred. In such cases, "end" probes are skipped, | |
37f6433e | 190 | but each "error" probe is still attempted. This kind of probe can be |
d898100a FCE |
191 | used to clean up or emit a "final gasp". It may also be numerically |
192 | parametrized to set a sequence. | |
65aeaea0 | 193 | |
6e3347a9 FCE |
194 | .SS NEVER |
195 | The probe point | |
196 | .IR never | |
197 | is specially defined by the translator to mean "never". Its probe | |
198 | handler is never run, though its statements are analyzed for symbol / | |
199 | type correctness as usual. This probe point may be useful in | |
200 | conjunction with optional probes. | |
201 | ||
bafd76f1 | 202 | .SS SYSCALL and ND_SYSCALL |
1027502b FCE |
203 | |
204 | The | |
bafd76f1 | 205 | .IR syscall.* " and " nd_syscall.* |
1027502b | 206 | aliases define several hundred probes, too many to |
56bd0316 | 207 | detail here. They are of the general form: |
1027502b FCE |
208 | |
209 | .SAMPLE | |
210 | syscall.NAME | |
211 | .br | |
bafd76f1 FCE |
212 | nd_syscall.NAME |
213 | .br | |
1027502b | 214 | syscall.NAME.return |
bafd76f1 FCE |
215 | .br |
216 | nd_syscall.NAME.return | |
1027502b FCE |
217 | .ESAMPLE |
218 | ||
bafd76f1 | 219 | Generally, a pair of probes are defined for each normal system call as listed in the |
1027502b FCE |
220 | .IR syscalls(2) |
221 | manual page, one for entry and one for return. Those system calls that never | |
222 | return do not have a corresponding | |
223 | .IR .return | |
bafd76f1 FCE |
224 | probe. The nd_* family of probes are about the same, except it uses |
225 | .B non-DWARF | |
226 | based searching mechanisms, which may result in a lower quality of symbolic | |
227 | context data (parameters), and may miss some system calls. You may want to | |
228 | try them first, in case kernel debugging information is not immediately available. | |
1027502b | 229 | .PP |
df7f3a01 | 230 | Each probe alias provides a variety of variables. Looking at the tapset source |
1027502b FCE |
231 | code is the most reliable way. Generally, each variable listed in the standard |
232 | manual page is made available as a script-level variable, so | |
233 | .IR syscall.open | |
234 | exposes | |
235 | .IR filename ", " flags ", and " mode . | |
236 | In addition, a standard suite of variables is available at most aliases: | |
237 | .TP | |
238 | .IR argstr | |
239 | A pretty-printed form of the entire argument list, without parentheses. | |
240 | .TP | |
241 | .IR name | |
242 | The name of the system call. | |
243 | .TP | |
244 | .IR retstr | |
245 | For return probes, a pretty-printed form of the system-call result. | |
246 | .PP | |
df7f3a01 FCE |
247 | As usual for probe aliases, these variables are all simply initialized |
248 | once from the underlying $context variables, so that later changes to | |
249 | $context variables are not automatically reflected. Not all probe | |
250 | aliases obey all of these general guidelines. Please report any | |
251 | bothersome ones you encounter as a bug. | |
c34eceea FCE |
252 | .PP |
253 | If debuginfo availability is a problem, you may try using the | |
254 | non-DWARF syscall probe aliases instead. Use the | |
255 | .IR nd_syscall. | |
256 | prefix instead of | |
257 | .IR syscall. | |
258 | The same context variables are available, as far as possible. | |
1027502b | 259 | |
ba4a90fd FCE |
260 | .SS TIMERS |
261 | ||
262 | Intervals defined by the standard kernel "jiffies" timer may be used | |
263 | to trigger probe handlers asynchronously. Two probe point variants | |
264 | are supported by the translator: | |
ca88561f | 265 | |
ba4a90fd FCE |
266 | .SAMPLE |
267 | timer.jiffies(N) | |
268 | timer.jiffies(N).randomize(M) | |
269 | .ESAMPLE | |
ca88561f | 270 | |
ba4a90fd FCE |
271 | The probe handler is run every N jiffies (a kernel-defined unit of |
272 | time, typically between 1 and 60 ms). If the "randomize" component is | |
13d2ecdb | 273 | given, a linearly distributed random value in the range [\-M..+M] is |
ba4a90fd FCE |
274 | added to N every time the handler is run. N is restricted to a |
275 | reasonable range (1 to around a million), and M is restricted to be | |
276 | smaller than N. There are no target variables provided in either | |
277 | context. It is possible for such probes to be run concurrently on | |
278 | a multi-processor computer. | |
422d1ceb | 279 | .PP |
197a4d62 | 280 | Alternatively, intervals may be specified in units of time. |
422d1ceb | 281 | There are two probe point variants similar to the jiffies timer: |
ca88561f | 282 | |
422d1ceb FCE |
283 | .SAMPLE |
284 | timer.ms(N) | |
285 | timer.ms(N).randomize(M) | |
286 | .ESAMPLE | |
ca88561f | 287 | |
197a4d62 JS |
288 | Here, N and M are specified in milliseconds, but the full options for units |
289 | are seconds (s/sec), milliseconds (ms/msec), microseconds (us/usec), | |
290 | nanoseconds (ns/nsec), and hertz (hz). Randomization is not supported for | |
291 | hertz timers. | |
292 | ||
293 | The actual resolution of the timers depends on the target kernel. For | |
294 | kernels prior to 2.6.17, timers are limited to jiffies resolution, so | |
295 | intervals are rounded up to the nearest jiffies interval. After 2.6.17, | |
296 | the implementation uses hrtimers for tighter precision, though the actual | |
297 | resolution will be arch-dependent. In either case, if the "randomize" | |
298 | component is given, then the random value will be added to the interval | |
299 | before any rounding occurs. | |
39e57ce0 | 300 | .PP |
ab8b5560 FCE |
301 | Profiling timers are also available to provide probes that execute on |
302 | all CPUs at the rate of the system tick (CONFIG_HZ). This probe takes | |
303 | no parameters. On some kernels, this is a one-concurrent-user-only or | |
e996e76a | 304 | disabled facility, resulting in error \-16 (EBUSY) during probe |
ab8b5560 | 305 | registration. |
ca88561f | 306 | |
39e57ce0 | 307 | .SAMPLE |
acf7bde9 | 308 | timer.profile.tick |
39e57ce0 | 309 | .ESAMPLE |
ca88561f | 310 | |
39e57ce0 FCE |
311 | Full context information of the interrupted process is available, making |
312 | this probe suitable for a time-based sampling profiler. | |
acf7bde9 SM |
313 | .PP |
314 | It is recommended to use the tapset probe | |
315 | .IR timer.profile | |
316 | rather than timer.profile.tick. This probe point behaves identically | |
317 | to timer.profile.tick when the underlying functionality is available, | |
318 | and falls back to using perf.sw.cpu_clock on some recent kernels which | |
319 | lack the corresponding profile timer facility. | |
ba4a90fd FCE |
320 | |
321 | .SS DWARF | |
322 | ||
323 | This family of probe points uses symbolic debugging information for | |
324 | the target kernel/module/program, as may be found in unstripped | |
325 | executables, or the separate | |
326 | .I debuginfo | |
327 | packages. They allow placement of probes logically into the execution | |
328 | path of the target program, by specifying a set of points in the | |
329 | source or object code. When a matching statement executes on any | |
330 | processor, the probe handler is run in that context. | |
331 | .PP | |
7c86df9f JL |
332 | Probe points in the DWARF family can be identified by the target kernel |
333 | module (or user process), source file, line number, function name, or | |
334 | some combination of these. | |
335 | .PP | |
336 | Here is a list of DWARF probe points currently supported: | |
ba4a90fd FCE |
337 | .SAMPLE |
338 | kernel.function(PATTERN) | |
b8da0ad1 | 339 | kernel.function(PATTERN).call |
7c86df9f JL |
340 | kernel.function(PATTERN).callee(PATTERN) |
341 | kernel.function(PATTERN).callees(DEPTH) | |
ba4a90fd | 342 | kernel.function(PATTERN).return |
b8da0ad1 | 343 | kernel.function(PATTERN).inline |
592470cd | 344 | kernel.function(PATTERN).label(LPATTERN) |
ba4a90fd | 345 | module(MPATTERN).function(PATTERN) |
b8da0ad1 | 346 | module(MPATTERN).function(PATTERN).call |
7c86df9f JL |
347 | module(MPATTERN).function(PATTERN).callee(PATTERN) |
348 | module(MPATTERN).function(PATTERN).callees(DEPTH) | |
ba4a90fd | 349 | module(MPATTERN).function(PATTERN).return |
b8da0ad1 | 350 | module(MPATTERN).function(PATTERN).inline |
2cab6244 | 351 | module(MPATTERN).function(PATTERN).label(LPATTERN) |
ba4a90fd | 352 | kernel.statement(PATTERN) |
37ebca01 | 353 | kernel.statement(ADDRESS).absolute |
ba4a90fd | 354 | module(MPATTERN).statement(PATTERN) |
6f017dee | 355 | process("PATH").function("NAME") |
6f017dee | 356 | process("PATH").statement("*@FILE.c:123") |
b73a1293 | 357 | process("PATH").library("PATH").function("NAME") |
b73a1293 | 358 | process("PATH").library("PATH").statement("*@FILE.c:123") |
6f017dee | 359 | process("PATH").function("*").return |
6f017dee | 360 | process("PATH").function("myfun").label("foo") |
7c86df9f JL |
361 | process("PATH").function("foo").callee("bar") |
362 | process("PATH").function("foo").callees(DEPTH) | |
5fa99496 | 363 | process(PID).statement(ADDRESS).absolute |
ba4a90fd | 364 | .ESAMPLE |
6f017dee FCE |
365 | (See the USER-SPACE section below for more information on the process |
366 | probes.) | |
7c86df9f JL |
367 | .PP |
368 | The list above includes multiple variants and modifiers which provide | |
369 | additional functionality or filters. They are: | |
370 | .RS | |
371 | .TP | |
372 | \fB.function\fR | |
373 | Places a probe near the beginning of the named function, so that | |
374 | parameters are available as context variables. | |
375 | .TP | |
376 | \fB.return\fR | |
377 | Places a probe at the moment \fBafter\fR the return from the named | |
378 | function, so the return value is available as the "$return" context | |
379 | variable. | |
380 | .TP | |
381 | \fB.inline\fR | |
382 | Filters the results to include only instances of inlined functions. Note | |
7f357865 | 383 | that inlined functions do not have an identifiable return point, so |
7c86df9f JL |
384 | \fB.return\fR is not supported on \fB.inline\fR probes. |
385 | .TP | |
386 | \fB.call\fR | |
387 | Filters the results to include only non-inlined functions (the opposite | |
388 | set of \fB.inline\fR) | |
389 | .TP | |
390 | \fB.exported\fR | |
391 | Filters the results to include only exported functions. | |
392 | .TP | |
7c86df9f | 393 | \fB.statement\fR |
7f357865 | 394 | Places a probe at the exact spot, exposing those local variables that |
7c86df9f JL |
395 | are visible there. |
396 | .TP | |
397 | \fB.callee\fR | |
398 | Places a probe on the callee function given in the \fB.callee\fR | |
399 | modifier, where the callee must be a function called by the target | |
400 | function given in \fB.function\fR. The advantage of doing this over | |
401 | directly probing the callee function is that this probe point is run | |
402 | only when the callee is called from the target function (add the | |
403 | -DSTAP_CALLEE_MATCHALL directive to override this when calling | |
404 | \fBstap\fR(1)). | |
405 | ||
406 | Note that only callees that can be statically determined are available. | |
407 | For example, calls through function pointers are not available. | |
408 | Additionally, calls to functions located in other objects (e.g. | |
074c54b6 JL |
409 | libraries) are not available (instead use another probe point). This |
410 | feature will only work for code compiled with GCC 4.7+. | |
7c86df9f JL |
411 | .TP |
412 | \fB.callees\fR | |
413 | Shortcut for \fB.callee("*")\fR, which places a probe on all callees of | |
414 | the function. | |
415 | .TP | |
416 | \fB.callees\fR(DEPTH) | |
417 | Recursively places probes on callees. For example, \fB.callees(2)\fR | |
418 | will probe both callees of the target function, as well as callees of | |
419 | those callees. And \fB.callees(3)\fR goes one level deeper, etc... | |
420 | A callee probe at depth N is only triggered when the N callers in the | |
421 | callstack match those that were statically determined during analysis | |
422 | (this also may be overriden using -DSTAP_CALLEE_MATCHALL). | |
423 | .RE | |
424 | .PP | |
425 | In the above list of probe points, MPATTERN stands for a string literal | |
426 | that aims to identify the loaded kernel module of interest and LPATTERN | |
427 | stands for a source program label. Both MPATTERN and LPATTERN may | |
428 | include the "*" "[]", and "?" wildcards. | |
592470cd | 429 | PATTERN stands for a string literal that |
6f05b6ab | 430 | aims to identify a point in the program. It is made up of three |
ca88561f MM |
431 | parts: |
432 | .IP \(bu 4 | |
433 | The first part is the name of a function, as would appear in the | |
ba4a90fd FCE |
434 | .I nm |
435 | program's output. This part may use the "*" and "?" wildcarding | |
ca88561f MM |
436 | operators to match multiple names. |
437 | .IP \(bu 4 | |
438 | The second part is optional and begins with the "@" character. | |
439 | It is followed by the path to the source file containing the function, | |
440 | which may include a wildcard pattern, such as mm/slab*. | |
79640c29 | 441 | If it does not match as is, an implicit "*/" is optionally added |
ea384b8c | 442 | .I before |
79640c29 FCE |
443 | the pattern, so that a script need only name the last few components |
444 | of a possibly long source directory path. | |
ca88561f | 445 | .IP \(bu 4 |
ba4a90fd | 446 | Finally, the third part is optional if the file name part was given, |
1bd128a3 SC |
447 | and identifies the line number in the source file preceded by a ":" |
448 | or a "+". The line number is assumed to be an | |
449 | absolute line number if preceded by a ":", or relative to the entry of | |
99a5f9cf SC |
450 | the function if preceded by a "+". |
451 | All the lines in the function can be matched with ":*". | |
f7470174 | 452 | A range of lines x through y can be matched with ":x\-y". |
ca88561f | 453 | .PP |
ba4a90fd | 454 | As an alternative, PATTERN may be a numeric constant, indicating an |
ea384b8c FCE |
455 | address. Such an address may be found from symbol tables of the |
456 | appropriate kernel / module object file. It is verified against | |
457 | known statement code boundaries, and will be relocated for use at | |
458 | run time. | |
459 | .PP | |
460 | In guru mode only, absolute kernel-space addresses may be specified with | |
461 | the ".absolute" suffix. Such an address is considered already relocated, | |
462 | as if it came from | |
463 | .BR /proc/kallsyms , | |
464 | so it cannot be checked against statement/instruction boundaries. | |
6f017dee FCE |
465 | .SS CONTEXT VARIABLES |
466 | ||
ba4a90fd | 467 | .PP |
6f017dee | 468 | Many of the source-level context variables, such as function parameters, |
ba4a90fd FCE |
469 | locals, globals visible in the compilation unit, may be visible to |
470 | probe handlers. They may refer to these variables by prefixing their | |
471 | name with "$" within the scripts. In addition, a special syntax | |
6f017dee FCE |
472 | allows limited traversal of structures, pointers, and arrays. More |
473 | syntax allows pretty-printing of individual variables or their groups. | |
474 | See also | |
475 | .BR @cast . | |
f8b9be91 FCE |
476 | Note that variables may be inaccessible due to them being paged out, |
477 | or for a few other reasons. See also man | |
478 | .IR error::fault (7stap). | |
6f017dee | 479 | |
ba4a90fd FCE |
480 | .TP |
481 | $var | |
482 | refers to an in-scope variable "var". If it's an integer-like type, | |
7b9361d5 FCE |
483 | it will be cast to a 64-bit int for systemtap script use. String-like |
484 | pointers (char *) may be copied to systemtap string values using the | |
485 | .IR kernel_string " or " user_string | |
486 | functions. | |
ba4a90fd | 487 | .TP |
179a00c3 MW |
488 | @var("varname") |
489 | an alternative syntax for | |
490 | .IR $varname | |
491 | . | |
492 | .TP | |
493 | @var("varname@src/file.c") | |
494 | refers to the global (either file local or external) variable | |
495 | .IR varname | |
496 | defined when the file | |
497 | .IR src/file.c | |
498 | was compiled. The CU in which the variable is resolved is the first CU | |
499 | in the module of the probe point which matches the given file name at | |
500 | the end and has the shortest file name path (e.g. given | |
501 | .IR @var("foo@bar/baz.c") | |
502 | and CUs with file name paths | |
503 | .IR src/sub/module/bar/baz.c | |
504 | and | |
505 | .IR src/bar/baz.c | |
506 | the second CU will be chosen to resolve the (file) global variable | |
507 | .IR foo | |
508 | . | |
509 | .TP | |
ab5e90c2 FCE |
510 | $var\->field traversal via a structure's or a pointer's field. This |
511 | generalized indirection operator may be repeated to follow more | |
512 | levels. Note that the | |
513 | .IR . | |
514 | operator is not used for plain structure | |
515 | members, only | |
516 | .IR \-> | |
517 | for both purposes. (This is because "." is reserved for string | |
518 | concatenation.) | |
ba4a90fd | 519 | .TP |
a43ba433 FCE |
520 | $return |
521 | is available in return probes only for functions that are declared | |
462a0d51 | 522 | with a return value, which can be determined using @defined($return). |
a43ba433 | 523 | .TP |
ba4a90fd | 524 | $var[N] |
33b081c5 JS |
525 | indexes into an array. The index given with a literal number or even |
526 | an arbitrary numeric expression. | |
6f017dee FCE |
527 | .PP |
528 | A number of operators exist for such basic context variable expressions: | |
34af38db | 529 | .TP |
2cb3fe26 SC |
530 | $$vars |
531 | expands to a character string that is equivalent to | |
6f017dee FCE |
532 | .SAMPLE |
533 | sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x", | |
534 | parm1, ..., parmN, var1, ..., varN) | |
535 | .ESAMPLE | |
536 | for each variable in scope at the probe point. Some values may be | |
537 | printed as | |
538 | .IR =? | |
539 | if their run-time location cannot be found. | |
2cb3fe26 SC |
540 | .TP |
541 | $$locals | |
a43ba433 | 542 | expands to a subset of $$vars for only local variables. |
2cb3fe26 SC |
543 | .TP |
544 | $$parms | |
a43ba433 FCE |
545 | expands to a subset of $$vars for only function parameters. |
546 | .TP | |
547 | $$return | |
548 | is available in return probes only. It expands to a string that | |
fd574705 | 549 | is equivalent to sprintf("return=%x", $return) |
a43ba433 | 550 | if the probed function has a return value, or else an empty string. |
6f017dee FCE |
551 | .TP |
552 | & $EXPR | |
553 | expands to the address of the given context variable expression, if it | |
554 | is addressable. | |
555 | .TP | |
556 | @defined($EXPR) | |
557 | expands to 1 or 0 iff the given context variable expression is resolvable, | |
558 | for use in conditionals such as | |
559 | .SAMPLE | |
f7470174 | 560 | @defined($foo\->bar) ? $foo\->bar : 0 |
6f017dee FCE |
561 | .ESAMPLE |
562 | .TP | |
563 | $EXPR$ | |
564 | expands to a string with all of $EXPR's members, equivalent to | |
565 | .SAMPLE | |
566 | sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}", | |
567 | $EXPR\->a, $EXPR\->b) | |
568 | .ESAMPLE | |
569 | .TP | |
570 | $EXPR$$ | |
571 | expands to a string with all of $var's members and submembers, equivalent to | |
572 | .SAMPLE | |
573 | sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}", | |
574 | $EXPR\->a, $EXPR\->b, $EXPR\->c\->x, $EXPR\->c\->y, $EXPR\->d[0]) | |
575 | .ESAMPLE | |
576 | ||
3f5a5bb1 FCE |
577 | .SS MORE ON RETURN PROBES |
578 | ||
579 | .PP | |
580 | For the kernel ".return" probes, only a certain fixed number of | |
581 | returns may be outstanding. The default is a relatively small number, | |
582 | on the order of a few times the number of physical CPUs. If many | |
583 | different threads concurrently call the same blocking function, such | |
584 | as futex(2) or read(2), this limit could be exceeded, and skipped | |
e996e76a | 585 | "kretprobes" would be reported by "stap \-t". To work around this, |
3f5a5bb1 FCE |
586 | specify a |
587 | .SAMPLE | |
588 | probe FOO.return.maxactive(NNN) | |
589 | .ESAMPLE | |
590 | suffix, with a large enough NNN to cover all expected concurrently blocked | |
591 | threads. Alternately, use the | |
592 | .SAMPLE | |
e996e76a | 593 | stap \-DKRETACTIVE=NNNN |
3f5a5bb1 FCE |
594 | .ESAMPLE |
595 | stap command line macro setting to override the default for all | |
596 | ".return" probes. | |
1c0b8e23 | 597 | |
39e3139a | 598 | .PP |
1c0b8e23 FCE |
599 | For ".return" probes, context variables other than the "$return" may |
600 | be accessible, as a convenience for a script programmer wishing to | |
601 | access function parameters. These values are \fBsnapshots\fP | |
602 | taken at the time of function entry. Local variables within the | |
603 | function are \fBnot\fP generally accessible, since those variables did | |
604 | not exist in allocated/initialized form at the snapshot moment. | |
8cc799a5 | 605 | .PP |
1c0b8e23 FCE |
606 | In addition, arbitrary entry-time expressions can also be saved for |
607 | ".return" probes using the | |
8cc799a5 JS |
608 | .IR @entry(expr) |
609 | operator. For example, one can compute the elapsed time of a function: | |
610 | .SAMPLE | |
611 | probe kernel.function("do_filp_open").return { | |
612 | println( get_timeofday_us() \- @entry(get_timeofday_us()) ) | |
613 | } | |
614 | .ESAMPLE | |
39e3139a | 615 | |
1c0b8e23 FCE |
616 | .PP |
617 | The following table summarizes how values related to a function | |
618 | parameter context variable, a pointer named \fBaddr\fP, may be | |
619 | accessed from a | |
620 | .IR .return | |
621 | probe. | |
622 | .\" summarized from http://sourceware.org/ml/systemtap/2012-q1/msg00025.html | |
623 | .TS | |
624 | l l l. | |
625 | \fBat-entry value past-exit value\fP | |
626 | ||
627 | $addr \fInot available\fP | |
628 | $addr->x->y @cast(@entry($addr),"struct zz")->x->y | |
629 | $addr[0] {kernel,user}_{char,int,...}(& $addr[0]) | |
630 | .TE | |
631 | ||
ba4a90fd | 632 | |
94c3c803 AM |
633 | .SS DWARFLESS |
634 | In absence of debugging information, entry & exit points of kernel & module | |
635 | functions can be probed using the "kprobe" family of probes. | |
636 | However, these do not permit looking up the arguments / local variables | |
637 | of the function. | |
638 | Following constructs are supported : | |
639 | .SAMPLE | |
640 | kprobe.function(FUNCTION) | |
3c57fe1f | 641 | kprobe.function(FUNCTION).call |
94c3c803 AM |
642 | kprobe.function(FUNCTION).return |
643 | kprobe.module(NAME).function(FUNCTION) | |
3c57fe1f | 644 | kprobe.module(NAME).function(FUNCTION).call |
94c3c803 AM |
645 | kprobe.module(NAME).function(FUNCTION).return |
646 | kprobe.statement.(ADDRESS).absolute | |
647 | .ESAMPLE | |
648 | .PP | |
649 | Probes of type | |
650 | .B function | |
651 | are recommended for kernel functions, whereas probes of type | |
652 | .B module | |
653 | are recommended for probing functions of the specified module. | |
654 | In case the absolute address of a kernel or module function is known, | |
655 | .B statement | |
656 | probes can be utilized. | |
657 | .PP | |
658 | Note that | |
659 | .I FUNCTION | |
660 | and | |
661 | .I MODULE | |
662 | names | |
663 | .B must not | |
664 | contain wildcards, or the probe will not be registered. | |
665 | Also, statement probes must be run under guru-mode only. | |
666 | ||
667 | ||
1ada6f08 | 668 | .SS USER-SPACE |
38e96af8 FCE |
669 | Support for user-space probing is available for kernels that are |
670 | configured with the utrace extensions, or have the uprobes facility in | |
671 | linux 3.5. (Various kernel build configuration options need to be | |
672 | enabled; systemtap will advise if these are missing.) | |
673 | ||
0a1c696d FCE |
674 | .PP |
675 | There are several forms. First, a non-symbolic probe point: | |
1ada6f08 FCE |
676 | .SAMPLE |
677 | process(PID).statement(ADDRESS).absolute | |
678 | .ESAMPLE | |
679 | is analogous to | |
680 | .IR | |
681 | kernel.statement(ADDRESS).absolute | |
682 | in that both use raw (unverified) virtual addresses and provide | |
683 | no $variables. The target PID parameter must identify a running | |
684 | process, and ADDRESS should identify a valid instruction address. | |
685 | All threads of that process will be probed. | |
29cb9b42 | 686 | .PP |
0a1c696d FCE |
687 | Second, non-symbolic user-kernel interface events handled by |
688 | utrace may be probed: | |
29cb9b42 | 689 | .SAMPLE |
dd078c96 | 690 | process(PID).begin |
82f0e81b | 691 | process("FULLPATH").begin |
986e98de | 692 | process.begin |
dd078c96 | 693 | process(PID).thread.begin |
82f0e81b | 694 | process("FULLPATH").thread.begin |
986e98de | 695 | process.thread.begin |
dd078c96 | 696 | process(PID).end |
82f0e81b | 697 | process("FULLPATH").end |
986e98de | 698 | process.end |
dd078c96 | 699 | process(PID).thread.end |
82f0e81b | 700 | process("FULLPATH").thread.end |
986e98de | 701 | process.thread.end |
29cb9b42 | 702 | process(PID).syscall |
82f0e81b | 703 | process("FULLPATH").syscall |
986e98de | 704 | process.syscall |
29cb9b42 | 705 | process(PID).syscall.return |
82f0e81b | 706 | process("FULLPATH").syscall.return |
986e98de | 707 | process.syscall.return |
0afb7073 | 708 | process(PID).insn |
82f0e81b | 709 | process("FULLPATH").insn |
0afb7073 | 710 | process(PID).insn.block |
82f0e81b | 711 | process("FULLPATH").insn.block |
29cb9b42 DS |
712 | .ESAMPLE |
713 | .PP | |
714 | A | |
dd078c96 | 715 | .B .begin |
82f0e81b | 716 | probe gets called when new process described by PID or FULLPATH gets created. |
29cb9b42 | 717 | A |
dd078c96 | 718 | .B .thread.begin |
82f0e81b | 719 | probe gets called when a new thread described by PID or FULLPATH gets created. |
159cb109 | 720 | A |
dd078c96 | 721 | .B .end |
82f0e81b | 722 | probe gets called when process described by PID or FULLPATH dies. |
dd078c96 DS |
723 | A |
724 | .B .thread.end | |
82f0e81b | 725 | probe gets called when a thread described by PID or FULLPATH dies. |
29cb9b42 DS |
726 | A |
727 | .B .syscall | |
82f0e81b | 728 | probe gets called when a thread described by PID or FULLPATH makes a |
6270adc1 MH |
729 | system call. The system call number is available in the |
730 | .BR $syscall | |
731 | context variable, and the first 6 arguments of the system call | |
732 | are available in the | |
733 | .BR $argN | |
734 | (ex. $arg1, $arg2, ...) context variable. | |
29cb9b42 DS |
735 | A |
736 | .B .syscall.return | |
82f0e81b | 737 | probe gets called when a thread described by PID or FULLPATH returns from a |
5d67b47c MH |
738 | system call. The system call number is available in the |
739 | .BR $syscall | |
740 | context variable, and the return value of the system call is available | |
741 | in the | |
742 | .BR $return | |
29cb9b42 | 743 | context variable. |
a96d1db0 | 744 | A |
0afb7073 | 745 | .B .insn |
82f0e81b | 746 | probe gets called for every single-stepped instruction of the process described by PID or FULLPATH. |
0afb7073 FCE |
747 | A |
748 | .B .insn.block | |
82f0e81b FCE |
749 | probe gets called for every block-stepped instruction of the process described by PID or FULLPATH. |
750 | .PP | |
751 | If a process probe is specified without a PID or FULLPATH, all user | |
752 | threads will be probed. However, if systemtap was invoked with the | |
f7470174 | 753 | .IR \-c " or " \-x |
82f0e81b | 754 | options, then process probes are restricted to the process |
6d5d594e | 755 | hierarchy associated with the target process. If a process probe is |
fc18e6c4 | 756 | unspecified (i.e. without a PID or FULLPATH), but with the |
6d5d594e LB |
757 | .IR \-c " |
758 | option, the PATH of the | |
759 | .IR \-c " | |
fc18e6c4 JL |
760 | cmd will be heuristically filled into the process PATH. In that case, |
761 | only command parameters are allowed in the \fI-c\fR command (i.e. no | |
762 | command substitution allowed and no occurrences of any of these | |
763 | characters: '|&;<>(){}'). | |
0a1c696d FCE |
764 | |
765 | .PP | |
766 | Third, symbolic static instrumentation compiled into programs and | |
767 | shared libraries may be | |
768 | probed: | |
769 | .SAMPLE | |
770 | process("PATH").mark("LABEL") | |
a794dbeb | 771 | process("PATH").provider("PROVIDER").mark("LABEL") |
0a1c696d FCE |
772 | .ESAMPLE |
773 | .PP | |
f28a8c28 SC |
774 | A |
775 | .B .mark | |
776 | probe gets called via a static probe which is defined in the | |
38e96af8 FCE |
777 | application by STAP_PROBE1(PROVIDER,LABEL,arg1), which are macros defined in |
778 | .BR sys/sdt.h . | |
779 | The PROVIDER is an arbitrary application identifier, LABEL is the | |
780 | marker site identifier, and arg1 is the integer-typed argument. | |
781 | STAP_PROBE1 is used for probes with 1 argument, STAP_PROBE2 is used | |
782 | for probes with 2 arguments, and so on. The arguments of the probe | |
783 | are available in the context variables $arg1, $arg2, ... An | |
784 | alternative to using the STAP_PROBE macros is to use the dtrace script | |
785 | to create custom macros. Additionally, the variables $$name and | |
786 | $$provider are available as parts of the probe point name. The | |
787 | .B sys/sdt.h | |
788 | macro names DTRACE_PROBE* are available as aliases for STAP_PROBE*. | |
0a1c696d | 789 | |
29cb9b42 | 790 | .PP |
38e96af8 FCE |
791 | Finally, full symbolic source-level probes in user-space programs and |
792 | shared libraries are supported. These are exactly analogous to the | |
793 | symbolic DWARF-based kernel/module probes described above. They | |
794 | expose the same sorts of context $variables for function parameters, | |
795 | local variables, and so on. | |
0a1c696d FCE |
796 | .SAMPLE |
797 | process("PATH").function("NAME") | |
798 | process("PATH").statement("*@FILE.c:123") | |
4d0fcb93 SC |
799 | process("PATH").plt("NAME") |
800 | process("PATH").library("PATH").plt("NAME") | |
b73a1293 SC |
801 | process("PATH").library("PATH").function("NAME") |
802 | process("PATH").library("PATH").statement("*@FILE.c:123") | |
0a1c696d FCE |
803 | process("PATH").function("*").return |
804 | process("PATH").function("myfun").label("foo") | |
7c86df9f | 805 | process("PATH").function("foo").callee("bar") |
0a1c696d FCE |
806 | .ESAMPLE |
807 | ||
808 | .PP | |
809 | Note that for all process probes, | |
29cb9b42 | 810 | .I PATH |
ea384b8c FCE |
811 | names refer to executables that are searched the same way shells do: relative |
812 | to the working directory if they contain a "/" character, otherwise in | |
813 | .BR $PATH . | |
d1bcbe71 RH |
814 | If PATH names refer to scripts, the actual interpreters (specified in the |
815 | script in the first line after the #! characters) are probed. | |
78683caf JL |
816 | |
817 | .PP | |
b73a1293 | 818 | If PATH is a process component parameter referring to shared libraries |
78683caf JL |
819 | then all processes that map it at runtime would be selected for probing. |
820 | If PATH is a library component parameter referring to shared libraries | |
821 | then the process specified by the process component would be selected. | |
822 | Note that the PATH pattern in a library component will always apply to | |
823 | libraries statically determined to be in use by the process. However, | |
824 | you may also specify the full path to any library file even if not | |
825 | statically needed by the process. | |
79dc1dee FCE |
826 | |
827 | .PP | |
828 | A .plt probe will probe functions in the program linkage table | |
4d0fcb93 | 829 | corresponding to the rest of the probe point. .plt can be specified |
79dc1dee FCE |
830 | as a shorthand for .plt("*"). The symbol name is available as a |
831 | $$name context variable; function arguments are not available, since | |
832 | PLTs are processed without debuginfo. | |
833 | ||
834 | .PP | |
82f0e81b FCE |
835 | If the PATH string contains wildcards as in the MPATTERN case, then |
836 | standard globbing is performed to find all matching paths. In this | |
837 | case, the | |
838 | .BR $PATH | |
839 | environment variable is not used. | |
840 | ||
841 | .PP | |
153e7a22 FCE |
842 | If systemtap was invoked with the |
843 | .IR \-c " or " \-x | |
760695db FCE |
844 | options, then process probes are restricted to the process |
845 | hierarchy associated with the target process. | |
1ada6f08 | 846 | |
982026f1 SM |
847 | .SS JAVA |
848 | Support for probing Java methods is available using Byteman as a | |
849 | backend. Byteman is an instrumentation tool from the JBoss project | |
850 | which systemtap can use to monitor invocations for a specific method | |
851 | or line in a Java program. | |
852 | .PP | |
853 | Systemtap does so by generating a Byteman script listing the probes to | |
854 | instrument and then invoking the Byteman | |
855 | .IR bminstall | |
d885563b | 856 | utility. |
982026f1 | 857 | .PP |
768754f8 | 858 | This Java instrumentation support is currently a prototype feature |
d885563b FCE |
859 | with major limitations. Moreover, Java probing currently does not |
860 | work across users; the stap script must run (with appropriate | |
861 | permissions) under the same user that the Java process being | |
862 | probed. (Thus a stap script under root currently cannot probe Java | |
863 | methods in a non-root-user Java process.) | |
982026f1 SM |
864 | |
865 | .PP | |
866 | The first probe type refers to Java processes by the name of the Java process: | |
867 | .SAMPLE | |
868 | java("PNAME").class("CLASSNAME").method("PATTERN") | |
869 | java("PNAME").class("CLASSNAME").method("PATTERN").return | |
870 | .ESAMPLE | |
269cd0ae LB |
871 | The PNAME argument must be a pre-existing jvm pid, and be identifiable |
872 | via a jps listing. | |
873 | .PP | |
982026f1 SM |
874 | The PATTERN parameter specifies the signature of the Java method to |
875 | probe. The signature must consist of the exact name of the method, | |
876 | followed by a bracketed list of the types of the arguments, for | |
877 | instance "myMethod(int,double,Foo)". Wildcards are not supported. | |
878 | .PP | |
879 | The probe can be set to trigger at a specific line within the method | |
880 | by appending a line number with colon, just as in other types of | |
881 | probes: "myMethod(int,double,Foo):245". | |
882 | .PP | |
883 | The CLASSNAME parameter identifies the Java class the method belongs | |
884 | to, either with or without the package qualification. By default, the | |
885 | probe only triggers on descendants of the class that do not override | |
886 | the method definition of the original class. However, CLASSNAME can | |
887 | take an optional caret prefix, as in | |
888 | .IR ^org.my.MyClass, | |
889 | which specifies that the probe should also trigger on all descendants | |
890 | of MyClass that override the original method. For instance, every method | |
891 | with signature foo(int) in program org.my.MyApp can be probed at once using | |
892 | .SAMPLE | |
893 | java("org.my.MyApp").class("^java.lang.Object").method("foo(int)") | |
894 | .ESAMPLE | |
895 | .PP | |
896 | The second probe type works analogously, but refers to Java processes by PID: | |
897 | .SAMPLE | |
898 | java(PID).class("CLASSNAME").method("PATTERN") | |
899 | java(PID).class("CLASSNAME").method("PATTERN").return | |
900 | .ESAMPLE | |
901 | (PIDs for an already running process can be obtained using the | |
902 | .IR jps (1) | |
903 | utility.) | |
a26d56a4 SM |
904 | .PP |
905 | Context variables defined within java probes include | |
a26d56a4 SM |
906 | .IR $arg1 |
907 | through | |
908 | .IR $arg10 | |
d885563b | 909 | (for up to the first 10 arguments of a method), represented as integers or strings. |
982026f1 | 910 | |
9cb48751 DS |
911 | .SS PROCFS |
912 | ||
913 | These probe points allow procfs "files" in | |
c243f608 LB |
914 | /proc/systemtap/MODNAME to be created, read and written using a |
915 | permission that may be modified using the proper umask value. Default permissions are 0400 for read | |
916 | probes, and 0200 for write probes. If both a read and write probe are being | |
917 | used on the same file, a default permission of 0600 will be used. | |
918 | Using procfs.umask(0040).read would | |
919 | result in a 0404 permission set for the file. | |
9cb48751 DS |
920 | .RI ( MODNAME |
921 | is the name of the systemtap module). The | |
922 | .I proc | |
923 | filesystem is a pseudo-filesystem which is used an an interface to | |
c243f608 | 924 | kernel data structures. There are several probe point variants supported |
9cb48751 | 925 | by the translator: |
ca88561f | 926 | |
9cb48751 DS |
927 | .SAMPLE |
928 | procfs("PATH").read | |
c243f608 | 929 | procfs("PATH").umask(UMASK).read |
38975255 | 930 | procfs("PATH").read.maxsize(MAXSIZE) |
c243f608 | 931 | procfs("PATH").umask(UMASK).maxsize(MAXSIZE) |
9cb48751 | 932 | procfs("PATH").write |
c243f608 | 933 | procfs("PATH").umask(UMASK).write |
9cb48751 | 934 | procfs.read |
c243f608 | 935 | procfs.umask(UMASK).read |
38975255 | 936 | procfs.read.maxsize(MAXSIZE) |
c243f608 | 937 | procfs.umask(UMASK).read.maxsize(MAXSIZE) |
9cb48751 | 938 | procfs.write |
c243f608 | 939 | procfs.umask(UMASK).write |
9cb48751 | 940 | .ESAMPLE |
ca88561f | 941 | |
9cb48751 DS |
942 | .I PATH |
943 | is the file name (relative to /proc/systemtap/MODNAME) to be created. | |
944 | If no | |
945 | .I PATH | |
946 | is specified (as in the last two variants above), | |
947 | .I PATH | |
948 | defaults to "command". | |
949 | .PP | |
950 | When a user reads /proc/systemtap/MODNAME/PATH, the corresponding | |
951 | procfs | |
952 | .I read | |
953 | probe is triggered. The string data to be read should be assigned to | |
954 | a variable named | |
955 | .IR $value , | |
956 | like this: | |
ca88561f | 957 | |
9cb48751 DS |
958 | .SAMPLE |
959 | procfs("PATH").read { $value = "100\\n" } | |
960 | .ESAMPLE | |
961 | .PP | |
962 | When a user writes into /proc/systemtap/MODNAME/PATH, the | |
963 | corresponding procfs | |
964 | .I write | |
965 | probe is triggered. The data the user wrote is available in the | |
966 | string variable named | |
967 | .IR $value , | |
968 | like this: | |
ca88561f | 969 | |
9cb48751 DS |
970 | .SAMPLE |
971 | procfs("PATH").write { printf("user wrote: %s", $value) } | |
972 | .ESAMPLE | |
38975255 DS |
973 | .PP |
974 | .I MAXSIZE | |
975 | is the size of the procfs read buffer. Specifying | |
976 | .I MAXSIZE | |
977 | allows larger procfs output. If no | |
978 | .I MAXSIZE | |
979 | is specified, the procfs read buffer defaults to | |
980 | .I STP_PROCFS_BUFSIZE | |
981 | (which defaults to | |
982 | .IR MAXSTRINGLEN , | |
983 | the maximum length of a string). | |
984 | If setting the procfs read buffers for more than one file is needed, | |
985 | it may be easiest to override the | |
986 | .I STP_PROCFS_BUFSIZE | |
987 | definition. | |
988 | Here's an example of using | |
989 | .IR MAXSIZE : | |
990 | ||
991 | .SAMPLE | |
992 | procfs.read.maxsize(1024) { | |
993 | $value = "long string..." | |
994 | $value .= "another long string..." | |
995 | $value .= "another long string..." | |
996 | $value .= "another long string..." | |
997 | } | |
998 | .ESAMPLE | |
9cb48751 | 999 | |
da00b50e SM |
1000 | .SS NETFILTER HOOKS |
1001 | ||
1002 | These probe points allow observation of network packets using the | |
1003 | netfilter mechanism. A netfilter probe in systemtap corresponds to a | |
1004 | netfilter hook function in the original netfilter probes API. It is | |
1005 | probably more convenient to use | |
1006 | .IR tapset::netfilter (3stap), | |
1007 | which wraps the primitive netfilter hooks and does the work of | |
1008 | extracting useful information from the context variables. | |
1009 | ||
1010 | .PP | |
1011 | There are several probe point variants supported by the translator: | |
1012 | ||
1013 | .SAMPLE | |
1014 | netfilter.hook("HOOKNAME").pf("PROTOCOL_F") | |
1015 | netfilter.pf("PROTOCOL_F").hook("HOOKNAME") | |
1016 | netfilter.hook("HOOKNAME").pf("PROTOCOL_F").priority("PRIORITY") | |
1017 | netfilter.pf("PROTOCOL_F").hook("HOOKNAME").priority("PRIORITY") | |
1018 | .ESAMPLE | |
1019 | ||
1020 | .PP | |
1021 | .I PROTOCOL_F | |
1022 | is the protocol family to listen for, currently one of | |
1023 | .I NFPROTO_IPV4, | |
1024 | .I NFPROTO_IPV6, | |
1025 | .I NFPROTO_ARP, | |
1026 | or | |
1027 | .I NFPROTO_BRIDGE. | |
1028 | ||
1029 | .PP | |
1030 | .I HOOKNAME | |
1031 | is the point, or 'hook', in the protocol stack at which to intercept | |
1032 | the packet. The available hook names for each protocol family are | |
1033 | taken from the kernel header files <linux/netfilter_ipv4.h>, | |
1034 | <linux/netfilter_ipv6.h>, <linux/netfilter_arp.h> and | |
1035 | <linux/netfilter_bridge.h>. For instance, allowable hook names for | |
1036 | .I NFPROTO_IPV4 | |
1037 | are | |
1038 | .I NF_INET_PRE_ROUTING, | |
1039 | .I NF_INET_LOCAL_IN, | |
1040 | .I NF_INET_FORWARD, | |
1041 | .I NF_INET_LOCAL_OUT, | |
1042 | and | |
1043 | .I NF_INET_POST_ROUTING. | |
1044 | ||
1045 | .PP | |
1046 | .I PRIORITY | |
1047 | is an integer priority giving the order in which the probe point | |
1048 | should be triggered relative to any other netfilter hook functions | |
1049 | which trigger on the same packet. Hook functions execute on each | |
1050 | packet in order from smallest priority number to largest priority number. If no | |
1051 | .I PRIORITY | |
1052 | is specified (as in the first two probe point variants above), | |
1053 | .I PRIORITY | |
1054 | defaults to "0". | |
1055 | ||
1056 | There are a number of predefined priority names of the form | |
1057 | .I NF_IP_PRI_* | |
1058 | and | |
1059 | .I NF_IP6_PRI_* | |
1060 | which are defined in the kernel header files <linux/netfilter_ipv4.h> and <linux/netfilter_ipv6.h> respectively. The script is permitted to use these | |
1061 | instead of specifying an integer priority. (The probe points for | |
1062 | .I NFPROTO_ARP | |
1063 | and | |
1064 | .I NFPROTO_BRIDGE | |
1065 | currently do not expose any named hook priorities to the script writer.) | |
1066 | Thus, allowable ways to specify the priority include: | |
1067 | ||
1068 | .SAMPLE | |
1069 | priority("255") | |
1070 | priority("NF_IP_PRI_SELINUX_LAST") | |
1071 | .ESAMPLE | |
1072 | ||
1073 | A script using guru mode is permitted to specify any identifier or | |
1074 | number as the parameter for hook, pf, and priority. This feature | |
1075 | should be used with caution, as the parameter is inserted verbatim into | |
1076 | the C code generated by systemtap. | |
1077 | ||
1078 | The netfilter probe points define the following context variables: | |
1079 | .TP | |
4d914c37 FCE |
1080 | .IR $hooknum |
1081 | The hook number. | |
1082 | .TP | |
da00b50e SM |
1083 | .IR $skb |
1084 | The address of the sk_buff struct representing the packet. See | |
1085 | <linux/skbuff.h> for details on how to use this struct, or | |
1086 | alternatively use the tapset | |
1087 | .IR tapset::netfilter (3stap) | |
1088 | for easy access to key information. | |
1089 | ||
1090 | .TP | |
1091 | .IR $in | |
1092 | The address of the net_device struct representing the network device | |
1093 | on which the packet was received (if any). May be 0 if the device is | |
1094 | unknown or undefined at that stage in the protocol stack. | |
1095 | ||
1096 | .TP | |
1097 | .IR $out | |
1098 | The address of the net_device struct representing the network device | |
1099 | on which the packet will be sent (if any). May be 0 if the device is | |
1100 | unknown or undefined at that stage in the protocol stack. | |
1101 | ||
1102 | .TP | |
1103 | .IR $verdict | |
1104 | (Guru mode only.) Assigning one of the verdict values defined in | |
1105 | <linux/netfilter.h> to this variable alters the further progress of | |
1106 | the packet through the protocol stack. For instance, the following | |
1107 | guru mode script forces all ipv6 network packets to be dropped: | |
1108 | ||
1109 | .SAMPLE | |
1110 | probe netfilter.pf("NFPROTO_IPV6").hook("NF_IP6_PRE_ROUTING") { | |
c49ffe6c | 1111 | $verdict = 0 /* nf_drop */ |
da00b50e SM |
1112 | } |
1113 | .ESAMPLE | |
1114 | ||
c49ffe6c SM |
1115 | For convenience, unlike the primitive probe points discussed here, the |
1116 | probes defined in | |
1117 | .IR tapset::netfilter (3stap) | |
1118 | export the lowercase names of the verdict constants (e.g. NF_DROP | |
1119 | becomes nf_drop) as local variables. | |
1120 | ||
6032e2ce | 1121 | .SS KERNEL TRACEPOINTS |
bc724b8b JS |
1122 | |
1123 | This family of probe points hooks up to static probing tracepoints | |
1124 | inserted into the kernel or modules. As with markers, these | |
1125 | tracepoints are special macro calls inserted by kernel developers to | |
1126 | make probing faster and more reliable than with DWARF-based probes, | |
1127 | and DWARF debugging information is not required to probe tracepoints. | |
1128 | Tracepoints have an extra advantage of more strongly-typed parameters | |
1129 | than markers. | |
1130 | ||
6032e2ce FCE |
1131 | Tracepoint probes look like: |
1132 | .BR kernel.trace("name") . | |
bc724b8b JS |
1133 | The tracepoint name string, which may contain the usual wildcard |
1134 | characters, is matched against the names defined by the kernel | |
1135 | developers in the tracepoint header files. | |
1136 | ||
1137 | The handler associated with a tracepoint-based probe may read the | |
1138 | optional parameters specified at the macro call site. These are | |
1139 | named according to the declaration by the tracepoint author. For | |
1140 | example, the tracepoint probe | |
1141 | .BR kernel.trace("sched_switch") | |
1142 | provides the parameters | |
1143 | .BR $rq ", " $prev ", and " $next . | |
1144 | If the parameter is a complex type, as in a struct pointer, then a | |
1145 | script can access fields with the same syntax as DWARF $target | |
1146 | variables. Also, tracepoint parameters cannot be modified, but in | |
1147 | guru-mode a script may modify fields of parameters. | |
1148 | ||
1149 | The name of the tracepoint is available in | |
1150 | .BR $$name , | |
1151 | and a string of name=value pairs for all parameters of the tracepoint | |
1152 | is available in | |
046e7190 | 1153 | .BR $$vars " or " $$parms . |
bc724b8b | 1154 | |
6032e2ce FCE |
1155 | .SS KERNEL MARKERS (OBSOLETE) |
1156 | ||
1157 | This family of probe points hooks up to an older style of static | |
1158 | probing markers inserted into older kernels or modules. These markers | |
1159 | are special STAP_MARK macro calls inserted by kernel developers to | |
1160 | make probing faster and more reliable than with DWARF-based probes. | |
1161 | Further, DWARF debugging information is | |
1162 | .I not | |
1163 | required to probe markers. | |
1164 | ||
1165 | Marker probe points begin with | |
1166 | .BR kernel . | |
1167 | The next part names the marker itself: | |
1168 | .BR mark("name") . | |
1169 | The marker name string, which may contain the usual wildcard characters, | |
1170 | is matched against the names given to the marker macros when the kernel | |
1171 | and/or module was compiled. Optionally, you can specify | |
1172 | .BR format("format") . | |
1173 | Specifying the marker format string allows differentiation between two | |
1174 | markers with the same name but different marker format strings. | |
1175 | ||
1176 | The handler associated with a marker-based probe may read the | |
1177 | optional parameters specified at the macro call site. These are | |
1178 | named | |
1179 | .BR $arg1 " through " $argNN , | |
1180 | where NN is the number of parameters supplied by the macro. Number | |
1181 | and string parameters are passed in a type-safe manner. | |
1182 | ||
1183 | The marker format string associated with a marker is available in | |
1184 | .BR $format . | |
1185 | And also the marker name string is available in | |
1186 | .BR $name . | |
1187 | ||
dd225250 PS |
1188 | .SS HARDWARE BREAKPOINTS |
1189 | This family of probes is used to set hardware watchpoints for a given | |
1190 | (global) kernel symbol. The probes take three components as inputs : | |
1191 | ||
1192 | 1. The | |
1193 | .BR virtual address / name | |
1194 | of the kernel symbol to be traced is supplied as argument to this class | |
1195 | of probes. ( Probes for only data segment variables are supported. Probing | |
1196 | local variables of a function cannot be done.) | |
1197 | ||
1198 | 2. Nature of access to be probed : | |
1199 | a. | |
1200 | .I .write | |
1201 | probe gets triggered when a write happens at the specified address/symbol | |
1202 | name. | |
1203 | b. | |
1204 | .I rw | |
1205 | probe is triggered when either a read or write happens. | |
1206 | ||
1207 | 3. | |
1208 | .BR .length | |
1209 | (optional) | |
1210 | Users have the option of specifying the address interval to be probed | |
1211 | using "length" constructs. The user-specified length gets approximated | |
1212 | to the closest possible address length that the architecture can | |
1213 | support. If the specified length exceeds the limits imposed by | |
1214 | architecture, an error message is flagged and probe registration fails. | |
1215 | Wherever 'length' is not specified, the translator requests a hardware | |
1216 | breakpoint probe of length 1. It should be noted that the "length" | |
1217 | construct is not valid with symbol names. | |
1218 | ||
1219 | Following constructs are supported : | |
1220 | .SAMPLE | |
1221 | probe kernel.data(ADDRESS).write | |
1222 | probe kernel.data(ADDRESS).rw | |
1223 | probe kernel.data(ADDRESS).length(LEN).write | |
1224 | probe kernel.data(ADDRESS).length(LEN).rw | |
1225 | probe kernel.data("SYMBOL_NAME").write | |
1226 | probe kernel.data("SYMBOL_NAME").rw | |
1227 | .ESAMPLE | |
1228 | ||
1229 | This set of probes make use of the debug registers of the processor, | |
1230 | which is a scarce resource. (4 on x86 , 1 on powerpc ) The script | |
1231 | translation flags a warning if a user requests more hardware breakpoint probes | |
1232 | than the limits set by architecture. For example,a pass-2 warning is flashed | |
1233 | when an input script requests 5 hardware breakpoint probes on an x86 | |
1234 | system while x86 architecture supports a maximum of 4 breakpoints. | |
1235 | Users are cautioned to set probes judiciously. | |
1236 | ||
9becfcef MW |
1237 | .SS PERF |
1238 | ||
f8b9be91 | 1239 | This family of probe points interfaces to the kernel "perf event" |
cb7d3cd8 | 1240 | infrastructure for controlling hardware performance counters. |
9becfcef MW |
1241 | The events being attached to are described by the "type", |
1242 | "config" fields of the | |
1243 | .IR perf_event_attr | |
1244 | structure, and are sampled at an interval governed by the | |
1245 | "sample_period" field. | |
1246 | ||
1247 | These fields are made available to systemtap scripts using | |
1248 | the following syntax: | |
1249 | .SAMPLE | |
1250 | probe perf.type(NN).config(MM).sample(XX) | |
1251 | probe perf.type(NN).config(MM) | |
dbdab5c8 SC |
1252 | probe perf.type(NN).config(MM).process("PROC") |
1253 | probe perf.type(NN).config(MM).counter("COUNTER") | |
1254 | probe perf.type(NN).config(MM).process("PROC").counter("COUNTER") | |
9becfcef MW |
1255 | .ESAMPLE |
1256 | The systemtap probe handler is called once per XX increments | |
1257 | of the underlying performance counter. The default sampling | |
1258 | count is 1000000. | |
1259 | The range of valid type/config is described by the | |
1260 | .IR perf_event_open (2) | |
1261 | system call, and/or the | |
1262 | .IR linux/perf_event.h | |
1263 | file. Invalid combinations or exhausted hardware counter resources | |
1264 | result in errors during systemtap script startup. Systemtap does | |
1265 | not sanity-check the values: it merely passes them through to | |
6a8fe809 SC |
1266 | the kernel for error- and safety-checking. By default the perf event |
1267 | probe is systemwide unless .process is specified, which will bind the | |
fce2c5df | 1268 | probe to a specific task. If the name is omitted then it |
e996e76a | 1269 | is inferred from the stap \-c argument. A perf event can be read on |
75cd04ca SC |
1270 | demand using .counter. The body of the perf probe handler will not be |
1271 | invoked for a .counter probe; instead, the counter is read in a user | |
1272 | space probe via: | |
dbdab5c8 SC |
1273 | .TP |
1274 | process("PROCESS").statement("func@file") {stat <<< @perf("NAME")} | |
1275 | ||
fce2c5df | 1276 | |
ba4a90fd FCE |
1277 | .SH EXAMPLES |
1278 | .PP | |
1279 | Here are some example probe points, defining the associated events. | |
1280 | .TP | |
1281 | begin, end, end | |
1282 | refers to the startup and normal shutdown of the session. In this | |
1283 | case, the handler would run once during startup and twice during | |
1284 | shutdown. | |
1285 | .TP | |
1286 | timer.jiffies(1000).randomize(200) | |
13d2ecdb | 1287 | refers to a periodic interrupt, every 1000 +/\- 200 jiffies. |
ba4a90fd FCE |
1288 | .TP |
1289 | kernel.function("*init*"), kernel.function("*exit*") | |
1290 | refers to all kernel functions with "init" or "exit" in the name. | |
1291 | .TP | |
199d126d MW |
1292 | kernel.function("*@kernel/time.c:240") |
1293 | refers to any functions within the "kernel/time.c" file that span | |
6ff00e1d FCE |
1294 | line 240. |
1295 | .BR | |
1296 | Note | |
1297 | that this is | |
1298 | .BR not | |
1299 | a probe at the statement at that line number. Use the | |
1300 | .IR | |
1301 | kernel.statement | |
1302 | probe instead. | |
ba4a90fd | 1303 | .TP |
6032e2ce FCE |
1304 | kernel.trace("sched_*") |
1305 | refers to all scheduler-related (really, prefixed) tracepoints in | |
1306 | the kernel. | |
1307 | .TP | |
6f05b6ab | 1308 | kernel.mark("getuid") |
6032e2ce | 1309 | refers to an obsolete STAP_MARK(getuid, ...) macro call in the kernel. |
6f05b6ab | 1310 | .TP |
ba4a90fd FCE |
1311 | module("usb*").function("*sync*").return |
1312 | refers to the moment of return from all functions with "sync" in the | |
1313 | name in any of the USB drivers. | |
1314 | .TP | |
1315 | kernel.statement(0xc0044852) | |
1316 | refers to the first byte of the statement whose compiled instructions | |
1317 | include the given address in the kernel. | |
b4ceace2 | 1318 | .TP |
199d126d MW |
1319 | kernel.statement("*@kernel/time.c:296") |
1320 | refers to the statement of line 296 within "kernel/time.c". | |
1bd128a3 SC |
1321 | .TP |
1322 | kernel.statement("bio_init@fs/bio.c+3") | |
1323 | refers to the statement at line bio_init+3 within "fs/bio.c". | |
a5ae3f3d | 1324 | .TP |
dd225250 | 1325 | kernel.data("pid_max").write |
cb7d3cd8 | 1326 | refers to a hardware breakpoint of type "write" set on pid_max |
dd225250 | 1327 | .TP |
729286d8 | 1328 | syscall.*.return |
b4ceace2 | 1329 | refers to the group of probe aliases with any name in the third position |
ba4a90fd FCE |
1330 | |
1331 | .SH SEE ALSO | |
5dfce2b6 FCE |
1332 | .nh |
1333 | .nf | |
78db65bd | 1334 | .IR stap (1), |
89965a32 FCE |
1335 | .IR probe::* (3stap), |
1336 | .IR tapset::* (3stap) | |
1c0b8e23 FCE |
1337 | |
1338 | .\" Local Variables: | |
1339 | .\" mode: nroff | |
1340 | .\" End: |