]>
Commit | Line | Data |
---|---|---|
5f92f126 | 1 | .\" t |
ec1a2239 | 2 | .TH STAPPROBES 3stap |
ba4a90fd FCE |
3 | .SH NAME |
4 | stapprobes \- systemtap probe points | |
5 | ||
6 | .\" macros | |
7 | .de SAMPLE | |
8 | .br | |
9 | .RS | |
10 | .nf | |
11 | .nh | |
12 | .. | |
13 | .de ESAMPLE | |
14 | .hy | |
15 | .fi | |
16 | .RE | |
17 | .. | |
18 | ||
19 | .SH DESCRIPTION | |
20 | The following sections enumerate the variety of probe points supported | |
89965a32 FCE |
21 | by the systemtap translator, and some of the additional aliases defined by |
22 | standard tapset scripts. Many are individually documented in the | |
23 | .IR 3stap | |
24 | manual section, with the | |
25 | .IR probe:: | |
26 | prefix. | |
ba4a90fd | 27 | .PP |
7abecb38 | 28 | The general probe point syntax is a dotted-symbol sequence. This |
ba4a90fd FCE |
29 | allows a breakdown of the event namespace into parts, somewhat like |
30 | the Domain Name System does on the Internet. Each component | |
7abecb38 | 31 | identifier may be parametrized by a string or number literal, with a |
d898100a | 32 | syntax like a function call. A component may include a "*" character, |
649260f3 JS |
33 | to expand to a set of matching probe points. It may also include "**" |
34 | to match multiple sequential components at once. Probe aliases likewise | |
d898100a FCE |
35 | expand to other probe points. Each and every resulting probe point is |
36 | normally resolved to some low-level system instrumentation facility | |
37 | (e.g., a kprobe address, marker, or a timer configuration), otherwise | |
38 | the elaboration phase will fail. | |
39 | .PP | |
40 | However, a probe point may be followed by a "?" character, to indicate | |
41 | that it is optional, and that no error should result if it fails to | |
42 | resolve. Optionalness passes down through all levels of | |
43 | alias/wildcard expansion. Alternately, a probe point may be followed | |
44 | by a "!" character, to indicate that it is both optional and | |
37f6433e | 45 | sufficient. (Think vaguely of the Prolog cut operator.) If it does |
d898100a FCE |
46 | resolve, then no further probe points in the same comma-separated list |
47 | will be resolved. Therefore, the "!" sufficiency mark only makes | |
48 | sense in a list of probe point alternatives. | |
dfd11cc3 MH |
49 | .PP |
50 | Additionally, a probe point may be followed by a "if (expr)" statement, in | |
51 | order to enable/disable the probe point on-the-fly. With the "if" statement, | |
52 | if the "expr" is false when the probe point is hit, the whole probe body | |
53 | including alias's body is skipped. The condition is stacked up through | |
54 | all levels of alias/wildcard expansion. So the final condition becomes | |
55 | the logical-and of conditions of all expanded alias/wildcard. | |
6e3347a9 | 56 | |
e904ad95 FCE |
57 | These are all |
58 | .B syntactically | |
59 | valid probe points. (They are generally | |
60 | .B semantically | |
61 | invalid, depending on the contents of the tapsets, and the versions of | |
62 | kernel/user software installed.) | |
ca88561f | 63 | |
ba4a90fd FCE |
64 | .SAMPLE |
65 | kernel.function("foo").return | |
e904ad95 | 66 | process("/bin/vi").statement(0x2222) |
ba4a90fd | 67 | end |
729286d8 | 68 | syscall.* |
649260f3 | 69 | sys**open |
6e3347a9 | 70 | kernel.function("no_such_function") ? |
d898100a | 71 | module("awol").function("no_such_function") ! |
dfd11cc3 | 72 | signal.*? if (switch) |
94c3c803 | 73 | kprobe.function("foo") |
ba4a90fd FCE |
74 | .ESAMPLE |
75 | ||
6f05b6ab FCE |
76 | Probes may be broadly classified into "synchronous" and |
77 | "asynchronous". A "synchronous" event is deemed to occur when any | |
78 | processor executes an instruction matched by the specification. This | |
79 | gives these probes a reference point (instruction address) from which | |
80 | more contextual data may be available. Other families of probe points | |
81 | refer to "asynchronous" events such as timers/counters rolling over, | |
82 | where there is no fixed reference point that is related. Each probe | |
83 | point specification may match multiple locations (for example, using | |
84 | wildcards or aliases), and all them are then probed. A probe | |
85 | declaration may also contain several comma-separated specifications, | |
86 | all of which are probed. | |
87 | ||
5f92f126 FCE |
88 | .SH DWARF DEBUGINFO |
89 | ||
90 | Resolving some probe points requires DWARF debuginfo or "debug | |
91 | symbols" for the specific part being instrumented. For some others, | |
92 | DWARF is automatically synthesized on the fly from source code header | |
93 | files. For others, it is not needed at all. Since a systemtap script | |
94 | may use any mixture of probe points together, the union of their DWARF | |
95 | requirements has to be met on the computer where script compilation | |
96 | occurs. (See the \fI\-\-use\-server\fR option and the \fBstap-server\ | |
97 | (8)\fR man page for information about the remote compilation facility, | |
98 | which allows these requirements to be met on a different machine.) | |
99 | .PP | |
100 | The following point lists many of the available probe point families, | |
101 | to classify them with respect to their need for DWARF debuginfo. | |
102 | ||
103 | .TS | |
104 | l l l. | |
7bfd1083 | 105 | \fBDWARF NON-DWARF\fP |
5f92f126 | 106 | |
7bfd1083 TJL |
107 | kernel.function, .statement kernel.mark |
108 | module.function, .statement process.mark | |
109 | process.function, .statement begin, end, error, never | |
110 | process.mark \fI(backup)\fP timer | |
111 | perf | |
112 | procfs | |
113 | \fBAUTO-DWARF\fP kernel.statement.absolute | |
114 | kernel.data | |
115 | kernel.trace kprobe.function | |
116 | process.statement.absolute | |
117 | process.begin, .end, .error | |
5f92f126 FCE |
118 | .TE |
119 | ||
120 | .SH PROBE POINT FAMILIES | |
121 | ||
65aeaea0 | 122 | .SS BEGIN/END/ERROR |
ba4a90fd FCE |
123 | |
124 | The probe points | |
125 | .IR begin " and " end | |
126 | are defined by the translator to refer to the time of session startup | |
127 | and shutdown. All "begin" probe handlers are run, in some sequence, | |
128 | during the startup of the session. All global variables will have | |
129 | been initialized prior to this point. All "end" probes are run, in | |
130 | some sequence, during the | |
131 | .I normal | |
132 | shutdown of a session, such as in the aftermath of an | |
133 | .I exit () | |
134 | function call, or an interruption from the user. In the case of an | |
135 | error-triggered shutdown, "end" probes are not run. There are no | |
136 | target variables available in either context. | |
6a256b03 JS |
137 | .PP |
138 | If the order of execution among "begin" or "end" probes is significant, | |
139 | then an optional sequence number may be provided: | |
ca88561f | 140 | |
6a256b03 JS |
141 | .SAMPLE |
142 | begin(N) | |
143 | end(N) | |
144 | .ESAMPLE | |
ca88561f | 145 | |
6a256b03 JS |
146 | The number N may be positive or negative. The probe handlers are run in |
147 | increasing order, and the order between handlers with the same sequence | |
148 | number is unspecified. When "begin" or "end" are given without a | |
149 | sequence, they are effectively sequence zero. | |
ba4a90fd | 150 | |
65aeaea0 FCE |
151 | The |
152 | .IR error | |
153 | probe point is similar to the | |
154 | .IR end | |
d898100a FCE |
155 | probe, except that each such probe handler run when the session ends |
156 | after errors have occurred. In such cases, "end" probes are skipped, | |
37f6433e | 157 | but each "error" probe is still attempted. This kind of probe can be |
d898100a FCE |
158 | used to clean up or emit a "final gasp". It may also be numerically |
159 | parametrized to set a sequence. | |
65aeaea0 | 160 | |
6e3347a9 FCE |
161 | .SS NEVER |
162 | The probe point | |
163 | .IR never | |
164 | is specially defined by the translator to mean "never". Its probe | |
165 | handler is never run, though its statements are analyzed for symbol / | |
166 | type correctness as usual. This probe point may be useful in | |
167 | conjunction with optional probes. | |
168 | ||
1027502b FCE |
169 | .SS SYSCALL |
170 | ||
171 | The | |
172 | .IR syscall.* | |
173 | aliases define several hundred probes, too many to | |
174 | summarize here. They are: | |
175 | ||
176 | .SAMPLE | |
177 | syscall.NAME | |
178 | .br | |
179 | syscall.NAME.return | |
180 | .ESAMPLE | |
181 | ||
182 | Generally, two probes are defined for each normal system call as listed in the | |
183 | .IR syscalls(2) | |
184 | manual page, one for entry and one for return. Those system calls that never | |
185 | return do not have a corresponding | |
186 | .IR .return | |
187 | probe. | |
188 | .PP | |
df7f3a01 | 189 | Each probe alias provides a variety of variables. Looking at the tapset source |
1027502b FCE |
190 | code is the most reliable way. Generally, each variable listed in the standard |
191 | manual page is made available as a script-level variable, so | |
192 | .IR syscall.open | |
193 | exposes | |
194 | .IR filename ", " flags ", and " mode . | |
195 | In addition, a standard suite of variables is available at most aliases: | |
196 | .TP | |
197 | .IR argstr | |
198 | A pretty-printed form of the entire argument list, without parentheses. | |
199 | .TP | |
200 | .IR name | |
201 | The name of the system call. | |
202 | .TP | |
203 | .IR retstr | |
204 | For return probes, a pretty-printed form of the system-call result. | |
205 | .PP | |
df7f3a01 FCE |
206 | As usual for probe aliases, these variables are all simply initialized |
207 | once from the underlying $context variables, so that later changes to | |
208 | $context variables are not automatically reflected. Not all probe | |
209 | aliases obey all of these general guidelines. Please report any | |
210 | bothersome ones you encounter as a bug. | |
1027502b FCE |
211 | |
212 | ||
ba4a90fd FCE |
213 | .SS TIMERS |
214 | ||
215 | Intervals defined by the standard kernel "jiffies" timer may be used | |
216 | to trigger probe handlers asynchronously. Two probe point variants | |
217 | are supported by the translator: | |
ca88561f | 218 | |
ba4a90fd FCE |
219 | .SAMPLE |
220 | timer.jiffies(N) | |
221 | timer.jiffies(N).randomize(M) | |
222 | .ESAMPLE | |
ca88561f | 223 | |
ba4a90fd FCE |
224 | The probe handler is run every N jiffies (a kernel-defined unit of |
225 | time, typically between 1 and 60 ms). If the "randomize" component is | |
13d2ecdb | 226 | given, a linearly distributed random value in the range [\-M..+M] is |
ba4a90fd FCE |
227 | added to N every time the handler is run. N is restricted to a |
228 | reasonable range (1 to around a million), and M is restricted to be | |
229 | smaller than N. There are no target variables provided in either | |
230 | context. It is possible for such probes to be run concurrently on | |
231 | a multi-processor computer. | |
422d1ceb | 232 | .PP |
197a4d62 | 233 | Alternatively, intervals may be specified in units of time. |
422d1ceb | 234 | There are two probe point variants similar to the jiffies timer: |
ca88561f | 235 | |
422d1ceb FCE |
236 | .SAMPLE |
237 | timer.ms(N) | |
238 | timer.ms(N).randomize(M) | |
239 | .ESAMPLE | |
ca88561f | 240 | |
197a4d62 JS |
241 | Here, N and M are specified in milliseconds, but the full options for units |
242 | are seconds (s/sec), milliseconds (ms/msec), microseconds (us/usec), | |
243 | nanoseconds (ns/nsec), and hertz (hz). Randomization is not supported for | |
244 | hertz timers. | |
245 | ||
246 | The actual resolution of the timers depends on the target kernel. For | |
247 | kernels prior to 2.6.17, timers are limited to jiffies resolution, so | |
248 | intervals are rounded up to the nearest jiffies interval. After 2.6.17, | |
249 | the implementation uses hrtimers for tighter precision, though the actual | |
250 | resolution will be arch-dependent. In either case, if the "randomize" | |
251 | component is given, then the random value will be added to the interval | |
252 | before any rounding occurs. | |
39e57ce0 FCE |
253 | .PP |
254 | Profiling timers are also available to provide probes that execute on all | |
3ca1f652 FCE |
255 | CPUs at the rate of the system tick (CONFIG_HZ). |
256 | This probe takes no parameters. | |
ca88561f | 257 | |
39e57ce0 FCE |
258 | .SAMPLE |
259 | timer.profile | |
260 | .ESAMPLE | |
ca88561f | 261 | |
39e57ce0 FCE |
262 | Full context information of the interrupted process is available, making |
263 | this probe suitable for a time-based sampling profiler. | |
ba4a90fd FCE |
264 | |
265 | .SS DWARF | |
266 | ||
267 | This family of probe points uses symbolic debugging information for | |
268 | the target kernel/module/program, as may be found in unstripped | |
269 | executables, or the separate | |
270 | .I debuginfo | |
271 | packages. They allow placement of probes logically into the execution | |
272 | path of the target program, by specifying a set of points in the | |
273 | source or object code. When a matching statement executes on any | |
274 | processor, the probe handler is run in that context. | |
275 | .PP | |
276 | Points in a kernel, which are identified by | |
ca88561f | 277 | module, source file, line number, function name, or some |
6f05b6ab | 278 | combination of these. |
ba4a90fd FCE |
279 | .PP |
280 | Here is a list of probe point families currently supported. The | |
281 | .B .function | |
282 | variant places a probe near the beginning of the named function, so that | |
283 | parameters are available as context variables. The | |
284 | .B .return | |
39e3139a FCE |
285 | variant places a probe at the moment |
286 | .B after | |
287 | the return from the named function, so the return value is available | |
288 | as the "$return" context variable. The | |
54efe513 | 289 | .B .inline |
b8da0ad1 | 290 | modifier for |
54efe513 | 291 | .B .function |
b8da0ad1 FCE |
292 | filters the results to include only instances of inlined functions. |
293 | The | |
294 | .B .call | |
4bda987e SC |
295 | modifier selects the opposite subset. The \textbf{.exported} modifier |
296 | filters the results to include only exported functions. Inline | |
297 | functions do not have an identifiable return point, so | |
54efe513 GH |
298 | .B .return |
299 | is not supported on | |
300 | .B .inline | |
301 | probes. The | |
ba4a90fd FCE |
302 | .B .statement |
303 | variant places a probe at the exact spot, exposing those local variables | |
304 | that are visible there. | |
ca88561f | 305 | |
ba4a90fd FCE |
306 | .SAMPLE |
307 | kernel.function(PATTERN) | |
308 | .br | |
b8da0ad1 FCE |
309 | kernel.function(PATTERN).call |
310 | .br | |
ba4a90fd FCE |
311 | kernel.function(PATTERN).return |
312 | .br | |
b8da0ad1 | 313 | kernel.function(PATTERN).inline |
54efe513 | 314 | .br |
592470cd SC |
315 | kernel.function(PATTERN).label(LPATTERN) |
316 | .br | |
ba4a90fd FCE |
317 | module(MPATTERN).function(PATTERN) |
318 | .br | |
b8da0ad1 FCE |
319 | module(MPATTERN).function(PATTERN).call |
320 | .br | |
ba4a90fd FCE |
321 | module(MPATTERN).function(PATTERN).return |
322 | .br | |
b8da0ad1 FCE |
323 | module(MPATTERN).function(PATTERN).inline |
324 | .br | |
2cab6244 JS |
325 | module(MPATTERN).function(PATTERN).label(LPATTERN) |
326 | .br | |
54efe513 | 327 | .br |
ba4a90fd FCE |
328 | kernel.statement(PATTERN) |
329 | .br | |
37ebca01 FCE |
330 | kernel.statement(ADDRESS).absolute |
331 | .br | |
ba4a90fd | 332 | module(MPATTERN).statement(PATTERN) |
6f017dee FCE |
333 | .br |
334 | process("PATH").function("NAME") | |
335 | .br | |
336 | process("PATH").statement("*@FILE.c:123") | |
337 | .br | |
b73a1293 SC |
338 | process("PATH").library("PATH").function("NAME") |
339 | .br | |
340 | process("PATH").library("PATH").statement("*@FILE.c:123") | |
341 | .br | |
6f017dee FCE |
342 | process("PATH").function("*").return |
343 | .br | |
344 | process("PATH").function("myfun").label("foo") | |
5fa99496 FCE |
345 | .br |
346 | process(PID).statement(ADDRESS).absolute | |
ba4a90fd | 347 | .ESAMPLE |
ca88561f | 348 | |
6f017dee FCE |
349 | (See the USER-SPACE section below for more information on the process |
350 | probes.) | |
351 | ||
ba4a90fd | 352 | In the above list, MPATTERN stands for a string literal that aims to |
592470cd SC |
353 | identify the loaded kernel module of interest and LPATTERN stands for |
354 | a source program label. Both MPATTERN and LPATTERN may include the "*" | |
355 | "[]", and "?" wildcards. | |
356 | PATTERN stands for a string literal that | |
6f05b6ab | 357 | aims to identify a point in the program. It is made up of three |
ca88561f MM |
358 | parts: |
359 | .IP \(bu 4 | |
360 | The first part is the name of a function, as would appear in the | |
ba4a90fd FCE |
361 | .I nm |
362 | program's output. This part may use the "*" and "?" wildcarding | |
ca88561f MM |
363 | operators to match multiple names. |
364 | .IP \(bu 4 | |
365 | The second part is optional and begins with the "@" character. | |
366 | It is followed by the path to the source file containing the function, | |
367 | which may include a wildcard pattern, such as mm/slab*. | |
79640c29 | 368 | If it does not match as is, an implicit "*/" is optionally added |
ea384b8c | 369 | .I before |
79640c29 FCE |
370 | the pattern, so that a script need only name the last few components |
371 | of a possibly long source directory path. | |
ca88561f | 372 | .IP \(bu 4 |
ba4a90fd | 373 | Finally, the third part is optional if the file name part was given, |
1bd128a3 SC |
374 | and identifies the line number in the source file preceded by a ":" |
375 | or a "+". The line number is assumed to be an | |
376 | absolute line number if preceded by a ":", or relative to the entry of | |
99a5f9cf SC |
377 | the function if preceded by a "+". |
378 | All the lines in the function can be matched with ":*". | |
f7470174 | 379 | A range of lines x through y can be matched with ":x\-y". |
ca88561f | 380 | .PP |
ba4a90fd | 381 | As an alternative, PATTERN may be a numeric constant, indicating an |
ea384b8c FCE |
382 | address. Such an address may be found from symbol tables of the |
383 | appropriate kernel / module object file. It is verified against | |
384 | known statement code boundaries, and will be relocated for use at | |
385 | run time. | |
386 | .PP | |
387 | In guru mode only, absolute kernel-space addresses may be specified with | |
388 | the ".absolute" suffix. Such an address is considered already relocated, | |
389 | as if it came from | |
390 | .BR /proc/kallsyms , | |
391 | so it cannot be checked against statement/instruction boundaries. | |
6f017dee FCE |
392 | |
393 | .SS CONTEXT VARIABLES | |
394 | ||
ba4a90fd | 395 | .PP |
6f017dee | 396 | Many of the source-level context variables, such as function parameters, |
ba4a90fd FCE |
397 | locals, globals visible in the compilation unit, may be visible to |
398 | probe handlers. They may refer to these variables by prefixing their | |
399 | name with "$" within the scripts. In addition, a special syntax | |
6f017dee FCE |
400 | allows limited traversal of structures, pointers, and arrays. More |
401 | syntax allows pretty-printing of individual variables or their groups. | |
402 | See also | |
403 | .BR @cast . | |
404 | ||
ba4a90fd FCE |
405 | .TP |
406 | $var | |
407 | refers to an in-scope variable "var". If it's an integer-like type, | |
7b9361d5 FCE |
408 | it will be cast to a 64-bit int for systemtap script use. String-like |
409 | pointers (char *) may be copied to systemtap string values using the | |
410 | .IR kernel_string " or " user_string | |
411 | functions. | |
ba4a90fd | 412 | .TP |
ab5e90c2 FCE |
413 | $var\->field traversal via a structure's or a pointer's field. This |
414 | generalized indirection operator may be repeated to follow more | |
415 | levels. Note that the | |
416 | .IR . | |
417 | operator is not used for plain structure | |
418 | members, only | |
419 | .IR \-> | |
420 | for both purposes. (This is because "." is reserved for string | |
421 | concatenation.) | |
ba4a90fd | 422 | .TP |
a43ba433 FCE |
423 | $return |
424 | is available in return probes only for functions that are declared | |
425 | with a return value. | |
426 | .TP | |
ba4a90fd | 427 | $var[N] |
33b081c5 JS |
428 | indexes into an array. The index given with a literal number or even |
429 | an arbitrary numeric expression. | |
6f017dee FCE |
430 | .PP |
431 | A number of operators exist for such basic context variable expressions: | |
34af38db | 432 | .TP |
2cb3fe26 SC |
433 | $$vars |
434 | expands to a character string that is equivalent to | |
6f017dee FCE |
435 | .SAMPLE |
436 | sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x", | |
437 | parm1, ..., parmN, var1, ..., varN) | |
438 | .ESAMPLE | |
439 | for each variable in scope at the probe point. Some values may be | |
440 | printed as | |
441 | .IR =? | |
442 | if their run-time location cannot be found. | |
2cb3fe26 SC |
443 | .TP |
444 | $$locals | |
a43ba433 | 445 | expands to a subset of $$vars for only local variables. |
2cb3fe26 SC |
446 | .TP |
447 | $$parms | |
a43ba433 FCE |
448 | expands to a subset of $$vars for only function parameters. |
449 | .TP | |
450 | $$return | |
451 | is available in return probes only. It expands to a string that | |
fd574705 | 452 | is equivalent to sprintf("return=%x", $return) |
a43ba433 | 453 | if the probed function has a return value, or else an empty string. |
6f017dee FCE |
454 | .TP |
455 | & $EXPR | |
456 | expands to the address of the given context variable expression, if it | |
457 | is addressable. | |
458 | .TP | |
459 | @defined($EXPR) | |
460 | expands to 1 or 0 iff the given context variable expression is resolvable, | |
461 | for use in conditionals such as | |
462 | .SAMPLE | |
f7470174 | 463 | @defined($foo\->bar) ? $foo\->bar : 0 |
6f017dee FCE |
464 | .ESAMPLE |
465 | .TP | |
466 | $EXPR$ | |
467 | expands to a string with all of $EXPR's members, equivalent to | |
468 | .SAMPLE | |
469 | sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}", | |
470 | $EXPR\->a, $EXPR\->b) | |
471 | .ESAMPLE | |
472 | .TP | |
473 | $EXPR$$ | |
474 | expands to a string with all of $var's members and submembers, equivalent to | |
475 | .SAMPLE | |
476 | sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}", | |
477 | $EXPR\->a, $EXPR\->b, $EXPR\->c\->x, $EXPR\->c\->y, $EXPR\->d[0]) | |
478 | .ESAMPLE | |
479 | ||
39e3139a FCE |
480 | .PP |
481 | For ".return" probes, context variables other than the "$return" | |
482 | value itself are only available for the function call parameters. | |
483 | The expressions evaluate to the | |
484 | .IR entry-time | |
485 | values of those variables, since that is when a snapshot is taken. | |
486 | Other local variables are not generally accessible, since by the time | |
487 | a ".return" probe hits, the probed function will have already returned. | |
8cc799a5 JS |
488 | .PP |
489 | Arbitrary entry-time expressions can also be saved for ".return" | |
490 | probes using the | |
491 | .IR @entry(expr) | |
492 | operator. For example, one can compute the elapsed time of a function: | |
493 | .SAMPLE | |
494 | probe kernel.function("do_filp_open").return { | |
495 | println( get_timeofday_us() \- @entry(get_timeofday_us()) ) | |
496 | } | |
497 | .ESAMPLE | |
39e3139a | 498 | |
ba4a90fd | 499 | |
94c3c803 AM |
500 | .SS DWARFLESS |
501 | In absence of debugging information, entry & exit points of kernel & module | |
502 | functions can be probed using the "kprobe" family of probes. | |
503 | However, these do not permit looking up the arguments / local variables | |
504 | of the function. | |
505 | Following constructs are supported : | |
506 | .SAMPLE | |
507 | kprobe.function(FUNCTION) | |
508 | kprobe.function(FUNCTION).return | |
509 | kprobe.module(NAME).function(FUNCTION) | |
510 | kprobe.module(NAME).function(FUNCTION).return | |
511 | kprobe.statement.(ADDRESS).absolute | |
512 | .ESAMPLE | |
513 | .PP | |
514 | Probes of type | |
515 | .B function | |
516 | are recommended for kernel functions, whereas probes of type | |
517 | .B module | |
518 | are recommended for probing functions of the specified module. | |
519 | In case the absolute address of a kernel or module function is known, | |
520 | .B statement | |
521 | probes can be utilized. | |
522 | .PP | |
523 | Note that | |
524 | .I FUNCTION | |
525 | and | |
526 | .I MODULE | |
527 | names | |
528 | .B must not | |
529 | contain wildcards, or the probe will not be registered. | |
530 | Also, statement probes must be run under guru-mode only. | |
531 | ||
532 | ||
1ada6f08 | 533 | .SS USER-SPACE |
0a1c696d FCE |
534 | Support for user-space probing is available for kernels |
535 | that are configured with the utrace extensions. See | |
536 | .SAMPLE | |
537 | http://people.redhat.com/roland/utrace/ | |
538 | .ESAMPLE | |
539 | .PP | |
540 | There are several forms. First, a non-symbolic probe point: | |
1ada6f08 FCE |
541 | .SAMPLE |
542 | process(PID).statement(ADDRESS).absolute | |
543 | .ESAMPLE | |
544 | is analogous to | |
545 | .IR | |
546 | kernel.statement(ADDRESS).absolute | |
547 | in that both use raw (unverified) virtual addresses and provide | |
548 | no $variables. The target PID parameter must identify a running | |
549 | process, and ADDRESS should identify a valid instruction address. | |
550 | All threads of that process will be probed. | |
29cb9b42 | 551 | .PP |
0a1c696d FCE |
552 | Second, non-symbolic user-kernel interface events handled by |
553 | utrace may be probed: | |
29cb9b42 | 554 | .SAMPLE |
dd078c96 | 555 | process(PID).begin |
82f0e81b | 556 | process("FULLPATH").begin |
986e98de | 557 | process.begin |
dd078c96 | 558 | process(PID).thread.begin |
82f0e81b | 559 | process("FULLPATH").thread.begin |
986e98de | 560 | process.thread.begin |
dd078c96 | 561 | process(PID).end |
82f0e81b | 562 | process("FULLPATH").end |
986e98de | 563 | process.end |
dd078c96 | 564 | process(PID).thread.end |
82f0e81b | 565 | process("FULLPATH").thread.end |
986e98de | 566 | process.thread.end |
29cb9b42 | 567 | process(PID).syscall |
82f0e81b | 568 | process("FULLPATH").syscall |
986e98de | 569 | process.syscall |
29cb9b42 | 570 | process(PID).syscall.return |
82f0e81b | 571 | process("FULLPATH").syscall.return |
986e98de | 572 | process.syscall.return |
0afb7073 | 573 | process(PID).insn |
82f0e81b | 574 | process("FULLPATH").insn |
0afb7073 | 575 | process(PID).insn.block |
82f0e81b | 576 | process("FULLPATH").insn.block |
29cb9b42 DS |
577 | .ESAMPLE |
578 | .PP | |
579 | A | |
dd078c96 | 580 | .B .begin |
82f0e81b | 581 | probe gets called when new process described by PID or FULLPATH gets created. |
29cb9b42 | 582 | A |
dd078c96 | 583 | .B .thread.begin |
82f0e81b | 584 | probe gets called when a new thread described by PID or FULLPATH gets created. |
159cb109 | 585 | A |
dd078c96 | 586 | .B .end |
82f0e81b | 587 | probe gets called when process described by PID or FULLPATH dies. |
dd078c96 DS |
588 | A |
589 | .B .thread.end | |
82f0e81b | 590 | probe gets called when a thread described by PID or FULLPATH dies. |
29cb9b42 DS |
591 | A |
592 | .B .syscall | |
82f0e81b | 593 | probe gets called when a thread described by PID or FULLPATH makes a |
6270adc1 MH |
594 | system call. The system call number is available in the |
595 | .BR $syscall | |
596 | context variable, and the first 6 arguments of the system call | |
597 | are available in the | |
598 | .BR $argN | |
599 | (ex. $arg1, $arg2, ...) context variable. | |
29cb9b42 DS |
600 | A |
601 | .B .syscall.return | |
82f0e81b | 602 | probe gets called when a thread described by PID or FULLPATH returns from a |
5d67b47c MH |
603 | system call. The system call number is available in the |
604 | .BR $syscall | |
605 | context variable, and the return value of the system call is available | |
606 | in the | |
607 | .BR $return | |
29cb9b42 | 608 | context variable. |
a96d1db0 | 609 | A |
0afb7073 | 610 | .B .insn |
82f0e81b | 611 | probe gets called for every single-stepped instruction of the process described by PID or FULLPATH. |
0afb7073 FCE |
612 | A |
613 | .B .insn.block | |
82f0e81b FCE |
614 | probe gets called for every block-stepped instruction of the process described by PID or FULLPATH. |
615 | .PP | |
616 | If a process probe is specified without a PID or FULLPATH, all user | |
617 | threads will be probed. However, if systemtap was invoked with the | |
f7470174 | 618 | .IR \-c " or " \-x |
82f0e81b | 619 | options, then process probes are restricted to the process |
6d5d594e LB |
620 | hierarchy associated with the target process. If a process probe is |
621 | specified without a PID or FULLPATH, but with the | |
622 | .IR \-c " | |
623 | option, the PATH of the | |
624 | .IR \-c " | |
625 | cmd will be heuristically filled into the process PATH. | |
0a1c696d FCE |
626 | |
627 | .PP | |
628 | Third, symbolic static instrumentation compiled into programs and | |
629 | shared libraries may be | |
630 | probed: | |
631 | .SAMPLE | |
632 | process("PATH").mark("LABEL") | |
a794dbeb | 633 | process("PATH").provider("PROVIDER").mark("LABEL") |
0a1c696d FCE |
634 | .ESAMPLE |
635 | .PP | |
f28a8c28 SC |
636 | A |
637 | .B .mark | |
638 | probe gets called via a static probe which is defined in the | |
a794dbeb FCE |
639 | application by STAP_PROBE1(PROVIDER,LABEL,arg1), which is defined in |
640 | sdt.h. The handle is an application handle, LABEL corresponds to | |
641 | the .mark argument, and arg1 is the argument. STAP_PROBE1 is used for | |
642 | probes with 1 argument, STAP_PROBE2 is used for probes with 2 | |
643 | arguments, and so on. The arguments of the probe are available in the | |
644 | context variables $arg1, $arg2, ... An alternative to using the | |
645 | STAP_PROBE macros is to use the dtrace script to create custom macros. | |
646 | Additionally, the variables $$name and $$provider are available as | |
647 | parts of the probe point name. | |
0a1c696d | 648 | |
29cb9b42 | 649 | .PP |
0a1c696d FCE |
650 | Finally, full symbolic source-level probes in user-space programs |
651 | and shared libraries are supported. These are exactly analogous | |
652 | to the symbolic DWARF-based kernel/module probes described above, | |
f7470174 | 653 | and expose similar contextual $variables. |
0a1c696d FCE |
654 | .SAMPLE |
655 | process("PATH").function("NAME") | |
656 | process("PATH").statement("*@FILE.c:123") | |
4d0fcb93 SC |
657 | process("PATH").plt("NAME") |
658 | process("PATH").library("PATH").plt("NAME") | |
b73a1293 SC |
659 | process("PATH").library("PATH").function("NAME") |
660 | process("PATH").library("PATH").statement("*@FILE.c:123") | |
0a1c696d FCE |
661 | process("PATH").function("*").return |
662 | process("PATH").function("myfun").label("foo") | |
663 | .ESAMPLE | |
664 | ||
665 | .PP | |
666 | Note that for all process probes, | |
29cb9b42 | 667 | .I PATH |
ea384b8c FCE |
668 | names refer to executables that are searched the same way shells do: relative |
669 | to the working directory if they contain a "/" character, otherwise in | |
670 | .BR $PATH . | |
d1bcbe71 RH |
671 | If PATH names refer to scripts, the actual interpreters (specified in the |
672 | script in the first line after the #! characters) are probed. | |
b73a1293 SC |
673 | If PATH is a process component parameter referring to shared libraries |
674 | then all processes that map it at runtime would be selected for | |
675 | probing. If PATH is a library component parameter referring to shared | |
676 | libraries then the process specified by the process component would be | |
4d0fcb93 SC |
677 | selected. A .plt probe will probe functions in the program linkage table |
678 | corresponding to the rest of the probe point. .plt can be specified | |
679 | as a shorthand for .plt("*"). | |
82f0e81b FCE |
680 | If the PATH string contains wildcards as in the MPATTERN case, then |
681 | standard globbing is performed to find all matching paths. In this | |
682 | case, the | |
683 | .BR $PATH | |
684 | environment variable is not used. | |
685 | ||
686 | .PP | |
153e7a22 FCE |
687 | If systemtap was invoked with the |
688 | .IR \-c " or " \-x | |
760695db FCE |
689 | options, then process probes are restricted to the process |
690 | hierarchy associated with the target process. | |
1ada6f08 | 691 | |
9cb48751 DS |
692 | .SS PROCFS |
693 | ||
694 | These probe points allow procfs "files" in | |
c243f608 LB |
695 | /proc/systemtap/MODNAME to be created, read and written using a |
696 | permission that may be modified using the proper umask value. Default permissions are 0400 for read | |
697 | probes, and 0200 for write probes. If both a read and write probe are being | |
698 | used on the same file, a default permission of 0600 will be used. | |
699 | Using procfs.umask(0040).read would | |
700 | result in a 0404 permission set for the file. | |
9cb48751 DS |
701 | .RI ( MODNAME |
702 | is the name of the systemtap module). The | |
703 | .I proc | |
704 | filesystem is a pseudo-filesystem which is used an an interface to | |
c243f608 | 705 | kernel data structures. There are several probe point variants supported |
9cb48751 | 706 | by the translator: |
ca88561f | 707 | |
9cb48751 DS |
708 | .SAMPLE |
709 | procfs("PATH").read | |
c243f608 | 710 | procfs("PATH").umask(UMASK).read |
38975255 | 711 | procfs("PATH").read.maxsize(MAXSIZE) |
c243f608 | 712 | procfs("PATH").umask(UMASK).maxsize(MAXSIZE) |
9cb48751 | 713 | procfs("PATH").write |
c243f608 | 714 | procfs("PATH").umask(UMASK).write |
9cb48751 | 715 | procfs.read |
c243f608 | 716 | procfs.umask(UMASK).read |
38975255 | 717 | procfs.read.maxsize(MAXSIZE) |
c243f608 | 718 | procfs.umask(UMASK).read.maxsize(MAXSIZE) |
9cb48751 | 719 | procfs.write |
c243f608 | 720 | procfs.umask(UMASK).write |
9cb48751 | 721 | .ESAMPLE |
ca88561f | 722 | |
9cb48751 DS |
723 | .I PATH |
724 | is the file name (relative to /proc/systemtap/MODNAME) to be created. | |
725 | If no | |
726 | .I PATH | |
727 | is specified (as in the last two variants above), | |
728 | .I PATH | |
729 | defaults to "command". | |
730 | .PP | |
731 | When a user reads /proc/systemtap/MODNAME/PATH, the corresponding | |
732 | procfs | |
733 | .I read | |
734 | probe is triggered. The string data to be read should be assigned to | |
735 | a variable named | |
736 | .IR $value , | |
737 | like this: | |
ca88561f | 738 | |
9cb48751 DS |
739 | .SAMPLE |
740 | procfs("PATH").read { $value = "100\\n" } | |
741 | .ESAMPLE | |
742 | .PP | |
743 | When a user writes into /proc/systemtap/MODNAME/PATH, the | |
744 | corresponding procfs | |
745 | .I write | |
746 | probe is triggered. The data the user wrote is available in the | |
747 | string variable named | |
748 | .IR $value , | |
749 | like this: | |
ca88561f | 750 | |
9cb48751 DS |
751 | .SAMPLE |
752 | procfs("PATH").write { printf("user wrote: %s", $value) } | |
753 | .ESAMPLE | |
38975255 DS |
754 | .PP |
755 | .I MAXSIZE | |
756 | is the size of the procfs read buffer. Specifying | |
757 | .I MAXSIZE | |
758 | allows larger procfs output. If no | |
759 | .I MAXSIZE | |
760 | is specified, the procfs read buffer defaults to | |
761 | .I STP_PROCFS_BUFSIZE | |
762 | (which defaults to | |
763 | .IR MAXSTRINGLEN , | |
764 | the maximum length of a string). | |
765 | If setting the procfs read buffers for more than one file is needed, | |
766 | it may be easiest to override the | |
767 | .I STP_PROCFS_BUFSIZE | |
768 | definition. | |
769 | Here's an example of using | |
770 | .IR MAXSIZE : | |
771 | ||
772 | .SAMPLE | |
773 | procfs.read.maxsize(1024) { | |
774 | $value = "long string..." | |
775 | $value .= "another long string..." | |
776 | $value .= "another long string..." | |
777 | $value .= "another long string..." | |
778 | } | |
779 | .ESAMPLE | |
9cb48751 | 780 | |
6f05b6ab FCE |
781 | .SS MARKERS |
782 | ||
783 | This family of probe points hooks up to static probing markers | |
784 | inserted into the kernel or modules. These markers are special macro | |
785 | calls inserted by kernel developers to make probing faster and more | |
786 | reliable than with DWARF-based probes. Further, DWARF debugging | |
787 | information is | |
788 | .I not | |
789 | required to probe markers. | |
790 | ||
791 | Marker probe points begin with | |
f781f849 DS |
792 | .BR kernel . |
793 | The next part names the marker itself: | |
6f05b6ab FCE |
794 | .BR mark("name") . |
795 | The marker name string, which may contain the usual wildcard characters, | |
796 | is matched against the names given to the marker macros when the kernel | |
eb973c2a DS |
797 | and/or module was compiled. Optionally, you can specify |
798 | .BR format("format") . | |
37f6433e | 799 | Specifying the marker format string allows differentiation between two |
eb973c2a | 800 | markers with the same name but different marker format strings. |
6f05b6ab FCE |
801 | |
802 | The handler associated with a marker-based probe may read the | |
803 | optional parameters specified at the macro call site. These are | |
804 | named | |
805 | .BR $arg1 " through " $argNN , | |
806 | where NN is the number of parameters supplied by the macro. Number | |
807 | and string parameters are passed in a type-safe manner. | |
808 | ||
eb973c2a DS |
809 | The marker format string associated with a marker is available in |
810 | .BR $format . | |
37f6433e | 811 | And also the marker name string is available in |
bc54e71c | 812 | .BR $name . |
eb973c2a | 813 | |
bc724b8b JS |
814 | .SS TRACEPOINTS |
815 | ||
816 | This family of probe points hooks up to static probing tracepoints | |
817 | inserted into the kernel or modules. As with markers, these | |
818 | tracepoints are special macro calls inserted by kernel developers to | |
819 | make probing faster and more reliable than with DWARF-based probes, | |
820 | and DWARF debugging information is not required to probe tracepoints. | |
821 | Tracepoints have an extra advantage of more strongly-typed parameters | |
822 | than markers. | |
823 | ||
824 | Tracepoint probes begin with | |
825 | .BR kernel . | |
826 | The next part names the tracepoint itself: | |
827 | .BR trace("name") . | |
828 | The tracepoint name string, which may contain the usual wildcard | |
829 | characters, is matched against the names defined by the kernel | |
830 | developers in the tracepoint header files. | |
831 | ||
832 | The handler associated with a tracepoint-based probe may read the | |
833 | optional parameters specified at the macro call site. These are | |
834 | named according to the declaration by the tracepoint author. For | |
835 | example, the tracepoint probe | |
836 | .BR kernel.trace("sched_switch") | |
837 | provides the parameters | |
838 | .BR $rq ", " $prev ", and " $next . | |
839 | If the parameter is a complex type, as in a struct pointer, then a | |
840 | script can access fields with the same syntax as DWARF $target | |
841 | variables. Also, tracepoint parameters cannot be modified, but in | |
842 | guru-mode a script may modify fields of parameters. | |
843 | ||
844 | The name of the tracepoint is available in | |
845 | .BR $$name , | |
846 | and a string of name=value pairs for all parameters of the tracepoint | |
847 | is available in | |
046e7190 | 848 | .BR $$vars " or " $$parms . |
bc724b8b | 849 | |
dd225250 PS |
850 | .SS HARDWARE BREAKPOINTS |
851 | This family of probes is used to set hardware watchpoints for a given | |
852 | (global) kernel symbol. The probes take three components as inputs : | |
853 | ||
854 | 1. The | |
855 | .BR virtual address / name | |
856 | of the kernel symbol to be traced is supplied as argument to this class | |
857 | of probes. ( Probes for only data segment variables are supported. Probing | |
858 | local variables of a function cannot be done.) | |
859 | ||
860 | 2. Nature of access to be probed : | |
861 | a. | |
862 | .I .write | |
863 | probe gets triggered when a write happens at the specified address/symbol | |
864 | name. | |
865 | b. | |
866 | .I rw | |
867 | probe is triggered when either a read or write happens. | |
868 | ||
869 | 3. | |
870 | .BR .length | |
871 | (optional) | |
872 | Users have the option of specifying the address interval to be probed | |
873 | using "length" constructs. The user-specified length gets approximated | |
874 | to the closest possible address length that the architecture can | |
875 | support. If the specified length exceeds the limits imposed by | |
876 | architecture, an error message is flagged and probe registration fails. | |
877 | Wherever 'length' is not specified, the translator requests a hardware | |
878 | breakpoint probe of length 1. It should be noted that the "length" | |
879 | construct is not valid with symbol names. | |
880 | ||
881 | Following constructs are supported : | |
882 | .SAMPLE | |
883 | probe kernel.data(ADDRESS).write | |
884 | probe kernel.data(ADDRESS).rw | |
885 | probe kernel.data(ADDRESS).length(LEN).write | |
886 | probe kernel.data(ADDRESS).length(LEN).rw | |
887 | probe kernel.data("SYMBOL_NAME").write | |
888 | probe kernel.data("SYMBOL_NAME").rw | |
889 | .ESAMPLE | |
890 | ||
891 | This set of probes make use of the debug registers of the processor, | |
892 | which is a scarce resource. (4 on x86 , 1 on powerpc ) The script | |
893 | translation flags a warning if a user requests more hardware breakpoint probes | |
894 | than the limits set by architecture. For example,a pass-2 warning is flashed | |
895 | when an input script requests 5 hardware breakpoint probes on an x86 | |
896 | system while x86 architecture supports a maximum of 4 breakpoints. | |
897 | Users are cautioned to set probes judiciously. | |
898 | ||
ba4a90fd FCE |
899 | .SH EXAMPLES |
900 | .PP | |
901 | Here are some example probe points, defining the associated events. | |
902 | .TP | |
903 | begin, end, end | |
904 | refers to the startup and normal shutdown of the session. In this | |
905 | case, the handler would run once during startup and twice during | |
906 | shutdown. | |
907 | .TP | |
908 | timer.jiffies(1000).randomize(200) | |
13d2ecdb | 909 | refers to a periodic interrupt, every 1000 +/\- 200 jiffies. |
ba4a90fd FCE |
910 | .TP |
911 | kernel.function("*init*"), kernel.function("*exit*") | |
912 | refers to all kernel functions with "init" or "exit" in the name. | |
913 | .TP | |
914 | kernel.function("*@kernel/sched.c:240") | |
915 | refers to any functions within the "kernel/sched.c" file that span | |
6ff00e1d FCE |
916 | line 240. |
917 | .BR | |
918 | Note | |
919 | that this is | |
920 | .BR not | |
921 | a probe at the statement at that line number. Use the | |
922 | .IR | |
923 | kernel.statement | |
924 | probe instead. | |
ba4a90fd | 925 | .TP |
6f05b6ab FCE |
926 | kernel.mark("getuid") |
927 | refers to an STAP_MARK(getuid, ...) macro call in the kernel. | |
928 | .TP | |
ba4a90fd FCE |
929 | module("usb*").function("*sync*").return |
930 | refers to the moment of return from all functions with "sync" in the | |
931 | name in any of the USB drivers. | |
932 | .TP | |
933 | kernel.statement(0xc0044852) | |
934 | refers to the first byte of the statement whose compiled instructions | |
935 | include the given address in the kernel. | |
b4ceace2 | 936 | .TP |
a5ae3f3d | 937 | kernel.statement("*@kernel/sched.c:2917") |
1bd128a3 SC |
938 | refers to the statement of line 2917 within "kernel/sched.c". |
939 | .TP | |
940 | kernel.statement("bio_init@fs/bio.c+3") | |
941 | refers to the statement at line bio_init+3 within "fs/bio.c". | |
a5ae3f3d | 942 | .TP |
dd225250 PS |
943 | kernel.data("pid_max").write |
944 | refers to a hardware preakpoint of type "write" set on pid_max | |
945 | .TP | |
729286d8 | 946 | syscall.*.return |
b4ceace2 | 947 | refers to the group of probe aliases with any name in the third position |
ba4a90fd | 948 | |
f33e9151 FCE |
949 | .SS PERF |
950 | ||
951 | This | |
952 | .IR prototype | |
953 | family of probe points interfaces to the kernel "perf event" | |
954 | infrasture for controlling hardware performance counters. | |
955 | The events being attached to are described by the "type", | |
956 | "config" fields of the | |
957 | .IR perf_event_attr | |
958 | structure, and are sampled at an interval governed by the | |
959 | "sample_period" field. | |
960 | ||
961 | These fields are made available to systemtap scripts using | |
962 | the following syntax: | |
963 | .SAMPLE | |
bb9fd173 | 964 | probe perf.type(NN).config(MM).sample(XX) |
f33e9151 FCE |
965 | probe perf.type(NN).config(MM) |
966 | .ESAMPLE | |
19f00bd9 FCE |
967 | The systemtap probe handler is called once per XX increments |
968 | of the underlying performance counter. The default sampling | |
969 | count is 1000000. | |
f33e9151 FCE |
970 | The range of valid type/config is described by the |
971 | .IR perf_event_open (2) | |
972 | system call, and/or the | |
973 | .IR linux/perf_event.h | |
8fb91f5f FCE |
974 | file. Invalid combinations or exhausted hardware counter resources |
975 | result in errors during systemtap script startup. Systemtap does | |
f33e9151 FCE |
976 | not sanity-check the values: it merely passes them through to |
977 | the kernel for error- and safety-checking. | |
978 | ||
ba4a90fd | 979 | .SH SEE ALSO |
78db65bd | 980 | .IR stap (1), |
89965a32 FCE |
981 | .IR probe::* (3stap), |
982 | .IR tapset::* (3stap) |