]>
Commit | Line | Data |
---|---|---|
ba4a90fd | 1 | .\" -*- nroff -*- |
ec1a2239 | 2 | .TH STAPPROBES 3stap |
ba4a90fd FCE |
3 | .SH NAME |
4 | stapprobes \- systemtap probe points | |
5 | ||
6 | .\" macros | |
7 | .de SAMPLE | |
8 | .br | |
9 | .RS | |
10 | .nf | |
11 | .nh | |
12 | .. | |
13 | .de ESAMPLE | |
14 | .hy | |
15 | .fi | |
16 | .RE | |
17 | .. | |
18 | ||
19 | .SH DESCRIPTION | |
20 | The following sections enumerate the variety of probe points supported | |
89965a32 FCE |
21 | by the systemtap translator, and some of the additional aliases defined by |
22 | standard tapset scripts. Many are individually documented in the | |
23 | .IR 3stap | |
24 | manual section, with the | |
25 | .IR probe:: | |
26 | prefix. | |
ba4a90fd | 27 | .PP |
7abecb38 | 28 | The general probe point syntax is a dotted-symbol sequence. This |
ba4a90fd FCE |
29 | allows a breakdown of the event namespace into parts, somewhat like |
30 | the Domain Name System does on the Internet. Each component | |
7abecb38 | 31 | identifier may be parametrized by a string or number literal, with a |
d898100a | 32 | syntax like a function call. A component may include a "*" character, |
649260f3 JS |
33 | to expand to a set of matching probe points. It may also include "**" |
34 | to match multiple sequential components at once. Probe aliases likewise | |
d898100a FCE |
35 | expand to other probe points. Each and every resulting probe point is |
36 | normally resolved to some low-level system instrumentation facility | |
37 | (e.g., a kprobe address, marker, or a timer configuration), otherwise | |
38 | the elaboration phase will fail. | |
39 | .PP | |
40 | However, a probe point may be followed by a "?" character, to indicate | |
41 | that it is optional, and that no error should result if it fails to | |
42 | resolve. Optionalness passes down through all levels of | |
43 | alias/wildcard expansion. Alternately, a probe point may be followed | |
44 | by a "!" character, to indicate that it is both optional and | |
37f6433e | 45 | sufficient. (Think vaguely of the Prolog cut operator.) If it does |
d898100a FCE |
46 | resolve, then no further probe points in the same comma-separated list |
47 | will be resolved. Therefore, the "!" sufficiency mark only makes | |
48 | sense in a list of probe point alternatives. | |
dfd11cc3 MH |
49 | .PP |
50 | Additionally, a probe point may be followed by a "if (expr)" statement, in | |
51 | order to enable/disable the probe point on-the-fly. With the "if" statement, | |
52 | if the "expr" is false when the probe point is hit, the whole probe body | |
53 | including alias's body is skipped. The condition is stacked up through | |
54 | all levels of alias/wildcard expansion. So the final condition becomes | |
55 | the logical-and of conditions of all expanded alias/wildcard. | |
6e3347a9 | 56 | |
e904ad95 FCE |
57 | These are all |
58 | .B syntactically | |
59 | valid probe points. (They are generally | |
60 | .B semantically | |
61 | invalid, depending on the contents of the tapsets, and the versions of | |
62 | kernel/user software installed.) | |
ca88561f | 63 | |
ba4a90fd FCE |
64 | .SAMPLE |
65 | kernel.function("foo").return | |
e904ad95 | 66 | process("/bin/vi").statement(0x2222) |
ba4a90fd | 67 | end |
729286d8 | 68 | syscall.* |
649260f3 | 69 | sys**open |
6e3347a9 | 70 | kernel.function("no_such_function") ? |
d898100a | 71 | module("awol").function("no_such_function") ! |
dfd11cc3 | 72 | signal.*? if (switch) |
94c3c803 | 73 | kprobe.function("foo") |
ba4a90fd FCE |
74 | .ESAMPLE |
75 | ||
e904ad95 | 76 | |
6f05b6ab FCE |
77 | Probes may be broadly classified into "synchronous" and |
78 | "asynchronous". A "synchronous" event is deemed to occur when any | |
79 | processor executes an instruction matched by the specification. This | |
80 | gives these probes a reference point (instruction address) from which | |
81 | more contextual data may be available. Other families of probe points | |
82 | refer to "asynchronous" events such as timers/counters rolling over, | |
83 | where there is no fixed reference point that is related. Each probe | |
84 | point specification may match multiple locations (for example, using | |
85 | wildcards or aliases), and all them are then probed. A probe | |
86 | declaration may also contain several comma-separated specifications, | |
87 | all of which are probed. | |
88 | ||
65aeaea0 | 89 | .SS BEGIN/END/ERROR |
ba4a90fd FCE |
90 | |
91 | The probe points | |
92 | .IR begin " and " end | |
93 | are defined by the translator to refer to the time of session startup | |
94 | and shutdown. All "begin" probe handlers are run, in some sequence, | |
95 | during the startup of the session. All global variables will have | |
96 | been initialized prior to this point. All "end" probes are run, in | |
97 | some sequence, during the | |
98 | .I normal | |
99 | shutdown of a session, such as in the aftermath of an | |
100 | .I exit () | |
101 | function call, or an interruption from the user. In the case of an | |
102 | error-triggered shutdown, "end" probes are not run. There are no | |
103 | target variables available in either context. | |
6a256b03 JS |
104 | .PP |
105 | If the order of execution among "begin" or "end" probes is significant, | |
106 | then an optional sequence number may be provided: | |
ca88561f | 107 | |
6a256b03 JS |
108 | .SAMPLE |
109 | begin(N) | |
110 | end(N) | |
111 | .ESAMPLE | |
ca88561f | 112 | |
6a256b03 JS |
113 | The number N may be positive or negative. The probe handlers are run in |
114 | increasing order, and the order between handlers with the same sequence | |
115 | number is unspecified. When "begin" or "end" are given without a | |
116 | sequence, they are effectively sequence zero. | |
ba4a90fd | 117 | |
65aeaea0 FCE |
118 | The |
119 | .IR error | |
120 | probe point is similar to the | |
121 | .IR end | |
d898100a FCE |
122 | probe, except that each such probe handler run when the session ends |
123 | after errors have occurred. In such cases, "end" probes are skipped, | |
37f6433e | 124 | but each "error" probe is still attempted. This kind of probe can be |
d898100a FCE |
125 | used to clean up or emit a "final gasp". It may also be numerically |
126 | parametrized to set a sequence. | |
65aeaea0 | 127 | |
6e3347a9 FCE |
128 | .SS NEVER |
129 | The probe point | |
130 | .IR never | |
131 | is specially defined by the translator to mean "never". Its probe | |
132 | handler is never run, though its statements are analyzed for symbol / | |
133 | type correctness as usual. This probe point may be useful in | |
134 | conjunction with optional probes. | |
135 | ||
1027502b FCE |
136 | .SS SYSCALL |
137 | ||
138 | The | |
139 | .IR syscall.* | |
140 | aliases define several hundred probes, too many to | |
141 | summarize here. They are: | |
142 | ||
143 | .SAMPLE | |
144 | syscall.NAME | |
145 | .br | |
146 | syscall.NAME.return | |
147 | .ESAMPLE | |
148 | ||
149 | Generally, two probes are defined for each normal system call as listed in the | |
150 | .IR syscalls(2) | |
151 | manual page, one for entry and one for return. Those system calls that never | |
152 | return do not have a corresponding | |
153 | .IR .return | |
154 | probe. | |
155 | .PP | |
156 | Each probe alias defines a variety of variables. Looking at the tapset source | |
157 | code is the most reliable way. Generally, each variable listed in the standard | |
158 | manual page is made available as a script-level variable, so | |
159 | .IR syscall.open | |
160 | exposes | |
161 | .IR filename ", " flags ", and " mode . | |
162 | In addition, a standard suite of variables is available at most aliases: | |
163 | .TP | |
164 | .IR argstr | |
165 | A pretty-printed form of the entire argument list, without parentheses. | |
166 | .TP | |
167 | .IR name | |
168 | The name of the system call. | |
169 | .TP | |
170 | .IR retstr | |
171 | For return probes, a pretty-printed form of the system-call result. | |
172 | .PP | |
173 | Not all probe aliases obey all of these general guidelines. Please report | |
174 | any bothersome ones you encounter as a bug. | |
175 | ||
176 | ||
ba4a90fd FCE |
177 | .SS TIMERS |
178 | ||
179 | Intervals defined by the standard kernel "jiffies" timer may be used | |
180 | to trigger probe handlers asynchronously. Two probe point variants | |
181 | are supported by the translator: | |
ca88561f | 182 | |
ba4a90fd FCE |
183 | .SAMPLE |
184 | timer.jiffies(N) | |
185 | timer.jiffies(N).randomize(M) | |
186 | .ESAMPLE | |
ca88561f | 187 | |
ba4a90fd FCE |
188 | The probe handler is run every N jiffies (a kernel-defined unit of |
189 | time, typically between 1 and 60 ms). If the "randomize" component is | |
13d2ecdb | 190 | given, a linearly distributed random value in the range [\-M..+M] is |
ba4a90fd FCE |
191 | added to N every time the handler is run. N is restricted to a |
192 | reasonable range (1 to around a million), and M is restricted to be | |
193 | smaller than N. There are no target variables provided in either | |
194 | context. It is possible for such probes to be run concurrently on | |
195 | a multi-processor computer. | |
422d1ceb | 196 | .PP |
197a4d62 | 197 | Alternatively, intervals may be specified in units of time. |
422d1ceb | 198 | There are two probe point variants similar to the jiffies timer: |
ca88561f | 199 | |
422d1ceb FCE |
200 | .SAMPLE |
201 | timer.ms(N) | |
202 | timer.ms(N).randomize(M) | |
203 | .ESAMPLE | |
ca88561f | 204 | |
197a4d62 JS |
205 | Here, N and M are specified in milliseconds, but the full options for units |
206 | are seconds (s/sec), milliseconds (ms/msec), microseconds (us/usec), | |
207 | nanoseconds (ns/nsec), and hertz (hz). Randomization is not supported for | |
208 | hertz timers. | |
209 | ||
210 | The actual resolution of the timers depends on the target kernel. For | |
211 | kernels prior to 2.6.17, timers are limited to jiffies resolution, so | |
212 | intervals are rounded up to the nearest jiffies interval. After 2.6.17, | |
213 | the implementation uses hrtimers for tighter precision, though the actual | |
214 | resolution will be arch-dependent. In either case, if the "randomize" | |
215 | component is given, then the random value will be added to the interval | |
216 | before any rounding occurs. | |
39e57ce0 FCE |
217 | .PP |
218 | Profiling timers are also available to provide probes that execute on all | |
3ca1f652 FCE |
219 | CPUs at the rate of the system tick (CONFIG_HZ). |
220 | This probe takes no parameters. | |
ca88561f | 221 | |
39e57ce0 FCE |
222 | .SAMPLE |
223 | timer.profile | |
224 | .ESAMPLE | |
ca88561f | 225 | |
39e57ce0 FCE |
226 | Full context information of the interrupted process is available, making |
227 | this probe suitable for a time-based sampling profiler. | |
ba4a90fd FCE |
228 | |
229 | .SS DWARF | |
230 | ||
231 | This family of probe points uses symbolic debugging information for | |
232 | the target kernel/module/program, as may be found in unstripped | |
233 | executables, or the separate | |
234 | .I debuginfo | |
235 | packages. They allow placement of probes logically into the execution | |
236 | path of the target program, by specifying a set of points in the | |
237 | source or object code. When a matching statement executes on any | |
238 | processor, the probe handler is run in that context. | |
239 | .PP | |
240 | Points in a kernel, which are identified by | |
ca88561f | 241 | module, source file, line number, function name, or some |
6f05b6ab | 242 | combination of these. |
ba4a90fd FCE |
243 | .PP |
244 | Here is a list of probe point families currently supported. The | |
245 | .B .function | |
246 | variant places a probe near the beginning of the named function, so that | |
247 | parameters are available as context variables. The | |
248 | .B .return | |
39e3139a FCE |
249 | variant places a probe at the moment |
250 | .B after | |
251 | the return from the named function, so the return value is available | |
252 | as the "$return" context variable. The | |
54efe513 | 253 | .B .inline |
b8da0ad1 | 254 | modifier for |
54efe513 | 255 | .B .function |
b8da0ad1 FCE |
256 | filters the results to include only instances of inlined functions. |
257 | The | |
258 | .B .call | |
259 | modifier selects the opposite subset. Inline functions do not have an | |
260 | identifiable return point, so | |
54efe513 GH |
261 | .B .return |
262 | is not supported on | |
263 | .B .inline | |
264 | probes. The | |
ba4a90fd FCE |
265 | .B .statement |
266 | variant places a probe at the exact spot, exposing those local variables | |
267 | that are visible there. | |
ca88561f | 268 | |
ba4a90fd FCE |
269 | .SAMPLE |
270 | kernel.function(PATTERN) | |
271 | .br | |
b8da0ad1 FCE |
272 | kernel.function(PATTERN).call |
273 | .br | |
ba4a90fd FCE |
274 | kernel.function(PATTERN).return |
275 | .br | |
b8da0ad1 | 276 | kernel.function(PATTERN).inline |
54efe513 | 277 | .br |
592470cd SC |
278 | kernel.function(PATTERN).label(LPATTERN) |
279 | .br | |
ba4a90fd FCE |
280 | module(MPATTERN).function(PATTERN) |
281 | .br | |
b8da0ad1 FCE |
282 | module(MPATTERN).function(PATTERN).call |
283 | .br | |
ba4a90fd FCE |
284 | module(MPATTERN).function(PATTERN).return |
285 | .br | |
b8da0ad1 FCE |
286 | module(MPATTERN).function(PATTERN).inline |
287 | .br | |
54efe513 | 288 | .br |
ba4a90fd FCE |
289 | kernel.statement(PATTERN) |
290 | .br | |
37ebca01 FCE |
291 | kernel.statement(ADDRESS).absolute |
292 | .br | |
ba4a90fd | 293 | module(MPATTERN).statement(PATTERN) |
6f017dee FCE |
294 | .br |
295 | process("PATH").function("NAME") | |
296 | .br | |
297 | process("PATH").statement("*@FILE.c:123") | |
298 | .br | |
299 | process("PATH").function("*").return | |
300 | .br | |
301 | process("PATH").function("myfun").label("foo") | |
ba4a90fd | 302 | .ESAMPLE |
ca88561f | 303 | |
6f017dee FCE |
304 | (See the USER-SPACE section below for more information on the process |
305 | probes.) | |
306 | ||
ba4a90fd | 307 | In the above list, MPATTERN stands for a string literal that aims to |
592470cd SC |
308 | identify the loaded kernel module of interest and LPATTERN stands for |
309 | a source program label. Both MPATTERN and LPATTERN may include the "*" | |
310 | "[]", and "?" wildcards. | |
311 | PATTERN stands for a string literal that | |
6f05b6ab | 312 | aims to identify a point in the program. It is made up of three |
ca88561f MM |
313 | parts: |
314 | .IP \(bu 4 | |
315 | The first part is the name of a function, as would appear in the | |
ba4a90fd FCE |
316 | .I nm |
317 | program's output. This part may use the "*" and "?" wildcarding | |
ca88561f MM |
318 | operators to match multiple names. |
319 | .IP \(bu 4 | |
320 | The second part is optional and begins with the "@" character. | |
321 | It is followed by the path to the source file containing the function, | |
322 | which may include a wildcard pattern, such as mm/slab*. | |
79640c29 | 323 | If it does not match as is, an implicit "*/" is optionally added |
ea384b8c | 324 | .I before |
79640c29 FCE |
325 | the pattern, so that a script need only name the last few components |
326 | of a possibly long source directory path. | |
ca88561f | 327 | .IP \(bu 4 |
ba4a90fd | 328 | Finally, the third part is optional if the file name part was given, |
1bd128a3 SC |
329 | and identifies the line number in the source file preceded by a ":" |
330 | or a "+". The line number is assumed to be an | |
331 | absolute line number if preceded by a ":", or relative to the entry of | |
99a5f9cf SC |
332 | the function if preceded by a "+". |
333 | All the lines in the function can be matched with ":*". | |
334 | A range of lines x through y can be matched with ":x-y". | |
ca88561f | 335 | .PP |
ba4a90fd | 336 | As an alternative, PATTERN may be a numeric constant, indicating an |
ea384b8c FCE |
337 | address. Such an address may be found from symbol tables of the |
338 | appropriate kernel / module object file. It is verified against | |
339 | known statement code boundaries, and will be relocated for use at | |
340 | run time. | |
341 | .PP | |
342 | In guru mode only, absolute kernel-space addresses may be specified with | |
343 | the ".absolute" suffix. Such an address is considered already relocated, | |
344 | as if it came from | |
345 | .BR /proc/kallsyms , | |
346 | so it cannot be checked against statement/instruction boundaries. | |
6f017dee FCE |
347 | |
348 | .SS CONTEXT VARIABLES | |
349 | ||
ba4a90fd | 350 | .PP |
6f017dee | 351 | Many of the source-level context variables, such as function parameters, |
ba4a90fd FCE |
352 | locals, globals visible in the compilation unit, may be visible to |
353 | probe handlers. They may refer to these variables by prefixing their | |
354 | name with "$" within the scripts. In addition, a special syntax | |
6f017dee FCE |
355 | allows limited traversal of structures, pointers, and arrays. More |
356 | syntax allows pretty-printing of individual variables or their groups. | |
357 | See also | |
358 | .BR @cast . | |
359 | ||
ba4a90fd FCE |
360 | .TP |
361 | $var | |
362 | refers to an in-scope variable "var". If it's an integer-like type, | |
7b9361d5 FCE |
363 | it will be cast to a 64-bit int for systemtap script use. String-like |
364 | pointers (char *) may be copied to systemtap string values using the | |
365 | .IR kernel_string " or " user_string | |
366 | functions. | |
ba4a90fd | 367 | .TP |
ab5e90c2 FCE |
368 | $var\->field traversal via a structure's or a pointer's field. This |
369 | generalized indirection operator may be repeated to follow more | |
370 | levels. Note that the | |
371 | .IR . | |
372 | operator is not used for plain structure | |
373 | members, only | |
374 | .IR \-> | |
375 | for both purposes. (This is because "." is reserved for string | |
376 | concatenation.) | |
ba4a90fd | 377 | .TP |
a43ba433 FCE |
378 | $return |
379 | is available in return probes only for functions that are declared | |
380 | with a return value. | |
381 | .TP | |
ba4a90fd | 382 | $var[N] |
33b081c5 JS |
383 | indexes into an array. The index given with a literal number or even |
384 | an arbitrary numeric expression. | |
6f017dee FCE |
385 | .PP |
386 | A number of operators exist for such basic context variable expressions: | |
34af38db | 387 | .TP |
2cb3fe26 SC |
388 | $$vars |
389 | expands to a character string that is equivalent to | |
6f017dee FCE |
390 | .SAMPLE |
391 | sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x", | |
392 | parm1, ..., parmN, var1, ..., varN) | |
393 | .ESAMPLE | |
394 | for each variable in scope at the probe point. Some values may be | |
395 | printed as | |
396 | .IR =? | |
397 | if their run-time location cannot be found. | |
2cb3fe26 SC |
398 | .TP |
399 | $$locals | |
a43ba433 | 400 | expands to a subset of $$vars for only local variables. |
2cb3fe26 SC |
401 | .TP |
402 | $$parms | |
a43ba433 FCE |
403 | expands to a subset of $$vars for only function parameters. |
404 | .TP | |
405 | $$return | |
406 | is available in return probes only. It expands to a string that | |
fd574705 | 407 | is equivalent to sprintf("return=%x", $return) |
a43ba433 | 408 | if the probed function has a return value, or else an empty string. |
6f017dee FCE |
409 | .TP |
410 | & $EXPR | |
411 | expands to the address of the given context variable expression, if it | |
412 | is addressable. | |
413 | .TP | |
414 | @defined($EXPR) | |
415 | expands to 1 or 0 iff the given context variable expression is resolvable, | |
416 | for use in conditionals such as | |
417 | .SAMPLE | |
418 | @defined($foo->bar) ? $foo->bar : 0 | |
419 | .ESAMPLE | |
420 | .TP | |
421 | $EXPR$ | |
422 | expands to a string with all of $EXPR's members, equivalent to | |
423 | .SAMPLE | |
424 | sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}", | |
425 | $EXPR\->a, $EXPR\->b) | |
426 | .ESAMPLE | |
427 | .TP | |
428 | $EXPR$$ | |
429 | expands to a string with all of $var's members and submembers, equivalent to | |
430 | .SAMPLE | |
431 | sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}", | |
432 | $EXPR\->a, $EXPR\->b, $EXPR\->c\->x, $EXPR\->c\->y, $EXPR\->d[0]) | |
433 | .ESAMPLE | |
434 | ||
39e3139a FCE |
435 | .PP |
436 | For ".return" probes, context variables other than the "$return" | |
437 | value itself are only available for the function call parameters. | |
438 | The expressions evaluate to the | |
439 | .IR entry-time | |
440 | values of those variables, since that is when a snapshot is taken. | |
441 | Other local variables are not generally accessible, since by the time | |
442 | a ".return" probe hits, the probed function will have already returned. | |
8cc799a5 JS |
443 | .PP |
444 | Arbitrary entry-time expressions can also be saved for ".return" | |
445 | probes using the | |
446 | .IR @entry(expr) | |
447 | operator. For example, one can compute the elapsed time of a function: | |
448 | .SAMPLE | |
449 | probe kernel.function("do_filp_open").return { | |
450 | println( get_timeofday_us() \- @entry(get_timeofday_us()) ) | |
451 | } | |
452 | .ESAMPLE | |
39e3139a | 453 | |
ba4a90fd | 454 | |
94c3c803 AM |
455 | .SS DWARFLESS |
456 | In absence of debugging information, entry & exit points of kernel & module | |
457 | functions can be probed using the "kprobe" family of probes. | |
458 | However, these do not permit looking up the arguments / local variables | |
459 | of the function. | |
460 | Following constructs are supported : | |
461 | .SAMPLE | |
462 | kprobe.function(FUNCTION) | |
463 | kprobe.function(FUNCTION).return | |
464 | kprobe.module(NAME).function(FUNCTION) | |
465 | kprobe.module(NAME).function(FUNCTION).return | |
466 | kprobe.statement.(ADDRESS).absolute | |
467 | .ESAMPLE | |
468 | .PP | |
469 | Probes of type | |
470 | .B function | |
471 | are recommended for kernel functions, whereas probes of type | |
472 | .B module | |
473 | are recommended for probing functions of the specified module. | |
474 | In case the absolute address of a kernel or module function is known, | |
475 | .B statement | |
476 | probes can be utilized. | |
477 | .PP | |
478 | Note that | |
479 | .I FUNCTION | |
480 | and | |
481 | .I MODULE | |
482 | names | |
483 | .B must not | |
484 | contain wildcards, or the probe will not be registered. | |
485 | Also, statement probes must be run under guru-mode only. | |
486 | ||
487 | ||
1ada6f08 | 488 | .SS USER-SPACE |
0a1c696d FCE |
489 | Support for user-space probing is available for kernels |
490 | that are configured with the utrace extensions. See | |
491 | .SAMPLE | |
492 | http://people.redhat.com/roland/utrace/ | |
493 | .ESAMPLE | |
494 | .PP | |
495 | There are several forms. First, a non-symbolic probe point: | |
1ada6f08 FCE |
496 | .SAMPLE |
497 | process(PID).statement(ADDRESS).absolute | |
498 | .ESAMPLE | |
499 | is analogous to | |
500 | .IR | |
501 | kernel.statement(ADDRESS).absolute | |
502 | in that both use raw (unverified) virtual addresses and provide | |
503 | no $variables. The target PID parameter must identify a running | |
504 | process, and ADDRESS should identify a valid instruction address. | |
505 | All threads of that process will be probed. | |
29cb9b42 | 506 | .PP |
0a1c696d FCE |
507 | Second, non-symbolic user-kernel interface events handled by |
508 | utrace may be probed: | |
29cb9b42 | 509 | .SAMPLE |
dd078c96 | 510 | process(PID).begin |
82f0e81b | 511 | process("FULLPATH").begin |
986e98de | 512 | process.begin |
dd078c96 | 513 | process(PID).thread.begin |
82f0e81b | 514 | process("FULLPATH").thread.begin |
986e98de | 515 | process.thread.begin |
dd078c96 | 516 | process(PID).end |
82f0e81b | 517 | process("FULLPATH").end |
986e98de | 518 | process.end |
dd078c96 | 519 | process(PID).thread.end |
82f0e81b | 520 | process("FULLPATH").thread.end |
986e98de | 521 | process.thread.end |
29cb9b42 | 522 | process(PID).syscall |
82f0e81b | 523 | process("FULLPATH").syscall |
986e98de | 524 | process.syscall |
29cb9b42 | 525 | process(PID).syscall.return |
82f0e81b | 526 | process("FULLPATH").syscall.return |
986e98de | 527 | process.syscall.return |
0afb7073 | 528 | process(PID).insn |
82f0e81b | 529 | process("FULLPATH").insn |
0afb7073 | 530 | process(PID).insn.block |
82f0e81b | 531 | process("FULLPATH").insn.block |
29cb9b42 DS |
532 | .ESAMPLE |
533 | .PP | |
534 | A | |
dd078c96 | 535 | .B .begin |
82f0e81b | 536 | probe gets called when new process described by PID or FULLPATH gets created. |
29cb9b42 | 537 | A |
dd078c96 | 538 | .B .thread.begin |
82f0e81b | 539 | probe gets called when a new thread described by PID or FULLPATH gets created. |
159cb109 | 540 | A |
dd078c96 | 541 | .B .end |
82f0e81b | 542 | probe gets called when process described by PID or FULLPATH dies. |
dd078c96 DS |
543 | A |
544 | .B .thread.end | |
82f0e81b | 545 | probe gets called when a thread described by PID or FULLPATH dies. |
29cb9b42 DS |
546 | A |
547 | .B .syscall | |
82f0e81b | 548 | probe gets called when a thread described by PID or FULLPATH makes a |
6270adc1 MH |
549 | system call. The system call number is available in the |
550 | .BR $syscall | |
551 | context variable, and the first 6 arguments of the system call | |
552 | are available in the | |
553 | .BR $argN | |
554 | (ex. $arg1, $arg2, ...) context variable. | |
29cb9b42 DS |
555 | A |
556 | .B .syscall.return | |
82f0e81b | 557 | probe gets called when a thread described by PID or FULLPATH returns from a |
5d67b47c MH |
558 | system call. The system call number is available in the |
559 | .BR $syscall | |
560 | context variable, and the return value of the system call is available | |
561 | in the | |
562 | .BR $return | |
29cb9b42 | 563 | context variable. |
a96d1db0 | 564 | A |
0afb7073 | 565 | .B .insn |
82f0e81b | 566 | probe gets called for every single-stepped instruction of the process described by PID or FULLPATH. |
0afb7073 FCE |
567 | A |
568 | .B .insn.block | |
82f0e81b FCE |
569 | probe gets called for every block-stepped instruction of the process described by PID or FULLPATH. |
570 | .PP | |
571 | If a process probe is specified without a PID or FULLPATH, all user | |
572 | threads will be probed. However, if systemtap was invoked with the | |
573 | .IR -c " or " -x | |
574 | options, then process probes are restricted to the process | |
575 | hierarchy associated with the target process. | |
0a1c696d FCE |
576 | |
577 | .PP | |
578 | Third, symbolic static instrumentation compiled into programs and | |
579 | shared libraries may be | |
580 | probed: | |
581 | .SAMPLE | |
582 | process("PATH").mark("LABEL") | |
a794dbeb | 583 | process("PATH").provider("PROVIDER").mark("LABEL") |
0a1c696d FCE |
584 | .ESAMPLE |
585 | .PP | |
f28a8c28 SC |
586 | A |
587 | .B .mark | |
588 | probe gets called via a static probe which is defined in the | |
a794dbeb FCE |
589 | application by STAP_PROBE1(PROVIDER,LABEL,arg1), which is defined in |
590 | sdt.h. The handle is an application handle, LABEL corresponds to | |
591 | the .mark argument, and arg1 is the argument. STAP_PROBE1 is used for | |
592 | probes with 1 argument, STAP_PROBE2 is used for probes with 2 | |
593 | arguments, and so on. The arguments of the probe are available in the | |
594 | context variables $arg1, $arg2, ... An alternative to using the | |
595 | STAP_PROBE macros is to use the dtrace script to create custom macros. | |
596 | Additionally, the variables $$name and $$provider are available as | |
597 | parts of the probe point name. | |
0a1c696d | 598 | |
29cb9b42 | 599 | .PP |
0a1c696d FCE |
600 | Finally, full symbolic source-level probes in user-space programs |
601 | and shared libraries are supported. These are exactly analogous | |
602 | to the symbolic DWARF-based kernel/module probes described above, | |
603 | and expose similar contextual $-variables. | |
604 | .SAMPLE | |
605 | process("PATH").function("NAME") | |
606 | process("PATH").statement("*@FILE.c:123") | |
607 | process("PATH").function("*").return | |
608 | process("PATH").function("myfun").label("foo") | |
609 | .ESAMPLE | |
610 | ||
611 | .PP | |
612 | Note that for all process probes, | |
29cb9b42 | 613 | .I PATH |
ea384b8c FCE |
614 | names refer to executables that are searched the same way shells do: relative |
615 | to the working directory if they contain a "/" character, otherwise in | |
616 | .BR $PATH . | |
153e7a22 FCE |
617 | PATH may also refer to shared libraries, in which case all proceses that |
618 | map it at runtime would be selected for probing. | |
82f0e81b FCE |
619 | If the PATH string contains wildcards as in the MPATTERN case, then |
620 | standard globbing is performed to find all matching paths. In this | |
621 | case, the | |
622 | .BR $PATH | |
623 | environment variable is not used. | |
624 | ||
625 | .PP | |
153e7a22 FCE |
626 | If systemtap was invoked with the |
627 | .IR \-c " or " \-x | |
760695db FCE |
628 | options, then process probes are restricted to the process |
629 | hierarchy associated with the target process. | |
1ada6f08 | 630 | |
9cb48751 DS |
631 | .SS PROCFS |
632 | ||
633 | These probe points allow procfs "files" in | |
c243f608 LB |
634 | /proc/systemtap/MODNAME to be created, read and written using a |
635 | permission that may be modified using the proper umask value. Default permissions are 0400 for read | |
636 | probes, and 0200 for write probes. If both a read and write probe are being | |
637 | used on the same file, a default permission of 0600 will be used. | |
638 | Using procfs.umask(0040).read would | |
639 | result in a 0404 permission set for the file. | |
9cb48751 DS |
640 | .RI ( MODNAME |
641 | is the name of the systemtap module). The | |
642 | .I proc | |
643 | filesystem is a pseudo-filesystem which is used an an interface to | |
c243f608 | 644 | kernel data structures. There are several probe point variants supported |
9cb48751 | 645 | by the translator: |
ca88561f | 646 | |
9cb48751 DS |
647 | .SAMPLE |
648 | procfs("PATH").read | |
c243f608 | 649 | procfs("PATH").umask(UMASK).read |
38975255 | 650 | procfs("PATH").read.maxsize(MAXSIZE) |
c243f608 | 651 | procfs("PATH").umask(UMASK).maxsize(MAXSIZE) |
9cb48751 | 652 | procfs("PATH").write |
c243f608 | 653 | procfs("PATH").umask(UMASK).write |
9cb48751 | 654 | procfs.read |
c243f608 | 655 | procfs.umask(UMASK).read |
38975255 | 656 | procfs.read.maxsize(MAXSIZE) |
c243f608 | 657 | procfs.umask(UMASK).read.maxsize(MAXSIZE) |
9cb48751 | 658 | procfs.write |
c243f608 | 659 | procfs.umask(UMASK).write |
9cb48751 | 660 | .ESAMPLE |
ca88561f | 661 | |
9cb48751 DS |
662 | .I PATH |
663 | is the file name (relative to /proc/systemtap/MODNAME) to be created. | |
664 | If no | |
665 | .I PATH | |
666 | is specified (as in the last two variants above), | |
667 | .I PATH | |
668 | defaults to "command". | |
669 | .PP | |
670 | When a user reads /proc/systemtap/MODNAME/PATH, the corresponding | |
671 | procfs | |
672 | .I read | |
673 | probe is triggered. The string data to be read should be assigned to | |
674 | a variable named | |
675 | .IR $value , | |
676 | like this: | |
ca88561f | 677 | |
9cb48751 DS |
678 | .SAMPLE |
679 | procfs("PATH").read { $value = "100\\n" } | |
680 | .ESAMPLE | |
681 | .PP | |
682 | When a user writes into /proc/systemtap/MODNAME/PATH, the | |
683 | corresponding procfs | |
684 | .I write | |
685 | probe is triggered. The data the user wrote is available in the | |
686 | string variable named | |
687 | .IR $value , | |
688 | like this: | |
ca88561f | 689 | |
9cb48751 DS |
690 | .SAMPLE |
691 | procfs("PATH").write { printf("user wrote: %s", $value) } | |
692 | .ESAMPLE | |
38975255 DS |
693 | .PP |
694 | .I MAXSIZE | |
695 | is the size of the procfs read buffer. Specifying | |
696 | .I MAXSIZE | |
697 | allows larger procfs output. If no | |
698 | .I MAXSIZE | |
699 | is specified, the procfs read buffer defaults to | |
700 | .I STP_PROCFS_BUFSIZE | |
701 | (which defaults to | |
702 | .IR MAXSTRINGLEN , | |
703 | the maximum length of a string). | |
704 | If setting the procfs read buffers for more than one file is needed, | |
705 | it may be easiest to override the | |
706 | .I STP_PROCFS_BUFSIZE | |
707 | definition. | |
708 | Here's an example of using | |
709 | .IR MAXSIZE : | |
710 | ||
711 | .SAMPLE | |
712 | procfs.read.maxsize(1024) { | |
713 | $value = "long string..." | |
714 | $value .= "another long string..." | |
715 | $value .= "another long string..." | |
716 | $value .= "another long string..." | |
717 | } | |
718 | .ESAMPLE | |
9cb48751 | 719 | |
6f05b6ab FCE |
720 | .SS MARKERS |
721 | ||
722 | This family of probe points hooks up to static probing markers | |
723 | inserted into the kernel or modules. These markers are special macro | |
724 | calls inserted by kernel developers to make probing faster and more | |
725 | reliable than with DWARF-based probes. Further, DWARF debugging | |
726 | information is | |
727 | .I not | |
728 | required to probe markers. | |
729 | ||
730 | Marker probe points begin with | |
f781f849 DS |
731 | .BR kernel . |
732 | The next part names the marker itself: | |
6f05b6ab FCE |
733 | .BR mark("name") . |
734 | The marker name string, which may contain the usual wildcard characters, | |
735 | is matched against the names given to the marker macros when the kernel | |
eb973c2a DS |
736 | and/or module was compiled. Optionally, you can specify |
737 | .BR format("format") . | |
37f6433e | 738 | Specifying the marker format string allows differentiation between two |
eb973c2a | 739 | markers with the same name but different marker format strings. |
6f05b6ab FCE |
740 | |
741 | The handler associated with a marker-based probe may read the | |
742 | optional parameters specified at the macro call site. These are | |
743 | named | |
744 | .BR $arg1 " through " $argNN , | |
745 | where NN is the number of parameters supplied by the macro. Number | |
746 | and string parameters are passed in a type-safe manner. | |
747 | ||
eb973c2a DS |
748 | The marker format string associated with a marker is available in |
749 | .BR $format . | |
37f6433e | 750 | And also the marker name string is available in |
bc54e71c | 751 | .BR $name . |
eb973c2a | 752 | |
bc724b8b JS |
753 | .SS TRACEPOINTS |
754 | ||
755 | This family of probe points hooks up to static probing tracepoints | |
756 | inserted into the kernel or modules. As with markers, these | |
757 | tracepoints are special macro calls inserted by kernel developers to | |
758 | make probing faster and more reliable than with DWARF-based probes, | |
759 | and DWARF debugging information is not required to probe tracepoints. | |
760 | Tracepoints have an extra advantage of more strongly-typed parameters | |
761 | than markers. | |
762 | ||
763 | Tracepoint probes begin with | |
764 | .BR kernel . | |
765 | The next part names the tracepoint itself: | |
766 | .BR trace("name") . | |
767 | The tracepoint name string, which may contain the usual wildcard | |
768 | characters, is matched against the names defined by the kernel | |
769 | developers in the tracepoint header files. | |
770 | ||
771 | The handler associated with a tracepoint-based probe may read the | |
772 | optional parameters specified at the macro call site. These are | |
773 | named according to the declaration by the tracepoint author. For | |
774 | example, the tracepoint probe | |
775 | .BR kernel.trace("sched_switch") | |
776 | provides the parameters | |
777 | .BR $rq ", " $prev ", and " $next . | |
778 | If the parameter is a complex type, as in a struct pointer, then a | |
779 | script can access fields with the same syntax as DWARF $target | |
780 | variables. Also, tracepoint parameters cannot be modified, but in | |
781 | guru-mode a script may modify fields of parameters. | |
782 | ||
783 | The name of the tracepoint is available in | |
784 | .BR $$name , | |
785 | and a string of name=value pairs for all parameters of the tracepoint | |
786 | is available in | |
046e7190 | 787 | .BR $$vars " or " $$parms . |
bc724b8b | 788 | |
dd225250 PS |
789 | .SS HARDWARE BREAKPOINTS |
790 | This family of probes is used to set hardware watchpoints for a given | |
791 | (global) kernel symbol. The probes take three components as inputs : | |
792 | ||
793 | 1. The | |
794 | .BR virtual address / name | |
795 | of the kernel symbol to be traced is supplied as argument to this class | |
796 | of probes. ( Probes for only data segment variables are supported. Probing | |
797 | local variables of a function cannot be done.) | |
798 | ||
799 | 2. Nature of access to be probed : | |
800 | a. | |
801 | .I .write | |
802 | probe gets triggered when a write happens at the specified address/symbol | |
803 | name. | |
804 | b. | |
805 | .I rw | |
806 | probe is triggered when either a read or write happens. | |
807 | ||
808 | 3. | |
809 | .BR .length | |
810 | (optional) | |
811 | Users have the option of specifying the address interval to be probed | |
812 | using "length" constructs. The user-specified length gets approximated | |
813 | to the closest possible address length that the architecture can | |
814 | support. If the specified length exceeds the limits imposed by | |
815 | architecture, an error message is flagged and probe registration fails. | |
816 | Wherever 'length' is not specified, the translator requests a hardware | |
817 | breakpoint probe of length 1. It should be noted that the "length" | |
818 | construct is not valid with symbol names. | |
819 | ||
820 | Following constructs are supported : | |
821 | .SAMPLE | |
822 | probe kernel.data(ADDRESS).write | |
823 | probe kernel.data(ADDRESS).rw | |
824 | probe kernel.data(ADDRESS).length(LEN).write | |
825 | probe kernel.data(ADDRESS).length(LEN).rw | |
826 | probe kernel.data("SYMBOL_NAME").write | |
827 | probe kernel.data("SYMBOL_NAME").rw | |
828 | .ESAMPLE | |
829 | ||
830 | This set of probes make use of the debug registers of the processor, | |
831 | which is a scarce resource. (4 on x86 , 1 on powerpc ) The script | |
832 | translation flags a warning if a user requests more hardware breakpoint probes | |
833 | than the limits set by architecture. For example,a pass-2 warning is flashed | |
834 | when an input script requests 5 hardware breakpoint probes on an x86 | |
835 | system while x86 architecture supports a maximum of 4 breakpoints. | |
836 | Users are cautioned to set probes judiciously. | |
837 | ||
ba4a90fd FCE |
838 | .SH EXAMPLES |
839 | .PP | |
840 | Here are some example probe points, defining the associated events. | |
841 | .TP | |
842 | begin, end, end | |
843 | refers to the startup and normal shutdown of the session. In this | |
844 | case, the handler would run once during startup and twice during | |
845 | shutdown. | |
846 | .TP | |
847 | timer.jiffies(1000).randomize(200) | |
13d2ecdb | 848 | refers to a periodic interrupt, every 1000 +/\- 200 jiffies. |
ba4a90fd FCE |
849 | .TP |
850 | kernel.function("*init*"), kernel.function("*exit*") | |
851 | refers to all kernel functions with "init" or "exit" in the name. | |
852 | .TP | |
853 | kernel.function("*@kernel/sched.c:240") | |
854 | refers to any functions within the "kernel/sched.c" file that span | |
855 | line 240. | |
856 | .TP | |
6f05b6ab FCE |
857 | kernel.mark("getuid") |
858 | refers to an STAP_MARK(getuid, ...) macro call in the kernel. | |
859 | .TP | |
ba4a90fd FCE |
860 | module("usb*").function("*sync*").return |
861 | refers to the moment of return from all functions with "sync" in the | |
862 | name in any of the USB drivers. | |
863 | .TP | |
864 | kernel.statement(0xc0044852) | |
865 | refers to the first byte of the statement whose compiled instructions | |
866 | include the given address in the kernel. | |
b4ceace2 | 867 | .TP |
a5ae3f3d | 868 | kernel.statement("*@kernel/sched.c:2917") |
1bd128a3 SC |
869 | refers to the statement of line 2917 within "kernel/sched.c". |
870 | .TP | |
871 | kernel.statement("bio_init@fs/bio.c+3") | |
872 | refers to the statement at line bio_init+3 within "fs/bio.c". | |
a5ae3f3d | 873 | .TP |
dd225250 PS |
874 | kernel.data("pid_max").write |
875 | refers to a hardware preakpoint of type "write" set on pid_max | |
876 | .TP | |
729286d8 | 877 | syscall.*.return |
b4ceace2 | 878 | refers to the group of probe aliases with any name in the third position |
ba4a90fd | 879 | |
f33e9151 FCE |
880 | .SS PERF |
881 | ||
882 | This | |
883 | .IR prototype | |
884 | family of probe points interfaces to the kernel "perf event" | |
885 | infrasture for controlling hardware performance counters. | |
886 | The events being attached to are described by the "type", | |
887 | "config" fields of the | |
888 | .IR perf_event_attr | |
889 | structure, and are sampled at an interval governed by the | |
890 | "sample_period" field. | |
891 | ||
892 | These fields are made available to systemtap scripts using | |
893 | the following syntax: | |
894 | .SAMPLE | |
bb9fd173 | 895 | probe perf.type(NN).config(MM).sample(XX) |
f33e9151 FCE |
896 | probe perf.type(NN).config(MM) |
897 | .ESAMPLE | |
898 | The range of valid type/config is described by the | |
899 | .IR perf_event_open (2) | |
900 | system call, and/or the | |
901 | .IR linux/perf_event.h | |
8fb91f5f FCE |
902 | file. Invalid combinations or exhausted hardware counter resources |
903 | result in errors during systemtap script startup. Systemtap does | |
f33e9151 FCE |
904 | not sanity-check the values: it merely passes them through to |
905 | the kernel for error- and safety-checking. | |
906 | ||
ba4a90fd | 907 | .SH SEE ALSO |
78db65bd | 908 | .IR stap (1), |
89965a32 FCE |
909 | .IR probe::* (3stap), |
910 | .IR tapset::* (3stap) |