]> sourceware.org Git - systemtap.git/blob - stap.1.in
Merge branch 'master' of ssh://sources.redhat.com/git/systemtap
[systemtap.git] / stap.1.in
1 .\" -*- nroff -*-
2 .TH STAP 1 @DATE@ "Red Hat"
3 .SH NAME
4 stap \- systemtap script translator/driver
5
6 .\" macros
7 .de SAMPLE
8 .br
9 .RS
10 .nf
11 .nh
12 ..
13 .de ESAMPLE
14 .hy
15 .fi
16 .RE
17 ..
18
19 .SH SYNOPSIS
20
21 .br
22 .B stap
23 [
24 .I OPTIONS
25 ]
26 .I FILENAME
27 [
28 .I ARGUMENTS
29 ]
30 .br
31 .B stap
32 [
33 .I OPTIONS
34 ]
35 .B \-
36 [
37 .I ARGUMENTS
38 ]
39 .br
40 .B stap
41 [
42 .I OPTIONS
43 ]
44 .BI \-e " SCRIPT"
45 [
46 .I ARGUMENTS
47 ]
48 .br
49 .B stap
50 [
51 .I OPTIONS
52 ]
53 .BI \-l " PROBE"
54 [
55 .I ARGUMENTS
56 ]
57 .br
58 .B stap
59 [
60 .I OPTIONS
61 ]
62 .BI \-L " PROBE"
63 [
64 .I ARGUMENTS
65 ]
66
67 .SH DESCRIPTION
68
69 The
70 .IR stap
71 program is the front-end to the Systemtap tool. It accepts probing
72 instructions (written in a simple scripting language), translates
73 those instructions into C code, compiles this C code, and loads the
74 resulting kernel module into a running Linux kernel to perform the
75 requested system trace/probe functions. You can supply the script in
76 a named file, from standard input, or from the command line. The
77 program runs until it is interrupted by the user, or if the script
78 voluntarily invokes the
79 .I exit()
80 function, or by sufficient number of soft errors.
81 .PP
82 The language, which is described in a later section, is strictly typed,
83 declaration free, procedural, and inspired by
84 .IR awk .
85 It allows source code points or events in the kernel to be associated
86 with handlers, which are subroutines that are executed synchronously. It is
87 somewhat similar conceptually to "breakpoint command lists" in the
88 .IR gdb
89 debugger.
90 .PP
91 This manual corresponds to version @VERSION@.
92
93 .SH OPTIONS
94 The systemtap translator supports the following options. Any other option
95 prints a list of supported options.
96 .TP
97 .B \-h
98 Show help message.
99 .TP
100 .B \-V
101 Show version message.
102 .TP
103 .BI \-p " NUM"
104 Stop after pass NUM. The passes are numbered 1-5: parse, elaborate,
105 translate, compile, run. See the
106 .B PROCESSING
107 section for details.
108 .TP
109 .B \-v
110 Increase verbosity for all passes. Produce a larger volume of
111 informative (?) output each time option repeated.
112 .TP
113 .B \-\-vp ABCDE
114 Increase verbosity on a per-pass basis. For example, "\-\-vp\ 002"
115 adds 2 units of verbosity to pass 3 only. The combination "\-v\ \-\-vp\ 00004"
116 adds 1 unit of verbosity for all passes, and 4 more for pass 5.
117 .TP
118 .B \-k
119 Keep the temporary directory after all processing. This may be useful
120 in order to examine the generated C code, or to reuse the compiled
121 kernel object.
122 .TP
123 .B \-g
124 Guru mode. Enable parsing of unsafe expert-level constructs like
125 embedded C.
126 .TP
127 .B \-P
128 Prologue-searching mode. Activate heuristics to work around incorrect
129 debugging information for $target variables.
130 .TP
131 .B \-u
132 Unoptimized mode. Disable unused code elision during elaboration.
133 .TP
134 .B \-w
135 Suppressed warnings mode. Disables all warning messages.
136 .TP
137 .BI \-b
138 Use bulk mode (percpu files) for kernel-to-user data transfer.
139 .TP
140 .B \-t
141 Collect timing information on the number of times probe executes
142 and average amount of time spent in each probe.
143 .TP
144 .BI \-s NUM
145 Use NUM megabyte buffers for kernel-to-user data transfer. On a
146 multiprocessor in bulk mode, this is a per-processor amount.
147 .TP
148 .BI \-I " DIR"
149 Add the given directory to the tapset search directory. See the
150 description of pass 2 for details.
151 .TP
152 .BI \-D " NAME=VALUE"
153 Add the given C preprocessor directive to the module Makefile. These can
154 be used to override limit parameters described below.
155 .TP
156 .BI \-B " NAME=VALUE"
157 Add the given make directive to the kernel module build's make invocation.
158 These can be used to add or override kconfig options.
159 .TP
160 .BI \-R " DIR"
161 Look for the systemtap runtime sources in the given directory.
162 .TP
163 .BI \-r " /DIR"
164 Build for kernel in given build tree. Can also be set with the
165 .I SYSTEMTAP_RELEASE
166 environment variable.
167 .TP
168 .BI \-r " RELEASE"
169 Build for kernel in build tree
170 .BR /lib/modules/RELEASE/build .
171 Can also be set with the
172 .I SYSTEMTAP_RELEASE
173 environment variable.
174 .TP
175 .BI \-m " MODULE"
176 Use the given name for the generated kernel object module, instead
177 of a unique randomized name. The generated kernel object module is
178 copied to the current directory.
179 .TP
180 .BI \-d " MODULE"
181 Add symbol/unwind information for the given module into the kernel object
182 module. This may enable symbolic tracebacks from those modules/programs,
183 even if they do not have an explicit probe placed into them.
184 .TP
185 .BI \-o " FILE"
186 Send standard output to named file. In bulk mode, percpu files will
187 start with FILE_ (FILE_cpu with -F) followed by the cpu number.
188 This supports strftime(3) formats for FILE.
189 .TP
190 .BI \-c " CMD"
191 Start the probes, run CMD, and exit when CMD finishes.
192 .TP
193 .BI \-x " PID"
194 Sets target() to PID. This allows scripts to be written that filter on
195 a specific process.
196 .TP
197 .BI \-l " PROBE"
198 Instead of running a probe script, just list all available probe
199 points matching the given pattern. The pattern may include wildcards
200 and aliases.
201 .TP
202 .BI \-L " PROBE"
203 Similar to "-l", but list probe points and script-level local variables.
204 .TP
205 .BI \-F
206 Without -o option, load module and start probes, then detach from the module
207 leaving the probes running.
208 With -o option, run staprun in background as a daemon and show its pid.
209 .TP
210 .BI \-S " size[,N]"
211 Sets the maximum size of output file and the maximum number of output files.
212 If the size of output file will exceed
213 .B size
214 , systemtap switches output file to the next file. And if the number of
215 output files exceed
216 .B N
217 , systemtap removes the oldest output file. You can omit the second argument.
218 .TP
219 .B \-\-kelf
220 For names and addresses of functions to probe,
221 consult the symbol tables in the kernel and modules.
222 This can be useful if your kernel and/or modules were compiled
223 without debugging information, or the function you want to probe
224 is in an assembly-language file built without debugging information.
225 See the
226 .B "MAKING DO WITH SYMBOL TABLES"
227 section for more information.
228 .TP
229 .BI \-\-kmap [=FILE]
230 For names and addresses of kernel functions to probe,
231 consult the symbol table in the indicated text file.
232 The default is /boot/System.map-VERSION.
233 The contents of this file should be in the form of the default output from
234 .IR nm (1).
235 Only symbols of type T or t are used.
236 If you specify /proc/kallsyms or some other file in that format,
237 where lines for module symbols contain a fourth column,
238 reading of the symbol table stops with the first module symbol
239 (which should be right after the last kernel symbol).
240 As with
241 .BR \-\-kelf ,
242 the symbol table in each module's .ko file will also be consulted.
243 See the
244 .B "MAKING DO WITH SYMBOL TABLES"
245 section for more information.
246 .TP
247 .B \-\-ignore\-vmlinux
248 For testing, act as though neither the uncompressed kernel (vmlinux)
249 nor the kernel debugging information can be found.
250 .TP
251 .B \-\-ignore\-dwarf
252 For testing, act as though vmlinux and modules lack debugging information.
253 .TP
254 .B \-\-skip\-badvars
255 Ignore out of context variables and substitute with literal 0.
256
257 .SH ARGUMENTS
258
259 Any additional arguments on the command line are passed to the script
260 parser for substitution. See below.
261
262 .SH SCRIPT LANGUAGE
263
264 The systemtap script language resembles
265 .IR awk .
266 There are two main outermost constructs: probes and functions. Within
267 these, statements and expressions use C-like operator syntax and
268 precedence.
269
270 .SS GENERAL SYNTAX
271 Whitespace is ignored. Three forms of comments are supported:
272 .RS
273 .br
274 .BR # " ... shell style, to the end of line, except for $# and @#"
275 .br
276 .BR // " ... C++ style, to the end of line"
277 .br
278 .BR /* " ... C style ... " */
279 .RE
280 Literals are either strings enclosed in double-quotes (passing through
281 the usual C escape codes with backslashes), or integers (in decimal,
282 hexadecimal, or octal, using the same notation as in C). All strings
283 are limited in length to some reasonable value (a few hundred bytes).
284 Integers are 64-bit signed quantities, although the parser also accepts
285 (and wraps around) values above positive 2**63.
286 .PP
287 In addition, script arguments given at the end of the command line may
288 be inserted. Use
289 .B $1 ... $<NN>
290 for insertion unquoted,
291 .B @1 ... @<NN>
292 for insertion as a string literal. The number of arguments may be accessed
293 through
294 .B $#
295 (as an unquoted number) or through
296 .B @#
297 (as a quoted number). These may be used at any place a token may begin,
298 including within the preprocessing stage. Reference to an argument
299 number beyond what was actually given is an error.
300
301 .SS PREPROCESSING
302 A simple conditional preprocessing stage is run as a part of parsing.
303 The general form is similar to the
304 .RB cond " ? " exp1 " : " exp2
305 ternary operator:
306 .SAMPLE
307 .BR %( " CONDITION " %? " TRUE-TOKENS " %)
308 .BR %( " CONDITION " %? " TRUE-TOKENS " %: " FALSE-TOKENS " %)
309 .ESAMPLE
310 The CONDITION is either an expression whose format is determined by its
311 first keyword, or a string literals comparison or a numeric literals
312 comparison. It can be also composed of many alternatives and conjunctions
313 of CONDITIONs (meant as in previous sentence) using || and && respectively.
314 However, parentheses are not supported yet, so remembering that conjunction
315 takes precedence over alternative is important.
316 .PP
317 If the first part is the identifier
318 .BR kernel_vr " or " kernel_v
319 to refer to the kernel version number, with ("2.6.13\-1.322FC3smp") or
320 without ("2.6.13") the release code suffix, then
321 the second part is one of the six standard numeric comparison operators
322 .BR < ", " <= ", " == ", " != ", " > ", and " >= ,
323 and the third part is a string literal that contains an RPM-style
324 version-release value. The condition is deemed satisfied if the
325 version of the target kernel (as optionally overridden by the
326 .BR \-r
327 option) compares to the given version string. The comparison is
328 performed by the glibc function
329 .BR strverscmp .
330 As a special case, if the operator is for simple equality
331 .RB ( == ),
332 or inequality
333 .RB ( != ),
334 and the third part contains any wildcard characters
335 .RB ( * " or " ? " or " [ "),"
336 then the expression is treated as a wildcard (mis)match as evaluated
337 by
338 .BR fnmatch .
339 .PP
340 If, on the other hand, the first part is the identifier
341 .BR arch
342 to refer to the processor architecture (as named by the kernel
343 build system ARCH/SUBARCH), then the second
344 part is one of the two string comparison operators
345 .BR == " or " != ,
346 and the third part is a string literal for matching it. This
347 comparison is a wildcard (mis)match.
348 .PP
349 Similarly, if the first part is an identifier like
350 .BR CONFIG_something
351 to refer to a kernel configuration option, then the second part is
352 .BR == " or " != ,
353 and the third part is a string literal for matching the value
354 (commonly "y" or "m"). Nonexistent or unset kernel configuration
355 options are represented by the empty string. This comparison is also
356 a wildcard (mis)match.
357 .PP
358 Otherwise, the CONDITION is expected to be a comparison between two string
359 literals or two numeric literals. In this case, the arguments are the only
360 variables usable.
361 .PP
362 The TRUE-TOKENS and FALSE-TOKENS are zero or more general parser
363 tokens (possibly including nested preprocessor conditionals), and are
364 pasted into the input stream if the condition is true or false. For
365 example, the following code induces a parse error unless the target
366 kernel version is newer than 2.6.5:
367 .SAMPLE
368 %( kernel_v <= "2.6.5" %? **ERROR** %) # invalid token sequence
369 .ESAMPLE
370 The following code might adapt to hypothetical kernel version drift:
371 .SAMPLE
372 probe kernel.function (
373 %( kernel_v <= "2.6.12" %? "__mm_do_fault" %:
374 %( kernel_vr == "2.6.13*smp" %? "do_page_fault" %:
375 UNSUPPORTED %) %)
376 ) { /* ... */ }
377
378 %( arch == "ia64" %?
379 probe syscall.vliw = kernel.function("vliw_widget") {}
380 %)
381 .ESAMPLE
382
383 .SS VARIABLES
384 Identifiers for variables and functions are an alphanumeric sequence,
385 and may include "_" and "$" characters. They may not start with a
386 plain digit, as in C. Each variable is by default local to the probe
387 or function statement block within which it is mentioned, and therefore
388 its scope and lifetime is limited to a particular probe or function
389 invocation.
390 .\" XXX add statistics type here once it's supported
391 .PP
392 Scalar variables are implicitly typed as either string or integer.
393 Associative arrays also have a string or integer value, and a
394 a tuple of strings and/or integers serving as a key. Here are a
395 few basic expressions.
396 .SAMPLE
397 var1 = 5
398 var2 = "bar"
399 array1 [pid()] = "name" # single numeric key
400 array2 ["foo",4,i++] += 5 # vector of string/num/num keys
401 if (["hello",5,4] in array2) println ("yes") # membership test
402 .ESAMPLE
403 .PP
404 The translator performs
405 .I type inference
406 on all identifiers, including array indexes and function parameters.
407 Inconsistent type-related use of identifiers signals an error.
408 .PP
409 Variables may be declared global, so that they are shared amongst all
410 probes and live as long as the entire systemtap session. There is one
411 namespace for all global variables, regardless of which script file
412 they are found within. A global declaration may be written at the
413 outermost level anywhere, not within a block of code. Global
414 variables which are written but never read will be displayed
415 automatically at session shutdown. The following
416 declaration marks a few variables as global. The translator will
417 infer for each its value type, and if it is used as an array, its key
418 types. Optionally, scalar globals may be initialized with a string
419 or number literal.
420 .RS
421 .BR global " var1" , " var2" , " var3=4"
422 .RE
423 .PP
424 Arrays are limited in size by the MAXMAPENTRIES variable -- see the
425 .B SAFETY AND SECURITY
426 section for details. Optionally, global arrays may be declared with a
427 maximum size in brackets, overriding MAXMAPENTRIES for that array only.
428 Note that this doesn't indicate the type of keys for the array, just the
429 size.
430 .RS
431 .BR global " tiny_array[10]" , " normal_array" , " big_array[50000]"
432 .RE
433 .\" XXX add statistics type here once it's supported
434
435 .SS STATEMENTS
436 Statements enable procedural control flow. They may occur within
437 functions and probe handlers. The total number of statements executed
438 in response to any single probe event is limited to some number
439 defined by a macro in the translated C code, and is in the
440 neighbourhood of 1000.
441 .TP
442 EXP
443 Execute the string- or integer-valued expression and throw away
444 the value.
445 .TP
446 .BR { " STMT1 STMT2 ... " }
447 Execute each statement in sequence in this block. Note that
448 separators or terminators are generally not necessary between statements.
449 .TP
450 .BR ;
451 Null statement, do nothing. It is useful as an optional separator between
452 statements to improve syntax-error detection and to handle certain
453 grammar ambiguities.
454 .TP
455 .BR if " (EXP) STMT1 [ " else " STMT2 ]"
456 Compare integer-valued EXP to zero. Execute the first (non-zero)
457 or second STMT (zero).
458 .TP
459 .BR while " (EXP) STMT"
460 While integer-valued EXP evaluates to non-zero, execute STMT.
461 .TP
462 .BR for " (EXP1; EXP2; EXP3) STMT"
463 Execute EXP1 as initialization. While EXP2 is non-zero, execute
464 STMT, then the iteration expression EXP3.
465 .TP
466 .BR foreach " (VAR " in " ARRAY [ "limit " EXP ]) STMT"
467 Loop over each element of the named global array, assigning current
468 key to VAR. The array may not be modified within the statement.
469 By adding a single
470 .BR + " or " \-
471 operator after the VAR or the ARRAY identifier, the iteration will
472 proceed in a sorted order, by ascending or descending index or value.
473 Using the optional
474 .BR limit
475 keyword limits the number of loop iterations to EXP times. EXP is
476 evaluated once at the beginning of the loop.
477 .TP
478 .BR foreach " ([VAR1, VAR2, ...] " in " ARRAY [ "limit " EXP ]) STMT"
479 Same as above, used when the array is indexed with a tuple of keys.
480 A sorting suffix may be used on at most one VAR or ARRAY identifier.
481 .TP
482 .BR break ", " continue
483 Exit or iterate the innermost nesting loop
484 .RB ( while " or " for " or " foreach )
485 statement.
486 .TP
487 .BR return " EXP"
488 Return EXP value from enclosing function. If the function's value is
489 not taken anywhere, then a return statement is not needed, and the
490 function will have a special "unknown" type with no return value.
491 .TP
492 .BR next
493 Return now from enclosing probe handler.
494 .TP
495 .BR delete " ARRAY[INDEX1, INDEX2, ...]"
496 Remove from ARRAY the element specified by the index tuple. The value will no
497 longer be available, and subsequent iterations will not report the element.
498 It is not an error to delete an element that does not exist.
499 .TP
500 .BR delete " ARRAY"
501 Remove all elements from ARRAY.
502 .TP
503 .BR delete " SCALAR"
504 Removes the value of SCALAR. Integers and strings are cleared to 0 and ""
505 respectively, while statistics are reset to the initial empty state.
506
507 .SS EXPRESSIONS
508 Systemtap supports a number of operators that have the same general syntax,
509 semantics, and precedence as in C and awk. Arithmetic is performed as per
510 typical C rules for signed integers. Division by zero or overflow is
511 detected and results in an error.
512 .TP
513 binary numeric operators
514 .B * / % + \- >> << & ^ | && ||
515 .TP
516 binary string operators
517 .B .
518 (string concatenation)
519 .TP
520 numeric assignment operators
521 .B = *= /= %= += \-= >>= <<= &= ^= |=
522 .TP
523 string assignment operators
524 .B = .=
525 .TP
526 unary numeric operators
527 .B + \- ! ~ ++ \-\-
528 .TP
529 binary numeric or string comparison operators
530 .B < > <= >= == !=
531 .TP
532 ternary operator
533 .RB cond " ? " exp1 " : " exp2
534 .TP
535 grouping operator
536 .BR ( " exp " )
537 .TP
538 function call
539 .RB "fn " ( "[ arg1, arg2, ... ]" )
540 .TP
541 array membership check
542 .RB exp " in " array
543 .br
544 .BR "[" exp1 ", " exp2 ", " ... "] in " array
545
546 .SS PROBES
547 The main construct in the scripting language identifies probes.
548 Probes associate abstract events with a statement block ("probe
549 handler") that is to be executed when any of those events occur. The
550 general syntax is as follows:
551 .SAMPLE
552 .BR probe " PROBEPOINT [" , " PROBEPOINT] " { " [STMT ...] " }
553 .ESAMPLE
554 .PP
555 Events are specified in a special syntax called "probe points". There
556 are several varieties of probe points defined by the translator, and
557 tapset scripts may define further ones using aliases. These are
558 listed in the
559 .IR stapprobes (3stap)
560 manual pages.
561 .PP
562 The probe handler is interpreted relative to the context of each
563 event. For events associated with kernel code, this context may
564 include
565 .I variables
566 defined in the
567 .I source code
568 at that spot. These "target variables" are presented to the script as
569 variables whose names are prefixed with "$". They may be accessed
570 only if the kernel's compiler preserved them despite optimization.
571 This is the same constraint that a debugger user faces when working
572 with optimized code. Some other events have very little context.
573 .PP
574 New probe points may be defined using "aliases". Probe point aliases
575 look similar to probe definitions, but instead of activating a probe
576 at the given point, it just defines a new probe point name as an alias
577 to an existing one. There are two types of alias, i.e. the prologue
578 style and the epilogue style which are identified by "=" and "+="
579 respectively.
580 .PP
581 For prologue style alias, the statement block that follows an alias
582 definition is implicitly added as a prologue to any probe that refers
583 to the alias. While for the epilogue style alias, the statement block
584 that follows an alias definition is implicitly added as an epilogue to
585 any probe that refers to the alias. For example:
586
587 .SAMPLE
588 probe syscall.read = kernel.function("sys_read") {
589 fildes = $fd
590 if (execname == "init") next # skip rest of probe
591 }
592 .ESAMPLE
593 defines a new probe point
594 .nh
595 .IR syscall.read ,
596 .hy
597 which expands to
598 .nh
599 .IR kernel.function("sys_read") ,
600 .hy
601 with the given statement as a prologue, which is useful to predefine
602 some variables for the alias user and/or to skip probe processing
603 entirely based on some conditions. And
604 .SAMPLE
605 probe syscall.read += kernel.function("sys_read") {
606 if (tracethis) println ($fd)
607 }
608 .ESAMPLE
609 defines a new probe point with the given statement as an epilogue, which
610 is useful to take actions based upon variables set or left over by the
611 the alias user.
612
613 An alias is used just like a built-in probe type.
614 .SAMPLE
615 probe syscall.read {
616 printf("reading fd=%d\n", fildes)
617 if (fildes > 10) tracethis = 1
618 }
619 .ESAMPLE
620
621 .SS FUNCTIONS
622 Systemtap scripts may define subroutines to factor out common work.
623 Functions take any number of scalar (integer or string) arguments, and
624 must return a single scalar (integer or string). An example function
625 declaration looks like this:
626 .SAMPLE
627 function thisfn (arg1, arg2) {
628 return arg1 + arg2
629 }
630 .ESAMPLE
631 Note the general absence of type declarations, which are instead
632 inferred by the translator. However, if desired, a function
633 definition may include explicit type declarations for its return value
634 and/or its arguments. This is especially helpful for embedded-C
635 functions. In the following example, the type inference engine need
636 only infer type type of arg2 (a string).
637 .SAMPLE
638 function thatfn:string (arg1:long, arg2) {
639 return sprint(arg1) . arg2
640 }
641 .ESAMPLE
642 Functions may call others or themselves
643 recursively, up to a fixed nesting limit. This limit is defined by
644 a macro in the translated C code and is in the neighbourhood of 10.
645
646 .SS PRINTING
647 There are a set of function names that are specially treated by the
648 translator. They format values for printing to the standard systemtap
649 output stream in a more convenient way. The
650 .IR sprint*
651 variants return the formatted string instead of printing it.
652 .TP
653 .BR print ", " sprint
654 Print one or more values of any type, concatenated directly together.
655 .TP
656 .BR println ", " sprintln
657 Print values like
658 .IR print " and " sprint ,
659 but also append a newline.
660 .TP
661 .BR printd ", " sprintd
662 Take a string delimiter and two or more values of any type, and print the
663 values with the delimiter interposed. The delimiter must be a literal
664 string constant.
665 .TP
666 .BR printdln ", " sprintdln
667 Print values with a delimiter like
668 .IR printd " and " sprintd ,
669 but also append a newline.
670 .TP
671 .BR printf ", " sprintf
672 Take a formatting string and a number of values of corresponding types,
673 and print them all. The format must be a literal string constant.
674 .PP
675 The
676 .IR printf
677 formatting directives similar to those of C, except that they are
678 fully type-checked by the translator:
679 .RS
680 .TP
681 %b
682 Writes a binary blob of the value given, instead of ASCII text. The width specifier determines the number of bytes to write; valid specifiers are %b %1b %2b %4b %8b. Default (%b) is 8 bytes.
683 .TP
684 %c
685 Character.
686 .TP
687 %d,%i
688 Signed decimal.
689 .TP
690 %m
691 Safely reads kernel memory at the given address, outputs its content. The precision specifier determines the number of bytes to read. Default is 1 byte.
692 .TP
693 %M
694 Same as %m, but outputs in hexadecimal. The minimal size of output is double the precision specifier.
695 .TP
696 %o
697 Unsigned octal.
698 .TP
699 %p
700 Unsigned pointer address.
701 .TP
702 %s
703 String.
704 .TP
705 %u
706 Unsigned decimal.
707 .TP
708 %x
709 Unsigned hex value, in all lower-case.
710 .TP
711 %X
712 Unsigned hex value, in all upper-case.
713 .TP
714 %%
715 Writes a %.
716 .RE
717 .PP
718 Examples:
719 .SAMPLE
720 a = "alice", b = "bob", p = 0x1234abcd, i = 123, j = -1, id[a] = 1234, id[b] = 4567
721 print("hello")
722 Prints: hello
723 println(b)
724 Prints: bob\\n
725 println(a . " is " . sprint(16))
726 Prints: alice is 16
727 foreach (name in id) printdln("|", strlen(name), name, id[name])
728 Prints: 5|alice|1234\\n3|bob|4567
729 printf("%c is %s; %x or %X or %p; %d or %u\\n",97,a,p,p,p,j,j)
730 Prints: a is alice; 1234abcd or 1234ABCD or 0x1234abcd; -1 or 18446744073709551615\\n
731 printf("2 bytes of kernel buffer at address %p: %2m", p, p)
732 Prints: 2 byte of kernel buffer at address 0x1234abcd: <binary data>
733 printf("%4b", p)
734 Prints (these values as binary data): 0x1234abcd
735 .ESAMPLE
736
737 .SS STATISTICS
738 It is often desirable to collect statistics in a way that avoids the
739 penalties of repeatedly exclusive locking the global variables those
740 numbers are being put into. Systemtap provides a solution using a
741 special operator to accumulate values, and several pseudo-functions to
742 extract the statistical aggregates.
743 .PP
744 The aggregation operator is
745 .IR <<< ,
746 and resembles an assignment, or a C++ output-streaming operation.
747 The left operand specifies a scalar or array-index lvalue, which must
748 be declared global. The right operand is a numeric expression. The
749 meaning is intuitive: add the given number to the pile of numbers to
750 compute statistics of. (The specific list of statistics to gather
751 is given separately, by the extraction functions.)
752 .SAMPLE
753 foo <<< 1
754 stats[pid()] <<< memsize
755 .ESAMPLE
756 .PP
757 The extraction functions are also special. For each appearance of a
758 distinct extraction function operating on a given identifier, the
759 translator arranges to compute a set of statistics that satisfy it.
760 The statistics system is thereby "on-demand". Each execution of
761 an extraction function causes the aggregation to be computed for
762 that moment across all processors.
763 .PP
764 Here is the set of extractor functions. The first argument of each is
765 the same style of lvalue used on the left hand side of the accumulate
766 operation. The
767 .IR @count(v) ", " @sum(v) ", " @min(v) ", " @max(v) ", " @avg(v)
768 extractor functions compute the number/total/minimum/maximum/average
769 of all accumulated values. The resulting values are all simple
770 integers.
771 .PP
772 Histograms are also available, but are more complicated because they
773 have a vector rather than scalar value.
774 .I @hist_linear(v,start,stop,interval)
775 represents a linear histogram from "start" to "stop" by increments
776 of "interval". The interval must be positive. Similarly,
777 .I @hist_log(v)
778 represents a base-2 logarithmic histogram. Printing a histogram
779 with the
780 .I print
781 family of functions renders a histogram object as a tabular
782 "ASCII art" bar chart.
783 .SAMPLE
784 probe foo {
785 x <<< $value
786 }
787 probe end {
788 printf ("avg %d = sum %d / count %d\\n",
789 @avg(x), @sum(x), @count(x))
790 print (@hist_log(v))
791 }
792 .ESAMPLE
793
794 .SS TYPECASTING
795 Once a pointer has been saved into a script integer variable, the
796 translator loses the type information necessary to access members from
797 that pointer. Using the
798 .I @cast()
799 operator tells the translator how to read a pointer.
800 .SAMPLE
801 @cast(p, "type_name"[, "module"])->member
802 .ESAMPLE
803 .PP
804 This will interpret
805 .I p
806 as a pointer to a struct/union named
807 .I type_name
808 and dereference the
809 .I member
810 value. The optional
811 .I module
812 tells the translator where to look for information about that type.
813 Multiple modules may be specified as a list with
814 .IR :
815 separators. If the module is not specified, it will default either to
816 the probe module for dwarf probes, or to "kernel" for functions and all
817 other probes types.
818 .PP
819 The translator can create its own module with type information from a header
820 surrounded by angle brackets, in case normal debuginfo is not available. For
821 kernel headers, prefix it with "kernel" to use the appropriate build system.
822 All other headers are build with default GCC parameters into a user module.
823 .SAMPLE
824 @cast(tv, "timeval", "<sys/time.h>")->tv_sec
825 @cast(task, "task_struct", "kernel<linux/sched.h>")->tgid
826 .ESAMPLE
827 .PP
828 When in guru mode, the translator will also allow scripts to assign new
829 values to members of typecasted pointers.
830 .PP
831 Typecasting is also useful in the case of
832 .I void*
833 members whose type may be determinable at runtime.
834 .SAMPLE
835 probe foo {
836 if ($var->type == 1) {
837 value = @cast($var->data, "type1")->bar
838 } else {
839 value = @cast($var->data, "type2")->baz
840 }
841 print(value)
842 }
843 .ESAMPLE
844
845 .SS EMBEDDED C
846 When in guru mode, the translator accepts embedded code in the
847 script. Such code is enclosed between
848 .IR %{
849 and
850 .IR %}
851 markers, and is transcribed verbatim, without analysis, in some
852 sequence, into the generated C code. At the outermost level, this may
853 be useful to add
854 .IR #include
855 instructions, and any auxiliary definitions for use by other embedded
856 code.
857 .PP
858 The other place where embedded code is permitted is as a function body.
859 In this case, the script language body is replaced entirely by a piece
860 of C code enclosed again between
861 .IR %{ " and " %}
862 markers.
863 This C code may do anything reasonable and safe. There are a number
864 of undocumented but complex safety constraints on atomicity,
865 concurrency, resource consumption, and run time limits, so this
866 is an advanced technique.
867 .PP
868 The memory locations set aside for input and output values
869 are made available to it using a macro
870 .IR THIS .
871 Here are some examples:
872 .SAMPLE
873 function add_one (val) %{
874 THIS\->__retvalue = THIS\->val + 1;
875 %}
876 function add_one_str (val) %{
877 strlcpy (THIS\->__retvalue, THIS\->val, MAXSTRINGLEN);
878 strlcat (THIS\->__retvalue, "one", MAXSTRINGLEN);
879 %}
880 .ESAMPLE
881 The function argument and return value types have to be inferred by
882 the translator from the call sites in order for this to work. The
883 user should examine C code generated for ordinary script-language
884 functions in order to write compatible embedded-C ones.
885
886 .SS BUILT-INS
887 A set of builtin functions and probe point aliases are provided
888 by the scripts installed under the
889 .nh
890 .IR @prefix@/share/systemtap/tapset
891 .hy
892 directory. These are described in the
893 .IR stapfuncs "(3stap) and " stapprobes (3stap)
894 manual pages.
895
896 .SH PROCESSING
897 The translator begins pass 1 by parsing the given input script,
898 and all scripts (files named
899 .IR *.stp )
900 found in a tapset directory. The directories listed
901 with
902 .BR \-I
903 are processed in sequence, each processed in "guru mode". For each
904 directory, a number of subdirectories are also searched. These
905 subdirectories are derived from the selected kernel version (the
906 .BR \-R
907 option),
908 in order to allow more kernel-version-specific scripts to override less
909 specific ones. For example, for a kernel version
910 .IR 2.6.12\-23.FC3
911 the following patterns would be searched, in sequence:
912 .IR 2.6.12\-23.FC3/*.stp ,
913 .IR 2.6.12/*.stp ,
914 .IR 2.6/*.stp ,
915 and finally
916 .IR *.stp
917 Stopping the translator after pass 1 causes it to print the parse trees.
918
919 .PP
920 In pass 2, the translator analyzes the input script to resolve symbols
921 and types. References to variables, functions, and probe aliases that
922 are unresolved internally are satisfied by searching through the
923 parsed tapset scripts. If any tapset script is selected because it
924 defines an unresolved symbol, then the entirety of that script is
925 added to the translator's resolution queue. This process iterates
926 until all symbols are resolved and a subset of tapset scripts is
927 selected.
928 .PP
929 Next, all probe point descriptions are validated
930 against the wide variety supported by the translator. Probe points that
931 refer to code locations ("synchronous probe points") require the
932 appropriate kernel debugging information to be installed. In the
933 associated probe handlers, target-side variables (whose names begin
934 with "$") are found and have their run-time locations decoded.
935 .PP
936 Next, all probes and functions are analyzed for optimization
937 opportunities, in order to remove variables, expressions, and
938 functions that have no useful value and no side-effect. Embedded-C
939 functions are assumed to have side-effects unless they include the
940 magic string
941 .BR /*\ pure\ */ .
942 Since this optimization can hide latent code errors such as type
943 mismatches or invalid $target variables, it sometimes may be useful
944 to disable the optimizations with the
945 .BR \-u
946 option.
947 .PP
948 Finally, all variable, function, parameter, array, and index types are
949 inferred from context (literals and operators). Stopping the
950 translator after pass 2 causes it to list all the probes, functions,
951 and variables, along with all inferred types. Any inconsistent or
952 unresolved types cause an error.
953
954 .PP
955 In pass 3, the translator writes C code that represents the actions
956 of all selected script files, and creates a
957 .IR Makefile
958 to build that into a kernel object. These files are placed into a
959 temporary directory. Stopping the translator at this point causes
960 it to print the contents of the C file.
961
962 .PP
963 In pass 4, the translator invokes the Linux kernel build system to
964 create the actual kernel object file. This involves running
965 .IR make
966 in the temporary directory, and requires a kernel module build
967 system (headers, config and Makefiles) to be installed in the usual
968 spot
969 .IR /lib/modules/VERSION/build .
970 Stopping the translator after pass 4 is the last chance before
971 running the kernel object. This may be useful if you want to
972 archive the file.
973
974 .PP
975 In pass 5, the translator invokes the systemtap auxiliary program
976 .I staprun
977 program for the given kernel object. This program arranges to load
978 the module then communicates with it, copying trace data from the
979 kernel into temporary files, until the user sends an interrupt signal.
980 Any run-time error encountered by the probe handlers, such as running
981 out of memory, division by zero, exceeding nesting or runtime limits,
982 results in a soft error indication. Soft errors in excess of
983 MAXERRORS block of all subsequent probes (except error-handling
984 probes), and terminate the session. Finally,
985 .I staprun
986 unloads the module, and cleans up.
987
988 .SS ABNORMAL TERMINATION
989
990 One should avoid killing the stap process forcibly, for example with
991 SIGKILL, because the stapio process (a child process of the stap
992 process) and the loaded module may be left running on the system. If
993 this happens, send SIGTERM or SIGINT to any remaining stapio
994 processes, then use rmmod to unload the systemtap module.
995
996
997 .SH EXAMPLES
998 See the
999 .IR stapex (3stap)
1000 manual page for a collection of samples.
1001
1002 .SH CACHING
1003 The systemtap translator caches the pass 3 output (the generated C
1004 code) and the pass 4 output (the compiled kernel module) if pass 4
1005 completes successfully. This cached output is reused if the same
1006 script is translated again assuming the same conditions exist (same kernel
1007 version, same systemtap version, etc.). Cached files are stored in
1008 the
1009 .I $SYSTEMTAP_DIR/cache
1010 directory. The cache can be limited by having the file
1011 .I cache_mb_limit
1012 placed in the cache directory (shown above) containing only an ASCII
1013 integer representing how many MiB the cache should not exceed. Note that
1014 this is a 'soft' limit in that the cache will be cleaned after a new entry
1015 is added, so the total cache size may temporarily exceed this limit. In the
1016 absence of this file, a default will be created with the limit set to 64MiB.
1017
1018 .SH SAFETY AND SECURITY
1019 Systemtap is an administrative tool. It exposes kernel internal data
1020 structures and potentially private user information.
1021 It acquires
1022 either root privileges
1023
1024 To actually run the kernel objects it builds, a user must be one of
1025 the following:
1026 .IP \(bu 4
1027 the root user;
1028 .IP \(bu 4
1029 a member of the
1030 .I stapdev
1031 group; or
1032 .IP \(bu 4
1033 a member of the
1034 .I stapusr
1035 group. Members of the
1036 .I stapusr
1037 group can only use modules located in
1038 the /lib/modules/VERSION/systemtap directory. This directory
1039 must be owned by root and not be world writable.
1040 .PP
1041 The kernel modules generated by
1042 .I stap
1043 program are run by the
1044 .IR staprun
1045 program. The latter is a part of the Systemtap package, dedicated to
1046 module loading and unloading (but only in the white zone), and
1047 kernel-to-user data transfer. Since
1048 .IR staprun
1049 does not perform any additional security checks on the kernel objects
1050 it is given, it would be unwise for a system administrator to add
1051 untrusted users to the
1052 .I stapdev
1053 or
1054 .I stapusr
1055 groups.
1056 .PP
1057 The translator asserts certain safety constraints. It aims to ensure
1058 that no handler routine can run for very long, allocate memory,
1059 perform unsafe operations, or in unintentionally interfere with the
1060 kernel. Use of script global variables is suitably locked to protect
1061 against manipulation by concurrent probe handlers. Use of guru mode
1062 constructs such as embedded C can violate these constraints, leading
1063 to kernel crash or data corruption.
1064 .PP
1065 The resource use limits are set by macros in the generated C code.
1066 These may be overridden with the
1067 .BR \-D
1068 flag. A selection of these is as follows:
1069 .TP
1070 MAXNESTING
1071 Maximum number of nested function calls. Default determined by
1072 script analysis, with a bonus 10 slots added for recursive
1073 scripts.
1074 .TP
1075 MAXSTRINGLEN
1076 Maximum length of strings, default 128.
1077 .TP
1078 MAXTRYLOCK
1079 Maximum number of iterations to wait for locks on global variables
1080 before declaring possible deadlock and skipping the probe, default 1000.
1081 .TP
1082 MAXACTION
1083 Maximum number of statements to execute during any single probe hit
1084 (with interrupts disabled),
1085 default 1000.
1086 .TP
1087 MAXACTION_INTERRUPTIBLE
1088 Maximum number of statements to execute during any single probe hit
1089 which is executed with interrupts enabled (such as begin/end probes),
1090 default (MAXACTION * 10).
1091 .TP
1092 MAXMAPENTRIES
1093 Maximum number of rows in any single global array, default 2048.
1094 .TP
1095 MAXERRORS
1096 Maximum number of soft errors before an exit is triggered, default 0, which
1097 means that the first error will exit the script.
1098 .TP
1099 MAXSKIPPED
1100 Maximum number of skipped probes before an exit is triggered, default 100.
1101 Running systemtap with \-t (timing) mode gives more details about skipped
1102 probes. With the default \-DINTERRUPTIBLE=1 setting, probes skipped due to
1103 reentrancy are not accumulated against this limit.
1104 .TP
1105 MINSTACKSPACE
1106 Minimum number of free kernel stack bytes required in order to
1107 run a probe handler, default 1024. This number should be large enough
1108 for the probe handler's own needs, plus a safety margin.
1109 .TP
1110 MAXUPROBES
1111 Maximum number of concurrently armed user-space probes (uprobes), default
1112 somewhat larger than the number of user-space probe points named in the script.
1113 This pool needs to be potentialy large because individual uprobe objects (about
1114 64 bytes each) are allocated for each process for each matching script-level probe.
1115
1116 .PP
1117 With scripts that contain probes on any interrupt path, it is possible that
1118 those interrupts may occur in the middle of another probe handler. The probe
1119 in the interrupt handler would be skipped in this case to avoid reentrance.
1120 To work around this issue, execute stap with the option
1121 .BR \-DINTERRUPTIBLE=0
1122 to mask interrupts throughout the probe handler. This does add some extra
1123 overhead to the probes, but it may prevent reentrance for common problem
1124 cases. However, probes in NMI handlers and in the callpath of the stap
1125 runtime may still be skipped due to reentrance.
1126
1127 .PP
1128 Multiple scripts can write data into a relay buffer concurrently. A host
1129 script provides an interface for accessing its relay buffer to guest scripts.
1130 Then, the output of the guests are merged into the output of the host.
1131 To run a script as a host, execute stap with
1132 .BR \-DRELAYHOST[=name]
1133 option. The
1134 .BR name
1135 identifies your host script among several hosts.
1136 While running the host, execute stap with
1137 .BR \-DRELAYGUEST[=name]
1138 to add a guest script to the host.
1139 Note that you must unload guests before unloading a host. If there are some
1140 guests connected to the host, unloading the host will be failed.
1141
1142 .PP
1143 In case something goes wrong with
1144 .IR stap " or " staprun
1145 after a probe has already started running, one may safely kill both
1146 user processes, and remove the active probe kernel module with
1147 .IR rmmod .
1148 Any pending trace messages may be lost.
1149
1150 .PP
1151 In addition to the methods outlined above, the generated kernel module
1152 also uses overload processing to make sure that probes can't run for
1153 too long. If more than STP_OVERLOAD_THRESHOLD cycles (default
1154 500000000) have been spent in all the probes on a single cpu during
1155 the last STP_OVERLOAD_INTERVAL cycles (default 1000000000), the probes
1156 have overloaded the system and an exit is triggered.
1157 .PP
1158 By default, overload processing is turned on for all modules. If you
1159 would like to disable overload processing, define STP_NO_OVERLOAD.
1160
1161 .SH MAKING DO WITH SYMBOL TABLES
1162 Systemtap performs best when it has access to the debugging information
1163 associated with your kernel and modules.
1164 However, if this information is not available,
1165 systemtap can still support probing of function entries and returns
1166 using symbols read from vmlinux and/or the modules in /lib/modules.
1167 Systemtap can also read the kernel symbol table from a text file
1168 such as /boot/System.map or /proc/kallsyms.
1169 See the
1170 .B \-\-kelf
1171 and
1172 .B \-\-kmap
1173 options.
1174 .PP
1175 If systemtap finds relevant debugging information,
1176 it will use it even if you specify
1177 .B \-\-kelf
1178 or
1179 .BR \-\-kmap .
1180 .PP
1181 Without debugging information, systemtap cannot support the
1182 following types of language constructs:
1183 .IP \(bu 4
1184 probe specifications that refer to source files or line numbers
1185 .IP \(bu 4
1186 probe specifications that refer to inline functions
1187 .IP \(bu 4
1188 statements that refer to $target variables
1189 .IP \(bu 4
1190 statements that refer to @cast() variables
1191 .IP \(bu 4
1192 tapset-defined variables defined using any of the above constructs.
1193 In particular, at this writing,
1194 the prologue blocks for certain aliases in the syscall tapset
1195 (e.g., syscall.open) contain "if" statements that refer to $target variables.
1196 If your script refers to any such aliases,
1197 systemtap must have access to the kernel's debugging information.
1198 .PP
1199 Most T and t symbols correspond to function entry points, but some do not.
1200 Based only on the symbol table, systemtap cannot tell the difference.
1201 Placing return probes on symbols that aren't entry points
1202 will most likely lead to kernel stack corruption.
1203
1204 .SH FILES
1205 .\" consider autoconf-substituting these directories
1206 .TP
1207 ~/.systemtap
1208 Systemtap data directory for cached systemtap files, unless overridden
1209 by the
1210 .I SYSTEMTAP_DIR
1211 environment variable.
1212 .TP
1213 /tmp/stapXXXXXX
1214 Temporary directory for systemtap files, including translated C code
1215 and kernel object.
1216 .TP
1217 @prefix@/share/systemtap/tapset
1218 The automatic tapset search directory, unless overridden by
1219 the
1220 .I SYSTEMTAP_TAPSET
1221 environment variable.
1222 .TP
1223 @prefix@/share/systemtap/runtime
1224 The runtime sources, unless overridden by the
1225 .I SYSTEMTAP_RUNTIME
1226 environment variable.
1227 .TP
1228 /lib/modules/VERSION/build
1229 The location of kernel module building infrastructure.
1230 .TP
1231 @prefix@/lib/debug/lib/modules/VERSION
1232 The location of kernel debugging information when packaged into the
1233 .IR kernel\-debuginfo
1234 RPM, unless overridden by the
1235 .I SYSTEMTAP_DEBUGINFO_PATH
1236 environment variable. The default value for this variable is
1237 .IR \+:.debug:/usr/lib/debug:build .
1238 Elfutils searches vmlinux in this path and it interprets the path as a base
1239 directory of which various subdirectories will be searched for finding modules.
1240 .TP
1241 @prefix@/bin/staprun
1242 The auxiliary program supervising module loading, interaction, and
1243 unloading.
1244
1245 .SH SEE ALSO
1246 .IR stapprobes (3stap),
1247 .IR stapfuncs (3stap),
1248 .IR stapvars (3stap),
1249 .IR stapex (3stap),
1250 .IR awk (1),
1251 .IR gdb (1)
1252
1253 .SH BUGS
1254 Use the Bugzilla link off of the project web page or our mailing list.
1255 .nh
1256 .BR http://sources.redhat.com/systemtap/ , <systemtap@sources.redhat.com> .
1257 .hy
This page took 0.090266 seconds and 5 git commands to generate.