]> sourceware.org Git - systemtap.git/blob - stap.1.in
Version bumps for 0.9.6 release
[systemtap.git] / stap.1.in
1 .\" -*- nroff -*-
2 .TH STAP 1 @DATE@ "Red Hat"
3 .SH NAME
4 stap \- systemtap script translator/driver
5
6 .\" macros
7 .de SAMPLE
8 .br
9 .RS
10 .nf
11 .nh
12 ..
13 .de ESAMPLE
14 .hy
15 .fi
16 .RE
17 ..
18
19 .SH SYNOPSIS
20
21 .br
22 .B stap
23 [
24 .I OPTIONS
25 ]
26 .I FILENAME
27 [
28 .I ARGUMENTS
29 ]
30 .br
31 .B stap
32 [
33 .I OPTIONS
34 ]
35 .B \-
36 [
37 .I ARGUMENTS
38 ]
39 .br
40 .B stap
41 [
42 .I OPTIONS
43 ]
44 .BI \-e " SCRIPT"
45 [
46 .I ARGUMENTS
47 ]
48 .br
49 .B stap
50 [
51 .I OPTIONS
52 ]
53 .BI \-l " PROBE"
54 [
55 .I ARGUMENTS
56 ]
57 .br
58 .B stap
59 [
60 .I OPTIONS
61 ]
62 .BI \-L " PROBE"
63 [
64 .I ARGUMENTS
65 ]
66
67 .SH DESCRIPTION
68
69 The
70 .IR stap
71 program is the front-end to the Systemtap tool. It accepts probing
72 instructions (written in a simple scripting language), translates
73 those instructions into C code, compiles this C code, and loads the
74 resulting kernel module into a running Linux kernel to perform the
75 requested system trace/probe functions. You can supply the script in
76 a named file, from standard input, or from the command line. The
77 program runs until it is interrupted by the user, or if the script
78 voluntarily invokes the
79 .I exit()
80 function, or by sufficient number of soft errors.
81 .PP
82 The language, which is described in a later section, is strictly typed,
83 declaration free, procedural, and inspired by
84 .IR awk .
85 It allows source code points or events in the kernel to be associated
86 with handlers, which are subroutines that are executed synchronously. It is
87 somewhat similar conceptually to "breakpoint command lists" in the
88 .IR gdb
89 debugger.
90 .PP
91 This manual corresponds to version @VERSION@.
92
93 .SH OPTIONS
94 The systemtap translator supports the following options. Any other option
95 prints a list of supported options.
96 .TP
97 .B \-h
98 Show help message.
99 .TP
100 .B \-V
101 Show version message.
102 .TP
103 .BI \-p " NUM"
104 Stop after pass NUM. The passes are numbered 1-5: parse, elaborate,
105 translate, compile, run. See the
106 .B PROCESSING
107 section for details.
108 .TP
109 .B \-v
110 Increase verbosity for all passes. Produce a larger volume of
111 informative (?) output each time option repeated.
112 .TP
113 .B \-\-vp ABCDE
114 Increase verbosity on a per-pass basis. For example, "\-\-vp\ 002"
115 adds 2 units of verbosity to pass 3 only. The combination "\-v\ \-\-vp\ 00004"
116 adds 1 unit of verbosity for all passes, and 4 more for pass 5.
117 .TP
118 .B \-k
119 Keep the temporary directory after all processing. This may be useful
120 in order to examine the generated C code, or to reuse the compiled
121 kernel object.
122 .TP
123 .B \-g
124 Guru mode. Enable parsing of unsafe expert-level constructs like
125 embedded C.
126 .TP
127 .B \-P
128 Prologue-searching mode. Activate heuristics to work around incorrect
129 debugging information for $target variables.
130 .TP
131 .B \-u
132 Unoptimized mode. Disable unused code elision during elaboration.
133 .TP
134 .B \-w
135 Suppressed warnings mode. Disable warning messages for elided code in user script.
136 .TP
137 .BI \-b
138 Use bulk mode (percpu files) for kernel-to-user data transfer.
139 .TP
140 .B \-t
141 Collect timing information on the number of times probe executes
142 and average amount of time spent in each probe.
143 .TP
144 .BI \-s NUM
145 Use NUM megabyte buffers for kernel-to-user data transfer. On a
146 multiprocessor in bulk mode, this is a per-processor amount.
147 .TP
148 .BI \-I " DIR"
149 Add the given directory to the tapset search directory. See the
150 description of pass 2 for details.
151 .TP
152 .BI \-D " NAME=VALUE"
153 Add the given C preprocessor directive to the module Makefile. These can
154 be used to override limit parameters described below.
155 .TP
156 .BI \-R " DIR"
157 Look for the systemtap runtime sources in the given directory.
158 .TP
159 .BI \-r " /DIR"
160 Build for kernel in given build tree. Can also be set with the
161 .I SYSTEMTAP_RELEASE
162 environment variable.
163 .TP
164 .BI \-r " RELEASE"
165 Build for kernel in build tree
166 .BR /lib/modules/RELEASE/build .
167 Can also be set with the
168 .I SYSTEMTAP_RELEASE
169 environment variable.
170 .TP
171 .BI \-m " MODULE"
172 Use the given name for the generated kernel object module, instead
173 of a unique randomized name. The generated kernel object module is
174 copied to the current directory.
175 .TP
176 .BI \-d " MODULE"
177 Add symbol/unwind information for the given module into the kernel object
178 module. This may enable symbolic tracebacks from those modules/programs,
179 even if they do not have an explicit probe placed into them.
180 .TP
181 .BI \-o " FILE"
182 Send standard output to named file. In bulk mode, percpu files will
183 start with FILE_ (FILE_cpu with -F) followed by the cpu number.
184 This supports strftime(3) formats for FILE.
185 .TP
186 .BI \-c " CMD"
187 Start the probes, run CMD, and exit when CMD finishes.
188 .TP
189 .BI \-x " PID"
190 Sets target() to PID. This allows scripts to be written that filter on
191 a specific process.
192 .TP
193 .BI \-l " PROBE"
194 Instead of running a probe script, just list all available probe
195 points matching the given pattern. The pattern may include wildcards
196 and aliases.
197 .TP
198 .BI \-L " PROBE"
199 Similar to "-l", but list probe points and script-level local variables.
200 .TP
201 .BI \-F
202 Without -o option, load module and start probes, then detach from the module
203 leaving the probes running.
204 With -o option, run staprun in background as a daemon and show its pid.
205 .TP
206 .BI \-S " size[,N]"
207 Sets the maximum size of output file and the maximum number of output files.
208 If the size of output file will exceed
209 .B size
210 , systemtap switches output file to the next file. And if the number of
211 output files exceed
212 .B N
213 , systemtap removes the oldest output file. You can omit the second argument.
214 .TP
215 .B \-\-kelf
216 For names and addresses of functions to probe,
217 consult the symbol tables in the kernel and modules.
218 This can be useful if your kernel and/or modules were compiled
219 without debugging information, or the function you want to probe
220 is in an assembly-language file built without debugging information.
221 See the
222 .B "MAKING DO WITH SYMBOL TABLES"
223 section for more information.
224 .TP
225 .BI \-\-kmap [=FILE]
226 For names and addresses of kernel functions to probe,
227 consult the symbol table in the indicated text file.
228 The default is /boot/System.map-VERSION.
229 The contents of this file should be in the form of the default output from
230 .IR nm (1).
231 Only symbols of type T or t are used.
232 If you specify /proc/kallsyms or some other file in that format,
233 where lines for module symbols contain a fourth column,
234 reading of the symbol table stops with the first module symbol
235 (which should be right after the last kernel symbol).
236 As with
237 .BR \-\-kelf ,
238 the symbol table in each module's .ko file will also be consulted.
239 See the
240 .B "MAKING DO WITH SYMBOL TABLES"
241 section for more information.
242 .TP
243 .B \-\-ignore\-vmlinux
244 For testing, act as though neither the uncompressed kernel (vmlinux)
245 nor the kernel debugging information can be found.
246 .TP
247 .B \-\-ignore\-dwarf
248 For testing, act as though vmlinux and modules lack debugging information.
249 .TP
250 .B \-\-skip\-badvars
251 Ignore out of context variables and substitute with literal 0.
252
253 .SH ARGUMENTS
254
255 Any additional arguments on the command line are passed to the script
256 parser for substitution. See below.
257
258 .SH SCRIPT LANGUAGE
259
260 The systemtap script language resembles
261 .IR awk .
262 There are two main outermost constructs: probes and functions. Within
263 these, statements and expressions use C-like operator syntax and
264 precedence.
265
266 .SS GENERAL SYNTAX
267 Whitespace is ignored. Three forms of comments are supported:
268 .RS
269 .br
270 .BR # " ... shell style, to the end of line, except for $# and @#"
271 .br
272 .BR // " ... C++ style, to the end of line"
273 .br
274 .BR /* " ... C style ... " */
275 .RE
276 Literals are either strings enclosed in double-quotes (passing through
277 the usual C escape codes with backslashes), or integers (in decimal,
278 hexadecimal, or octal, using the same notation as in C). All strings
279 are limited in length to some reasonable value (a few hundred bytes).
280 Integers are 64-bit signed quantities, although the parser also accepts
281 (and wraps around) values above positive 2**63.
282 .PP
283 In addition, script arguments given at the end of the command line may
284 be inserted. Use
285 .B $1 ... $<NN>
286 for insertion unquoted,
287 .B @1 ... @<NN>
288 for insertion as a string literal. The number of arguments may be accessed
289 through
290 .B $#
291 (as an unquoted number) or through
292 .B @#
293 (as a quoted number). These may be used at any place a token may begin,
294 including within the preprocessing stage. Reference to an argument
295 number beyond what was actually given is an error.
296
297 .SS PREPROCESSING
298 A simple conditional preprocessing stage is run as a part of parsing.
299 The general form is similar to the
300 .RB cond " ? " exp1 " : " exp2
301 ternary operator:
302 .SAMPLE
303 .BR %( " CONDITION " %? " TRUE-TOKENS " %)
304 .BR %( " CONDITION " %? " TRUE-TOKENS " %: " FALSE-TOKENS " %)
305 .ESAMPLE
306 The CONDITION is either an expression whose format is determined by its
307 first keyword, or a string literals comparison or a numeric literals
308 comparison.
309 .PP
310 If the first part is the identifier
311 .BR kernel_vr " or " kernel_v
312 to refer to the kernel version number, with ("2.6.13\-1.322FC3smp") or
313 without ("2.6.13") the release code suffix, then
314 the second part is one of the six standard numeric comparison operators
315 .BR < ", " <= ", " == ", " != ", " > ", and " >= ,
316 and the third part is a string literal that contains an RPM-style
317 version-release value. The condition is deemed satisfied if the
318 version of the target kernel (as optionally overridden by the
319 .BR \-r
320 option) compares to the given version string. The comparison is
321 performed by the glibc function
322 .BR strverscmp .
323 As a special case, if the operator is for simple equality
324 .RB ( == ),
325 or inequality
326 .RB ( != ),
327 and the third part contains any wildcard characters
328 .RB ( * " or " ? " or " [ "),"
329 then the expression is treated as a wildcard (mis)match as evaluated
330 by
331 .BR fnmatch .
332 .PP
333 If, on the other hand, the first part is the identifier
334 .BR arch
335 to refer to the processor architecture, then the second part
336 then the second part is one of the two string comparison operators
337 .BR == " or " != ,
338 and the third part is a string literal for matching it. This
339 comparison is a wildcard (mis)match.
340 .PP
341 Otherwise, the CONDITION is expected to be a comparison between two string
342 literals or two numeric literals. In this case, the arguments are the only
343 variables usable.
344 .PP
345 The TRUE-TOKENS and FALSE-TOKENS are zero or more general parser
346 tokens (possibly including nested preprocessor conditionals), and are
347 pasted into the input stream if the condition is true or false. For
348 example, the following code induces a parse error unless the target
349 kernel version is newer than 2.6.5:
350 .SAMPLE
351 %( kernel_v <= "2.6.5" %? **ERROR** %) # invalid token sequence
352 .ESAMPLE
353 The following code might adapt to hypothetical kernel version drift:
354 .SAMPLE
355 probe kernel.function (
356 %( kernel_v <= "2.6.12" %? "__mm_do_fault" %:
357 %( kernel_vr == "2.6.13*smp" %? "do_page_fault" %:
358 UNSUPPORTED %) %)
359 ) { /* ... */ }
360
361 %( arch == "ia64" %?
362 probe syscall.vliw = kernel.function("vliw_widget") {}
363 %)
364 .ESAMPLE
365
366 .SS VARIABLES
367 Identifiers for variables and functions are an alphanumeric sequence,
368 and may include "_" and "$" characters. They may not start with a
369 plain digit, as in C. Each variable is by default local to the probe
370 or function statement block within which it is mentioned, and therefore
371 its scope and lifetime is limited to a particular probe or function
372 invocation.
373 .\" XXX add statistics type here once it's supported
374 .PP
375 Scalar variables are implicitly typed as either string or integer.
376 Associative arrays also have a string or integer value, and a
377 a tuple of strings and/or integers serving as a key. Here are a
378 few basic expressions.
379 .SAMPLE
380 var1 = 5
381 var2 = "bar"
382 array1 [pid()] = "name" # single numeric key
383 array2 ["foo",4,i++] += 5 # vector of string/num/num keys
384 if (["hello",5,4] in array2) println ("yes") # membership test
385 .ESAMPLE
386 .PP
387 The translator performs
388 .I type inference
389 on all identifiers, including array indexes and function parameters.
390 Inconsistent type-related use of identifiers signals an error.
391 .PP
392 Variables may be declared global, so that they are shared amongst all
393 probes and live as long as the entire systemtap session. There is one
394 namespace for all global variables, regardless of which script file
395 they are found within. A global declaration may be written at the
396 outermost level anywhere, not within a block of code. Global
397 variables which are written but never read will be displayed
398 automatically at session shutdown. The following
399 declaration marks a few variables as global. The translator will
400 infer for each its value type, and if it is used as an array, its key
401 types. Optionally, scalar globals may be initialized with a string
402 or number literal.
403 .RS
404 .BR global " var1" , " var2" , " var3=4"
405 .RE
406 .PP
407 Arrays are limited in size by the MAXMAPENTRIES variable -- see the
408 .B SAFETY AND SECURITY
409 section for details. Optionally, global arrays may be declared with a
410 maximum size in brackets, overriding MAXMAPENTRIES for that array only.
411 Note that this doesn't indicate the type of keys for the array, just the
412 size.
413 .RS
414 .BR global " tiny_array[10]" , " normal_array" , " big_array[50000]"
415 .RE
416 .\" XXX add statistics type here once it's supported
417
418 .SS STATEMENTS
419 Statements enable procedural control flow. They may occur within
420 functions and probe handlers. The total number of statements executed
421 in response to any single probe event is limited to some number
422 defined by a macro in the translated C code, and is in the
423 neighbourhood of 1000.
424 .TP
425 EXP
426 Execute the string- or integer-valued expression and throw away
427 the value.
428 .TP
429 .BR { " STMT1 STMT2 ... " }
430 Execute each statement in sequence in this block. Note that
431 separators or terminators are generally not necessary between statements.
432 .TP
433 .BR ;
434 Null statement, do nothing. It is useful as an optional separator between
435 statements to improve syntax-error detection and to handle certain
436 grammar ambiguities.
437 .TP
438 .BR if " (EXP) STMT1 [ " else " STMT2 ]"
439 Compare integer-valued EXP to zero. Execute the first (non-zero)
440 or second STMT (zero).
441 .TP
442 .BR while " (EXP) STMT"
443 While integer-valued EXP evaluates to non-zero, execute STMT.
444 .TP
445 .BR for " (EXP1; EXP2; EXP3) STMT"
446 Execute EXP1 as initialization. While EXP2 is non-zero, execute
447 STMT, then the iteration expression EXP3.
448 .TP
449 .BR foreach " (VAR " in " ARRAY [ "limit " EXP ]) STMT"
450 Loop over each element of the named global array, assigning current
451 key to VAR. The array may not be modified within the statement.
452 By adding a single
453 .BR + " or " \-
454 operator after the VAR or the ARRAY identifier, the iteration will
455 proceed in a sorted order, by ascending or descending index or value.
456 Using the optional
457 .BR limit
458 keyword limits the number of loop iterations to EXP times. EXP is
459 evaluted once at the beginning of the loop.
460 .TP
461 .BR foreach " ([VAR1, VAR2, ...] " in " ARRAY [ "limit " EXP ]) STMT"
462 Same as above, used when the array is indexed with a tuple of keys.
463 A sorting suffix may be used on at most one VAR or ARRAY identifier.
464 .TP
465 .BR break ", " continue
466 Exit or iterate the innermost nesting loop
467 .RB ( while " or " for " or " foreach )
468 statement.
469 .TP
470 .BR return " EXP"
471 Return EXP value from enclosing function. If the function's value is
472 not taken anywhere, then a return statement is not needed, and the
473 function will have a special "unknown" type with no return value.
474 .TP
475 .BR next
476 Return now from enclosing probe handler.
477 .TP
478 .BR delete " ARRAY[INDEX1, INDEX2, ...]"
479 Remove from ARRAY the element specified by the index tuple. The value will no
480 longer be available, and subsequent iterations will not report the element.
481 It is not an error to delete an element that does not exist.
482 .TP
483 .BR delete " ARRAY"
484 Remove all elements from ARRAY.
485 .TP
486 .BR delete " SCALAR"
487 Removes the value of SCALAR. Integers and strings are cleared to 0 and ""
488 respectively, while statistics are reset to the initial empty state.
489
490 .SS EXPRESSIONS
491 Systemtap supports a number of operators that have the same general syntax,
492 semantics, and precedence as in C and awk. Arithmetic is performed as per
493 typical C rules for signed integers. Division by zero or overflow is
494 detected and results in an error.
495 .TP
496 binary numeric operators
497 .B * / % + \- >> << & ^ | && ||
498 .TP
499 binary string operators
500 .B .
501 (string concatenation)
502 .TP
503 numeric assignment operators
504 .B = *= /= %= += \-= >>= <<= &= ^= |=
505 .TP
506 string assignment operators
507 .B = .=
508 .TP
509 unary numeric operators
510 .B + \- ! ~ ++ \-\-
511 .TP
512 binary numeric or string comparison operators
513 .B < > <= >= == !=
514 .TP
515 ternary operator
516 .RB cond " ? " exp1 " : " exp2
517 .TP
518 grouping operator
519 .BR ( " exp " )
520 .TP
521 function call
522 .RB "fn " ( "[ arg1, arg2, ... ]" )
523 .TP
524 array membership check
525 .RB exp " in " array
526 .br
527 .BR "[" exp1 ", " exp2 ", " ... "] in " array
528
529 .SS PROBES
530 The main construct in the scripting language identifies probes.
531 Probes associate abstract events with a statement block ("probe
532 handler") that is to be executed when any of those events occur. The
533 general syntax is as follows:
534 .SAMPLE
535 .BR probe " PROBEPOINT [" , " PROBEPOINT] " { " [STMT ...] " }
536 .ESAMPLE
537 .PP
538 Events are specified in a special syntax called "probe points". There
539 are several varieties of probe points defined by the translator, and
540 tapset scripts may define further ones using aliases. These are
541 listed in the
542 .IR stapprobes (3stap)
543 manual pages.
544 .PP
545 The probe handler is interpreted relative to the context of each
546 event. For events associated with kernel code, this context may
547 include
548 .I variables
549 defined in the
550 .I source code
551 at that spot. These "target variables" are presented to the script as
552 variables whose names are prefixed with "$". They may be accessed
553 only if the kernel's compiler preserved them despite optimization.
554 This is the same constraint that a debugger user faces when working
555 with optimized code. Some other events have very little context.
556 .PP
557 New probe points may be defined using "aliases". Probe point aliases
558 look similar to probe definitions, but instead of activating a probe
559 at the given point, it just defines a new probe point name as an alias
560 to an existing one. There are two types of alias, i.e. the prologue
561 style and the epilogue style which are identified by "=" and "+="
562 respectively.
563 .PP
564 For prologue style alias, the statement block that follows an alias
565 definition is implicitly added as a prologue to any probe that refers
566 to the alias. While for the epilogue style alias, the statement block
567 that follows an alias definition is implicitly added as an epilogue to
568 any probe that refers to the alias. For example:
569
570 .SAMPLE
571 probe syscall.read = kernel.function("sys_read") {
572 fildes = $fd
573 if (execname == "init") next # skip rest of probe
574 }
575 .ESAMPLE
576 defines a new probe point
577 .nh
578 .IR syscall.read ,
579 .hy
580 which expands to
581 .nh
582 .IR kernel.function("sys_read") ,
583 .hy
584 with the given statement as a prologue, which is useful to predefine
585 some variables for the alias user and/or to skip probe processing
586 entirely based on some conditions. And
587 .SAMPLE
588 probe syscall.read += kernel.function("sys_read") {
589 if (tracethis) println ($fd)
590 }
591 .ESAMPLE
592 defines a new probe point with the given statement as an epilogue, which
593 is useful to take actions based upon variables set or left over by the
594 the alias user.
595
596 An alias is used just like a built-in probe type.
597 .SAMPLE
598 probe syscall.read {
599 printf("reading fd=%d\n", fildes)
600 if (fildes > 10) tracethis = 1
601 }
602 .ESAMPLE
603
604 .SS FUNCTIONS
605 Systemtap scripts may define subroutines to factor out common work.
606 Functions take any number of scalar (integer or string) arguments, and
607 must return a single scalar (integer or string). An example function
608 declaration looks like this:
609 .SAMPLE
610 function thisfn (arg1, arg2) {
611 return arg1 + arg2
612 }
613 .ESAMPLE
614 Note the general absence of type declarations, which are instead
615 inferred by the translator. However, if desired, a function
616 definition may include explicit type declarations for its return value
617 and/or its arguments. This is especially helpful for embedded-C
618 functions. In the following example, the type inference engine need
619 only infer type type of arg2 (a string).
620 .SAMPLE
621 function thatfn:string (arg1:long, arg2) {
622 return sprint(arg1) . arg2
623 }
624 .ESAMPLE
625 Functions may call others or themselves
626 recursively, up to a fixed nesting limit. This limit is defined by
627 a macro in the translated C code and is in the neighbourhood of 10.
628
629 .SS PRINTING
630 There are a set of function names that are specially treated by the
631 translator. They format values for printing to the standard systemtap
632 output stream in a more convenient way. The
633 .IR sprint*
634 variants return the formatted string instead of printing it.
635 .TP
636 .BR print ", " sprint
637 Print one or more values of any type, concatenated directly together.
638 .TP
639 .BR println ", " sprintln
640 Print values like
641 .IR print " and " sprint ,
642 but also append a newline.
643 .TP
644 .BR printd ", " sprintd
645 Take a string delimiter and two or more values of any type, and print the
646 values with the delimiter interposed. The delimiter must be a literal
647 string constant.
648 .TP
649 .BR printdln ", " sprintdln
650 Print values with a delimiter like
651 .IR printd " and " sprintd ,
652 but also append a newline.
653 .TP
654 .BR printf ", " sprintf
655 Take a formatting string and a number of values of corresponding types,
656 and print them all. The format must be a literal string constant.
657 .PP
658 The
659 .IR printf
660 formatting directives similar to those of C, except that they are
661 fully type-checked by the translator:
662 .RS
663 .TP
664 %b
665 Writes a binary blob of the value given, instead of ASCII text. The width specifier determines the number of bytes to write; valid specifiers are %b %1b %2b %4b %8b. Default (%b) is 8 bytes.
666 .TP
667 %c
668 Character.
669 .TP
670 %d,%i
671 Signed decimal.
672 .TP
673 %m
674 Safely reads kernel memory at the given address, outputs its content. The precision specifier determines the number of bytes to read. Default is 1 byte.
675 .TP
676 %M
677 Same as %m, but outputs in hexadecimal. The precision specifier determines the number of hexadecimal digits to output. Default is 1 digit.
678 .TP
679 %o
680 Unsigned octal.
681 .TP
682 %p
683 Unsigned pointer address.
684 .TP
685 %s
686 String.
687 .TP
688 %u
689 Unsigned decimal.
690 .TP
691 %x
692 Unsigned hex value, in all lower-case.
693 .TP
694 %X
695 Unsigned hex value, in all upper-case.
696 .TP
697 %%
698 Writes a %.
699 .RE
700 .PP
701 Examples:
702 .SAMPLE
703 a = "alice", b = "bob", p = 0x1234abcd, i = 123, j = -1, id[a] = 1234, id[b] = 4567
704 print("hello")
705 Prints: hello
706 println(b)
707 Prints: bob\\n
708 println(a . " is " . sprint(16))
709 Prints: alice is 16
710 foreach (name in id) printdln("|", strlen(name), name, id[name])
711 Prints: 5|alice|1234\\n3|bob|4567
712 printf("%c is %s; %x or %X or %p; %d or %u\\n",97,a,p,p,p,j,j)
713 Prints: a is alice; 1234abcd or 1234ABCD or 0x1234abcd; -1 or 18446744073709551615\\n
714 printf("2 bytes of kernel buffer at address %p: %2m", p, p)
715 Prints: 2 byte of kernel buffer at address 0x1234abcd: <binary data>
716 printf("%4b", p)
717 Prints (these values as binary data): 0x1234abcd
718 .ESAMPLE
719
720 .SS STATISTICS
721 It is often desirable to collect statistics in a way that avoids the
722 penalties of repeatedly exclusive locking the global variables those
723 numbers are being put into. Systemtap provides a solution using a
724 special operator to accumulate values, and several pseudo-functions to
725 extract the statistical aggregates.
726 .PP
727 The aggregation operator is
728 .IR <<< ,
729 and resembles an assignment, or a C++ output-streaming operation.
730 The left operand specifies a scalar or array-index lvalue, which must
731 be declared global. The right operand is a numeric expression. The
732 meaning is intuitive: add the given number to the pile of numbers to
733 compute statistics of. (The specific list of statistics to gather
734 is given separately, by the extraction functions.)
735 .SAMPLE
736 foo <<< 1
737 stats[pid()] <<< memsize
738 .ESAMPLE
739 .PP
740 The extraction functions are also special. For each appearance of a
741 distinct extraction function operating on a given identifier, the
742 translator arranges to compute a set of statistics that satisfy it.
743 The statistics system is thereby "on-demand". Each execution of
744 an extraction function causes the aggregation to be computed for
745 that moment across all processors.
746 .PP
747 Here is the set of extractor functions. The first argument of each is
748 the same style of lvalue used on the left hand side of the accumulate
749 operation. The
750 .IR @count(v) ", " @sum(v) ", " @min(v) ", " @max(v) ", " @avg(v)
751 extractor functions compute the number/total/minimum/maximum/average
752 of all accumulated values. The resulting values are all simple
753 integers.
754 .PP
755 Histograms are also available, but are more complicated because they
756 have a vector rather than scalar value.
757 .I @hist_linear(v,start,stop,interval)
758 represents a linear histogram from "start" to "stop" by increments
759 of "interval". The interval must be positive. Similarly,
760 .I @hist_log(v)
761 represents a base-2 logarithmic histogram. Printing a histogram
762 with the
763 .I print
764 family of functions renders a histogram object as a tabular
765 "ASCII art" bar chart.
766 .SAMPLE
767 probe foo {
768 x <<< $value
769 }
770 probe end {
771 printf ("avg %d = sum %d / count %d\\n",
772 @avg(x), @sum(x), @count(x))
773 print (@hist_log(v))
774 }
775 .ESAMPLE
776
777 .SS TYPECASTING
778 Once a pointer has been saved into a script integer variable, the
779 translator loses the type information necessary to access members from
780 that pointer. Using the
781 .I @cast()
782 operator tells the translator how to read a pointer.
783 .SAMPLE
784 @cast(p, "type_name"[, "module"])->member
785 .ESAMPLE
786 .PP
787 This will interpret
788 .I p
789 as a pointer to a struct/union named
790 .I type_name
791 and dereference the
792 .I member
793 value. The optional
794 .I module
795 tells the translator where to look for information about that type.
796 Multiple modules may be specified as a list with
797 .IR :
798 separators. If the module is not specified, it will default either to
799 the probe module for dwarf probes, or to "kernel" for functions and all
800 other probes types.
801 .PP
802 The translator can create its own module with type information from a header
803 surrounded by angle brackets, in case normal debuginfo is not available. For
804 kernel headers, prefix it with "kernel" to use the appropriate build system.
805 All other headers are build with default GCC parameters into a user module.
806 .SAMPLE
807 @cast(tv, "timeval", "<sys/time.h>")->tv_sec
808 @cast(task, "task_struct", "kernel<linux/sched.h>")->tgid
809 .ESAMPLE
810 .PP
811 When in guru mode, the translator will also allow scripts to assign new
812 values to members of typecasted pointers.
813 .PP
814 Typecasting is also useful in the case of
815 .I void*
816 members whose type may be determinable at runtime.
817 .SAMPLE
818 probe foo {
819 if ($var->type == 1) {
820 value = @cast($var->data, "type1")->bar
821 } else {
822 value = @cast($var->data, "type2")->baz
823 }
824 print(value)
825 }
826 .ESAMPLE
827
828 .SS EMBEDDED C
829 When in guru mode, the translator accepts embedded code in the
830 script. Such code is enclosed between
831 .IR %{
832 and
833 .IR %}
834 markers, and is transcribed verbatim, without analysis, in some
835 sequence, into the generated C code. At the outermost level, this may
836 be useful to add
837 .IR #include
838 instructions, and any auxiliary definitions for use by other embedded
839 code.
840 .PP
841 The other place where embedded code is permitted is as a function body.
842 In this case, the script language body is replaced entirely by a piece
843 of C code enclosed again between
844 .IR %{ " and " %}
845 markers.
846 This C code may do anything reasonable and safe. There are a number
847 of undocumented but complex safety constraints on atomicity,
848 concurrency, resource consumption, and run time limits, so this
849 is an advanced technique.
850 .PP
851 The memory locations set aside for input and output values
852 are made available to it using a macro
853 .IR THIS .
854 Here are some examples:
855 .SAMPLE
856 function add_one (val) %{
857 THIS\->__retvalue = THIS\->val + 1;
858 %}
859 function add_one_str (val) %{
860 strlcpy (THIS\->__retvalue, THIS\->val, MAXSTRINGLEN);
861 strlcat (THIS\->__retvalue, "one", MAXSTRINGLEN);
862 %}
863 .ESAMPLE
864 The function argument and return value types have to be inferred by
865 the translator from the call sites in order for this to work. The
866 user should examine C code generated for ordinary script-language
867 functions in order to write compatible embedded-C ones.
868
869 .SS BUILT-INS
870 A set of builtin functions and probe point aliases are provided
871 by the scripts installed under the
872 .nh
873 .IR @prefix@/share/systemtap/tapset
874 .hy
875 directory. These are described in the
876 .IR stapfuncs "(3stap) and " stapprobes (3stap)
877 manual pages.
878
879 .SH PROCESSING
880 The translator begins pass 1 by parsing the given input script,
881 and all scripts (files named
882 .IR *.stp )
883 found in a tapset directory. The directories listed
884 with
885 .BR \-I
886 are processed in sequence, each processed in "guru mode". For each
887 directory, a number of subdirectories are also searched. These
888 subdirectories are derived from the selected kernel version (the
889 .BR \-R
890 option),
891 in order to allow more kernel-version-specific scripts to override less
892 specific ones. For example, for a kernel version
893 .IR 2.6.12\-23.FC3
894 the following patterns would be searched, in sequence:
895 .IR 2.6.12\-23.FC3/*.stp ,
896 .IR 2.6.12/*.stp ,
897 .IR 2.6/*.stp ,
898 and finally
899 .IR *.stp
900 Stopping the translator after pass 1 causes it to print the parse trees.
901
902 .PP
903 In pass 2, the translator analyzes the input script to resolve symbols
904 and types. References to variables, functions, and probe aliases that
905 are unresolved internally are satisfied by searching through the
906 parsed tapset scripts. If any tapset script is selected because it
907 defines an unresolved symbol, then the entirety of that script is
908 added to the translator's resolution queue. This process iterates
909 until all symbols are resolved and a subset of tapset scripts is
910 selected.
911 .PP
912 Next, all probe point descriptions are validated
913 against the wide variety supported by the translator. Probe points that
914 refer to code locations ("synchronous probe points") require the
915 appropriate kernel debugging information to be installed. In the
916 associated probe handlers, target-side variables (whose names begin
917 with "$") are found and have their run-time locations decoded.
918 .PP
919 Next, all probes and functions are analyzed for optimization
920 opportunities, in order to remove variables, expressions, and
921 functions that have no useful value and no side-effect. Embedded-C
922 functions are assumed to have side-effects unless they include the
923 magic string
924 .BR /*\ pure\ */ .
925 Since this optimization can hide latent code errors such as type
926 mismatches or invalid $target variables, it sometimes may be useful
927 to disable the optimizations with the
928 .BR \-u
929 option.
930 .PP
931 Finally, all variable, function, parameter, array, and index types are
932 inferred from context (literals and operators). Stopping the
933 translator after pass 2 causes it to list all the probes, functions,
934 and variables, along with all inferred types. Any inconsistent or
935 unresolved types cause an error.
936
937 .PP
938 In pass 3, the translator writes C code that represents the actions
939 of all selected script files, and creates a
940 .IR Makefile
941 to build that into a kernel object. These files are placed into a
942 temporary directory. Stopping the translator at this point causes
943 it to print the contents of the C file.
944
945 .PP
946 In pass 4, the translator invokes the Linux kernel build system to
947 create the actual kernel object file. This involves running
948 .IR make
949 in the temporary directory, and requires a kernel module build
950 system (headers, config and Makefiles) to be installed in the usual
951 spot
952 .IR /lib/modules/VERSION/build .
953 Stopping the translator after pass 4 is the last chance before
954 running the kernel object. This may be useful if you want to
955 archive the file.
956
957 .PP
958 In pass 5, the translator invokes the systemtap auxiliary program
959 .I staprun
960 program for the given kernel object. This program arranges to load
961 the module then communicates with it, copying trace data from the
962 kernel into temporary files, until the user sends an interrupt signal.
963 Any run-time error encountered by the probe handlers, such as running
964 out of memory, division by zero, exceeding nesting or runtime limits,
965 results in a soft error indication. Soft errors in excess of
966 MAXERRORS block of all subsequent probes, and terminate the session.
967 Finally,
968 .I staprun
969 unloads the module, and cleans up.
970
971 .SH EXAMPLES
972 See the
973 .IR stapex (3stap)
974 manual page for a collection of samples.
975
976 .SH CACHING
977 The systemtap translator caches the pass 3 output (the generated C
978 code) and the pass 4 output (the compiled kernel module) if pass 4
979 completes successfully. This cached output is reused if the same
980 script is translated again assuming the same conditions exist (same kernel
981 version, same systemtap version, etc.). Cached files are stored in
982 the
983 .I $SYSTEMTAP_DIR/cache
984 directory. The cache can be limited by having the file
985 .I cache_mb_limit
986 placed in the cache directory (shown above) containing only an ASCII
987 integer representing how many MiB the cache should not exceed. Note that
988 this is a 'soft' limit in that the cache will be cleaned after a new entry
989 is added, so the total cache size may temporarily exceed this limit. In the
990 absence of this file, a default will be created with the limit set to 64MiB.
991
992 .SH SAFETY AND SECURITY
993 Systemtap is an administrative tool. It exposes kernel internal data
994 structures and potentially private user information.
995 It acquires
996 either root privileges
997
998 To actually run the kernel objects it builds, a user must be one of
999 the following:
1000 .IP \(bu 4
1001 the root user;
1002 .IP \(bu 4
1003 a member of the
1004 .I stapdev
1005 group; or
1006 .IP \(bu 4
1007 a member of the
1008 .I stapusr
1009 group. Members of the
1010 .I stapusr
1011 group can only use modules located in
1012 the /lib/modules/VERSION/systemtap directory. This directory
1013 must be owned by root and not be world writable.
1014 .PP
1015 The kernel modules generated by
1016 .I stap
1017 program are run by the
1018 .IR staprun
1019 program. The latter is a part of the Systemtap package, dedicated to
1020 module loading and unloading (but only in the white zone), and
1021 kernel-to-user data transfer. Since
1022 .IR staprun
1023 does not perform any additional security checks on the kernel objects
1024 it is given, it would be unwise for a system administrator to add
1025 untrusted users to the
1026 .I stapdev
1027 or
1028 .I stapusr
1029 groups.
1030 .PP
1031 The translator asserts certain safety constraints. It aims to ensure
1032 that no handler routine can run for very long, allocate memory,
1033 perform unsafe operations, or in unintentionally interfere with the
1034 kernel. Use of script global variables is suitably locked to protect
1035 against manipulation by concurrent probe handlers. Use of guru mode
1036 constructs such as embedded C can violate these constraints, leading
1037 to kernel crash or data corruption.
1038 .PP
1039 The resource use limits are set by macros in the generated C code.
1040 These may be overridden with the
1041 .BR \-D
1042 flag. A selection of these is as follows:
1043 .TP
1044 MAXNESTING
1045 Maximum number of recursive function call levels, default 10.
1046 .TP
1047 MAXSTRINGLEN
1048 Maximum length of strings, default 128.
1049 .TP
1050 MAXTRYLOCK
1051 Maximum number of iterations to wait for locks on global variables
1052 before declaring possible deadlock and skipping the probe, default 1000.
1053 .TP
1054 MAXACTION
1055 Maximum number of statements to execute during any single probe hit
1056 (with interrupts disabled),
1057 default 1000.
1058 .TP
1059 MAXACTION_INTERRUPTIBLE
1060 Maximum number of statements to execute during any single probe hit
1061 which is executed with interrupts enabled (such as begin/end probes),
1062 default (MAXACTION * 10).
1063 .TP
1064 MAXMAPENTRIES
1065 Maximum number of rows in any single global array, default 2048.
1066 .TP
1067 MAXERRORS
1068 Maximum number of soft errors before an exit is triggered, default 0, which
1069 means that the first error will exit the script.
1070 .TP
1071 MAXSKIPPED
1072 Maximum number of skipped probes before an exit is triggered, default 100.
1073 Running systemtap with \-t (timing) mode gives more details about skipped
1074 probes. With the default \-DINTERRUPTIBLE=1 setting, probes skipped due to
1075 reentrancy are not accumulated against this limit.
1076 .TP
1077 MINSTACKSPACE
1078 Minimum number of free kernel stack bytes required in order to
1079 run a probe handler, default 1024. This number should be large enough
1080 for the probe handler's own needs, plus a safety margin.
1081 .TP
1082 MAXUPROBES
1083 Maximum number of concurrently armed user-space probes (uprobes), default
1084 100 times the number of user-space probe points named in the script. This
1085 pool is large because individual uprobe objects are allocated for each
1086 process for each script-level probe.
1087
1088 .PP
1089 With scripts that contain probes on any interrupt path, it is possible that
1090 those interrupts may occur in the middle of another probe handler. The probe
1091 in the interrupt handler would be skipped in this case to avoid reentrance.
1092 To work around this issue, execute stap with the option
1093 .BR \-DINTERRUPTIBLE=0
1094 to mask interrupts throughout the probe handler. This does add some extra
1095 overhead to the probes, but it may prevent reentrance for common problem
1096 cases. However, probes in NMI handlers and in the callpath of the stap
1097 runtime may still be skipped due to reentrance.
1098
1099 .PP
1100 Multiple scripts can write data into a relay buffer concurrently. A host
1101 script provides an interface for accessing its relay buffer to guest scripts.
1102 Then, the output of the guests are merged into the output of the host.
1103 To run a script as a host, execute stap with
1104 .BR \-DRELAYHOST[=name]
1105 option. The
1106 .BR name
1107 identifies your host script among several hosts.
1108 While running the host, execute stap with
1109 .BR \-DRELAYGUEST[=name]
1110 to add a guest script to the host.
1111 Note that you must unload guests before unloading a host. If there are some
1112 guests connected to the host, unloading the host will be failed.
1113
1114 .PP
1115 In case something goes wrong with
1116 .IR stap " or " staprun
1117 after a probe has already started running, one may safely kill both
1118 user processes, and remove the active probe kernel module with
1119 .IR rmmod .
1120 Any pending trace messages may be lost.
1121
1122 .PP
1123 In addition to the methods outlined above, the generated kernel module
1124 also uses overload processing to make sure that probes can't run for
1125 too long. If more than STP_OVERLOAD_THRESHOLD cycles (default
1126 500000000) have been spent in all the probes on a single cpu during
1127 the last STP_OVERLOAD_INTERVAL cycles (default 1000000000), the probes
1128 have overloaded the system and an exit is triggered.
1129 .PP
1130 By default, overload processing is turned on for all modules. If you
1131 would like to disable overload processing, define STP_NO_OVERLOAD.
1132
1133 .SH MAKING DO WITH SYMBOL TABLES
1134 Systemtap performs best when it has access to the debugging information
1135 associated with your kernel and modules.
1136 However, if this information is not available,
1137 systemtap can still support probing of function entries and returns
1138 using symbols read from vmlinux and/or the modules in /lib/modules.
1139 Systemtap can also read the kernel symbol table from a text file
1140 such as /boot/System.map or /proc/kallsyms.
1141 See the
1142 .B \-\-kelf
1143 and
1144 .B \-\-kmap
1145 options.
1146 .PP
1147 If systemtap finds relevant debugging information,
1148 it will use it even if you specify
1149 .B \-\-kelf
1150 or
1151 .BR \-\-kmap .
1152 .PP
1153 Without debugging information, systemtap cannot support the
1154 following types of language constructs:
1155 .IP \(bu 4
1156 probe specifications that refer to source files or line numbers
1157 .IP \(bu 4
1158 probe specifications that refer to inline functions
1159 .IP \(bu 4
1160 statements that refer to $target variables
1161 .IP \(bu 4
1162 statements that refer to @cast() variables
1163 .IP \(bu 4
1164 tapset-defined variables defined using any of the above constructs.
1165 In particular, at this writing,
1166 the prologue blocks for certain aliases in the syscall tapset
1167 (e.g., syscall.open) contain "if" statements that refer to $target variables.
1168 If your script refers to any such aliases,
1169 systemtap must have access to the kernel's debugging information.
1170 .PP
1171 Most T and t symbols correspond to function entry points, but some do not.
1172 Based only on the symbol table, systemtap cannot tell the difference.
1173 Placing return probes on symbols that aren't entry points
1174 will most likely lead to kernel stack corruption.
1175
1176 .SH FILES
1177 .\" consider autoconf-substituting these directories
1178 .TP
1179 ~/.systemtap
1180 Systemtap data directory for cached systemtap files, unless overridden
1181 by the
1182 .I SYSTEMTAP_DIR
1183 environment variable.
1184 .TP
1185 /tmp/stapXXXXXX
1186 Temporary directory for systemtap files, including translated C code
1187 and kernel object.
1188 .TP
1189 @prefix@/share/systemtap/tapset
1190 The automatic tapset search directory, unless overridden by
1191 the
1192 .I SYSTEMTAP_TAPSET
1193 environment variable.
1194 .TP
1195 @prefix@/share/systemtap/runtime
1196 The runtime sources, unless overridden by the
1197 .I SYSTEMTAP_RUNTIME
1198 environment variable.
1199 .TP
1200 /lib/modules/VERSION/build
1201 The location of kernel module building infrastructure.
1202 .TP
1203 @prefix@/lib/debug/lib/modules/VERSION
1204 The location of kernel debugging information when packaged into the
1205 .IR kernel\-debuginfo
1206 RPM, unless overridden by the
1207 .I SYSTEMTAP_DEBUGINFO_PATH
1208 environment variable. The default value for this variable is
1209 .IR \+:.debug:/usr/lib/debug:build .
1210 Elfutils searches vmlinux in this path and it interprets the path as a base
1211 directory of which various subdirectories will be searched for finding modules.
1212 .TP
1213 @prefix@/bin/staprun
1214 The auxiliary program supervising module loading, interaction, and
1215 unloading.
1216
1217 .SH SEE ALSO
1218 .IR stapprobes (3stap),
1219 .IR stapfuncs (3stap),
1220 .IR stapvars (3stap),
1221 .IR stapex (3stap),
1222 .IR awk (1),
1223 .IR gdb (1)
1224
1225 .SH BUGS
1226 Use the Bugzilla link off of the project web page or our mailing list.
1227 .nh
1228 .BR http://sources.redhat.com/systemtap/ , <systemtap@sources.redhat.com> .
1229 .hy
This page took 0.092259 seconds and 5 git commands to generate.