]> sourceware.org Git - systemtap.git/blob - stap.1
tracepoints: Work with the tracequery's .o rather than .ko
[systemtap.git] / stap.1
1 .\" -*- nroff -*-
2 .TH STAP 1
3 .SH NAME
4 stap \- systemtap script translator/driver
5
6 .\" macros
7 .de SAMPLE
8 .br
9 .RS
10 .nf
11 .nh
12 ..
13 .de ESAMPLE
14 .hy
15 .fi
16 .RE
17 ..
18
19 .SH SYNOPSIS
20
21 .br
22 .B stap
23 [
24 .I OPTIONS
25 ]
26 .I FILENAME
27 [
28 .I ARGUMENTS
29 ]
30 .br
31 .B stap
32 [
33 .I OPTIONS
34 ]
35 .B \-
36 [
37 .I ARGUMENTS
38 ]
39 .br
40 .B stap
41 [
42 .I OPTIONS
43 ]
44 .BI \-e " SCRIPT"
45 [
46 .I ARGUMENTS
47 ]
48 .br
49 .B stap
50 [
51 .I OPTIONS
52 ]
53 .BI \-l " PROBE"
54 [
55 .I ARGUMENTS
56 ]
57 .br
58 .B stap
59 [
60 .I OPTIONS
61 ]
62 .BI \-L " PROBE"
63 [
64 .I ARGUMENTS
65 ]
66
67 .SH DESCRIPTION
68
69 The
70 .IR stap
71 program is the front-end to the Systemtap tool. It accepts probing
72 instructions (written in a simple scripting language), translates
73 those instructions into C code, compiles this C code, and loads the
74 resulting kernel module into a running Linux kernel to perform the
75 requested system trace/probe functions. You can supply the script in
76 a named file, from standard input, or from the command line. The
77 program runs until it is interrupted by the user, or if the script
78 voluntarily invokes the
79 .I exit()
80 function, or by sufficient number of soft errors.
81 .PP
82 The language, which is described in a later section, is strictly typed,
83 declaration free, procedural, and inspired by
84 .IR awk .
85 It allows source code points or events in the kernel to be associated
86 with handlers, which are subroutines that are executed synchronously. It is
87 somewhat similar conceptually to "breakpoint command lists" in the
88 .IR gdb
89 debugger.
90
91 .SH OPTIONS
92 The systemtap translator supports the following options. Any other option
93 prints a list of supported options.
94 .TP
95 .B \-h \-\-help
96 Show help message.
97 .TP
98 .B \-V \-\-version
99 Show version message.
100 .TP
101 .BI \-p " NUM"
102 Stop after pass NUM. The passes are numbered 1-5: parse, elaborate,
103 translate, compile, run. See the
104 .B PROCESSING
105 section for details.
106 .TP
107 .B \-v
108 Increase verbosity for all passes. Produce a larger volume of
109 informative (?) output each time option repeated.
110 .TP
111 .B \-\-vp ABCDE
112 Increase verbosity on a per-pass basis. For example, "\-\-vp\ 002"
113 adds 2 units of verbosity to pass 3 only. The combination "\-v\ \-\-vp\ 00004"
114 adds 1 unit of verbosity for all passes, and 4 more for pass 5.
115 .TP
116 .B \-k
117 Keep the temporary directory after all processing. This may be useful
118 in order to examine the generated C code, or to reuse the compiled
119 kernel object.
120 .TP
121 .B \-g
122 Guru mode. Enable parsing of unsafe expert-level constructs like
123 embedded C.
124 .TP
125 .B \-P
126 Prologue-searching mode. Activate heuristics to work around incorrect
127 debugging information for $target variables.
128 .TP
129 .B \-u
130 Unoptimized mode. Disable unused code elision during elaboration.
131 .TP
132 .B \-w
133 Suppressed warnings mode. Disables all warning messages.
134 .TP
135 .BI \-b
136 Use bulk mode (percpu files) for kernel-to-user data transfer.
137 .TP
138 .B \-t
139 Collect timing information on the number of times probe executes
140 and average amount of time spent in each probe-point. Also shows
141 the derivation for each probe-point.
142 .TP
143 .BI \-s NUM
144 Use NUM megabyte buffers for kernel-to-user data transfer. On a
145 multiprocessor in bulk mode, this is a per-processor amount.
146 .TP
147 .BI \-I " DIR"
148 Add the given directory to the tapset search directory. See the
149 description of pass 2 for details.
150 .TP
151 .BI \-D " NAME=VALUE"
152 Add the given C preprocessor directive to the module Makefile. These can
153 be used to override limit parameters described below.
154 .TP
155 .BI \-B " NAME=VALUE"
156 Add the given make directive to the kernel module build's make invocation.
157 These can be used to add or override kconfig options.
158 .TP
159 .BI \-G " NAME=VALUE"
160 Sets the value of global variable NAME to VALUE when staprun is invoked.
161 This applies to scalar variables declared global in the script/tapset.
162 .TP
163 .BI \-R " DIR"
164 Look for the systemtap runtime sources in the given directory.
165 .TP
166 .BI \-r " /DIR"
167 Build for kernel in given build tree. Can also be set with the
168 .I SYSTEMTAP_RELEASE
169 environment variable.
170 .TP
171 .BI \-r " RELEASE"
172 Build for kernel in build tree
173 .BR /lib/modules/RELEASE/build .
174 Can also be set with the
175 .I SYSTEMTAP_RELEASE
176 environment variable.
177 .TP
178 .BI \-m " MODULE"
179 Use the given name for the generated kernel object module, instead
180 of a unique randomized name. The generated kernel object module is
181 copied to the current directory.
182 .TP
183 .BI \-d " MODULE"
184 Add symbol/unwind information for the given module into the kernel object
185 module. This may enable symbolic tracebacks from those modules/programs,
186 even if they do not have an explicit probe placed into them.
187 .TP
188 .BI \-\-ldd
189 Add symbol/unwind information for all shared libraries suspected by
190 ldd to be necessary for user-space binaries being probe or listed with
191 the \-d option. Caution: this can make the probe modules considerably
192 larger.
193 .TP
194 .BI \-\-all\-modules
195 Equivalent to specifying "\-dkernel" and a "\-d" for each kernel module that is
196 currently loaded. Caution: this can make the probe modules considerably
197 larger.
198 .TP
199 .BI \-o " FILE"
200 Send standard output to named file. In bulk mode, percpu files will
201 start with FILE_ (FILE_cpu with \-F) followed by the cpu number.
202 This supports strftime(3) formats for FILE.
203 .TP
204 .BI \-c " CMD"
205 Start the probes, run CMD, and exit when CMD finishes. This also has the
206 effect of setting target() to the pid of the command ran.
207 .TP
208 .BI \-x " PID"
209 Sets target() to PID. This allows scripts to be written that filter on
210 a specific process.
211 .TP
212 .BI \-l " PROBE"
213 Instead of running a probe script, just list all available probe
214 points matching the given single probe point. The pattern may include
215 wildcards and aliases, but not comma-separated multiple probe points.
216 The process result code will indicate failure if there are no matches.
217 .TP
218 .BI \-L " PROBE"
219 Similar to "\-l", but list probe points and script-level local variables.
220 .TP
221 .BI \-F
222 Without \-o option, load module and start probes, then detach from the module
223 leaving the probes running.
224 With \-o option, run staprun in background as a daemon and show its pid.
225 .TP
226 .BI \-S " size[,N]"
227 Sets the maximum size of output file and the maximum number of output files.
228 If the size of output file will exceed
229 .B size
230 , systemtap switches output file to the next file. And if the number of
231 output files exceed
232 .B N
233 , systemtap removes the oldest output file. You can omit the second argument.
234 .\" PR6864: disable temporarily
235 .\".TP
236 .\".B \-\-kelf
237 .\"For names and addresses of functions to probe,
238 .\"consult the symbol tables in the kernel and modules.
239 .\"This can be useful if your kernel and/or modules were compiled
240 .\"without debugging information, or the function you want to probe
241 .\"is in an assembly-language file built without debugging information.
242 .\"See the
243 .\".B "MAKING DO WITH SYMBOL TABLES"
244 .\"section for more information.
245 .\".TP
246 .\".BI \-\-kmap [=FILE]
247 .\"For names and addresses of kernel functions to probe,
248 .\"consult the symbol table in the indicated text file.
249 .\"The default is /boot/System.map\-VERSION.
250 .\"The contents of this file should be in the form of the default output from
251 .\".IR nm (1).
252 .\"Only symbols of type T or t are used.
253 .\"If you specify /proc/kallsyms or some other file in that format,
254 .\"where lines for module symbols contain a fourth column,
255 .\"reading of the symbol table stops with the first module symbol
256 .\"(which should be right after the last kernel symbol).
257 .\"As with
258 .\".BR \-\-kelf ,
259 .\"the symbol table in each module's .ko file will also be consulted.
260 .\"See the
261 .\".B "MAKING DO WITH SYMBOL TABLES"
262 .\"section for more information.
263 .\" \-\-ignore\-{vmlinux,dwarf} shouldn't be visible
264 .TP
265 .B \-\-skip\-badvars
266 Ignore out of context variables and substitute with literal 0.
267
268 .TP
269 .BI \-\-compatible " VERSION"
270 Suppress recent script language or tapset changes which are incompatible
271 with given older version of systemtap. This may be useful if a much older
272 systemtap script fails to run. See the DEPRECATION section for more
273 details.
274
275 .TP
276 .BI \-\-check\-version
277 This option is used to check if the active script has any constructors
278 that may be systemtap version specific. See the DEPRECATION section
279 for more details.
280
281 .TP
282 .BI \-\-clean\-cache
283 This option prunes stale entries from the cache directory. This is normally
284 done automatically after successful runs, but this option will trigger the
285 cleanup manually and then exit. See the CACHING section for more details about
286 cache limits.
287
288 .TP
289 .BI \-\-disable\-cache
290 This option disables all use of the cache directory. No files will be either
291 read from or written to the cache.
292
293 .TP
294 .BI \-\-poison\-cache
295 This option treats files in the cache directory as invalid. No files will be
296 read from the cache, but resulting files from this run will still be written to
297 the cache. This is meant as a troubleshooting aid when stap's cached behavior
298 seems to be misbehaving.
299
300 .TP
301 .BI \-\-unprivileged
302 This option instructs \fIstap\fR to examine the script looking for constructs
303 which are not allowed for unprivileged users (see \fIUNPRIVILEGED USERS\fR).
304 Compilation fails if any such
305 constructs are used.
306 If this option is specified when using a compile server
307 (see \fI\-\-use\-server\fR),
308 the server will examine the script and, if compilation succeeds, the
309 server will cryptographically sign the resulting kernel module, certifying
310 that is it safe for use by unprivileged users.
311
312 If \fI\-\-unprivileged\fR has not been specified,
313 \fI\-pN\fR has not been specified with N < 5,
314 and the invoking user not
315 \fIroot\fR, is not a member of the group \fIstapdev\fR, but is a member of the
316 group \fIstapusr\fR, then \fIstap\fR will automatically
317 add \fI\-\-unprivileged\fR to the options already specified.
318
319 .TP
320 \fB\-\-use\-server \fR[\fIHOSTNAME\fR[\fI:PORT\fR] | \fIIP_ADDRESS\fR[\fI:PORT\fR] | \fICERT_SERIAL\fR]
321 Specify compile\-server(s) to be used for compilation and/or in conjunction
322 with
323 .I \-\-list\-servers
324 and
325 .I \-\-trust\-servers
326 (see below). If no argument is
327 supplied, then the default in unprivileged mode (see
328 .IR \-\-unprivileged )
329 is to select compatible servers which are trusted as SSL peers and as
330 module signers and currently online. Otherwise the default is to select
331 compatible servers which are trusted as SSL peers
332 and currently online.
333 .I \-\-use\-server
334 may be
335 specified more than once, in which case a list of servers is accumulated
336 in the order specified. Servers may be specified by host name, ip address, or
337 by certificate serial number (obtained using
338 .IR \-\-list\-servers ).
339 The latter is most commonly used when revoking
340 trust in a server (see
341 .I \-\-trust\-servers
342 below). If a server is specified by host name or ip address, then an optional
343 port number may be specified. This is useful for accessing servers which are
344 not on the local network or to specify a particular server.
345
346 If \fI\-\-use\-server\fR has not been specified,
347 \fI\-pN\fR has not been specified with N < 5,
348 and the invoking user not \fIroot\fR,
349 is not a member of the group \fIstapdev\fR, but is a member of the group
350 \fIstapusr\fR, then \fIstap\fR will automatically
351 add \fI\-\-use\-server\fR to the options already specified.
352
353 .TP
354 \fB\-\-use\-server\-on\-error \fR[\fByes\fR|\fBno\fR]
355 Instructs stap to retry compilation of a script using a compile server if
356 compilation on the local host fails in a manner which suggests that it might
357 succeed using a server.
358 If this option is not specified, the default is \fIno\fR.
359 If no argument is provided, then the default
360 is \fIyes\fR. Compilation will be retried for certain types of errors
361 (e.g. insufficient data or resources) which may not occur during
362 re\-compilation by a compile
363 server. Compile servers will be selected automatically for the
364 re\-compilation attempt as if \fI\-\-use\-server\fR was specified with no
365 arguments.
366
367 .TP
368 .BI \-\-list\-servers " [SERVERS]"
369 Display the status of the requested
370 .IR SERVERS ,
371 where
372 .I SERVERS
373 is a comma\-separated
374 list of server attributes. The list of attributes is combined to filter the
375 list of servers displayed. Supported attributes are:
376 .RS
377 .TP
378 .BI all
379 specifies all known servers (trusted SSL peers, trusted module signers, online
380 servers).
381 .TP
382 .BI specified
383 specifies servers specified using
384 .IR \-\-use\-server .
385 .TP
386 .BI online
387 filters the output by retaining information about servers which are currently
388 online.
389 .TP
390 .BI trusted
391 filters the output by retaining information about servers which are trusted as
392 SSL peers.
393 .TP
394 .BI signer
395 filters the output by retaining information about servers which are trusted as
396 module signers (see
397 .IR \-\-unprivileged ).
398 .TP
399 .BI compatible
400 filters the output by retaining information about servers which are compatible
401 with the current kernel release and architecture.
402 .RE
403 .IP
404 If no argument is provided, then the default is
405 .BR specified .
406 If no servers were specified using
407 .IR \-\-use\-server ,
408 then the default servers for
409 .IR \-\-use\-server
410 are listed.
411
412 .TP
413 .BI \-\-trust\-servers " [TRUST_SPEC]"
414 Grant or revoke trust in compile\-servers, specified using
415 .IR \-\-use\-server
416 as specified by TRUST_SPEC,
417 where TRUST_SPEC is a comma\-separated list specifying the trust which is to
418 be granted or revoked. Supported elements are:
419 .RS
420 .TP
421 .BI ssl
422 trust the specified servers as SSL peers.
423 .TP
424 .BI signer
425 trust the specified servers as module signers (see
426 .IR \-\-unprivileged ).
427 Only root can specify
428 .BR signer.
429 .TP
430 .BI all\-users
431 grant trust as an ssl peer for all users on the local host. The default is
432 to grant trust as an ssl peer for the current user only. Trust as a module
433 signer is always granted for all users. Only root can specify
434 .BR all\-users .
435 .TP
436 .BI revoke
437 revoke the specified trust. The default is to grant it.
438 .TP
439 .BI no\-prompt
440 do not prompt the user for confirmation before carrying out the requested
441 action. The default is to prompt the user for confirmation.
442 .RE
443 .IP
444 If no argument is provided, then the default is
445 .BR ssl .
446 If no servers were specified using
447 .IR \-\-use\-server ,
448 then no trust will be granted or revoked.
449 .IP
450 Unless \fBno\-prompt\fR has been specified,
451 the user will be prompted to confirm the trust to be granted or revoked before
452 the operation is performed.
453
454 .TP
455 .BI \-\-dump-probe-types
456 Dumps a list of supported probe types. If
457 .IR \-\-unprivileged
458 is also specified, the list will be limited to probe types available to unprivileged users.
459
460 .TP
461 .BI \-\-remote " [USER@]HOSTNAME"
462 Set the execution target to the specified ssh host, optionally using a username
463 not matching your own. This option may be repeated to target multiple
464 execution targets. Passes 1-4 are completed locally as normal to build the
465 script, and then pass 5 will copy the module to the target and run it.
466 If a custom ssh_config file is in use, add
467 .B SendEnv LANG
468 to retain internationalization functionality.
469
470 .TP
471 .BI \-\-download\-debuginfo " [OPTION]"
472 Enable, disable or set a timeout for the automatic debuginfo downloading feature
473 offered by abrt as specified by OPTION, where OPTION is one of the following:
474 .RS
475 .TP
476 .BI yes
477 enable automatic downloading of debuginfo with no timeout. This is the same
478 as not providing an OPTION value to
479 .IR \-\-download\-debuginfo
480 .TP
481 .BI no
482 explicitly disable automatic dowloading of debuginfo. This is the same as
483 not using the option at all.
484 .TP
485 .BI ask
486 show abrt output, and ask before continuing download. No timeout will be set.
487 .TP
488 .BI <timeout>
489 specify a timeout as a positive number to stop the download if it is taking
490 too long.
491
492
493 .SH ARGUMENTS
494
495 Any additional arguments on the command line are passed to the script
496 parser for substitution. See below.
497
498 .SH SCRIPT LANGUAGE
499
500 The systemtap script language resembles
501 .IR awk .
502 There are two main outermost constructs: probes and functions. Within
503 these, statements and expressions use C-like operator syntax and
504 precedence.
505
506 .SS GENERAL SYNTAX
507 Whitespace is ignored. Three forms of comments are supported:
508 .RS
509 .br
510 .BR # " ... shell style, to the end of line, except for $# and @#"
511 .br
512 .BR // " ... C++ style, to the end of line"
513 .br
514 .BR /* " ... C style ... " */
515 .RE
516 Literals are either strings enclosed in double-quotes (passing through
517 the usual C escape codes with backslashes), or integers (in decimal,
518 hexadecimal, or octal, using the same notation as in C). All strings
519 are limited in length to some reasonable value (a few hundred bytes).
520 Integers are 64-bit signed quantities, although the parser also accepts
521 (and wraps around) values above positive 2**63.
522 .PP
523 In addition, script arguments given at the end of the command line may
524 be inserted. Use
525 .B $1 ... $<NN>
526 for insertion unquoted,
527 .B @1 ... @<NN>
528 for insertion as a string literal. The number of arguments may be accessed
529 through
530 .B $#
531 (as an unquoted number) or through
532 .B @#
533 (as a quoted number). These may be used at any place a token may begin,
534 including within the preprocessing stage. Reference to an argument
535 number beyond what was actually given is an error.
536
537 .SS PREPROCESSING
538 A simple conditional preprocessing stage is run as a part of parsing.
539 The general form is similar to the
540 .RB cond " ? " exp1 " : " exp2
541 ternary operator:
542 .SAMPLE
543 .BR %( " CONDITION " %? " TRUE-TOKENS " %)
544 .BR %( " CONDITION " %? " TRUE-TOKENS " %: " FALSE-TOKENS " %)
545 .ESAMPLE
546 The CONDITION is either an expression whose format is determined by its
547 first keyword, or a string literals comparison or a numeric literals
548 comparison. It can be also composed of many alternatives and conjunctions
549 of CONDITIONs (meant as in previous sentence) using || and && respectively.
550 However, parentheses are not supported yet, so remembering that conjunction
551 takes precedence over alternative is important.
552 .PP
553 If the first part is the identifier
554 .BR kernel_vr " or " kernel_v
555 to refer to the kernel version number, with ("2.6.13\-1.322FC3smp") or
556 without ("2.6.13") the release code suffix, then
557 the second part is one of the six standard numeric comparison operators
558 .BR < ", " <= ", " == ", " != ", " > ", and " >= ,
559 and the third part is a string literal that contains an RPM-style
560 version-release value. The condition is deemed satisfied if the
561 version of the target kernel (as optionally overridden by the
562 .BR \-r
563 option) compares to the given version string. The comparison is
564 performed by the glibc function
565 .BR strverscmp .
566 As a special case, if the operator is for simple equality
567 .RB ( == ),
568 or inequality
569 .RB ( != ),
570 and the third part contains any wildcard characters
571 .RB ( * " or " ? " or " [ "),"
572 then the expression is treated as a wildcard (mis)match as evaluated
573 by
574 .BR fnmatch .
575 .PP
576 If, on the other hand, the first part is the identifier
577 .BR arch
578 to refer to the processor architecture (as named by the kernel
579 build system ARCH/SUBARCH), then the second
580 part is one of the two string comparison operators
581 .BR == " or " != ,
582 and the third part is a string literal for matching it. This
583 comparison is a wildcard (mis)match.
584 .PP
585 Similarly, if the first part is an identifier like
586 .BR CONFIG_something
587 to refer to a kernel configuration option, then the second part is
588 .BR == " or " != ,
589 and the third part is a string literal for matching the value
590 (commonly "y" or "m"). Nonexistent or unset kernel configuration
591 options are represented by the empty string. This comparison is also
592 a wildcard (mis)match.
593 .PP
594 If the first part is the identifier
595 .BR systemtap_v ,
596 the test refers to the systemtap compatibility version, which may be
597 overridden for old scripts with the
598 .BI \-\-compatible
599 flag. The comparison operator is as is for
600 .BR kernel_v
601 and the right operand is a version string. See also the DEPRECATION
602 section below.
603 .PP
604 Otherwise, the CONDITION is expected to be a comparison between two string
605 literals or two numeric literals. In this case, the arguments are the only
606 variables usable.
607 .PP
608 The TRUE-TOKENS and FALSE-TOKENS are zero or more general parser
609 tokens (possibly including nested preprocessor conditionals), and are
610 passed into the input stream if the condition is true or false. For
611 example, the following code induces a parse error unless the target
612 kernel version is newer than 2.6.5:
613 .SAMPLE
614 %( kernel_v <= "2.6.5" %? **ERROR** %) # invalid token sequence
615 .ESAMPLE
616 The following code might adapt to hypothetical kernel version drift:
617 .SAMPLE
618 probe kernel.function (
619 %( kernel_v <= "2.6.12" %? "__mm_do_fault" %:
620 %( kernel_vr == "2.6.13*smp" %? "do_page_fault" %:
621 UNSUPPORTED %) %)
622 ) { /* ... */ }
623
624 %( arch == "ia64" %?
625 probe syscall.vliw = kernel.function("vliw_widget") {}
626 %)
627 .ESAMPLE
628
629 .SS VARIABLES
630 Identifiers for variables and functions are an alphanumeric sequence,
631 and may include "_" and "$" characters. They may not start with a
632 plain digit, as in C. Each variable is by default local to the probe
633 or function statement block within which it is mentioned, and therefore
634 its scope and lifetime is limited to a particular probe or function
635 invocation.
636 .\" XXX add statistics type here once it's supported
637 .PP
638 Scalar variables are implicitly typed as either string or integer.
639 Associative arrays also have a string or integer value, and a
640 tuple of strings and/or integers serving as a key. Here are a
641 few basic expressions.
642 .SAMPLE
643 var1 = 5
644 var2 = "bar"
645 array1 [pid()] = "name" # single numeric key
646 array2 ["foo",4,i++] += 5 # vector of string/num/num keys
647 if (["hello",5,4] in array2) println ("yes") # membership test
648 .ESAMPLE
649 .PP
650 The translator performs
651 .I type inference
652 on all identifiers, including array indexes and function parameters.
653 Inconsistent type-related use of identifiers signals an error.
654 .PP
655 Variables may be declared global, so that they are shared amongst all
656 probes and live as long as the entire systemtap session. There is one
657 namespace for all global variables, regardless of which script file
658 they are found within. Concurrent access to global variables is
659 automatically protected with locks, see the
660 .B SAFETY AND SECURITY
661 section for more details. A global declaration may be written at the
662 outermost level anywhere, not within a block of code. Global
663 variables which are written but never read will be displayed
664 automatically at session shutdown. The translator will
665 infer for each its value type, and if it is used as an array, its key
666 types. Optionally, scalar globals may be initialized with a string
667 or number literal. The following declaration marks variables as global.
668 .RS
669 .BR global " var1" , " var2" , " var3=4"
670 .RE
671 .PP
672 Global variables can also be set as module options. One can do this by either
673 using the \-G option, or the module must first be compiled using stap \-p4.
674 Global variables can then be set on the command line when calling staprun on
675 the module generated by stap \-p4. See
676 .IR staprun (8)
677 for more information.
678 .RS
679 .RE
680 .PP
681 Arrays are limited in size by the MAXMAPENTRIES variable -- see the
682 .B SAFETY AND SECURITY
683 section for details. Optionally, global arrays may be declared with a
684 maximum size in brackets, overriding MAXMAPENTRIES for that array only.
685 Note that this doesn't indicate the type of keys for the array, just the
686 size.
687 .RS
688 .BR global " tiny_array[10]" , " normal_array" , " big_array[50000]"
689 .RE
690 .\" XXX add statistics type here once it's supported
691
692 .SS STATEMENTS
693 Statements enable procedural control flow. They may occur within
694 functions and probe handlers. The total number of statements executed
695 in response to any single probe event is limited to some number
696 defined by a macro in the translated C code, and is in the
697 neighbourhood of 1000.
698 .TP
699 EXP
700 Execute the string- or integer-valued expression and throw away
701 the value.
702 .TP
703 .BR { " STMT1 STMT2 ... " }
704 Execute each statement in sequence in this block. Note that
705 separators or terminators are generally not necessary between statements.
706 .TP
707 .BR ;
708 Null statement, do nothing. It is useful as an optional separator between
709 statements to improve syntax-error detection and to handle certain
710 grammar ambiguities.
711 .TP
712 .BR if " (EXP) STMT1 [ " else " STMT2 ]"
713 Compare integer-valued EXP to zero. Execute the first (non-zero)
714 or second STMT (zero).
715 .TP
716 .BR while " (EXP) STMT"
717 While integer-valued EXP evaluates to non-zero, execute STMT.
718 .TP
719 .BR for " (EXP1; EXP2; EXP3) STMT"
720 Execute EXP1 as initialization. While EXP2 is non-zero, execute
721 STMT, then the iteration expression EXP3.
722 .TP
723 .BR foreach " (VAR " in " ARRAY [ "limit " EXP ]) STMT"
724 Loop over each element of the named global array, assigning current
725 key to VAR. The array may not be modified within the statement.
726 By adding a single
727 .BR + " or " \-
728 operator after the VAR or the ARRAY identifier, the iteration will
729 proceed in a sorted order, by ascending or descending index or value.
730 Using the optional
731 .BR limit
732 keyword limits the number of loop iterations to EXP times. EXP is
733 evaluated once at the beginning of the loop.
734 .TP
735 .BR foreach " ([VAR1, VAR2, ...] " in " ARRAY [ "limit " EXP ]) STMT"
736 Same as above, used when the array is indexed with a tuple of keys.
737 A sorting suffix may be used on at most one VAR or ARRAY identifier.
738 .TP
739 .BR foreach " (VALUE = VAR " in " ARRAY [ "limit " EXP ]) STMT"
740 This variant of foreach saves current value into VALUE on each
741 iteration, so it is the same as ARRAY[VAR]. This also works with a
742 tuple of keys. Sorting suffixes on VALUE have the same effect as on ARRAY.
743 .TP
744 .BR break ", " continue
745 Exit or iterate the innermost nesting loop
746 .RB ( while " or " for " or " foreach )
747 statement.
748 .TP
749 .BR return " EXP"
750 Return EXP value from enclosing function. If the function's value is
751 not taken anywhere, then a return statement is not needed, and the
752 function will have a special "unknown" type with no return value.
753 .TP
754 .BR next
755 Return now from enclosing probe handler. This is especially useful in
756 probe aliases that apply event filtering predicates.
757 .TP
758 .BR try " { STMT1 } " catch " { STMT2 }"
759 Run the statements in the first block. Upon any run-time errors, abort
760 STMT1 and start executing STMT2. Any errors in STMT2 will propagate to
761 outer try/catch blocks, if any.
762 .TP
763 .BR try " { STMT1 } " catch "(VAR) { STMT2 }"
764 Same as above, plus assign the error message to the string scalar variable VAR.
765 .TP
766 .BR delete " ARRAY[INDEX1, INDEX2, ...]"
767 Remove from ARRAY the element specified by the index tuple. The value will no
768 longer be available, and subsequent iterations will not report the element.
769 It is not an error to delete an element that does not exist.
770 .TP
771 .BR delete " ARRAY"
772 Remove all elements from ARRAY.
773 .TP
774 .BR delete " SCALAR"
775 Removes the value of SCALAR. Integers and strings are cleared to 0 and ""
776 respectively, while statistics are reset to the initial empty state.
777
778 .SS EXPRESSIONS
779 Systemtap supports a number of operators that have the same general syntax,
780 semantics, and precedence as in C and awk. Arithmetic is performed as per
781 typical C rules for signed integers. Division by zero or overflow is
782 detected and results in an error.
783 .TP
784 binary numeric operators
785 .B * / % + \- >> << & ^ | && ||
786 .TP
787 binary string operators
788 .B .
789 (string concatenation)
790 .TP
791 numeric assignment operators
792 .B = *= /= %= += \-= >>= <<= &= ^= |=
793 .TP
794 string assignment operators
795 .B = .=
796 .TP
797 unary numeric operators
798 .B + \- ! ~ ++ \-\-
799 .TP
800 binary numeric or string comparison operators
801 .B < > <= >= == !=
802 .TP
803 ternary operator
804 .RB cond " ? " exp1 " : " exp2
805 .TP
806 grouping operator
807 .BR ( " exp " )
808 .TP
809 function call
810 .RB "fn " ( "[ arg1, arg2, ... ]" )
811 .TP
812 array membership check
813 .RB exp " in " array
814 .br
815 .BR "[" exp1 ", " exp2 ", " ... "] in " array
816
817 .SS PROBES
818 The main construct in the scripting language identifies probes.
819 Probes associate abstract events with a statement block ("probe
820 handler") that is to be executed when any of those events occur. The
821 general syntax is as follows:
822 .SAMPLE
823 .BR probe " PROBEPOINT [" , " PROBEPOINT] " { " [STMT ...] " }
824 .ESAMPLE
825 .PP
826 Events are specified in a special syntax called "probe points". There
827 are several varieties of probe points defined by the translator, and
828 tapset scripts may define further ones using aliases. These are
829 listed in the
830 .IR stapprobes (3stap)
831 manual pages.
832 .PP
833 The probe handler is interpreted relative to the context of each
834 event. For events associated with kernel code, this context may
835 include
836 .I variables
837 defined in the
838 .I source code
839 at that spot. These "target variables" are presented to the script as
840 variables whose names are prefixed with "$". They may be accessed
841 only if the kernel's compiler preserved them despite optimization.
842 This is the same constraint that a debugger user faces when working
843 with optimized code. Some other events have very little context.
844 See the
845 .IR stapprobes (3stap)
846 man pages to see the kinds of context variables available at each kind
847 of probe point.
848 .PP
849 New probe points may be defined using "aliases". Probe point aliases
850 look similar to probe definitions, but instead of activating a probe
851 at the given point, it just defines a new probe point name as an alias
852 to an existing one. There are two types of alias, i.e. the prologue
853 style and the epilogue style which are identified by "=" and "+="
854 respectively.
855 .PP
856 For prologue style alias, the statement block that follows an alias
857 definition is implicitly added as a prologue to any probe that refers
858 to the alias. While for the epilogue style alias, the statement block
859 that follows an alias definition is implicitly added as an epilogue to
860 any probe that refers to the alias. For example:
861
862 .SAMPLE
863 probe syscall.read = kernel.function("sys_read") {
864 fildes = $fd
865 if (execname() == "init") next # skip rest of probe
866 }
867 .ESAMPLE
868 defines a new probe point
869 .nh
870 .IR syscall.read ,
871 .hy
872 which expands to
873 .nh
874 .IR kernel.function("sys_read") ,
875 .hy
876 with the given statement as a prologue, which is useful to predefine
877 some variables for the alias user and/or to skip probe processing
878 entirely based on some conditions. And
879 .SAMPLE
880 probe syscall.read += kernel.function("sys_read") {
881 if (tracethis) println ($fd)
882 }
883 .ESAMPLE
884 defines a new probe point with the given statement as an epilogue, which
885 is useful to take actions based upon variables set or left over by the
886 the alias user. Please note that in each case, the statements in the
887 alias handler block are treated ordinarily, so that variables assigned
888 there constitute mere initialization, not a macro substitution.
889
890 An alias is used just like a built-in probe type.
891 .SAMPLE
892 probe syscall.read {
893 printf("reading fd=%d\n", fildes)
894 if (fildes > 10) tracethis = 1
895 }
896 .ESAMPLE
897
898 .SS FUNCTIONS
899 Systemtap scripts may define subroutines to factor out common work.
900 Functions take any number of scalar (integer or string) arguments, and
901 must return a single scalar (integer or string). An example function
902 declaration looks like this:
903 .SAMPLE
904 function thisfn (arg1, arg2) {
905 return arg1 + arg2
906 }
907 .ESAMPLE
908 Note the general absence of type declarations, which are instead
909 inferred by the translator. However, if desired, a function
910 definition may include explicit type declarations for its return value
911 and/or its arguments. This is especially helpful for embedded-C
912 functions. In the following example, the type inference engine need
913 only infer type type of arg2 (a string).
914 .SAMPLE
915 function thatfn:string (arg1:long, arg2) {
916 return sprint(arg1) . arg2
917 }
918 .ESAMPLE
919 Functions may call others or themselves
920 recursively, up to a fixed nesting limit. This limit is defined by
921 a macro in the translated C code and is in the neighbourhood of 10.
922
923 .SS PRINTING
924 There are a set of function names that are specially treated by the
925 translator. They format values for printing to the standard systemtap
926 output stream in a more convenient way. The
927 .IR sprint*
928 variants return the formatted string instead of printing it.
929 .TP
930 .BR print ", " sprint
931 Print one or more values of any type, concatenated directly together.
932 .TP
933 .BR println ", " sprintln
934 Print values like
935 .IR print " and " sprint ,
936 but also append a newline.
937 .TP
938 .BR printd ", " sprintd
939 Take a string delimiter and two or more values of any type, and print the
940 values with the delimiter interposed. The delimiter must be a literal
941 string constant.
942 .TP
943 .BR printdln ", " sprintdln
944 Print values with a delimiter like
945 .IR printd " and " sprintd ,
946 but also append a newline.
947 .TP
948 .BR printf ", " sprintf
949 Take a formatting string and a number of values of corresponding types,
950 and print them all. The format must be a literal string constant.
951 .PP
952 The
953 .IR printf
954 formatting directives similar to those of C, except that they are
955 fully type-checked by the translator:
956 .RS
957 .TP
958 %b
959 Writes a binary blob of the value given, instead of ASCII text. The width specifier determines the number of bytes to write; valid specifiers are %b %1b %2b %4b %8b. Default (%b) is 8 bytes.
960 .TP
961 %c
962 Character.
963 .TP
964 %d,%i
965 Signed decimal.
966 .TP
967 %m
968 Safely reads kernel memory at the given address, outputs its content. The precision specifier determines the number of bytes to read. Default is 1 byte.
969 .TP
970 %M
971 Same as %m, but outputs in hexadecimal. The minimal size of output is double the precision specifier.
972 .TP
973 %o
974 Unsigned octal.
975 .TP
976 %p
977 Unsigned pointer address.
978 .TP
979 %s
980 String.
981 .TP
982 %u
983 Unsigned decimal.
984 .TP
985 %x
986 Unsigned hex value, in all lower-case.
987 .TP
988 %X
989 Unsigned hex value, in all upper-case.
990 .TP
991 %%
992 Writes a %.
993 .RE
994 .PP
995 Examples:
996 .SAMPLE
997 a = "alice", b = "bob", p = 0x1234abcd, i = 123, j = \-1, id[a] = 1234, id[b] = 4567
998 print("hello")
999 Prints: hello
1000 println(b)
1001 Prints: bob\\n
1002 println(a . " is " . sprint(16))
1003 Prints: alice is 16
1004 foreach (name in id) printdln("|", strlen(name), name, id[name])
1005 Prints: 5|alice|1234\\n3|bob|4567
1006 printf("%c is %s; %x or %X or %p; %d or %u\\n",97,a,p,p,p,j,j)
1007 Prints: a is alice; 1234abcd or 1234ABCD or 0x1234abcd; \-1 or 18446744073709551615\\n
1008 printf("2 bytes of kernel buffer at address %p: %2m", p, p)
1009 Prints: 2 byte of kernel buffer at address 0x1234abcd: <binary data>
1010 printf("%4b", p)
1011 Prints (these values as binary data): 0x1234abcd
1012 .ESAMPLE
1013
1014 .SS STATISTICS
1015 It is often desirable to collect statistics in a way that avoids the
1016 penalties of repeatedly exclusive locking the global variables those
1017 numbers are being put into. Systemtap provides a solution using a
1018 special operator to accumulate values, and several pseudo-functions to
1019 extract the statistical aggregates.
1020 .PP
1021 The aggregation operator is
1022 .IR <<< ,
1023 and resembles an assignment, or a C++ output-streaming operation.
1024 The left operand specifies a scalar or array-index lvalue, which must
1025 be declared global. The right operand is a numeric expression. The
1026 meaning is intuitive: add the given number to the pile of numbers to
1027 compute statistics of. (The specific list of statistics to gather
1028 is given separately, by the extraction functions.)
1029 .SAMPLE
1030 foo <<< 1
1031 stats[pid()] <<< memsize
1032 .ESAMPLE
1033 .PP
1034 The extraction functions are also special. For each appearance of a
1035 distinct extraction function operating on a given identifier, the
1036 translator arranges to compute a set of statistics that satisfy it.
1037 The statistics system is thereby "on-demand". Each execution of
1038 an extraction function causes the aggregation to be computed for
1039 that moment across all processors.
1040 .PP
1041 Here is the set of extractor functions. The first argument of each is
1042 the same style of lvalue used on the left hand side of the accumulate
1043 operation. The
1044 .IR @count(v) ", " @sum(v) ", " @min(v) ", " @max(v) ", " @avg(v)
1045 extractor functions compute the number/total/minimum/maximum/average
1046 of all accumulated values. The resulting values are all simple
1047 integers.
1048 .PP
1049 Histograms are also available, but are more complicated because they
1050 have a vector rather than scalar value.
1051 .I @hist_linear(v,start,stop,interval)
1052 represents a linear histogram from "start" to "stop" by increments
1053 of "interval". The interval must be positive. Similarly,
1054 .I @hist_log(v)
1055 represents a base-2 logarithmic histogram. Printing a histogram
1056 with the
1057 .I print
1058 family of functions renders a histogram object as a tabular
1059 "ASCII art" bar chart.
1060 .SAMPLE
1061 probe foo {
1062 x <<< $value
1063 }
1064 probe end {
1065 printf ("avg %d = sum %d / count %d\\n",
1066 @avg(x), @sum(x), @count(x))
1067 print (@hist_log(v))
1068 }
1069 .ESAMPLE
1070
1071 .SS TYPECASTING
1072 Once a pointer has been saved into a script integer variable, the
1073 translator loses the type information necessary to access members from
1074 that pointer. Using the
1075 .I @cast()
1076 operator tells the translator how to read a pointer.
1077 .SAMPLE
1078 @cast(p, "type_name"[, "module"])\->member
1079 .ESAMPLE
1080 .PP
1081 This will interpret
1082 .I p
1083 as a pointer to a struct/union named
1084 .I type_name
1085 and dereference the
1086 .I member
1087 value. Further
1088 .IR \->subfield
1089 expressions may be appended to dereference more levels.
1090 .BR
1091 NOTE:
1092 the same dereferencing operator
1093 .IR \->
1094 is used to refer to both direct containment or pointer indirection.
1095 Systemtap automatically determines which. The optional
1096 .I module
1097 tells the translator where to look for information about that type.
1098 Multiple modules may be specified as a list with
1099 .IR :
1100 separators. If the module is not specified, it will default either to
1101 the probe module for dwarf probes, or to "kernel" for functions and all
1102 other probes types.
1103 .PP
1104 The translator can create its own module with type information from a header
1105 surrounded by angle brackets, in case normal debuginfo is not available. For
1106 kernel headers, prefix it with "kernel" to use the appropriate build system.
1107 All other headers are build with default GCC parameters into a user module.
1108 Multiple headers may be specified in sequence to resolve a codependency.
1109 .SAMPLE
1110 @cast(tv, "timeval", "<sys/time.h>")\->tv_sec
1111 @cast(task, "task_struct", "kernel<linux/sched.h>")\->tgid
1112 @cast(task, "task_struct",
1113 "kernel<linux/sched.h><linux/fs_struct.h>")\->fs\->umask
1114 .ESAMPLE
1115 Values acquired by
1116 .BR @cast
1117 may be pretty-printed by the
1118 .BR
1119 $ " and " $$
1120 suffix operators, the same way as described in the CONTEXT VARIABLES
1121 section of the
1122 .IR stapprobes (3stap)
1123 manual page.
1124
1125 .PP
1126 When in guru mode, the translator will also allow scripts to assign new
1127 values to members of typecasted pointers.
1128 .PP
1129 Typecasting is also useful in the case of
1130 .I void*
1131 members whose type may be determinable at runtime.
1132 .SAMPLE
1133 probe foo {
1134 if ($var\->type == 1) {
1135 value = @cast($var\->data, "type1")\->bar
1136 } else {
1137 value = @cast($var\->data, "type2")\->baz
1138 }
1139 print(value)
1140 }
1141 .ESAMPLE
1142
1143 .SS EMBEDDED C
1144 When in guru mode, the translator accepts embedded code in the
1145 script. Such code is enclosed between
1146 .IR %{
1147 and
1148 .IR %}
1149 markers, and is transcribed verbatim, without analysis, in some
1150 sequence, into the generated C code. At the outermost level, this may
1151 be useful to add
1152 .IR #include
1153 instructions, and any auxiliary definitions for use by other embedded
1154 code.
1155 .PP
1156 Another place where embedded code is permitted is as a function body.
1157 In this case, the script language body is replaced entirely by a piece
1158 of C code enclosed again between
1159 .IR %{ " and " %}
1160 markers.
1161 This C code may do anything reasonable and safe. There are a number
1162 of undocumented but complex safety constraints on atomicity,
1163 concurrency, resource consumption, and run time limits, so this
1164 is an advanced technique.
1165 .PP
1166 The memory locations set aside for input and output values
1167 are made available to it using a macro
1168 .IR THIS .
1169 Here are some examples:
1170 .SAMPLE
1171 function add_one (val) %{
1172 THIS\->__retvalue = THIS\->val + 1;
1173 %}
1174 function add_one_str (val) %{
1175 strlcpy (THIS\->__retvalue, THIS\->val, MAXSTRINGLEN);
1176 strlcat (THIS\->__retvalue, "one", MAXSTRINGLEN);
1177 %}
1178 .ESAMPLE
1179 The function argument and return value types have to be inferred by
1180 the translator from the call sites in order for this to work. The
1181 user should examine C code generated for ordinary script-language
1182 functions in order to write compatible embedded-C ones.
1183 .PP
1184 The last place where embedded code is permitted is as an expression rvalue.
1185 In this case, the C code enclosed between
1186 .IR %{ " and " %}
1187 markers is interpreted as an ordinary expression value. It is assumed
1188 to be a normal 64-bit signed number, unless the marker
1189 .I /* string */
1190 is included, in which case it's treated as a string.
1191 .SAMPLE
1192 function add_one (val) {
1193 return val + %{ 1 %}
1194 }
1195 function add_string_two (val) {
1196 return val . %{ /* string */ "two" %}
1197 }
1198 .ESAMPLE
1199 .PP
1200 The embedded-C code may contain markers to assert optimization
1201 and safety properties.
1202 .TP
1203 .I /* pure */
1204 means that the C code has no side effects and may be elided entirely if its
1205 value is not used by script code.
1206 .TP
1207 .I /* unprivileged */
1208 means that the C code is so safe that even unprivileged users are permitted
1209 to use it.
1210 .TP
1211 .I /* myproc\-unprivileged */
1212 means that the C code is so safe that even unprivileged users are permitted
1213 to use it, provided that the target of the current probe is within the user's
1214 own process.
1215 .TP
1216 .I /* guru */
1217 means that the C code is so unsafe that a systemtap user must specify
1218 .IR \-g
1219 (guru mode) to use this.
1220 .TP
1221 .I /* string */
1222 in embedded-C expressions only, means that the expression has
1223 .I const char *
1224 type and should be treated as a string value, instead of
1225 the default long numeric.
1226
1227 .SS BUILT-INS
1228 A set of builtin functions and probe point aliases are provided
1229 by the scripts installed in the directory specified in the stappaths (7)
1230 manual page. The functions are described in the
1231 .IR stapfuncs "(3stap) and " stapprobes (3stap)
1232 manual pages.
1233
1234 .SH PROCESSING
1235 The translator begins pass 1 by parsing the given input script,
1236 and all scripts (files named
1237 .IR *.stp )
1238 found in a tapset directory. The directories listed
1239 with
1240 .BR \-I
1241 are processed in sequence, each processed in "guru mode". For each
1242 directory, a number of subdirectories are also searched. These
1243 subdirectories are derived from the selected kernel version (the
1244 .BR \-R
1245 option),
1246 in order to allow more kernel-version-specific scripts to override less
1247 specific ones. For example, for a kernel version
1248 .IR 2.6.12\-23.FC3
1249 the following patterns would be searched, in sequence:
1250 .IR 2.6.12\-23.FC3/*.stp ,
1251 .IR 2.6.12/*.stp ,
1252 .IR 2.6/*.stp ,
1253 and finally
1254 .IR *.stp
1255 Stopping the translator after pass 1 causes it to print the parse trees.
1256
1257 .PP
1258 In pass 2, the translator analyzes the input script to resolve symbols
1259 and types. References to variables, functions, and probe aliases that
1260 are unresolved internally are satisfied by searching through the
1261 parsed tapset scripts. If any tapset script is selected because it
1262 defines an unresolved symbol, then the entirety of that script is
1263 added to the translator's resolution queue. This process iterates
1264 until all symbols are resolved and a subset of tapset scripts is
1265 selected.
1266 .PP
1267 Next, all probe point descriptions are validated
1268 against the wide variety supported by the translator. Probe points that
1269 refer to code locations ("synchronous probe points") require the
1270 appropriate kernel debugging information to be installed. In the
1271 associated probe handlers, target-side variables (whose names begin
1272 with "$") are found and have their run-time locations decoded.
1273 .PP
1274 Next, all probes and functions are analyzed for optimization
1275 opportunities, in order to remove variables, expressions, and
1276 functions that have no useful value and no side-effect. Embedded-C
1277 functions are assumed to have side-effects unless they include the
1278 magic string
1279 .BR /*\ pure\ */ .
1280 Since this optimization can hide latent code errors such as type
1281 mismatches or invalid $target variables, it sometimes may be useful
1282 to disable the optimizations with the
1283 .BR \-u
1284 option.
1285 .PP
1286 Finally, all variable, function, parameter, array, and index types are
1287 inferred from context (literals and operators). Stopping the
1288 translator after pass 2 causes it to list all the probes, functions,
1289 and variables, along with all inferred types. Any inconsistent or
1290 unresolved types cause an error.
1291
1292 .PP
1293 In pass 3, the translator writes C code that represents the actions
1294 of all selected script files, and creates a
1295 .IR Makefile
1296 to build that into a kernel object. These files are placed into a
1297 temporary directory. Stopping the translator at this point causes
1298 it to print the contents of the C file.
1299
1300 .PP
1301 In pass 4, the translator invokes the Linux kernel build system to
1302 create the actual kernel object file. This involves running
1303 .IR make
1304 in the temporary directory, and requires a kernel module build
1305 system (headers, config and Makefiles) to be installed in the usual
1306 spot
1307 .IR /lib/modules/VERSION/build .
1308 Stopping the translator after pass 4 is the last chance before
1309 running the kernel object. This may be useful if you want to
1310 archive the file.
1311
1312 .PP
1313 In pass 5, the translator invokes the systemtap auxiliary program
1314 .I staprun
1315 program for the given kernel object. This program arranges to load
1316 the module then communicates with it, copying trace data from the
1317 kernel into temporary files, until the user sends an interrupt signal.
1318 Any run-time error encountered by the probe handlers, such as running
1319 out of memory, division by zero, exceeding nesting or runtime limits,
1320 results in a soft error indication. Soft errors in excess of
1321 MAXERRORS block of all subsequent probes (except error-handling
1322 probes), and terminate the session. Finally,
1323 .I staprun
1324 unloads the module, and cleans up.
1325
1326 .SS ABNORMAL TERMINATION
1327
1328 One should avoid killing the stap process forcibly, for example with
1329 SIGKILL, because the stapio process (a child process of the stap
1330 process) and the loaded module may be left running on the system. If
1331 this happens, send SIGTERM or SIGINT to any remaining stapio
1332 processes, then use rmmod to unload the systemtap module.
1333
1334
1335 .SH EXAMPLES
1336 See the
1337 .IR stapex (3stap)
1338 manual page for a collection of samples.
1339
1340 .SH CACHING
1341 The systemtap translator caches the pass 3 output (the generated C
1342 code) and the pass 4 output (the compiled kernel module) if pass 4
1343 completes successfully. This cached output is reused if the same
1344 script is translated again assuming the same conditions exist (same kernel
1345 version, same systemtap version, etc.). Cached files are stored in
1346 the
1347 .I $SYSTEMTAP_DIR/cache
1348 directory. The cache can be limited by having the file
1349 .I cache_mb_limit
1350 placed in the cache directory (shown above) containing only an ASCII
1351 integer representing how many MiB the cache should not exceed. Note that
1352 this is a 'soft' limit in that the cache will be cleaned after a new entry
1353 is added, so the total cache size may temporarily exceed this limit. In the
1354 absence of this file, a default will be created with the limit set to 64MiB.
1355
1356 .SH SAFETY AND SECURITY
1357 Systemtap is an administrative tool. It exposes kernel internal data
1358 structures and potentially private user information.
1359
1360 To actually run the kernel objects it builds, a user must be one of
1361 the following:
1362 .IP \(bu 4
1363 the root user;
1364 .IP \(bu 4
1365 a member of the
1366 .I stapdev
1367 and
1368 .I stapusr
1369 groups; or
1370 .IP \(bu 4
1371 a member of the
1372 .I stapusr
1373 group.
1374 .PP
1375 The root user or a user who is a member of both the
1376 .I stapdev
1377 and
1378 .I stapusr
1379 groups can build and run any systemtap script.
1380 Members of the
1381 .I stapusr
1382 group can only use pre\-built modules under the following conditions:
1383 .IP \(bu 4
1384 The module is located in
1385 the /lib/modules/VERSION/systemtap directory. This directory
1386 must be owned by root and not be world writable.
1387 .IP \(bu 4
1388 The module has been signed by a trusted signer. Trusted signers are normally
1389 systemtap compile\-servers which sign modules when the \-\-unprivileged option is
1390 specified by the client. See the
1391 .IR stap\-server (8)
1392 manual page for more information.
1393 .PP
1394 The kernel modules generated by
1395 .I stap
1396 program are run by the
1397 .IR staprun
1398 program. The latter is a part of the Systemtap package, dedicated to
1399 module loading and unloading (but only in the white zone), and
1400 kernel-to-user data transfer. Since
1401 .IR staprun
1402 does not perform any additional security checks on the kernel objects
1403 it is given, it would be unwise for a system administrator to add
1404 untrusted users to the
1405 .I stapdev
1406 or
1407 .I stapusr
1408 groups.
1409 .PP
1410 The translator asserts certain safety constraints. It aims to ensure
1411 that no handler routine can run for very long, allocate memory,
1412 perform unsafe operations, or in unintentionally interfere with the
1413 kernel. Uses of script global variables are automatically read/write
1414 locked as appropriate, to protect against manipulation by concurrent probe
1415 handlers. (Deadlocks are detected with timeouts. Use the
1416 .BR \-t
1417 flag to receive reports of excessive lock contention.) Use of guru mode
1418 constructs such as embedded C can violate these constraints, leading
1419 to kernel crash or data corruption.
1420 .PP
1421 The resource use limits are set by macros in the generated C code.
1422 These may be overridden with the
1423 .BR \-D
1424 flag. A selection of these is as follows:
1425 .TP
1426 MAXNESTING
1427 Maximum number of nested function calls. Default determined by
1428 script analysis, with a bonus 10 slots added for recursive
1429 scripts.
1430 .TP
1431 MAXSTRINGLEN
1432 Maximum length of strings, default 128.
1433 .TP
1434 MAXTRYLOCK
1435 Maximum number of iterations to wait for locks on global variables
1436 before declaring possible deadlock and skipping the probe, default 1000.
1437 .TP
1438 MAXACTION
1439 Maximum number of statements to execute during any single probe hit
1440 (with interrupts disabled),
1441 default 1000.
1442 .TP
1443 MAXACTION_INTERRUPTIBLE
1444 Maximum number of statements to execute during any single probe hit
1445 which is executed with interrupts enabled (such as begin/end probes),
1446 default (MAXACTION * 10).
1447 .TP
1448 MAXBACKTRACE
1449 Maximum number of stack frames that will be be processed by the stap
1450 runtime unwinder as produced by the backtrace functions in the
1451 [u]context-unwind.stp tapsets, default 20.
1452 .TP
1453 MAXMAPENTRIES
1454 Default maximum number of rows in any single global array, default 2048.
1455 Individual arrays may be declared with a larger or smaller limit instead:
1456 .SAMPLE
1457 global big[10000],little[5]
1458 .ESAMPLE
1459 .TP
1460 MAXERRORS
1461 Maximum number of soft errors before an exit is triggered, default 0, which
1462 means that the first error will exit the script.
1463 .TP
1464 MAXSKIPPED
1465 Maximum number of skipped probes before an exit is triggered, default 100.
1466 Running systemtap with \-t (timing) mode gives more details about skipped
1467 probes. With the default \-DINTERRUPTIBLE=1 setting, probes skipped due to
1468 reentrancy are not accumulated against this limit.
1469 .TP
1470 MINSTACKSPACE
1471 Minimum number of free kernel stack bytes required in order to
1472 run a probe handler, default 1024. This number should be large enough
1473 for the probe handler's own needs, plus a safety margin.
1474 .TP
1475 MAXUPROBES
1476 Maximum number of concurrently armed user-space probes (uprobes), default
1477 somewhat larger than the number of user-space probe points named in the script.
1478 This pool needs to be potentialy large because individual uprobe objects (about
1479 64 bytes each) are allocated for each process for each matching script-level probe.
1480 .TP
1481 STP_MAXMEMORY
1482 Maximum amount of memory (in kilobytes) that the systemtap module
1483 should use, default unlimited. The memory size includes the size of
1484 the module itself, plus any additional allocations. This only tracks
1485 direct allocations by the systemtap runtime. This does not track
1486 indirect allocations (as done by kprobes/uprobes/etc. internals).
1487 .TP
1488 TASK_FINDER_VMA_ENTRY_ITEMS
1489 Maximum number of VMA pages that will be tracked at runtime. This might
1490 get exhausted for system wide probes inspecting shared library variables
1491 and/or user backtraces. Defaults to 1536.
1492 .TP
1493 STP_PROCFS_BUFSIZE
1494 Size of procfs probe read buffers (in bytes). Defaults to
1495 .IR MAXSTRINGLEN .
1496 This value can be overridden on a per-procfs file basis using the
1497 procfs read probe
1498 .I .maxsize(MAXSIZE)
1499 parameter.
1500 .PP
1501 With scripts that contain probes on any interrupt path, it is possible that
1502 those interrupts may occur in the middle of another probe handler. The probe
1503 in the interrupt handler would be skipped in this case to avoid reentrance.
1504 To work around this issue, execute stap with the option
1505 .BR \-DINTERRUPTIBLE=0
1506 to mask interrupts throughout the probe handler. This does add some extra
1507 overhead to the probes, but it may prevent reentrance for common problem
1508 cases. However, probes in NMI handlers and in the callpath of the stap
1509 runtime may still be skipped due to reentrance.
1510
1511 .PP
1512 Multiple scripts can write data into a relay buffer concurrently. A host
1513 script provides an interface for accessing its relay buffer to guest scripts.
1514 Then, the output of the guests are merged into the output of the host.
1515 To run a script as a host, execute stap with
1516 .BR \-DRELAYHOST[=name]
1517 option. The
1518 .BR name
1519 identifies your host script among several hosts.
1520 While running the host, execute stap with
1521 .BR \-DRELAYGUEST[=name]
1522 to add a guest script to the host.
1523 Note that you must unload guests before unloading a host. If there are some
1524 guests connected to the host, unloading the host will be failed.
1525
1526 .PP
1527 In case something goes wrong with
1528 .IR stap " or " staprun
1529 after a probe has already started running, one may safely kill both
1530 user processes, and remove the active probe kernel module with
1531 .IR rmmod .
1532 Any pending trace messages may be lost.
1533
1534 .PP
1535 In addition to the methods outlined above, the generated kernel module
1536 also uses overload processing to make sure that probes can't run for
1537 too long. If more than STP_OVERLOAD_THRESHOLD cycles (default
1538 500000000) have been spent in all the probes on a single cpu during
1539 the last STP_OVERLOAD_INTERVAL cycles (default 1000000000), the probes
1540 have overloaded the system and an exit is triggered.
1541 .PP
1542 By default, overload processing is turned on for all modules. If you
1543 would like to disable overload processing, define STP_NO_OVERLOAD (or
1544 its alias STAP_NO_OVERLOAD).
1545
1546 .SH UNPRIVILEGED USERS
1547
1548 Systemtap exposes kernel internal data
1549 structures and potentially private user information. Because of this, use of
1550 systemtap's full capabilities are restricted to root and to users who are
1551 members of the groups stapdev and stapusr.
1552
1553 However, a restricted set of systemtap's features can be made available to
1554 trusted, unprivileged users. These users are members of the group stapusr
1555 only. These users can load systemtap modules which have been compiled and
1556 certified by a trusted systemtap compile\-server. See the descriptions of the
1557 options \fI\-\-unprivileged\fR and \fI\-\-use\-server\fR. See
1558 \fIREADME.unprivileged\fR in the systemtap source code for information about
1559 setting up a trusted compile server.
1560
1561 The restrictions enforced when \fI\-\-unprivileged\fR is specified are designed
1562 to prevent unprivileged users from:
1563 .RS
1564 .IP \(bu 4
1565 harming the system maliciously.
1566 .IP \(bu 4
1567 gaining access to information which would not normally be available to an
1568 unprivileged user.
1569 .IP \(bu 4
1570 disrupting the performance of processes owned by other users of the system.
1571 Some overhead to the system in general is unavoidable since the
1572 unprivileged user's probes
1573 will be triggered at the appropriate times. What we would like to avoid is
1574 targeted interruption of another user's processes which would not normally be
1575 possible by an unprivileged user.
1576 .RE
1577
1578 .SS PROBE RESTRICTIONS
1579 An unprivileged user may only use the following probes:
1580
1581 .RS
1582 .IP \(bu 4
1583 begin, begin(n)
1584 .IP \(bu 4
1585 end, end(n)
1586 .IP \(bu 4
1587 error(n)
1588 .IP \(bu 4
1589 never
1590 .IP \(bu 4
1591 process.*, where the target process is owned by the user.
1592 .IP \(bu 4
1593 timer.{jiffies,s,sec,ms,msec,us,usec,ns,nsec}(n)*
1594 .IP \(bu 4
1595 timer.hz(n)
1596 .RE
1597
1598 .SS SCRIPTING LANGUAGE RESTRICTIONS
1599 The following scripting language features are unavailable to unprivileged users:
1600
1601 .RS
1602 .IP \(bu 4
1603 any feature enabled by the Guru Mode (-g) option.
1604 .IP \(bu 4
1605 embedded C code.
1606 .RE
1607
1608 .SS RUNTIME RESTRICTIONS
1609 The following runtime restrictions are placed upon unprivileged users:
1610
1611 .RS
1612 .IP \(bu 4
1613 Only the default runtime code (see \fI-R\fR) may be used.
1614 .IP \(bu 4
1615 Probing of processes owned by other users is not permitted.
1616 .IP \(bu 4
1617 Access of kernel memory (read and write) is not permitted.
1618 .RE
1619
1620 .SS COMMAND LINE OPTION RESTRICTIONS
1621 Some command line options provide access to features which must not be available
1622 to unprivileged users:
1623
1624 .RS
1625 .IP \(bu 4
1626 -g may not be specified.
1627 .IP \(bu 4
1628 The following options may not be used by the compile-server client:
1629 .SAMPLE
1630 -a, -B, -D, -I, -r, -R
1631 .ESAMPLE
1632 .RE
1633
1634 .SS ENVIRONMENT RESTRICTIONS
1635 The following environment variables must not be set:
1636 .SAMPLE
1637
1638 SYSTEMTAP_RUNTIME
1639 SYSTEMTAP_TAPSET
1640 SYSTEMTAP_DEBUGINFO_PATH
1641 .ESAMPLE
1642
1643 .SS TAPSET RESTRICTIONS
1644 The following built-in tapset functions are unconditionally available to unprivileged
1645 users:
1646 .SAMPLE
1647
1648 _ehostunreach:long ()
1649 _enetunreach:long ()
1650 _icmp_dest_unreach:long ()
1651 _icmp_exc_fragtime:long ()
1652 _icmp_prot_unreach:long ()
1653 _icmp_time_exceeded:long ()
1654 _MM_ANONPAGES:long()
1655 _MM_FILEPAGES:long()
1656 _net_rx_drop:long ()
1657 _rtn_broadcast:long ()
1658 _rtn_multicast:long ()
1659 _rtn_unspec:long ()
1660 _sys_pipe2_flag_str:string (f:long)
1661 AF_INET:long()
1662 cpu:long ()
1663 cputime_to_msecs:long (cputime:long)
1664 egid:long ()
1665 error (msg:string)
1666 euid:long ()
1667 execname:string ()
1668 exit ()
1669 get_cycles:long ()
1670 gettimeofday_ns:long ()
1671 GFP_KERNEL:long()
1672 gid:long ()
1673 HZ:long ()
1674 is_myproc:long ()
1675 isdigit:long(str:string)
1676 isinstr:long(s1:string,s2:string)
1677 jiffies:long ()
1678 log (msg:string)
1679 mem_page_size:long ()
1680 module_name:string ()
1681 pexecname:string ()
1682 pgrp:long ()
1683 pid:long ()
1684 pn:string ()
1685 pp:string ()
1686 ppid:long ()
1687 randint:long(n:long)
1688 registers_valid:long ()
1689 sid:long ()
1690 str_replace:string (prnt_str:string, srch_str:string, rplc_str:string)
1691 stringat:long(str:string, pos:long)
1692 strlen:long(s:string)
1693 strtol:long(str:string, base:long)
1694 substr:string(str:string,start:long, length:long)
1695 target:long ()
1696 task_utime:long ()
1697 task_stime:long ()
1698 text_str:string(input:string)
1699 text_strn:string(input:string, len:long, quoted:long)
1700 tid:long ()
1701 tokenize:string(input:string, delim:string)
1702 tz_gmtoff() {
1703 tz_name() {
1704 uid:long ()
1705 user_mode:long ()
1706 warn (msg:string)
1707 .ESAMPLE
1708
1709 The following built-in tapset functions are available to unprivileged users
1710 within their own processes. Scripts written by unprivileged users must test the
1711 result of the tapset function \fIis_myproc\fR and only call these functions if
1712 the result is 1. The script will exit immediately if any of these functions is
1713 called by an unprivileged user within a probe within a process which is not
1714 owned by that user.
1715 .SAMPLE
1716
1717 _utrace_syscall_nr:long ()
1718 _utrace_syscall_arg:long (n:long)
1719 _utrace_syscall_return:long ()
1720 print_ubacktrace ()
1721 print_ubacktrace_brief ()
1722 print_ustack(stk:string)
1723 sprint_ubacktrace:string ()
1724 uaddr:long ()
1725 ubacktrace:string ()
1726 umodname:string (addr:long)
1727 user_char:long (addr:long)
1728 user_char_warn:long (addr:long)
1729 user_int:long (addr:long)
1730 user_int_warn:long (addr:long)
1731 user_int16:long (addr:long)
1732 user_int32:long (addr:long)
1733 user_int64:long (addr:long)
1734 user_int8:long (addr:long)
1735 user_long:long (addr:long)
1736 user_long_warn:long (addr:long)
1737 user_short:long (addr:long)
1738 user_short_warn:long (addr:long)
1739 user_string_quoted:string (addr:long)
1740 user_string_n_quoted:string (addr:long, n:long)
1741 user_string_n_warn:string (addr:long, n:long)
1742 user_string_n2:string (addr:long, n:long, err_msg:string)
1743 user_string_warn:string (addr:long)
1744 user_string2:string (addr:long, err_msg:string)
1745 user_uint16:long (addr:long)
1746 user_uint32:long (addr:long)
1747 user_uint8:long (addr:long)
1748 user_ushort:long (addr:long)
1749 user_ushort_warn:long (addr:long)
1750 usymdata:string (addr: long)
1751 usymname:string (addr: long)
1752 .ESAMPLE
1753
1754 No other built-in tapset functions may be used by unprivileged users.
1755
1756 .\" PR6864: disable temporarily
1757 .\".SH MAKING DO WITH SYMBOL TABLES
1758 .\"Systemtap performs best when it has access to the debugging information
1759 .\"associated with your kernel and modules.
1760 .\"However, if this information is not available,
1761 .\"systemtap can still support probing of function entries and returns
1762 .\"using symbols read from vmlinux and/or the modules in /lib/modules.
1763 .\"Systemtap can also read the kernel symbol table from a text file
1764 .\"such as /boot/System.map or /proc/kallsyms.
1765 .\"See the
1766 .\".B \-\-kelf
1767 .\"and
1768 .\".B \-\-kmap
1769 .\"options.
1770 .\".PP
1771 .\"If systemtap finds relevant debugging information,
1772 .\"it will use it even if you specify
1773 .\".B \-\-kelf
1774 .\"or
1775 .\".BR \-\-kmap .
1776 .\".PP
1777 .\"Without debugging information, systemtap cannot support the
1778 .\"following types of language constructs:
1779 .\".IP \(bu 4
1780 .\"probe specifications that refer to source files or line numbers
1781 .\".IP \(bu 4
1782 .\"probe specifications that refer to inline functions
1783 .\".IP \(bu 4
1784 .\"statements that refer to $target variables
1785 .\".IP \(bu 4
1786 .\"statements that refer to @cast() variables
1787 .\".IP \(bu 4
1788 .\"tapset-defined variables defined using any of the above constructs.
1789 .\"In particular, at this writing,
1790 .\"the prologue blocks for certain aliases in the syscall tapset
1791 .\"(e.g., syscall.open) contain "if" statements that refer to $target variables.
1792 .\"If your script refers to any such aliases,
1793 .\"systemtap must have access to the kernel's debugging information.
1794 .\".PP
1795 .\"Most T and t symbols correspond to function entry points, but some do not.
1796 .\"Based only on the symbol table, systemtap cannot tell the difference.
1797 .\"Placing return probes on symbols that aren't entry points
1798 .\"will most likely lead to kernel stack corruption.
1799
1800 .SH EXIT STATUS
1801
1802 The systemtap translator generally returns with a success code of 0 if
1803 the requested script was processed and executed successfully through
1804 the requested pass. Otherwise, errors may be printed to stderr and
1805 a failure code is returned. Use
1806 .I \-v
1807 or
1808 .I \-vp N
1809 to increase (global or per-pass) verbosity to identify the source of the
1810 trouble.
1811
1812 In listings mode
1813 .RI ( \-l " and " \-L ),
1814 error messages are normally suppressed. A success code of 0 is returned
1815 if at least one matching probe was found.
1816
1817 A script executing in pass 5 that is interrupted with ^C / SIGINT is
1818 considered to be successful.
1819
1820 .SH DEPRECATION
1821
1822 Over time, some features of the script language and the tapset library
1823 may undergo incompatible changes, so that a script written against
1824 an old version of systemtap may no longer run. In these cases, it may
1825 help to run systemtap with the
1826 .I \-\-compatible VERSION
1827 flag, specifying the last known working version of systemtap. Running
1828 systemtap with the
1829 .I \-\-check\-version
1830 flag will output a warning if any possible incompatible elements have
1831 been parsed. Below is a table of recently deprecated tapset functions
1832 and syntax elements that require the given \-\-compatible flag to use:
1833 .PP
1834 .TP
1835 \-\-compatible=1.2
1836 (none yet)
1837 .TP
1838 \-\-compatible=1.3
1839 The tapset alias 'syscall.compat_pselect7a' was misnamed. It should
1840 have been 'syscall.compat_pselect7' (without the trailing 'a').
1841 Starting in release 1.4, the old name will be deprecated.
1842 .TP
1843 \-\-compatible=1.4
1844 In the 'syscall.add_key' probe, the 'description_auddr' variable
1845 has been deprecated in favor of the new 'description_uaddr'
1846 variable.
1847 .IP
1848 In the 'syscall.fgetxattr', 'syscall.fsetxattr', 'syscall.getxattr',
1849 \'syscall.lgetxattr', 'syscall.lremovexattr', 'nd_syscall.fgetxattr',
1850 \'nd_syscall.fremovexattr', 'nd_syscall.fsetxattr', 'nd_syscall.getxattr',
1851 and 'nd_syscall.lremovexattr' probes, the 'name2' variable has been
1852 deprecated in favor of the new 'name_str' variable.
1853 .IP
1854 In the 'nd_syscall.accept' probe the 'flag_str' variable
1855 has been deprecated in favor of the new 'flags_str' variable.
1856 .IP
1857 In the 'nd_syscall.dup' probe the 'old_fd' variable has been
1858 deprecated in favor of the new 'oldfd' variable.
1859 .IP
1860 The tapset alias 'nd_syscall.compat_pselect7a' was misnamed. It should
1861 have been 'nd_syscall.compat_pselect7' (without the trailing 'a').
1862 .IP
1863 The tapset function 'cpuid' is deprecated in favor of the better known 'cpu'.
1864 .IP
1865 In the i386 'syscall.sigaltstack' probe, the 'ussp' variable has
1866 been deprecated in favor of the new 'uss_uaddr' variable.
1867 .IP
1868 In the ia64 'syscall.sigaltstack' probe, the 'ss_uaddr' and
1869 \'oss_uaddr' variables have been deprecated in favor of the new
1870 \'uss_uaddr' and 'uoss_uaddr' variables.
1871 .IP
1872 The powerpc tapset alias 'syscall.compat_sysctl' was deprecated
1873 and renamed 'syscall.sysctl32'.
1874 .IP
1875 In the x86_64 'syscall.sigaltstack' probe, the 'regs_uaddr'
1876 variable has been deprecated in favor of the new 'regs' variable.
1877 .\" e.g. tapset_function()
1878 .\" e.g. post-incrementing a frobozz in a while loop
1879
1880 .\" .... or for really deprecated stuff:
1881 .\" .TP
1882 .\" support removed in version X.Y
1883 .\" really_old_tapset_function()
1884
1885 .SH FILES
1886 .\" consider autoconf-substituting these directories
1887 .TP
1888 Important files and their corresponding paths can be located in the
1889 stappaths (7) manual page.
1890
1891 .SH SEE ALSO
1892 .IR stapprobes (3stap),
1893 .IR stapfuncs (3stap),
1894 .IR stappaths (7),
1895 .IR staprun (8),
1896 .IR stapvars (3stap),
1897 .IR stapex (3stap),
1898 .IR stap\-server (8),
1899 .IR awk (1),
1900 .IR gdb (1)
1901
1902 .SH BUGS
1903 Use the Bugzilla link of the project web page or our mailing list.
1904 .nh
1905 .BR http://sourceware.org/systemtap/ , <systemtap@sourceware.org> .
1906 .hy
This page took 0.128528 seconds and 5 git commands to generate.