stapprobes.3stap

   1 .\" t
   2 .TH STAPPROBES 3stap
   3 .SH NAME
   4 stapprobes \- systemtap probe points
   5
   6 .\" macros
   7 .de SAMPLE
   8 .br
   9 .RS
  10 .nf
  11 .nh
  12 ..
  13 .de ESAMPLE
  14 .hy
  15 .fi
  16 .RE
  17 ..
  18
  19 .SH DESCRIPTION
  20 The following sections enumerate the variety of probe points supported
  21 by the systemtap translator, and some of the additional aliases defined by
  22 standard tapset scripts.  Many are individually documented in the
  23 .IR 3stap
  24 manual section, with the
  25 .IR probe::
  26 prefix.
  27 .PP
  28 The general probe point syntax is a dotted-symbol sequence.  This
  29 allows a breakdown of the event namespace into parts, somewhat like
  30 the Domain Name System does on the Internet.  Each component
  31 identifier may be parametrized by a string or number literal, with a
  32 syntax like a function call.  A component may include a "*" character,
  33 to expand to a set of matching probe points.  It may also include "**"
  34 to match multiple sequential components at once.  Probe aliases likewise
  35 expand to other probe points.  Each and every resulting probe point is
  36 normally resolved to some low-level system instrumentation facility
  37 (e.g., a kprobe address, marker, or a timer configuration), otherwise
  38 the elaboration phase will fail.
  39 .PP
  40 However, a probe point may be followed by a "?" character, to indicate
  41 that it is optional, and that no error should result if it fails to
  42 resolve.  Optionalness passes down through all levels of
  43 alias/wildcard expansion.  Alternately, a probe point may be followed
  44 by a "!" character, to indicate that it is both optional and
  45 sufficient.  (Think vaguely of the Prolog cut operator.) If it does
  46 resolve, then no further probe points in the same comma-separated list
  47 will be resolved.  Therefore, the "!"  sufficiency mark only makes
  48 sense in a list of probe point alternatives.
  49 .PP
  50 Additionally, a probe point may be followed by a "if (expr)" statement, in
  51 order to enable/disable the probe point on-the-fly. With the "if" statement,
  52 if the "expr" is false when the probe point is hit, the whole probe body
  53 including alias's body is skipped. The condition is stacked up through
  54 all levels of alias/wildcard expansion. So the final condition becomes
  55 the logical-and of conditions of all expanded alias/wildcard.
  56
  57 These are all
  58 .B syntactically
  59 valid probe points.  (They are generally
  60 .B semantically
  61 invalid, depending on the contents of the tapsets, and the versions of
  62 kernel/user software installed.)
  63
  64 .SAMPLE
  65 kernel.function("foo").return
  66 process("/bin/vi").statement(0x2222)
  67 end
  68 syscall.*
  69 sys**open
  70 kernel.function("no_such_function") ?
  71 module("awol").function("no_such_function") !
  72 signal.*? if (switch)
  73 kprobe.function("foo")
  74 .ESAMPLE
  75
  76 Probes may be broadly classified into "synchronous" and
  77 "asynchronous".  A "synchronous" event is deemed to occur when any
  78 processor executes an instruction matched by the specification.  This
  79 gives these probes a reference point (instruction address) from which
  80 more contextual data may be available.  Other families of probe points
  81 refer to "asynchronous" events such as timers/counters rolling over,
  82 where there is no fixed reference point that is related.  Each probe
  83 point specification may match multiple locations (for example, using
  84 wildcards or aliases), and all them are then probed.  A probe
  85 declaration may also contain several comma-separated specifications,
  86 all of which are probed.
  87
  88 .SH DWARF DEBUGINFO
  89
  90 Resolving some probe points requires DWARF debuginfo or "debug
  91 symbols" for the specific part being instrumented.  For some others,
  92 DWARF is automatically synthesized on the fly from source code header
  93 files.  For others, it is not needed at all.  Since a systemtap script
  94 may use any mixture of probe points together, the union of their DWARF
  95 requirements has to be met on the computer where script compilation
  96 occurs.  (See the \fI\-\-use\-server\fR option and the \fBstap-server\
  97 (8)\fR man page for information about the remote compilation facility,
  98 which allows these requirements to be met on a different machine.)
  99 .PP
 100 The following point lists many of the available probe point families,
 101 to classify them with respect to their need for DWARF debuginfo.
 102
 103 .TS
 104 l l l.
 105 \fBDWARF        NON-DWARF\fP
 106
 107 kernel.function, .statement     kernel.mark
 108 module.function, .statement     process.mark
 109 process.function, .statement    begin, end, error, never
 110 process.mark \fI(backup)\fP     timer
 111         perf
 112         procfs
 113 \fBAUTO-DWARF\fP        kernel.statement.absolute
 114         kernel.data
 115 kernel.trace    kprobe.function
 116         process.statement.absolute
 117         process.begin, .end, .error
 118 .TE
 119
 120 .SH PROBE POINT FAMILIES
 121
 122 .SS BEGIN/END/ERROR
 123
 124 The probe points
 125 .IR begin " and " end
 126 are defined by the translator to refer to the time of session startup
 127 and shutdown.  All "begin" probe handlers are run, in some sequence,
 128 during the startup of the session.  All global variables will have
 129 been initialized prior to this point.  All "end" probes are run, in
 130 some sequence, during the
 131 .I normal
 132 shutdown of a session, such as in the aftermath of an
 133 .I exit ()
 134 function call, or an interruption from the user.  In the case of an
 135 error-triggered shutdown, "end" probes are not run.  There are no
 136 target variables available in either context.
 137 .PP
 138 If the order of execution among "begin" or "end" probes is significant,
 139 then an optional sequence number may be provided:
 140
 141 .SAMPLE
 142 begin(N)
 143 end(N)
 144 .ESAMPLE
 145
 146 The number N may be positive or negative.  The probe handlers are run in
 147 increasing order, and the order between handlers with the same sequence
 148 number is unspecified.  When "begin" or "end" are given without a
 149 sequence, they are effectively sequence zero.
 150
 151 The
 152 .IR error
 153 probe point is similar to the
 154 .IR end
 155 probe, except that each such probe handler run when the session ends
 156 after errors have occurred.  In such cases, "end" probes are skipped,
 157 but each "error" probe is still attempted.  This kind of probe can be
 158 used to clean up or emit a "final gasp".  It may also be numerically
 159 parametrized to set a sequence.
 160
 161 .SS NEVER
 162 The probe point
 163 .IR never
 164 is specially defined by the translator to mean "never".  Its probe
 165 handler is never run, though its statements are analyzed for symbol /
 166 type correctness as usual.  This probe point may be useful in
 167 conjunction with optional probes.
 168
 169 .SS SYSCALL
 170
 171 The
 172 .IR syscall.*
 173 aliases define several hundred probes, too many to
 174 summarize here.  They are:
 175
 176 .SAMPLE
 177 syscall.NAME
 178 .br
 179 syscall.NAME.return
 180 .ESAMPLE
 181
 182 Generally, two probes are defined for each normal system call as listed in the
 183 .IR syscalls(2)
 184 manual page, one for entry and one for return.  Those system calls that never
 185 return do not have a corresponding
 186 .IR .return
 187 probe.
 188 .PP
 189 Each probe alias provides a variety of variables. Looking at the tapset source
 190 code is the most reliable way.  Generally, each variable listed in the standard
 191 manual page is made available as a script-level variable, so
 192 .IR syscall.open
 193 exposes
 194 .IR filename ", " flags ", and " mode .
 195 In addition, a standard suite of variables is available at most aliases:
 196 .TP
 197 .IR argstr
 198 A pretty-printed form of the entire argument list, without parentheses.
 199 .TP
 200 .IR name
 201 The name of the system call.
 202 .TP
 203 .IR retstr
 204 For return probes, a pretty-printed form of the system-call result.
 205 .PP
 206 As usual for probe aliases, these variables are all simply initialized
 207 once from the underlying $context variables, so that later changes to
 208 $context variables are not automatically reflected.  Not all probe
 209 aliases obey all of these general guidelines.  Please report any
 210 bothersome ones you encounter as a bug.
 211
 212
 213 .SS TIMERS
 214
 215 Intervals defined by the standard kernel "jiffies" timer may be used
 216 to trigger probe handlers asynchronously.  Two probe point variants
 217 are supported by the translator:
 218
 219 .SAMPLE
 220 timer.jiffies(N)
 221 timer.jiffies(N).randomize(M)
 222 .ESAMPLE
 223
 224 The probe handler is run every N jiffies (a kernel-defined unit of
 225 time, typically between 1 and 60 ms).  If the "randomize" component is
 226 given, a linearly distributed random value in the range [\-M..+M] is
 227 added to N every time the handler is run.  N is restricted to a
 228 reasonable range (1 to around a million), and M is restricted to be
 229 smaller than N.  There are no target variables provided in either
 230 context.  It is possible for such probes to be run concurrently on
 231 a multi-processor computer.
 232 .PP
 233 Alternatively, intervals may be specified in units of time.
 234 There are two probe point variants similar to the jiffies timer:
 235
 236 .SAMPLE
 237 timer.ms(N)
 238 timer.ms(N).randomize(M)
 239 .ESAMPLE
 240
 241 Here, N and M are specified in milliseconds, but the full options for units
 242 are seconds (s/sec), milliseconds (ms/msec), microseconds (us/usec),
 243 nanoseconds (ns/nsec), and hertz (hz).  Randomization is not supported for
 244 hertz timers.
 245
 246 The actual resolution of the timers depends on the target kernel.  For
 247 kernels prior to 2.6.17, timers are limited to jiffies resolution, so
 248 intervals are rounded up to the nearest jiffies interval.  After 2.6.17,
 249 the implementation uses hrtimers for tighter precision, though the actual
 250 resolution will be arch-dependent.  In either case, if the "randomize"
 251 component is given, then the random value will be added to the interval
 252 before any rounding occurs.
 253 .PP
 254 Profiling timers are also available to provide probes that execute on all
 255 CPUs at the rate of the system tick (CONFIG_HZ).
 256 This probe takes no parameters.
 257
 258 .SAMPLE
 259 timer.profile
 260 .ESAMPLE
 261
 262 Full context information of the interrupted process is available, making
 263 this probe suitable for a time-based sampling profiler.
 264
 265 .SS DWARF
 266
 267 This family of probe points uses symbolic debugging information for
 268 the target kernel/module/program, as may be found in unstripped
 269 executables, or the separate
 270 .I debuginfo
 271 packages.  They allow placement of probes logically into the execution
 272 path of the target program, by specifying a set of points in the
 273 source or object code.  When a matching statement executes on any
 274 processor, the probe handler is run in that context.
 275 .PP
 276 Points in a kernel, which are identified by
 277 module, source file, line number, function name, or some
 278 combination of these.
 279 .PP
 280 Here is a list of probe point families currently supported.  The
 281 .B .function
 282 variant places a probe near the beginning of the named function, so that
 283 parameters are available as context variables.  The
 284 .B .return
 285 variant places a probe at the moment
 286 .B after
 287 the return from the named function, so the return value is available
 288 as the "$return" context variable.  The
 289 .B .inline
 290 modifier for
 291 .B .function
 292 filters the results to include only instances of inlined functions.
 293 The
 294 .B .call
 295 modifier selects the opposite subset.  The \textbf{.exported} modifier
 296 filters the results to include only exported functions.  Inline
 297 functions do not have an identifiable return point, so
 298 .B .return
 299 is not supported on
 300 .B .inline
 301 probes. The
 302 .B .statement
 303 variant places a probe at the exact spot, exposing those local variables
 304 that are visible there.
 305
 306 .SAMPLE
 307 kernel.function(PATTERN)
 308 .br
 309 kernel.function(PATTERN).call
 310 .br
 311 kernel.function(PATTERN).return
 312 .br
 313 kernel.function(PATTERN).inline
 314 .br
 315 kernel.function(PATTERN).label(LPATTERN)
 316 .br
 317 module(MPATTERN).function(PATTERN)
 318 .br
 319 module(MPATTERN).function(PATTERN).call
 320 .br
 321 module(MPATTERN).function(PATTERN).return
 322 .br
 323 module(MPATTERN).function(PATTERN).inline
 324 .br
 325 module(MPATTERN).function(PATTERN).label(LPATTERN)
 326 .br
 327 .br
 328 kernel.statement(PATTERN)
 329 .br
 330 kernel.statement(ADDRESS).absolute
 331 .br
 332 module(MPATTERN).statement(PATTERN)
 333 .br
 334 process("PATH").function("NAME")
 335 .br
 336 process("PATH").statement("*@FILE.c:123")
 337 .br
 338 process("PATH").library("PATH").function("NAME")
 339 .br
 340 process("PATH").library("PATH").statement("*@FILE.c:123")
 341 .br
 342 process("PATH").function("*").return
 343 .br
 344 process("PATH").function("myfun").label("foo")
 345 .br
 346 process(PID).statement(ADDRESS).absolute
 347 .ESAMPLE
 348
 349 (See the USER-SPACE section below for more information on the process
 350 probes.)
 351
 352 In the above list, MPATTERN stands for a string literal that aims to
 353 identify the loaded kernel module of interest and LPATTERN stands for
 354 a source program label.  Both MPATTERN and LPATTERN may include the "*"
 355 "[]", and "?" wildcards.
 356 PATTERN stands for a string literal that
 357 aims to identify a point in the program.  It is made up of three
 358 parts:
 359 .IP \(bu 4
 360 The first part is the name of a function, as would appear in the
 361 .I nm
 362 program's output.  This part may use the "*" and "?" wildcarding
 363 operators to match multiple names.
 364 .IP \(bu 4
 365 The second part is optional and begins with the "@" character.
 366 It is followed by the path to the source file containing the function,
 367 which may include a wildcard pattern, such as mm/slab*.
 368 If it does not match as is, an implicit "*/" is optionally added
 369 .I before
 370 the pattern, so that a script need only name the last few components
 371 of a possibly long source directory path.
 372 .IP \(bu 4
 373 Finally, the third part is optional if the file name part was given,
 374 and identifies the line number in the source file preceded by a ":"
 375 or a "+".  The line number is assumed to be an
 376 absolute line number if preceded by a ":", or relative to the entry of
 377 the function if preceded by a "+".
 378 All the lines in the function can be matched with ":*".
 379 A range of lines x through y can be matched with ":x\-y".
 380 .PP
 381 As an alternative, PATTERN may be a numeric constant, indicating an
 382 address.  Such an address may be found from symbol tables of the
 383 appropriate kernel / module object file.  It is verified against
 384 known statement code boundaries, and will be relocated for use at
 385 run time.
 386 .PP
 387 In guru mode only, absolute kernel-space addresses may be specified with
 388 the ".absolute" suffix.  Such an address is considered already relocated,
 389 as if it came from
 390 .BR /proc/kallsyms ,
 391 so it cannot be checked against statement/instruction boundaries.
 392
 393 .SS CONTEXT VARIABLES
 394
 395 .PP
 396 Many of the source-level context variables, such as function parameters,
 397 locals, globals visible in the compilation unit, may be visible to
 398 probe handlers.  They may refer to these variables by prefixing their
 399 name with "$" within the scripts.  In addition, a special syntax
 400 allows limited traversal of structures, pointers, and arrays.  More
 401 syntax allows pretty-printing of individual variables or their groups.
 402 See also
 403 .BR @cast .
 404
 405 .TP
 406 $var
 407 refers to an in-scope variable "var".  If it's an integer-like type,
 408 it will be cast to a 64-bit int for systemtap script use.  String-like
 409 pointers (char *) may be copied to systemtap string values using the
 410 .IR kernel_string " or " user_string
 411 functions.
 412 .TP
 413 @var("varname")
 414 an alternative syntax for
 415 .IR $varname
 416 .
 417 .TP
 418 @var("varname@src/file.c")
 419 refers to the global (either file local or external) variable
 420 .IR varname
 421 defined when the file
 422 .IR src/file.c
 423 was compiled. The CU in which the variable is resolved is the first CU
 424 in the module of the probe point which matches the given file name at
 425 the end and has the shortest file name path (e.g. given
 426 .IR @var("foo@bar/baz.c")
 427 and CUs with file name paths
 428 .IR src/sub/module/bar/baz.c
 429 and
 430 .IR src/bar/baz.c
 431 the second CU will be chosen to resolve the (file) global variable
 432 .IR foo
 433 .
 434 .TP
 435 $var\->field traversal via a structure's or a pointer's field.  This
 436 generalized indirection operator may be repeated to follow more
 437 levels.  Note that the
 438 .IR .
 439 operator is not used for plain structure
 440 members, only
 441 .IR \->
 442 for both purposes.  (This is because "." is reserved for string
 443 concatenation.)
 444 .TP
 445 $return
 446 is available in return probes only for functions that are declared
 447 with a return value.
 448 .TP
 449 $var[N]
 450 indexes into an array.  The index given with a literal number or even
 451 an arbitrary numeric expression.
 452 .PP
 453 A number of operators exist for such basic context variable expressions:
 454 .TP
 455 $$vars
 456 expands to a character string that is equivalent to
 457 .SAMPLE
 458 sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x",
 459         parm1, ..., parmN, var1, ..., varN)
 460 .ESAMPLE
 461 for each variable in scope at the probe point.  Some values may be
 462 printed as
 463 .IR =?
 464 if their run-time location cannot be found.
 465 .TP
 466 $$locals
 467 expands to a subset of $$vars for only local variables.
 468 .TP
 469 $$parms
 470 expands to a subset of $$vars for only function parameters.
 471 .TP
 472 $$return
 473 is available in return probes only.  It expands to a string that
 474 is equivalent to sprintf("return=%x", $return)
 475 if the probed function has a return value, or else an empty string.
 476 .TP
 477 & $EXPR
 478 expands to the address of the given context variable expression, if it
 479 is addressable.
 480 .TP
 481 @defined($EXPR)
 482 expands to 1 or 0 iff the given context variable expression is resolvable,
 483 for use in conditionals such as
 484 .SAMPLE
 485 @defined($foo\->bar) ? $foo\->bar : 0
 486 .ESAMPLE
 487 .TP
 488 $EXPR$
 489 expands to a string with all of $EXPR's members, equivalent to
 490 .SAMPLE
 491 sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}",
 492          $EXPR\->a, $EXPR\->b)
 493 .ESAMPLE
 494 .TP
 495 $EXPR$$
 496 expands to a string with all of $var's members and submembers, equivalent to
 497 .SAMPLE
 498 sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}",
 499         $EXPR\->a, $EXPR\->b, $EXPR\->c\->x, $EXPR\->c\->y, $EXPR\->d[0])
 500 .ESAMPLE
 501
 502 .PP
 503 For ".return" probes, context variables other than the "$return"
 504 value itself are only available for the function call parameters.
 505 The expressions evaluate to the
 506 .IR entry-time
 507 values of those variables, since that is when a snapshot is taken.
 508 Other local variables are not generally accessible, since by the time
 509 a ".return" probe hits, the probed function will have already returned.
 510 .PP
 511 Arbitrary entry-time expressions can also be saved for ".return"
 512 probes using the
 513 .IR @entry(expr)
 514 operator.  For example, one can compute the elapsed time of a function:
 515 .SAMPLE
 516 probe kernel.function("do_filp_open").return {
 517     println( get_timeofday_us() \- @entry(get_timeofday_us()) )
 518 }
 519 .ESAMPLE
 520
 521
 522 .SS DWARFLESS
 523 In absence of debugging information, entry & exit points of kernel & module
 524 functions can be probed using the "kprobe" family of probes.
 525 However, these do not permit looking up the arguments / local variables
 526 of the function.
 527 Following constructs are supported :
 528 .SAMPLE
 529 kprobe.function(FUNCTION)
 530 kprobe.function(FUNCTION).return
 531 kprobe.module(NAME).function(FUNCTION)
 532 kprobe.module(NAME).function(FUNCTION).return
 533 kprobe.statement.(ADDRESS).absolute
 534 .ESAMPLE
 535 .PP
 536 Probes of type
 537 .B function
 538 are recommended for kernel functions, whereas probes of type
 539 .B module
 540 are recommended for probing functions of the specified module.
 541 In case the absolute address of a kernel or module function is known,
 542 .B statement
 543 probes can be utilized.
 544 .PP
 545 Note that
 546 .I FUNCTION
 547 and
 548 .I MODULE
 549 names
 550 .B must not
 551 contain wildcards, or the probe will not be registered.
 552 Also, statement probes must be run under guru-mode only.
 553
 554
 555 .SS USER-SPACE
 556 Support for user-space probing is available for kernels
 557 that are configured with the utrace extensions.  See
 558 .SAMPLE
 559 http://people.redhat.com/roland/utrace/
 560 .ESAMPLE
 561 .PP
 562 There are several forms.  First, a non-symbolic probe point:
 563 .SAMPLE
 564 process(PID).statement(ADDRESS).absolute
 565 .ESAMPLE
 566 is analogous to
 567 .IR
 568 kernel.statement(ADDRESS).absolute
 569 in that both use raw (unverified) virtual addresses and provide
 570 no $variables.  The target PID parameter must identify a running
 571 process, and ADDRESS should identify a valid instruction address.
 572 All threads of that process will be probed.
 573 .PP
 574 Second, non-symbolic user-kernel interface events handled by
 575 utrace may be probed:
 576 .SAMPLE
 577 process(PID).begin
 578 process("FULLPATH").begin
 579 process.begin
 580 process(PID).thread.begin
 581 process("FULLPATH").thread.begin
 582 process.thread.begin
 583 process(PID).end
 584 process("FULLPATH").end
 585 process.end
 586 process(PID).thread.end
 587 process("FULLPATH").thread.end
 588 process.thread.end
 589 process(PID).syscall
 590 process("FULLPATH").syscall
 591 process.syscall
 592 process(PID).syscall.return
 593 process("FULLPATH").syscall.return
 594 process.syscall.return
 595 process(PID).insn
 596 process("FULLPATH").insn
 597 process(PID).insn.block
 598 process("FULLPATH").insn.block
 599 .ESAMPLE
 600 .PP
 601 A
 602 .B .begin
 603 probe gets called when new process described by PID or FULLPATH gets created.
 604 A
 605 .B .thread.begin
 606 probe gets called when a new thread described by PID or FULLPATH gets created.
 607 A
 608 .B .end
 609 probe gets called when process described by PID or FULLPATH dies.
 610 A
 611 .B .thread.end
 612 probe gets called when a thread described by PID or FULLPATH dies.
 613 A
 614 .B .syscall
 615 probe gets called when a thread described by PID or FULLPATH makes a
 616 system call.  The system call number is available in the
 617 .BR $syscall
 618 context variable, and the first 6 arguments of the system call
 619 are available in the
 620 .BR $argN
 621 (ex. $arg1, $arg2, ...) context variable.
 622 A
 623 .B .syscall.return
 624 probe gets called when a thread described by PID or FULLPATH returns from a
 625 system call.  The system call number is available in the
 626 .BR $syscall
 627 context variable, and the return value of the system call is available
 628 in the
 629 .BR $return
 630 context variable.
 631 A
 632 .B .insn
 633 probe gets called for every single-stepped instruction of the process described by PID or FULLPATH.
 634 A
 635 .B .insn.block
 636 probe gets called for every block-stepped instruction of the process described by PID or FULLPATH.
 637 .PP
 638 If a process probe is specified without a PID or FULLPATH, all user
 639 threads will be probed.  However, if systemtap was invoked with the
 640 .IR \-c " or " \-x
 641 options, then process probes are restricted to the process
 642 hierarchy associated with the target process.  If a process probe is
 643 specified without a PID or FULLPATH, but with the
 644 .IR \-c "
 645 option, the PATH of the
 646 .IR \-c "
 647 cmd will be heuristically filled into the process PATH.
 648
 649 .PP
 650 Third, symbolic static instrumentation compiled into programs and
 651 shared libraries may be
 652 probed:
 653 .SAMPLE
 654 process("PATH").mark("LABEL")
 655 process("PATH").provider("PROVIDER").mark("LABEL")
 656 .ESAMPLE
 657 .PP
 658 A
 659 .B .mark
 660 probe gets called via a static probe which is defined in the
 661 application by STAP_PROBE1(PROVIDER,LABEL,arg1), which is defined in
 662 sdt.h.  The handle is an application handle, LABEL corresponds to
 663 the .mark argument, and arg1 is the argument.  STAP_PROBE1 is used for
 664 probes with 1 argument, STAP_PROBE2 is used for probes with 2
 665 arguments, and so on.  The arguments of the probe are available in the
 666 context variables $arg1, $arg2, ...  An alternative to using the
 667 STAP_PROBE macros is to use the dtrace script to create custom macros.
 668 Additionally, the variables $$name and $$provider are available as
 669 parts of the probe point name.
 670
 671 .PP
 672 Finally, full symbolic source-level probes in user-space programs
 673 and shared libraries are supported.  These are exactly analogous
 674 to the symbolic DWARF-based kernel/module probes described above,
 675 and expose similar contextual $variables.
 676 .SAMPLE
 677 process("PATH").function("NAME")
 678 process("PATH").statement("*@FILE.c:123")
 679 process("PATH").plt("NAME")
 680 process("PATH").library("PATH").plt("NAME")
 681 process("PATH").library("PATH").function("NAME")
 682 process("PATH").library("PATH").statement("*@FILE.c:123")
 683 process("PATH").function("*").return
 684 process("PATH").function("myfun").label("foo")
 685 .ESAMPLE
 686
 687 .PP
 688 Note that for all process probes,
 689 .I PATH
 690 names refer to executables that are searched the same way shells do: relative
 691 to the working directory if they contain a "/" character, otherwise in
 692 .BR $PATH .
 693 If PATH names refer to scripts, the actual interpreters (specified in the
 694 script in the first line after the #! characters) are probed.
 695 If PATH is a process component parameter referring to shared libraries
 696 then all processes that map it at runtime would be selected for
 697 probing.  If PATH is a library component parameter referring to shared
 698 libraries then the process specified by the process component would be
 699 selected.  A .plt probe will probe functions in the program linkage table
 700 corresponding to the rest of the probe point.  .plt can be specified
 701 as a shorthand for .plt("*").
 702 If the PATH string contains wildcards as in the MPATTERN case, then
 703 standard globbing is performed to find all matching paths.  In this
 704 case, the
 705 .BR $PATH
 706 environment variable is not used.
 707
 708 .PP
 709 If systemtap was invoked with the
 710 .IR \-c " or " \-x
 711 options, then process probes are restricted to the process
 712 hierarchy associated with the target process.
 713
 714 .SS PROCFS
 715
 716 These probe points allow procfs "files" in
 717 /proc/systemtap/MODNAME to be created, read and written using a
 718 permission that may be modified using the proper umask value. Default permissions are 0400 for read
 719 probes, and 0200 for write probes. If both a read and write probe are being
 720 used on the same file, a default permission of 0600 will be used.
 721 Using procfs.umask(0040).read would
 722 result in a 0404 permission set for the file.
 723 .RI ( MODNAME
 724 is the name of the systemtap module). The
 725 .I proc
 726 filesystem is a pseudo-filesystem which is used an an interface to
 727 kernel data structures. There are several probe point variants supported
 728 by the translator:
 729
 730 .SAMPLE
 731 procfs("PATH").read
 732 procfs("PATH").umask(UMASK).read
 733 procfs("PATH").read.maxsize(MAXSIZE)
 734 procfs("PATH").umask(UMASK).maxsize(MAXSIZE)
 735 procfs("PATH").write
 736 procfs("PATH").umask(UMASK).write
 737 procfs.read
 738 procfs.umask(UMASK).read
 739 procfs.read.maxsize(MAXSIZE)
 740 procfs.umask(UMASK).read.maxsize(MAXSIZE)
 741 procfs.write
 742 procfs.umask(UMASK).write
 743 .ESAMPLE
 744
 745 .I PATH
 746 is the file name (relative to /proc/systemtap/MODNAME) to be created.
 747 If no
 748 .I PATH
 749 is specified (as in the last two variants above),
 750 .I PATH
 751 defaults to "command".
 752 .PP
 753 When a user reads /proc/systemtap/MODNAME/PATH, the corresponding
 754 procfs
 755 .I read
 756 probe is triggered.  The string data to be read should be assigned to
 757 a variable named
 758 .IR $value ,
 759 like this:
 760
 761 .SAMPLE
 762 procfs("PATH").read { $value = "100\\n" }
 763 .ESAMPLE
 764 .PP
 765 When a user writes into /proc/systemtap/MODNAME/PATH, the
 766 corresponding procfs
 767 .I write
 768 probe is triggered.  The data the user wrote is available in the
 769 string variable named
 770 .IR $value ,
 771 like this:
 772
 773 .SAMPLE
 774 procfs("PATH").write { printf("user wrote: %s", $value) }
 775 .ESAMPLE
 776 .PP
 777 .I MAXSIZE
 778 is the size of the procfs read buffer.  Specifying
 779 .I MAXSIZE
 780 allows larger procfs output.  If no
 781 .I MAXSIZE
 782 is specified, the procfs read buffer defaults to
 783 .I STP_PROCFS_BUFSIZE
 784 (which defaults to
 785 .IR MAXSTRINGLEN ,
 786 the maximum length of a string).
 787 If setting the procfs read buffers for more than one file is needed,
 788 it may be easiest to override the
 789 .I STP_PROCFS_BUFSIZE
 790 definition.
 791 Here's an example of using
 792 .IR MAXSIZE :
 793
 794 .SAMPLE
 795 procfs.read.maxsize(1024) {
 796     $value = "long string..."
 797     $value .= "another long string..."
 798     $value .= "another long string..."
 799     $value .= "another long string..."
 800 }
 801 .ESAMPLE
 802
 803 .SS MARKERS
 804
 805 This family of probe points hooks up to static probing markers
 806 inserted into the kernel or modules.  These markers are special macro
 807 calls inserted by kernel developers to make probing faster and more
 808 reliable than with DWARF-based probes.  Further, DWARF debugging
 809 information is
 810 .I not
 811 required to probe markers.
 812
 813 Marker probe points begin with
 814 .BR kernel .
 815 The next part names the marker itself:
 816 .BR mark("name") .
 817 The marker name string, which may contain the usual wildcard characters,
 818 is matched against the names given to the marker macros when the kernel
 819 and/or module was compiled.    Optionally, you can specify
 820 .BR format("format") .
 821 Specifying the marker format string allows differentiation between two
 822 markers with the same name but different marker format strings.
 823
 824 The handler associated with a marker-based probe may read the
 825 optional parameters specified at the macro call site.  These are
 826 named
 827 .BR $arg1 " through " $argNN ,
 828 where NN is the number of parameters supplied by the macro.  Number
 829 and string parameters are passed in a type-safe manner.
 830
 831 The marker format string associated with a marker is available in
 832 .BR $format .
 833 And also the marker name string is available in
 834 .BR $name .
 835
 836 .SS TRACEPOINTS
 837
 838 This family of probe points hooks up to static probing tracepoints
 839 inserted into the kernel or modules.  As with markers, these
 840 tracepoints are special macro calls inserted by kernel developers to
 841 make probing faster and more reliable than with DWARF-based probes,
 842 and DWARF debugging information is not required to probe tracepoints.
 843 Tracepoints have an extra advantage of more strongly-typed parameters
 844 than markers.
 845
 846 Tracepoint probes begin with
 847 .BR kernel .
 848 The next part names the tracepoint itself:
 849 .BR trace("name") .
 850 The tracepoint name string, which may contain the usual wildcard
 851 characters, is matched against the names defined by the kernel
 852 developers in the tracepoint header files.
 853
 854 The handler associated with a tracepoint-based probe may read the
 855 optional parameters specified at the macro call site.  These are
 856 named according to the declaration by the tracepoint author.  For
 857 example, the tracepoint probe
 858 .BR kernel.trace("sched_switch")
 859 provides the parameters
 860 .BR $rq ", " $prev ", and " $next .
 861 If the parameter is a complex type, as in a struct pointer, then a
 862 script can access fields with the same syntax as DWARF $target
 863 variables.  Also, tracepoint parameters cannot be modified, but in
 864 guru-mode a script may modify fields of parameters.
 865
 866 The name of the tracepoint is available in
 867 .BR $$name ,
 868 and a string of name=value pairs for all parameters of the tracepoint
 869 is available in
 870 .BR $$vars " or " $$parms .
 871
 872 .SS HARDWARE BREAKPOINTS
 873 This family of probes is used to set hardware watchpoints for a given
 874  (global) kernel symbol. The probes take three components as inputs :
 875
 876 1. The
 877 .BR virtual address / name
 878 of the kernel symbol to be traced is supplied as argument to this class
 879 of probes. ( Probes for only data segment variables are supported. Probing
 880 local variables of a function cannot be done.)
 881
 882 2. Nature of access to be probed :
 883 a.
 884 .I .write
 885 probe gets triggered when a write happens at the specified address/symbol
 886 name.
 887 b.
 888 .I rw
 889 probe is triggered when either a read or write happens.
 890
 891 3.
 892 .BR .length
 893 (optional)
 894 Users have the option of specifying the address interval to be probed
 895 using "length" constructs. The user-specified length gets approximated
 896 to the closest possible address length that the architecture can
 897 support. If the specified length exceeds the limits imposed by
 898 architecture, an error message is flagged and probe registration fails.
 899 Wherever 'length' is not specified, the translator requests a hardware
 900 breakpoint probe of length 1. It should be noted that the "length"
 901 construct is not valid with symbol names.
 902
 903 Following constructs are supported :
 904 .SAMPLE
 905 probe kernel.data(ADDRESS).write
 906 probe kernel.data(ADDRESS).rw
 907 probe kernel.data(ADDRESS).length(LEN).write
 908 probe kernel.data(ADDRESS).length(LEN).rw
 909 probe kernel.data("SYMBOL_NAME").write
 910 probe kernel.data("SYMBOL_NAME").rw
 911 .ESAMPLE
 912
 913 This set of probes make use of the debug registers of the processor,
 914 which is a scarce resource. (4 on x86 , 1 on powerpc ) The script
 915 translation flags a warning if a user requests more hardware breakpoint probes
 916 than the limits set by architecture. For example,a pass-2 warning is flashed
 917 when an input script requests 5 hardware breakpoint probes on an x86
 918 system while x86 architecture supports a maximum of 4 breakpoints.
 919 Users are cautioned to set probes judiciously.
 920
 921 .SH EXAMPLES
 922 .PP
 923 Here are some example probe points, defining the associated events.
 924 .TP
 925 begin, end, end
 926 refers to the startup and normal shutdown of the session.  In this
 927 case, the handler would run once during startup and twice during
 928 shutdown.
 929 .TP
 930 timer.jiffies(1000).randomize(200)
 931 refers to a periodic interrupt, every 1000 +/\- 200 jiffies.
 932 .TP
 933 kernel.function("*init*"), kernel.function("*exit*")
 934 refers to all kernel functions with "init" or "exit" in the name.
 935 .TP
 936 kernel.function("*@kernel/time.c:240")
 937 refers to any functions within the "kernel/time.c" file that span
 938 line 240.
 939 .BR
 940 Note
 941 that this is
 942 .BR not
 943 a probe at the statement at that line number.  Use the
 944 .IR
 945 kernel.statement
 946 probe instead.
 947 .TP
 948 kernel.mark("getuid")
 949 refers to an STAP_MARK(getuid, ...) macro call in the kernel.
 950 .TP
 951 module("usb*").function("*sync*").return
 952 refers to the moment of return from all functions with "sync" in the
 953 name in any of the USB drivers.
 954 .TP
 955 kernel.statement(0xc0044852)
 956 refers to the first byte of the statement whose compiled instructions
 957 include the given address in the kernel.
 958 .TP
 959 kernel.statement("*@kernel/time.c:296")
 960 refers to the statement of line 296 within "kernel/time.c".
 961 .TP
 962 kernel.statement("bio_init@fs/bio.c+3")
 963 refers to the statement at line bio_init+3 within "fs/bio.c".
 964 .TP
 965 kernel.data("pid_max").write
 966 refers to a hardware preakpoint of type "write" set on pid_max
 967 .TP
 968 syscall.*.return
 969 refers to the group of probe aliases with any name in the third position
 970
 971 .SS PERF
 972
 973 This
 974 .IR prototype
 975 family of probe points interfaces to the kernel "perf event"
 976 infrasture for controlling hardware performance counters.
 977 The events being attached to are described by the "type",
 978 "config" fields of the
 979 .IR perf_event_attr
 980 structure, and are sampled at an interval governed by the
 981 "sample_period" field.
 982
 983 These fields are made available to systemtap scripts using
 984 the following syntax:
 985 .SAMPLE
 986 probe perf.type(NN).config(MM).sample(XX)
 987 probe perf.type(NN).config(MM)
 988 .ESAMPLE
 989 The systemtap probe handler is called once per XX increments
 990 of the underlying performance counter.  The default sampling
 991 count is 1000000.
 992 The range of valid type/config is described by the
 993 .IR perf_event_open (2)
 994 system call, and/or the
 995 .IR linux/perf_event.h
 996 file.  Invalid combinations or exhausted hardware counter resources
 997 result in errors during systemtap script startup.  Systemtap does
 998 not sanity-check the values: it merely passes them through to
 999 the kernel for error- and safety-checking.
1000
1001 .SH SEE ALSO
1002 .IR stap (1),
1003 .IR probe::* (3stap),
1004 .IR tapset::* (3stap)