man/stapprobes.3stap

   1 .\" t
   2 .TH STAPPROBES 3stap
   3 .SH NAME
   4 stapprobes \- systemtap probe points
   5
   6 .\" macros
   7 .de SAMPLE
   8
   9 .nr oldin \\n(.i
  10 .br
  11 .RS
  12 .nf
  13 .nh
  14 ..
  15 .de ESAMPLE
  16 .hy
  17 .fi
  18 .RE
  19 .in \\n[oldin]u
  20
  21 ..
  22
  23 .SH DESCRIPTION
  24 The following sections enumerate the variety of probe points supported
  25 by the systemtap translator, and some of the additional aliases defined by
  26 standard tapset scripts.  Many are individually documented in the
  27 .IR 3stap
  28 manual section, with the
  29 .IR probe::
  30 prefix.
  31
  32 .SH SYNTAX
  33
  34 .PP
  35 .SAMPLE
  36 .BR probe " PROBEPOINT [" , " PROBEPOINT] " { " [STMT ...] " }
  37 .ESAMPLE
  38 .PP
  39 A probe declaration may list multiple comma-separated probe points in
  40 order to attach a handler to all of the named events.  Normally, the
  41 handler statements are run whenever any of events occur.  Depending on
  42 the type of probe point, the handler statements may refer to context
  43 variables (denoted with a dollar-sign prefix like $foo) to read or
  44 write state.  This may include function parameters for function
  45 probes, or local variables for statement probes.
  46 .PP
  47 The syntax of a single probe point is a general dotted-symbol
  48 sequence.  This allows a breakdown of the event namespace into parts,
  49 somewhat like the Domain Name System does on the Internet.  Each
  50 component identifier may be parametrized by a string or number
  51 literal, with a syntax like a function call.  A component may include
  52 a "*" character, to expand to a set of matching probe points.  It may
  53 also include "**" to match multiple sequential components at once.
  54 Probe aliases likewise expand to other probe points.
  55 .PP
  56 Probe aliases can be given on their own, or with a suffix. The suffix
  57 attaches to the underlying probe point that the alias is expanded
  58 to. For example,
  59 .SAMPLE
  60 syscall.read.return.maxactive(10)
  61 .ESAMPLE
  62 expands to
  63 .SAMPLE
  64 kernel.function("sys_read").return.maxactive(10)
  65 .ESAMPLE
  66 with the component
  67 .IR maxactive(10)
  68 being recognized as a suffix.
  69 .PP
  70 Normally, each and every probe point resulting from wildcard- and
  71 alias-expansion must be resolved to some low-level system
  72 instrumentation facility (e.g., a kprobe address, marker, or a timer
  73 configuration), otherwise the elaboration phase will fail.
  74 .PP
  75 However, a probe point may be followed by a "?" character, to indicate
  76 that it is optional, and that no error should result if it fails to
  77 resolve.  Optionalness passes down through all levels of
  78 alias/wildcard expansion.  Alternately, a probe point may be followed
  79 by a "!" character, to indicate that it is both optional and
  80 sufficient.  (Think vaguely of the Prolog cut operator.) If it does
  81 resolve, then no further probe points in the same comma-separated list
  82 will be resolved.  Therefore, the "!"  sufficiency mark only makes
  83 sense in a list of probe point alternatives.
  84 .PP
  85 Additionally, a probe point may be followed by a "if (expr)" statement, in
  86 order to enable/disable the probe point on-the-fly. With the "if" statement,
  87 if the "expr" is false when the probe point is hit, the whole probe body
  88 including alias's body is skipped. The condition is stacked up through
  89 all levels of alias/wildcard expansion. So the final condition becomes
  90 the logical-and of conditions of all expanded alias/wildcard.  The expressions
  91 are necessarily restricted to global variables.
  92 .PP
  93 These are all
  94 .B syntactically
  95 valid probe points.  (They are generally
  96 .B semantically
  97 invalid, depending on the contents of the tapsets, and the versions of
  98 kernel/user software installed.)
  99
 100 .SAMPLE
 101 kernel.function("foo").return
 102 process("/bin/vi").statement(0x2222)
 103 end
 104 syscall.*
 105 syscall.*.return.maxactive(10)
 106 syscall.{open,close}
 107 sys**open
 108 kernel.function("no_such_function") ?
 109 module("awol").function("no_such_function") !
 110 signal.*? if (switch)
 111 kprobe.function("foo")
 112 .ESAMPLE
 113
 114 Probes may be broadly classified into "synchronous" and
 115 "asynchronous".  A "synchronous" event is deemed to occur when any
 116 processor executes an instruction matched by the specification.  This
 117 gives these probes a reference point (instruction address) from which
 118 more contextual data may be available.  Other families of probe points
 119 refer to "asynchronous" events such as timers/counters rolling over,
 120 where there is no fixed reference point that is related.  Each probe
 121 point specification may match multiple locations (for example, using
 122 wildcards or aliases), and all them are then probed.  A probe
 123 declaration may also contain several comma-separated specifications,
 124 all of which are probed.
 125
 126 Brace expansion is a mechanism which allows a list of probe points to be
 127 generated. It is very similar to shell expansion. A component may be surrounded
 128 by a pair of curly braces to indicate that the comma-separated sequence of
 129 one or more subcomponents will each constitute a new probe point. The braces
 130 may be arbitrarily nested. The ordering of expanded results is based on
 131 product order.
 132
 133 The question mark (?), exclamation mark (!) indicators and probe point conditions
 134 may not be placed in any expansions that are before the last component.
 135
 136 The following is an example of brace expansion.
 137
 138 .SAMPLE
 139 syscall.{write,read}
 140 # Expands to
 141 syscall.write, syscall.read
 142
 143 {kernel,module("nfs")}.function("nfs*")!
 144 # Expands to
 145 kernel.function("nfs*")!, module("nfs").function("nfs*")!
 146 .ESAMPLE
 147
 148 .SH DWARF DEBUGINFO
 149
 150 Resolving some probe points requires DWARF debuginfo or "debug
 151 symbols" for the \fIspecific program\fR being instrumented.  For some others,
 152 DWARF is automatically synthesized on the fly from source code header
 153 files.  For others, it is not needed at all.  Since a systemtap script
 154 may use any mixture of probe points together, the union of their DWARF
 155 requirements has to be met on the computer where script compilation
 156 occurs.  (See the \fI\-\-use\-server\fR option and the \fBstap-server\
 157 (8)\fR man page for information about the remote compilation facility,
 158 which allows these requirements to be met on a different machine.)
 159 .PP
 160 The following point lists many of the available probe point families,
 161 to classify them with respect to their need for DWARF debuginfo for
 162 the specific program for that probe point.
 163
 164 .TS
 165 l l l.
 166 \fBDWARF        NON-DWARF       SYMBOL-TABLE\fP
 167
 168 kernel.function, .statement     kernel.mark     kernel.function\fI*\fP
 169 module.function, .statement     process.mark, process.plt       module.function\fI*\fP
 170 process.function, .statement    begin, end, error, never        process.function\fI*\fP
 171 process.mark\fI*\fP     timer
 172 \.function.callee       perf
 173         procfs
 174 \fBAUTO-GENERATED-DWARF\fP      kernel.statement.absolute
 175         kernel.data
 176 kernel.trace    kprobe.function
 177         process.statement.absolute
 178         process.begin, .end
 179         netfilter
 180         java
 181 .TE
 182
 183 .PP
 184 The probe types marked with \fI*\fP asterisks mark fallbacks, where
 185 systemtap can sometimes infer subset or substitute information.  In
 186 general, the more symbolic / debugging information available, the
 187 higher quality probing will be available.
 188
 189
 190 .SH ON-THE-FLY ARMING
 191
 192 The following types of probe points may be armed/disarmed on-the-fly
 193 to save overheads during uninteresting times.  Arming conditions may
 194 also be added to other types of probes, but will be treated as a
 195 wrapping conditional and won't benefit from overhead savings.
 196
 197 .TS
 198 l l.
 199 \fBDISARMABLE   exceptions\fP
 200 kernel.function, kernel.statement
 201 module.function, module.statement
 202 process.*.function, process.*.statement
 203 process.*.plt, process.*.mark
 204 timer.  timer.profile
 205 java
 206 .TE
 207
 208 .SH PROBE POINT FAMILIES
 209
 210 .SS BEGIN/END/ERROR
 211
 212 The probe points
 213 .IR begin " and " end
 214 are defined by the translator to refer to the time of session startup
 215 and shutdown.  All "begin" probe handlers are run, in some sequence,
 216 during the startup of the session.  All global variables will have
 217 been initialized prior to this point.  All "end" probes are run, in
 218 some sequence, during the
 219 .I normal
 220 shutdown of a session, such as in the aftermath of an
 221 .I exit ()
 222 function call, or an interruption from the user.  In the case of an
 223 error-triggered shutdown, "end" probes are not run.  There are no
 224 target variables available in either context.
 225 .PP
 226 If the order of execution among "begin" or "end" probes is significant,
 227 then an optional sequence number may be provided:
 228
 229 .SAMPLE
 230 begin(N)
 231 end(N)
 232 .ESAMPLE
 233
 234 The number N may be positive or negative.  The probe handlers are run in
 235 increasing order, and the order between handlers with the same sequence
 236 number is unspecified.  When "begin" or "end" are given without a
 237 sequence, they are effectively sequence zero.
 238
 239 The
 240 .IR error
 241 probe point is similar to the
 242 .IR end
 243 probe, except that each such probe handler run when the session ends
 244 after errors have occurred.  In such cases, "end" probes are skipped,
 245 but each "error" probe is still attempted.  This kind of probe can be
 246 used to clean up or emit a "final gasp".  It may also be numerically
 247 parametrized to set a sequence.
 248
 249 .SS NEVER
 250 The probe point
 251 .IR never
 252 is specially defined by the translator to mean "never".  Its probe
 253 handler is never run, though its statements are analyzed for symbol /
 254 type correctness as usual.  This probe point may be useful in
 255 conjunction with optional probes.
 256
 257 .SS SYSCALL and ND_SYSCALL
 258
 259 The
 260 .IR syscall.* " and " nd_syscall.*
 261 aliases define several hundred probes, too many to
 262 detail here.  They are of the general form:
 263
 264 .SAMPLE
 265 syscall.NAME
 266 .br
 267 nd_syscall.NAME
 268 .br
 269 syscall.NAME.return
 270 .br
 271 nd_syscall.NAME.return
 272 .ESAMPLE
 273
 274 Generally, a pair of probes are defined for each normal system call as listed in the
 275 .IR syscalls(2)
 276 manual page, one for entry and one for return.  Those system calls that never
 277 return do not have a corresponding
 278 .IR .return
 279 probe.  The nd_* family of probes are about the same, except it uses
 280 .B non-DWARF
 281 based searching mechanisms, which may result in a lower quality of symbolic
 282 context data (parameters), and may miss some system calls.  You may want to
 283 try them first, in case kernel debugging information is not immediately available.
 284 .PP
 285 Each probe alias provides a variety of variables. Looking at the tapset source
 286 code is the most reliable way.  Generally, each variable listed in the standard
 287 manual page is made available as a script-level variable, so
 288 .IR syscall.open
 289 exposes
 290 .IR filename ", " flags ", and " mode .
 291 In addition, a standard suite of variables is available at most aliases:
 292 .TP
 293 .IR argstr
 294 A pretty-printed form of the entire argument list, without parentheses.
 295 .TP
 296 .IR name
 297 The name of the system call.
 298 .TP
 299 .IR retstr
 300 For return probes, a pretty-printed form of the system-call result.
 301 .PP
 302 As usual for probe aliases, these variables are all initialized once
 303 from the underlying $context variables, so that later changes to
 304 $context variables are not automatically reflected.  Not all probe
 305 aliases obey all of these general guidelines.  Please report any
 306 bothersome ones you encounter as a bug.  Note that on some
 307 kernel/userspace architecture combinations (e.g., 32-bit userspace on
 308 64-bit kernel), the underlying $context variables may need explicit
 309 sign extension / masking.  When this is an issue, consider using the
 310 tapset-provided variables instead of raw $context variables.
 311 .PP
 312 If debuginfo availability is a problem, you may try using the
 313 non-DWARF syscall probe aliases instead.  Use the
 314 .IR nd_syscall.
 315 prefix instead of
 316 .IR syscall.
 317 The same context variables are available, as far as possible.
 318
 319 .SS TIMERS
 320
 321 There are two main types of timer probes: "jiffies" timer probes and
 322 time interval timer probes.
 323
 324 Intervals defined by the standard kernel "jiffies" timer may be used
 325 to trigger probe handlers asynchronously.  Two probe point variants
 326 are supported by the translator:
 327
 328 .SAMPLE
 329 timer.jiffies(N)
 330 timer.jiffies(N).randomize(M)
 331 .ESAMPLE
 332
 333 The probe handler is run every N jiffies (a kernel-defined unit of
 334 time, typically between 1 and 60 ms).  If the "randomize" component is
 335 given, a linearly distributed random value in the range [\-M..+M] is
 336 added to N every time the handler is run.  N is restricted to a
 337 reasonable range (1 to around a million), and M is restricted to be
 338 smaller than N.  There are no target variables provided in either
 339 context.  It is possible for such probes to be run concurrently on
 340 a multi-processor computer.
 341 .PP
 342 Alternatively, intervals may be specified in units of time.
 343 There are two probe point variants similar to the jiffies timer:
 344
 345 .SAMPLE
 346 timer.ms(N)
 347 timer.ms(N).randomize(M)
 348 .ESAMPLE
 349
 350 Here, N and M are specified in milliseconds, but the full options for units
 351 are seconds (s/sec), milliseconds (ms/msec), microseconds (us/usec),
 352 nanoseconds (ns/nsec), and hertz (hz).  Randomization is not supported for
 353 hertz timers.
 354
 355 The actual resolution of the timers depends on the target kernel.  For
 356 kernels prior to 2.6.17, timers are limited to jiffies resolution, so
 357 intervals are rounded up to the nearest jiffies interval.  After 2.6.17,
 358 the implementation uses hrtimers for tighter precision, though the actual
 359 resolution will be arch-dependent.  In either case, if the "randomize"
 360 component is given, then the random value will be added to the interval
 361 before any rounding occurs.
 362 .PP
 363 Profiling timers are also available to provide probes that execute on
 364 all CPUs at the rate of the system tick (CONFIG_HZ) or at a given
 365 frequency (hz). On some kernels, this is a one-concurrent-user-only or
 366 disabled facility, resulting in error \-16 (EBUSY) during probe
 367 registration.
 368
 369 .SAMPLE
 370 timer.profile.tick
 371 timer.profile.freq.hz(N)
 372 .ESAMPLE
 373
 374 Full context information of the interrupted process is available, making
 375 this probe suitable for a time-based sampling profiler.
 376 .PP
 377 It is recommended to use the tapset probe
 378 .IR timer.profile
 379 rather than timer.profile.tick. This probe point behaves identically
 380 to timer.profile.tick when the underlying functionality is available,
 381 and falls back to using perf.sw.cpu_clock on some recent kernels which
 382 lack the corresponding profile timer facility.
 383 .PP
 384 Profiling timers with specified frequencies are only accurate up to around
 385 100 hz. You may need to provide a larger value to achieve the desired
 386 rate.
 387 .PP
 388 Note that if a timer probe is set to fire at a very high rate
 389 and if the probe body is complex, succeeding timer probes can get
 390 skipped, since the time for them to run has already passed. Normally
 391 systemtap reports missed probes, but it will not report these skipped
 392 probes.
 393
 394 .SS DWARF
 395
 396 This family of probe points uses symbolic debugging information for
 397 the target kernel/module/program, as may be found in unstripped
 398 executables, or the separate
 399 .I debuginfo
 400 packages.  They allow placement of probes logically into the execution
 401 path of the target program, by specifying a set of points in the
 402 source or object code.  When a matching statement executes on any
 403 processor, the probe handler is run in that context.
 404 .PP
 405 Probe points in the DWARF family can be identified by the target kernel
 406 module (or user process), source file, line number, function name, or
 407 some combination of these.
 408 .PP
 409 Here is a list of DWARF probe points currently supported:
 410 .SAMPLE
 411 kernel.function(PATTERN)
 412 kernel.function(PATTERN).call
 413 kernel.function(PATTERN).callee(PATTERN)
 414 kernel.function(PATTERN).callee(PATTERN).return
 415 kernel.function(PATTERN).callee(PATTERN).call
 416 kernel.function(PATTERN).callees(DEPTH)
 417 kernel.function(PATTERN).return
 418 kernel.function(PATTERN).inline
 419 kernel.function(PATTERN).label(LPATTERN)
 420 module(MPATTERN).function(PATTERN)
 421 module(MPATTERN).function(PATTERN).call
 422 module(MPATTERN).function(PATTERN).callee(PATTERN)
 423 module(MPATTERN).function(PATTERN).callee(PATTERN).return
 424 module(MPATTERN).function(PATTERN).callee(PATTERN).call
 425 module(MPATTERN).function(PATTERN).callees(DEPTH)
 426 module(MPATTERN).function(PATTERN).return
 427 module(MPATTERN).function(PATTERN).inline
 428 module(MPATTERN).function(PATTERN).label(LPATTERN)
 429 kernel.statement(PATTERN)
 430 kernel.statement(PATTERN).nearest
 431 kernel.statement(ADDRESS).absolute
 432 module(MPATTERN).statement(PATTERN)
 433 process("PATH").function("NAME")
 434 process("PATH").statement("*@FILE.c:123")
 435 process("PATH").library("PATH").function("NAME")
 436 process("PATH").library("PATH").statement("*@FILE.c:123")
 437 process("PATH").library("PATH").statement("*@FILE.c:123").nearest
 438 process("PATH").function("*").return
 439 process("PATH").function("myfun").label("foo")
 440 process("PATH").function("foo").callee("bar")
 441 process("PATH").function("foo").callee("bar").return
 442 process("PATH").function("foo").callee("bar").call
 443 process("PATH").function("foo").callees(DEPTH)
 444 process(PID).function("NAME")
 445 process(PID).function("myfun").label("foo")
 446 process(PID).plt("NAME")
 447 process(PID).plt("NAME").return
 448 process(PID).statement("*@FILE.c:123")
 449 process(PID).statement("*@FILE.c:123").nearest
 450 process(PID).statement(ADDRESS).absolute
 451 .ESAMPLE
 452 (See the USER-SPACE section below for more information on the process
 453 probes.)
 454 .PP
 455 The list above includes multiple variants and modifiers which provide
 456 additional functionality or filters. They are:
 457 .RS
 458 .TP
 459 \fB.function\fR
 460 Places a probe near the beginning of the named function, so that
 461 parameters are available as context variables.
 462 .TP
 463 \fB.return\fR
 464 Places a probe at the moment \fBafter\fR the return from the named
 465 function, so the return value is available as the "$return" context
 466 variable.
 467 .TP
 468 \fB.inline\fR
 469 Filters the results to include only instances of inlined functions. Note
 470 that inlined functions do not have an identifiable return point, so
 471 \fB.return\fR is not supported on \fB.inline\fR probes.
 472 .TP
 473 \fB.call\fR
 474 Filters the results to include only non-inlined functions (the opposite
 475 set of \fB.inline\fR)
 476 .TP
 477 \fB.exported\fR
 478 Filters the results to include only exported functions.
 479 .TP
 480 \fB.statement\fR
 481 Places a probe at the exact spot, exposing those local variables that
 482 are visible there.
 483 .TP
 484 \fB.statement.nearest\fR
 485 Places a probe at the nearest available line number for each line number
 486 given in the statement.
 487 .TP
 488 \fB.callee\fR
 489 Places a probe on the callee function given in the \fB.callee\fR
 490 modifier, where the callee must be a function called by the target
 491 function given in \fB.function\fR. The advantage of doing this over
 492 directly probing the callee function is that this probe point is run
 493 only when the callee is called from the target function (add the
 494 -DSTAP_CALLEE_MATCHALL directive to override this when calling
 495 \fBstap\fR(1)).
 496
 497 Note that only callees that can be statically determined are available.
 498 For example, calls through function pointers are not available.
 499 Additionally, calls to functions located in other objects (e.g.
 500 libraries) are not available (instead use another probe point). This
 501 feature will only work for code compiled with GCC 4.7+.
 502 .TP
 503 \fB.callees\fR
 504 Shortcut for \fB.callee("*")\fR, which places a probe on all callees of
 505 the function.
 506 .TP
 507 \fB.callees\fR(DEPTH)
 508 Recursively places probes on callees. For example, \fB.callees(2)\fR
 509 will probe both callees of the target function, as well as callees of
 510 those callees. And \fB.callees(3)\fR goes one level deeper, etc...
 511 A callee probe at depth N is only triggered when the N callers in the
 512 callstack match those that were statically determined during analysis
 513 (this also may be overridden using -DSTAP_CALLEE_MATCHALL).
 514 .RE
 515 .PP
 516 In the above list of probe points, MPATTERN stands for a string literal
 517 that aims to identify the loaded kernel module of interest. For in-tree
 518 kernel modules, the name suffices (e.g. "btrfs"). The name may also
 519 include the "*", "[]", and "?" wildcards to match multiple in-tree
 520 modules. Out-of-tree modules are also supported by specifying the full
 521 path to the ko file. Wildcards are not supported. The file must follow
 522 the convention of being named <module_name>.ko (characters ',' and '-'
 523 are replaced by '_').
 524 .PP
 525 LPATTERN stands for a source program label. It may also contain "*",
 526 "[]", and "?" wildcards. PATTERN stands for a string literal that aims
 527 to identify a point in the program.  It is made up of three parts:
 528 .IP \(bu 4
 529 The first part is the name of a function, as would appear in the
 530 .I nm
 531 program's output.  This part may use the "*" and "?" wildcarding
 532 operators to match multiple names.
 533 .IP \(bu 4
 534 The second part is optional and begins with the "@" character.
 535 It is followed by the path to the source file containing the function,
 536 which may include a wildcard pattern, such as mm/slab*.
 537 If it does not match as is, an implicit "*/" is optionally added
 538 .I before
 539 the pattern, so that a script need only name the last few components
 540 of a possibly long source directory path.
 541 .IP \(bu 4
 542 Finally, the third part is optional if the file name part was given,
 543 and identifies the line number in the source file preceded by a ":"
 544 or a "+".  The line number is assumed to be an
 545 absolute line number if preceded by a ":", or relative to the
 546 declaration line of the function if preceded by a "+".
 547 All the lines in the function can be matched with ":*".
 548 A range of lines x through y can be matched with ":x\-y". Ranges and
 549 specific lines can be mixed using commas, e.g. ":x,y\-z".
 550 .PP
 551 As an alternative, PATTERN may be a numeric constant, indicating an
 552 address.  Such an address may be found from symbol tables of the
 553 appropriate kernel / module object file.  It is verified against
 554 known statement code boundaries, and will be relocated for use at
 555 run time.
 556 .PP
 557 In guru mode only, absolute kernel-space addresses may be specified with
 558 the ".absolute" suffix.  Such an address is considered already relocated,
 559 as if it came from
 560 .BR /proc/kallsyms ,
 561 so it cannot be checked against statement/instruction boundaries.
 562 .SS CONTEXT VARIABLES
 563
 564 .PP
 565 Many of the source-level context variables, such as function parameters,
 566 locals, globals visible in the compilation unit, may be visible to
 567 probe handlers.  They may refer to these variables by prefixing their
 568 name with "$" within the scripts.  In addition, a special syntax
 569 allows limited traversal of structures, pointers, and arrays.  More
 570 syntax allows pretty-printing of individual variables or their groups.
 571 See also
 572 .BR @cast .
 573 Note that variables may be inaccessible due to them being paged out,
 574 or for a few other reasons.  See also man
 575 .IR error::fault (7stap).
 576
 577 .TP
 578 $var
 579 refers to an in-scope variable "var".  If it's an integer-like type,
 580 it will be cast to a 64-bit int for systemtap script use.  String-like
 581 pointers (char *) may be copied to systemtap string values using the
 582 .IR kernel_string " or " user_string
 583 functions.
 584 .TP
 585 @var("varname")
 586 an alternative syntax for
 587 .IR $varname
 588 .
 589 .TP
 590 @var("varname@src/file.c")
 591 refers to the global (either file local or external) variable
 592 .IR varname
 593 defined when the file
 594 .IR src/file.c
 595 was compiled. The CU in which the variable is resolved is the first CU
 596 in the module of the probe point which matches the given file name at
 597 the end and has the shortest file name path (e.g. given
 598 .IR @var("foo@bar/baz.c")
 599 and CUs with file name paths
 600 .IR src/sub/module/bar/baz.c
 601 and
 602 .IR src/bar/baz.c
 603 the second CU will be chosen to resolve the (file) global variable
 604 .IR foo
 605 .
 606 .TP
 607 $var\->field traversal via a structure's or a pointer's field.  This
 608 generalized indirection operator may be repeated to follow more
 609 levels.  Note that the
 610 .IR .
 611 operator is not used for plain structure
 612 members, only
 613 .IR \->
 614 for both purposes.  (This is because "." is reserved for string
 615 concatenation.)
 616 .TP
 617 $return
 618 is available in return probes only for functions that are declared
 619 with a return value, which can be determined using @defined($return).
 620 .TP
 621 $var[N]
 622 indexes into an array.  The index given with a literal number or even
 623 an arbitrary numeric expression.
 624 .PP
 625 A number of operators exist for such basic context variable expressions:
 626 .TP
 627 $$vars
 628 expands to a character string that is equivalent to
 629 .SAMPLE
 630 sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x",
 631         parm1, ..., parmN, var1, ..., varN)
 632 .ESAMPLE
 633 for each variable in scope at the probe point.  Some values may be
 634 printed as
 635 .IR =?
 636 if their run-time location cannot be found.
 637 .TP
 638 $$locals
 639 expands to a subset of $$vars for only local variables.
 640 .TP
 641 $$parms
 642 expands to a subset of $$vars for only function parameters.
 643 .TP
 644 $$return
 645 is available in return probes only.  It expands to a string that
 646 is equivalent to sprintf("return=%x", $return)
 647 if the probed function has a return value, or else an empty string.
 648 .TP
 649 & $EXPR
 650 expands to the address of the given context variable expression, if it
 651 is addressable.
 652 .TP
 653 @defined($EXPR)
 654 expands to 1 or 0 iff the given context variable expression is resolvable,
 655 for use in conditionals such as
 656 .SAMPLE
 657 @defined($foo\->bar) ? $foo\->bar : 0
 658 .ESAMPLE
 659 .TP
 660 $EXPR$
 661 expands to a string with all of $EXPR's members, equivalent to
 662 .SAMPLE
 663 sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}",
 664          $EXPR\->a, $EXPR\->b)
 665 .ESAMPLE
 666 .TP
 667 $EXPR$$
 668 expands to a string with all of $var's members and submembers, equivalent to
 669 .SAMPLE
 670 sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}",
 671         $EXPR\->a, $EXPR\->b, $EXPR\->c\->x, $EXPR\->c\->y, $EXPR\->d[0])
 672 .ESAMPLE
 673
 674 .SS MORE ON RETURN PROBES
 675
 676 .PP
 677 For the kernel ".return" probes, only a certain fixed number of
 678 returns may be outstanding.  The default is a relatively small number,
 679 on the order of a few times the number of physical CPUs.  If many
 680 different threads concurrently call the same blocking function, such
 681 as futex(2) or read(2), this limit could be exceeded, and skipped
 682 "kretprobes" would be reported by "stap \-t".  To work around this,
 683 specify a
 684 .SAMPLE
 685 probe FOO.return.maxactive(NNN)
 686 .ESAMPLE
 687 suffix, with a large enough NNN to cover all expected concurrently blocked
 688 threads.  Alternately, use the
 689 .SAMPLE
 690 stap \-DKRETACTIVE=NNNN
 691 .ESAMPLE
 692 stap command line macro setting to override the default for all
 693 ".return" probes.
 694
 695 .PP
 696 For ".return" probes, context variables other than the "$return" may
 697 be accessible, as a convenience for a script programmer wishing to
 698 access function parameters.  These values are \fBsnapshots\fP
 699 taken at the time of function entry.  (Local variables within the
 700 function are \fBnot\fP generally accessible, since those variables did
 701 not exist in allocated/initialized form at the snapshot moment.)
 702 These entry-snapshot variables should be accessed via
 703 .IR @entry($var) .
 704 .PP
 705 In addition, arbitrary entry-time expressions can also be saved for
 706 ".return" probes using the
 707 .IR @entry(expr)
 708 operator.  For example, one can compute the elapsed time of a function:
 709 .SAMPLE
 710 probe kernel.function("do_filp_open").return {
 711     println( get_timeofday_us() \- @entry(get_timeofday_us()) )
 712 }
 713 .ESAMPLE
 714
 715 .PP
 716 The following table summarizes how values related to a function
 717 parameter context variable, a pointer named \fBaddr\fP, may be
 718 accessed from a
 719 .IR .return
 720 probe.
 721 .\" summarized from http://sourceware.org/ml/systemtap/2012-q1/msg00025.html
 722 .TS
 723 l l l.
 724 \fBat-entry value       past-exit value\fP
 725
 726 $addr   \fInot available\fP
 727 $addr->x->y     @cast(@entry($addr),"struct zz")->x->y
 728 $addr[0]        {kernel,user}_{char,int,...}(& $addr[0])
 729 .TE
 730
 731
 732 .SS DWARFLESS
 733 In absence of debugging information, entry & exit points of kernel & module
 734 functions can be probed using the "kprobe" family of probes.
 735 However, these do not permit looking up the arguments / local variables
 736 of the function.
 737 Following constructs are supported :
 738 .SAMPLE
 739 kprobe.function(FUNCTION)
 740 kprobe.function(FUNCTION).call
 741 kprobe.function(FUNCTION).return
 742 kprobe.module(NAME).function(FUNCTION)
 743 kprobe.module(NAME).function(FUNCTION).call
 744 kprobe.module(NAME).function(FUNCTION).return
 745 kprobe.statement(ADDRESS).absolute
 746 .ESAMPLE
 747 .PP
 748 Probes of type
 749 .B function
 750 are recommended for kernel functions, whereas probes of type
 751 .B module
 752 are recommended for probing functions of the specified module.
 753 In case the absolute address of a kernel or module function is known,
 754 .B statement
 755 probes can be utilized.
 756 .PP
 757 Note that
 758 .I FUNCTION
 759 and
 760 .I MODULE
 761 names
 762 .B must not
 763 contain wildcards, or the probe will not be registered.
 764 Also, statement probes must be run under guru-mode only.
 765
 766
 767 .SS USER-SPACE
 768 Support for user-space probing is available for kernels that are
 769 configured with the utrace extensions, or have the uprobes facility in
 770 linux 3.5.  (Various kernel build configuration options need to be
 771 enabled; systemtap will advise if these are missing.)
 772
 773 .PP
 774 There are several forms.  First, a non-symbolic probe point:
 775 .SAMPLE
 776 process(PID).statement(ADDRESS).absolute
 777 .ESAMPLE
 778 is analogous to
 779 .IR
 780 kernel.statement(ADDRESS).absolute
 781 in that both use raw (unverified) virtual addresses and provide
 782 no $variables.  The target PID parameter must identify a running
 783 process, and ADDRESS should identify a valid instruction address.
 784 All threads of that process will be probed.
 785 .PP
 786 Second, non-symbolic user-kernel interface events handled by
 787 utrace may be probed:
 788 .SAMPLE
 789 process(PID).begin
 790 process("FULLPATH").begin
 791 process.begin
 792 process(PID).thread.begin
 793 process("FULLPATH").thread.begin
 794 process.thread.begin
 795 process(PID).end
 796 process("FULLPATH").end
 797 process.end
 798 process(PID).thread.end
 799 process("FULLPATH").thread.end
 800 process.thread.end
 801 process(PID).syscall
 802 process("FULLPATH").syscall
 803 process.syscall
 804 process(PID).syscall.return
 805 process("FULLPATH").syscall.return
 806 process.syscall.return
 807 process(PID).insn
 808 process("FULLPATH").insn
 809 process(PID).insn.block
 810 process("FULLPATH").insn.block
 811 .ESAMPLE
 812 .PP
 813 A
 814 .B .begin
 815 probe gets called when new process described by PID or FULLPATH gets created.
 816 A
 817 .B .thread.begin
 818 probe gets called when a new thread described by PID or FULLPATH gets created.
 819 A
 820 .B .end
 821 probe gets called when process described by PID or FULLPATH dies.
 822 A
 823 .B .thread.end
 824 probe gets called when a thread described by PID or FULLPATH dies.
 825 A
 826 .B .syscall
 827 probe gets called when a thread described by PID or FULLPATH makes a
 828 system call.  The system call number is available in the
 829 .BR $syscall
 830 context variable, and the first 6 arguments of the system call
 831 are available in the
 832 .BR $argN
 833 (ex. $arg1, $arg2, ...) context variable.
 834 A
 835 .B .syscall.return
 836 probe gets called when a thread described by PID or FULLPATH returns from a
 837 system call.  The system call number is available in the
 838 .BR $syscall
 839 context variable, and the return value of the system call is available
 840 in the
 841 .BR $return
 842 context variable.
 843 A
 844 .B .insn
 845 probe gets called for every single-stepped instruction of the process described by PID or FULLPATH.
 846 A
 847 .B .insn.block
 848 probe gets called for every block-stepped instruction of the process described by PID or FULLPATH.
 849 .PP
 850 If a process probe is specified without a PID or FULLPATH, all user
 851 threads will be probed.  However, if systemtap was invoked with the
 852 .IR \-c " or " \-x
 853 options, then process probes are restricted to the process
 854 hierarchy associated with the target process.  If a process probe is
 855 unspecified (i.e. without a PID or FULLPATH), but with the
 856 .IR \-c "
 857 option, the PATH of the
 858 .IR \-c "
 859 cmd will be heuristically filled into the process PATH. In that case,
 860 only command parameters are allowed in the \fI-c\fR command (i.e. no
 861 command substitution allowed and no occurrences of any of these
 862 characters: '|&;<>(){}').
 863
 864 .PP
 865 Third, symbolic static instrumentation compiled into programs and
 866 shared libraries may be
 867 probed:
 868 .SAMPLE
 869 process("PATH").mark("LABEL")
 870 process("PATH").provider("PROVIDER").mark("LABEL")
 871 process(PID).mark("LABEL")
 872 process(PID).provider("PROVIDER").mark("LABEL")
 873 .ESAMPLE
 874 .PP
 875 A
 876 .B .mark
 877 probe gets called via a static probe which is defined in the
 878 application by STAP_PROBE1(PROVIDER,LABEL,arg1), which are macros defined in
 879 .BR sys/sdt.h .
 880 The PROVIDER is an arbitrary application identifier, LABEL is the
 881 marker site identifier, and arg1 is the integer-typed argument.
 882 STAP_PROBE1 is used for probes with 1 argument, STAP_PROBE2 is used
 883 for probes with 2 arguments, and so on.  The arguments of the probe
 884 are available in the context variables $arg1, $arg2, ...  An
 885 alternative to using the STAP_PROBE macros is to use the dtrace script
 886 to create custom macros.  Additionally, the variables $$name and
 887 $$provider are available as parts of the probe point name.  The
 888 .B sys/sdt.h
 889 macro names DTRACE_PROBE* are available as aliases for STAP_PROBE*.
 890
 891 .PP
 892 Finally, full symbolic source-level probes in user-space programs and
 893 shared libraries are supported.  These are exactly analogous to the
 894 symbolic DWARF-based kernel/module probes described above.  They
 895 expose the same sorts of context $variables for function parameters,
 896 local variables, and so on.
 897 .SAMPLE
 898 process("PATH").function("NAME")
 899 process("PATH").statement("*@FILE.c:123")
 900 process("PATH").plt("NAME")
 901 process("PATH").library("PATH").plt("NAME")
 902 process("PATH").library("PATH").function("NAME")
 903 process("PATH").library("PATH").statement("*@FILE.c:123")
 904 process("PATH").function("*").return
 905 process("PATH").function("myfun").label("foo")
 906 process("PATH").function("foo").callee("bar")
 907 process("PATH").plt("NAME").return
 908 process(PID).function("NAME")
 909 process(PID).statement("*@FILE.c:123")
 910 process(PID).plt("NAME")
 911 .ESAMPLE
 912
 913 .PP
 914 Note that for all process probes,
 915 .I PATH
 916 names refer to executables that are searched the same way shells do: relative
 917 to the working directory if they contain a "/" character, otherwise in
 918 .BR $PATH .
 919 If PATH names refer to scripts, the actual interpreters (specified in the
 920 script in the first line after the #! characters) are probed.
 921
 922 .PP
 923 Tapset process probes placed in the special directory
 924 $prefix/share/systemtap/tapset/PATH/ with relative paths will have their
 925 process parameter prefixed with the location of the tapset. For example,
 926
 927 .SAMPLE
 928 process("foo").function("NAME")
 929 .ESAMPLE
 930 .PP
 931 expands to
 932 .SAMPLE
 933 process("/usr/bin/foo").function("NAME")
 934 .ESAMPLE
 935
 936 .PP
 937 when placed in $prefix/share/systemtap/tapset/PATH/usr/bin/
 938
 939 .PP
 940 If PATH is a process component parameter referring to shared libraries
 941 then all processes that map it at runtime would be selected for probing.
 942 If PATH is a library component parameter referring to shared libraries
 943 then the process specified by the process component would be selected.
 944 Note that the PATH pattern in a library component will always apply to
 945 libraries statically determined to be in use by the process. However,
 946 you may also specify the full path to any library file even if not
 947 statically needed by the process.
 948
 949 .PP
 950 A .plt probe will probe functions in the program linkage table
 951 corresponding to the rest of the probe point.  .plt can be specified
 952 as a shorthand for .plt("*").  The symbol name is available as a
 953 $$name context variable; function arguments are not available, since
 954 PLTs are processed without debuginfo.  A .plt.return probe places a
 955 probe at the moment \fBafter\fR the return from the named
 956 function.
 957
 958 .PP
 959 If the PATH string contains wildcards as in the MPATTERN case, then
 960 standard globbing is performed to find all matching paths.  In this
 961 case, the
 962 .BR $PATH
 963 environment variable is not used.
 964
 965 .PP
 966 If systemtap was invoked with the
 967 .IR \-c " or " \-x
 968 options, then process probes are restricted to the process
 969 hierarchy associated with the target process.
 970
 971 .SS JAVA
 972 Support for probing Java methods is available using Byteman as a
 973 backend. Byteman is an instrumentation tool from the JBoss project
 974 which systemtap can use to monitor invocations for a specific method
 975 or line in a Java program.
 976 .PP
 977 Systemtap does so by generating a Byteman script listing the probes to
 978 instrument and then invoking the Byteman
 979 .IR bminstall
 980 utility.
 981 .PP
 982 This Java instrumentation support is currently a prototype feature
 983 with major limitations.  Moreover, Java probing currently does not
 984 work across users; the stap script must run (with appropriate
 985 permissions) under the same user that the Java process being
 986 probed. (Thus a stap script under root currently cannot probe Java
 987 methods in a non-root-user Java process.)
 988
 989 .PP
 990 The first probe type refers to Java processes by the name of the Java process:
 991 .SAMPLE
 992 java("PNAME").class("CLASSNAME").method("PATTERN")
 993 java("PNAME").class("CLASSNAME").method("PATTERN").return
 994 .ESAMPLE
 995 The PNAME argument must be a pre-existing jvm pid, and be identifiable
 996 via a jps listing.
 997 .PP
 998 The PATTERN parameter specifies the signature of the Java method to
 999 probe. The signature must consist of the exact name of the method,
1000 followed by a bracketed list of the types of the arguments, for
1001 instance "myMethod(int,double,Foo)". Wildcards are not supported.
1002 .PP
1003 The probe can be set to trigger at a specific line within the method
1004 by appending a line number with colon, just as in other types of
1005 probes: "myMethod(int,double,Foo):245".
1006 .PP
1007 The CLASSNAME parameter identifies the Java class the method belongs
1008 to, either with or without the package qualification. By default, the
1009 probe only triggers on descendants of the class that do not override
1010 the method definition of the original class. However, CLASSNAME can
1011 take an optional caret prefix, as in
1012 .IR ^org.my.MyClass,
1013 which specifies that the probe should also trigger on all descendants
1014 of MyClass that override the original method. For instance, every method
1015 with signature foo(int) in program org.my.MyApp can be probed at once using
1016 .SAMPLE
1017 java("org.my.MyApp").class("^java.lang.Object").method("foo(int)")
1018 .ESAMPLE
1019 .PP
1020 The second probe type works analogously, but refers to Java processes by PID:
1021 .SAMPLE
1022 java(PID).class("CLASSNAME").method("PATTERN")
1023 java(PID).class("CLASSNAME").method("PATTERN").return
1024 .ESAMPLE
1025 (PIDs for an already running process can be obtained using the
1026 .IR jps (1)
1027 utility.)
1028 .PP
1029 Context variables defined within java probes include
1030 .IR $arg1
1031 through
1032 .IR $arg10
1033 (for up to the first 10 arguments of a method), represented as integers or strings.
1034
1035 .SS PROCFS
1036
1037 These probe points allow procfs "files" in
1038 /proc/systemtap/MODNAME to be created, read and written using a
1039 permission that may be modified using the proper umask value. Default permissions are 0400 for read
1040 probes, and 0200 for write probes. If both a read and write probe are being
1041 used on the same file, a default permission of 0600 will be used.
1042 Using procfs.umask(0040).read would
1043 result in a 0404 permission set for the file.
1044 .RI ( MODNAME
1045 is the name of the systemtap module). The
1046 .I proc
1047 filesystem is a pseudo-filesystem which is used as an interface to
1048 kernel data structures. There are several probe point variants supported
1049 by the translator:
1050
1051 .SAMPLE
1052 procfs("PATH").read
1053 procfs("PATH").umask(UMASK).read
1054 procfs("PATH").read.maxsize(MAXSIZE)
1055 procfs("PATH").umask(UMASK).maxsize(MAXSIZE)
1056 procfs("PATH").write
1057 procfs("PATH").umask(UMASK).write
1058 procfs.read
1059 procfs.umask(UMASK).read
1060 procfs.read.maxsize(MAXSIZE)
1061 procfs.umask(UMASK).read.maxsize(MAXSIZE)
1062 procfs.write
1063 procfs.umask(UMASK).write
1064 .ESAMPLE
1065
1066 .I PATH
1067 is the file name (relative to /proc/systemtap/MODNAME) to be created.
1068 If no
1069 .I PATH
1070 is specified (as in the last two variants above),
1071 .I PATH
1072 defaults to "command".
1073 .PP
1074 When a user reads /proc/systemtap/MODNAME/PATH, the corresponding
1075 procfs
1076 .I read
1077 probe is triggered.  The string data to be read should be assigned to
1078 a variable named
1079 .IR $value ,
1080 like this:
1081
1082 .SAMPLE
1083 procfs("PATH").read { $value = "100\\n" }
1084 .ESAMPLE
1085 .PP
1086 When a user writes into /proc/systemtap/MODNAME/PATH, the
1087 corresponding procfs
1088 .I write
1089 probe is triggered.  The data the user wrote is available in the
1090 string variable named
1091 .IR $value ,
1092 like this:
1093
1094 .SAMPLE
1095 procfs("PATH").write { printf("user wrote: %s", $value) }
1096 .ESAMPLE
1097 .PP
1098 .I MAXSIZE
1099 is the size of the procfs read buffer.  Specifying
1100 .I MAXSIZE
1101 allows larger procfs output.  If no
1102 .I MAXSIZE
1103 is specified, the procfs read buffer defaults to
1104 .I STP_PROCFS_BUFSIZE
1105 (which defaults to
1106 .IR MAXSTRINGLEN ,
1107 the maximum length of a string).
1108 If setting the procfs read buffers for more than one file is needed,
1109 it may be easiest to override the
1110 .I STP_PROCFS_BUFSIZE
1111 definition.
1112 Here's an example of using
1113 .IR MAXSIZE :
1114
1115 .SAMPLE
1116 procfs.read.maxsize(1024) {
1117     $value = "long string..."
1118     $value .= "another long string..."
1119     $value .= "another long string..."
1120     $value .= "another long string..."
1121 }
1122 .ESAMPLE
1123
1124 .SS NETFILTER HOOKS
1125
1126 These probe points allow observation of network packets using the
1127 netfilter mechanism. A netfilter probe in systemtap corresponds to a
1128 netfilter hook function in the original netfilter probes API. It is
1129 probably more convenient to use
1130 .IR tapset::netfilter (3stap),
1131 which wraps the primitive netfilter hooks and does the work of
1132 extracting useful information from the context variables.
1133
1134 .PP
1135 There are several probe point variants supported by the translator:
1136
1137 .SAMPLE
1138 netfilter.hook("HOOKNAME").pf("PROTOCOL_F")
1139 netfilter.pf("PROTOCOL_F").hook("HOOKNAME")
1140 netfilter.hook("HOOKNAME").pf("PROTOCOL_F").priority("PRIORITY")
1141 netfilter.pf("PROTOCOL_F").hook("HOOKNAME").priority("PRIORITY")
1142 .ESAMPLE
1143
1144 .PP
1145 .I PROTOCOL_F
1146 is the protocol family to listen for, currently one of
1147 .I NFPROTO_IPV4,
1148 .I NFPROTO_IPV6,
1149 .I NFPROTO_ARP,
1150 or
1151 .I NFPROTO_BRIDGE.
1152
1153 .PP
1154 .I HOOKNAME
1155 is the point, or 'hook', in the protocol stack at which to intercept
1156 the packet. The available hook names for each protocol family are
1157 taken from the kernel header files <linux/netfilter_ipv4.h>,
1158 <linux/netfilter_ipv6.h>, <linux/netfilter_arp.h> and
1159 <linux/netfilter_bridge.h>. For instance, allowable hook names for
1160 .I NFPROTO_IPV4
1161 are
1162 .I NF_INET_PRE_ROUTING,
1163 .I NF_INET_LOCAL_IN,
1164 .I NF_INET_FORWARD,
1165 .I NF_INET_LOCAL_OUT,
1166 and
1167 .I NF_INET_POST_ROUTING.
1168
1169 .PP
1170 .I PRIORITY
1171 is an integer priority giving the order in which the probe point
1172 should be triggered relative to any other netfilter hook functions
1173 which trigger on the same packet. Hook functions execute on each
1174 packet in order from smallest priority number to largest priority number. If no
1175 .I PRIORITY
1176 is specified (as in the first two probe point variants above),
1177 .I PRIORITY
1178 defaults to "0".
1179
1180 There are a number of predefined priority names of the form
1181 .I NF_IP_PRI_*
1182 and
1183 .I NF_IP6_PRI_*
1184 which are defined in the kernel header files <linux/netfilter_ipv4.h> and <linux/netfilter_ipv6.h> respectively. The script is permitted to use these
1185 instead of specifying an integer priority. (The probe points for
1186 .I NFPROTO_ARP
1187 and
1188 .I NFPROTO_BRIDGE
1189 currently do not expose any named hook priorities to the script writer.)
1190 Thus, allowable ways to specify the priority include:
1191
1192 .SAMPLE
1193 priority("255")
1194 priority("NF_IP_PRI_SELINUX_LAST")
1195 .ESAMPLE
1196
1197 A script using guru mode is permitted to specify any identifier or
1198 number as the parameter for hook, pf, and priority. This feature
1199 should be used with caution, as the parameter is inserted verbatim into
1200 the C code generated by systemtap.
1201
1202 The netfilter probe points define the following context variables:
1203 .TP
1204 .IR $hooknum
1205 The hook number.
1206 .TP
1207 .IR $skb
1208 The address of the sk_buff struct representing the packet. See
1209 <linux/skbuff.h> for details on how to use this struct, or
1210 alternatively use the tapset
1211 .IR tapset::netfilter (3stap)
1212 for easy access to key information.
1213
1214 .TP
1215 .IR $in
1216 The address of the net_device struct representing the network device
1217 on which the packet was received (if any). May be 0 if the device is
1218 unknown or undefined at that stage in the protocol stack.
1219
1220 .TP
1221 .IR $out
1222 The address of the net_device struct representing the network device
1223 on which the packet will be sent (if any). May be 0 if the device is
1224 unknown or undefined at that stage in the protocol stack.
1225
1226 .TP
1227 .IR $verdict
1228 (Guru mode only.) Assigning one of the verdict values defined in
1229 <linux/netfilter.h> to this variable alters the further progress of
1230 the packet through the protocol stack. For instance, the following
1231 guru mode script forces all ipv6 network packets to be dropped:
1232
1233 .SAMPLE
1234 probe netfilter.pf("NFPROTO_IPV6").hook("NF_IP6_PRE_ROUTING") {
1235   $verdict = 0 /* nf_drop */
1236 }
1237 .ESAMPLE
1238
1239 For convenience, unlike the primitive probe points discussed here, the
1240 probes defined in
1241 .IR tapset::netfilter (3stap)
1242 export the lowercase names of the verdict constants (e.g. NF_DROP
1243 becomes nf_drop) as local variables.
1244
1245 .SS KERNEL TRACEPOINTS
1246
1247 This family of probe points hooks up to static probing tracepoints
1248 inserted into the kernel or modules.  As with markers, these
1249 tracepoints are special macro calls inserted by kernel developers to
1250 make probing faster and more reliable than with DWARF-based probes,
1251 and DWARF debugging information is not required to probe tracepoints.
1252 Tracepoints have an extra advantage of more strongly-typed parameters
1253 than markers.
1254
1255 Tracepoint probes look like:
1256 .BR kernel.trace("name") .
1257 The tracepoint name string, which may contain the usual wildcard
1258 characters, is matched against the names defined by the kernel
1259 developers in the tracepoint header files. To restrict the search to
1260 specific subsystems (e.g. sched, ext3, etc...), the following syntax
1261 can be used:
1262 .BR kernel.trace("system:name") .
1263 The tracepoint system string may also contain the usual wildcard
1264 characters.
1265
1266 The handler associated with a tracepoint-based probe may read the
1267 optional parameters specified at the macro call site.  These are
1268 named according to the declaration by the tracepoint author.  For
1269 example, the tracepoint probe
1270 .BR kernel.trace("sched:sched_switch")
1271 provides the parameters
1272 .BR $prev " and " $next .
1273 If the parameter is a complex type, as in a struct pointer, then a
1274 script can access fields with the same syntax as DWARF $target
1275 variables.  Also, tracepoint parameters cannot be modified, but in
1276 guru-mode a script may modify fields of parameters.
1277
1278 The subsystem and name of the tracepoint are available in
1279 .BR $$system " and " $$name
1280 and a string of name=value pairs for all parameters of the tracepoint
1281 is available in
1282 .BR $$vars " or " $$parms .
1283
1284 .SS KERNEL MARKERS (OBSOLETE)
1285
1286 This family of probe points hooks up to an older style of static
1287 probing markers inserted into older kernels or modules.  These markers
1288 are special STAP_MARK macro calls inserted by kernel developers to
1289 make probing faster and more reliable than with DWARF-based probes.
1290 Further, DWARF debugging information is
1291 .I not
1292 required to probe markers.
1293
1294 Marker probe points begin with
1295 .BR kernel .
1296 The next part names the marker itself:
1297 .BR mark("name") .
1298 The marker name string, which may contain the usual wildcard characters,
1299 is matched against the names given to the marker macros when the kernel
1300 and/or module was compiled.    Optionally, you can specify
1301 .BR format("format") .
1302 Specifying the marker format string allows differentiation between two
1303 markers with the same name but different marker format strings.
1304
1305 The handler associated with a marker-based probe may read the
1306 optional parameters specified at the macro call site.  These are
1307 named
1308 .BR $arg1 " through " $argNN ,
1309 where NN is the number of parameters supplied by the macro.  Number
1310 and string parameters are passed in a type-safe manner.
1311
1312 The marker format string associated with a marker is available in
1313 .BR $format .
1314 And also the marker name string is available in
1315 .BR $name .
1316
1317 .SS HARDWARE BREAKPOINTS
1318 This family of probes is used to set hardware watchpoints for a given
1319  (global) kernel symbol. The probes take three components as inputs :
1320
1321 1. The
1322 .BR virtual address / name
1323 of the kernel symbol to be traced is supplied as argument to this class
1324 of probes. ( Probes for only data segment variables are supported. Probing
1325 local variables of a function cannot be done.)
1326
1327 2. Nature of access to be probed :
1328 a.
1329 .I .write
1330 probe gets triggered when a write happens at the specified address/symbol
1331 name.
1332 b.
1333 .I rw
1334 probe is triggered when either a read or write happens.
1335
1336 3.
1337 .BR .length
1338 (optional)
1339 Users have the option of specifying the address interval to be probed
1340 using "length" constructs. The user-specified length gets approximated
1341 to the closest possible address length that the architecture can
1342 support. If the specified length exceeds the limits imposed by
1343 architecture, an error message is flagged and probe registration fails.
1344 Wherever 'length' is not specified, the translator requests a hardware
1345 breakpoint probe of length 1. It should be noted that the "length"
1346 construct is not valid with symbol names.
1347
1348 Following constructs are supported :
1349 .SAMPLE
1350 probe kernel.data(ADDRESS).write
1351 probe kernel.data(ADDRESS).rw
1352 probe kernel.data(ADDRESS).length(LEN).write
1353 probe kernel.data(ADDRESS).length(LEN).rw
1354 probe kernel.data("SYMBOL_NAME").write
1355 probe kernel.data("SYMBOL_NAME").rw
1356 .ESAMPLE
1357
1358 This set of probes make use of the debug registers of the processor,
1359 which is a scarce resource. (4 on x86 , 1 on powerpc ) The script
1360 translation flags a warning if a user requests more hardware breakpoint probes
1361 than the limits set by architecture. For example,a pass-2 warning is flashed
1362 when an input script requests 5 hardware breakpoint probes on an x86
1363 system while x86 architecture supports a maximum of 4 breakpoints.
1364 Users are cautioned to set probes judiciously.
1365
1366 .SS PERF
1367
1368 This family of probe points interfaces to the kernel "perf event"
1369 infrastructure for controlling hardware performance counters.
1370 The events being attached to are described by the "type",
1371 "config" fields of the
1372 .IR perf_event_attr
1373 structure, and are sampled at an interval governed by the
1374 "sample_period" and "sample_freq" fields.
1375
1376 These fields are made available to systemtap scripts using
1377 the following syntax:
1378 .SAMPLE
1379 probe perf.type(NN).config(MM).sample(XX)
1380 probe perf.type(NN).config(MM).hz(XX)
1381 probe perf.type(NN).config(MM)
1382 probe perf.type(NN).config(MM).process("PROC")
1383 probe perf.type(NN).config(MM).counter("COUNTER")
1384 probe perf.type(NN).config(MM).process("PROC").counter("COUNTER")
1385 .ESAMPLE
1386 The systemtap probe handler is called once per XX increments
1387 of the underlying performance counter when using the .sample field
1388 or at a frequency in hertz when using the .hz field. When not specified,
1389 the default behavior is to sample at a count of 1000000.
1390 The range of valid type/config is described by the
1391 .IR perf_event_open (2)
1392 system call, and/or the
1393 .IR linux/perf_event.h
1394 file.  Invalid combinations or exhausted hardware counter resources
1395 result in errors during systemtap script startup.  Systemtap does
1396 not sanity-check the values: it merely passes them through to
1397 the kernel for error- and safety-checking.  By default the perf event
1398 probe is systemwide unless .process is specified, which will bind the
1399 probe to a specific task.  If the name is omitted then it
1400 is inferred from the stap \-c argument.   A perf event can be read on
1401 demand using .counter.  The body of the perf probe handler will not be
1402 invoked for a .counter probe; instead, the counter is read in a user
1403 space probe via:
1404 .TP
1405    process("PROCESS").statement("func@file") {stat <<< @perf("NAME")}
1406
1407
1408 .SH EXAMPLES
1409 .PP
1410 Here are some example probe points, defining the associated events.
1411 .TP
1412 begin, end, end
1413 refers to the startup and normal shutdown of the session.  In this
1414 case, the handler would run once during startup and twice during
1415 shutdown.
1416 .TP
1417 timer.jiffies(1000).randomize(200)
1418 refers to a periodic interrupt, every 1000 +/\- 200 jiffies.
1419 .TP
1420 kernel.function("*init*"), kernel.function("*exit*")
1421 refers to all kernel functions with "init" or "exit" in the name.
1422 .TP
1423 kernel.function("*@kernel/time.c:240")
1424 refers to any functions within the "kernel/time.c" file that span
1425 line 240.
1426 .BR
1427 Note
1428 that this is
1429 .BR not
1430 a probe at the statement at that line number.  Use the
1431 .IR
1432 kernel.statement
1433 probe instead.
1434 .TP
1435 kernel.trace("sched_*")
1436 refers to all scheduler-related (really, prefixed) tracepoints in
1437 the kernel.
1438 .TP
1439 kernel.mark("getuid")
1440 refers to an obsolete STAP_MARK(getuid, ...) macro call in the kernel.
1441 .TP
1442 module("usb*").function("*sync*").return
1443 refers to the moment of return from all functions with "sync" in the
1444 name in any of the USB drivers.
1445 .TP
1446 kernel.statement(0xc0044852)
1447 refers to the first byte of the statement whose compiled instructions
1448 include the given address in the kernel.
1449 .TP
1450 kernel.statement("*@kernel/time.c:296")
1451 refers to the statement of line 296 within "kernel/time.c".
1452 .TP
1453 kernel.statement("bio_init@fs/bio.c+3")
1454 refers to the statement at line bio_init+3 within "fs/bio.c".
1455 .TP
1456 kernel.data("pid_max").write
1457 refers to a hardware breakpoint of type "write" set on pid_max
1458 .TP
1459 syscall.*.return
1460 refers to the group of probe aliases with any name in the third position
1461
1462 .SH SEE ALSO
1463 .nh
1464 .nf
1465 .IR stap (1),
1466 .IR probe::* (3stap),
1467 .IR tapset::* (3stap)
1468
1469 .\" Local Variables:
1470 .\" mode: nroff
1471 .\" End: