1 SystemTap overview

1.1 About this guide

This guide is a comprehensive reference of SystemTap’s language constructs and syntax. The contents borrow heavily from existing SystemTap documentation found in manual pages and the tutorial. The presentation of information here provides the reader with a single place to find language syntax and recommended usage. In order to successfully use this guide, you should be familiar with the general theory and operation of SystemTap. If you are new to SystemTap, you will find the tutorial to be an excellent place to start learning. For detailed information about tapsets, see the manual pages provided with the distribution. For information about the entire collection of SystemTap reference material, see Section 11

1.2 Reasons to use SystemTap

SystemTap provides infrastructure to simplify the gathering of information about a running Linux kernel so that it may be further analyzed. This analysis assists in identifying the underlying cause of a performance or functional problem. SystemTap was designed to eliminate the need for a developer to go through the tedious instrument, recompile, install, and reboot sequence normally required to collect this kind of data. To do this, it provides a simple command-line interface and scripting language for writing instrumentation for both kernel and user space. With SystemTap, developers, system administrators, and users can easily write scripts that gather and manipulate system data that is otherwise unavailable from standard Linux tools. Users of SystemTap will find it to be a significant improvement over older methods.

1.3 Event-action language

SystemTap’s language is strictly typed, declaration free, procedural, and inspired by dtrace and awk. Source code points or events in the kernel are associated with handlers, which are subroutines that are executed synchronously. These probes are conceptually similar to ”breakpoint command lists” in the GDB debugger.

There are two main outermost constructs: probes and functions. Within these, statements and expressions use C-like operator syntax and precedence.

1.4 Sample SystemTap scripts

Following are some example scripts that illustrate the basic operation of SystemTap. For more examples, see the examples/small_demos/ directory in the source directory, the SystemTap wiki at http://sourceware.org/systemtap/wiki/HomePage, or the SystemTap War Stories at http://sourceware.org/systemtap/wiki/WarStories page.

1.4.1 Basic SystemTap syntax and control structures

The following code examples demonstrate SystemTap syntax and control structures.

     global odds, evens
     
     probe begin {
         # "no" and "ne" are local integers
         for (i = 0; i < 10; i++) {
             if (i % 2) odds [no++] = i
                 else evens [ne++] = i
         }
     
         delete odds[2]
         delete evens[3]
         exit()
     }
     
     probe end {
         foreach (x+ in odds)
             printf ("odds[%d] = %d", x, odds[x])
     
         foreach (x in evens-)
             printf ("evens[%d] = %d", x, evens[x])
     }

This prints:

     odds[0] = 1
     odds[1] = 3
     odds[3] = 7
     odds[4] = 9
     evens[4] = 8
     evens[2] = 4
     evens[1] = 2
     evens[0] = 0

Note that all variable types are inferred, and that all locals and globals are initialized. Integers are set to 0 and strings are set to the empty string.

1.4.2 Primes between 0 and 49

     function isprime (x) {
         if (x < 2) return 0
         for (i = 2; i < x; i++) {
             if (x % i == 0) return 0
             if (i * i > x) break
         }
         return 1
     }
     
     probe begin {
         for (i = 0; i < 50; i++)
             if (isprime (i)) printf("%d\n", i)
         exit()
     }

This prints:

     2
     3
     5
     7
     11
     13
     17
     19
     23
     29
     31
     37
     41
     43
     47

1.4.3 Recursive functions

     function fibonacci(i) {
         if (i < 1) error ("bad number")
         if (i == 1) return 1
         if (i == 2) return 2
         return fibonacci (i-1) + fibonacci (i-2)
     }
     
     probe begin {
         printf ("11th fibonacci number: %d", fibonacci (11))
         exit ()
     }

This prints:

     11th fibonacci number: 118

Any larger number input to the function may exceed the MAXACTION or MAXNESTING limits, which will be caught at run time and result in an error. For more about limits see Section 1.6.

1.5 The stap command

The stap program is the front-end to the SystemTap tool. It accepts probing instructions written in its scripting language, translates those instructions into C code, compiles this C code, and loads the resulting kernel module into a running Linux kernel to perform the requested system trace or probe functions. You can supply the script in a named file, from standard input, or from the command line. The SystemTap script runs until one of the following conditions occurs:

The stap command does the following:

For a full list of options to the stap command, see the stap(1) manual page.

1.6 Safety and security

SystemTap is an administrative tool. It exposes kernel internal data structures and potentially private user information. It requires root privileges to actually run the kernel objects it builds using the sudo command, applied to the staprun program.

staprun is a part of the SystemTap package, dedicated to module loading and unloading and kernel-to-user data transfer. Since staprun does not perform any additional security checks on the kernel objects it is given, do not give elevated privileges via sudo to untrusted users.

The translator asserts certain safety constraints. It ensures that no handler routine can run for too long, allocate memory, perform unsafe operations, or unintentionally interfere with the kernel. Use of script global variables is locked to protect against manipulation by concurrent probe handlers. Use of guru mode constructs such as embedded C (see Section 3.5) can violate these constraints, leading to a kernel crash or data corruption.

The resource use limits are set by macros in the generated C code. These may be overridden with the -D flag. The following list describes a selection of these macros:

MAXNESTING – The maximum number of recursive function call levels. The default is 10.

MAXSTRINGLEN – The maximum length of strings. The default is 256 bytes for 32 bit machines and 512 bytes for all other machines.

MAXTRYLOCK – The maximum number of iterations to wait for locks on global variables before declaring possible deadlock and skipping the probe. The default is 1000.

MAXACTION – The maximum number of statements to execute during any single probe hit. The default is 1000.

MAXMAPENTRIES – The maximum number of rows in an array if the array size is not specified explicitly when declared. The default is 2048.

MAXERRORS – The maximum number of soft errors before an exit is triggered. The default is 0.

MAXSKIPPED – The maximum number of skipped reentrant probes before an exit is triggered. The default is 100.

MINSTACKSPACE – The minimum number of free kernel stack bytes required in order to run a probe handler. This number should be large enough for the probe handler’s own needs, plus a safety margin. The default is 1024.

If something goes wrong with stap or staprun after a probe has started running, you may safely kill both user processes, and remove the active probe kernel module with the rmmod command. Any pending trace messages may be lost.