This is the mail archive of the mailing list for the systemtap project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

stap/stapbpf Comparison

SystemTap 3.2 includes an early prototype of SystemTap's new BPF backend (stapbpf).
It represents a first step towards leveraging powerful new tracing and performance
analysis capabilities recently added to the Linux kernel. In this post I will
compare the translation process of stapbpf with the default backend (stap) and
compare some differences in functionality between these two backends.

Stap and stapbpf share common parsing and semantic analysis stages. As input for
translation, they both receive data structures representing a parse tree of the
script, complete with variable types and references to the definitions of all
variables and functions. To see a summary of this information, the '-p2' option
can be used with the stap command.

   $ cat sample.stp
   probe kernel.function("sys_read") { printf("hi from sys_read!\n"); exit() }

   $ stap -p2 sample.stp
   # functions
   exit:unknown ()
   kernel.function("SyS_read@fs/read_write.c:542") \
     /* pc=_stext+0x273da0 */ /* <- kernel.function("SyS_read@fs/read_write.c:542") */
   $ stap -p2 --runtime=bpf sample.stp
   # functions
   _set_exit_status:long ()
   exit:unknown ()
   # probes
   kernel.function("SyS_read@fs/read_write.c:542") \
     /* pc=_stext+0x273da0 */ /* <- kernel.function("SyS_read@fs/read_write.c:542") */

You can see that stapbpf's exit function involves an additional call to
_set_exit_status but otherwise the two backends are probing the exact same location.

>From this point, the translation processes diverge. Stap's goal is to convert the
script into a kernel module. To accomplish this, stap translates the parse tree
into the C source code of the desired kernel module. At runtime, GCC is used to
compile this source code into the actual kernel module. The '-p4' option can be
used with the stap command to produce the kernel object file.

   # stap -p4 sample.stp
   # staprun [...]_1316.ko
   hi from sys_read!

Instead of C, stapbpf translates the script directly into BPF bytecode to be
executed by an in-kernel virtual machine. The bytecode is then stored in a BPF-ELF
file intended for use by the stapbpf runtime.

  # stap -p4 --runtime=bpf sample.stp
  # stapbpf
  hi from sys_read!

Unlike stap's kernel modules, producing the BPF bytecode requires no external
compiler. This helps keep stapbpf's compile times and installation footprint low.
With the '-v' option we can see the duration of each stage of translation.

  # stap -v -p4 sample.stp
  Pass 3: translated to C [...] in 0usr/0sys/4real ms.
  Pass 4: compiled C [...] in 1330usr/310sys/1559real ms.

  # stap -v -p4 --runtime=bpf sample.stp
  Pass 4: compiled BPF into "" in 0usr/0sys/0real ms.

Notice that pass 3 and 4 takes 1563ms for stap but <1ms for stapbpf (which
combines pass 3 and 4 into a single pass).

When loading BPF bytecode programs into the kernel, they are first checked for
safety by a verifier inside the kernel. It checks for undesirable behaviors
such as out of bound jumps, out of bounds stack loads/stores and reads from
uninitialized addresses. It also checks for the presence of unreachable
instructions and infinite loops. Any BPF program which does not pass the
verification will not be loaded into the BPF virtual machine. Although the
default stap is held to similar standards and is known to be very safe to use,
stapbpf has the advantage of inheriting BPF's simpler security model.
However this advantage does come with some trade-offs. For example, BPF does not
support writing to kernel memory. Although stap disables this capability by
default, it does provide a "guru mode" that acts as an escape hatch for the user
who wishes to have this level of control over their operating system. This means
that stapbpf does not share stap's ability to, for example, administer security
band-aids to a live system. Even more restricting is that the verifier rejects
any program with loops. While it would be possible for stapbpf to unwind loops,
BPF also imposes a limit of 4096 instructions per program. 

  # stap --runtime=bpf contains_loops.stp
  Error loading /tmp/stapxSM7Kg/ bpf program load failed: Invalid argument
  Pass 5: run failed.

  # stap --runtime=bpf too_many_insns.stp
  Error loading /tmp/stapqxRXi4/ bpf program load failed: Argument list too long
  Pass 5: run failed.

The following table is a summary comparing stap and stapbpf. Features which BPF
permits but are not yet implemented in stapbpf are indicated with 'possible'.

                                    stap              stapbpf

probe handlers                      yes                yes

protected probe
execution environment               yes                yes

lock-protected global            per probe         per operation
variables                         locking            locking

kprobes (DWARF)                     yes                yes

kprobes (DWARF-less)                yes              possible

uprobes                             yes              possible

tracepoints                         yes              possible

timer-based probing                 yes              possible

probe dynamically loaded
kernel objects                      yes              possible

able to change state in                              possible
probed program                      yes          (userspace only)

means available to bypass
protection for advanced             yes                no

loop support
(for, while, foreach)               yes                no

string support                             
(variables, literals)               yes              limited*

probe handler
length limit                  1000 statements    4096 instructions 

means available to
increase handler                    yes                no
length limit

kernel verifies safety
of program                          no                 yes

* There is support for printf's format string literal.

It can be seen that stapbpf is able to provide only a subset of stap's
functionality. However for systems whose security policies either preclude
the full kernel module backend or require software with a security model
simpler than stap's, stapbpf aims to provide a convenient way to utilize
this subset.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]