This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
stap/stapbpf Comparison
- From: Aaron Merey <amerey at redhat dot com>
- To: systemtap at sourceware dot org
- Date: Mon, 30 Oct 2017 16:58:53 -0400 (EDT)
- Subject: stap/stapbpf Comparison
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx10.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx10.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=amerey at redhat dot com
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 1173661B8F
SystemTap 3.2 includes an early prototype of SystemTap's new BPF backend (stapbpf).
It represents a first step towards leveraging powerful new tracing and performance
analysis capabilities recently added to the Linux kernel. In this post I will
compare the translation process of stapbpf with the default backend (stap) and
compare some differences in functionality between these two backends.
Stap and stapbpf share common parsing and semantic analysis stages. As input for
translation, they both receive data structures representing a parse tree of the
script, complete with variable types and references to the definitions of all
variables and functions. To see a summary of this information, the '-p2' option
can be used with the stap command.
$ cat sample.stp
probe kernel.function("sys_read") { printf("hi from sys_read!\n"); exit() }
$ stap -p2 sample.stp
# functions
exit:unknown ()
kernel.function("SyS_read@fs/read_write.c:542") \
/* pc=_stext+0x273da0 */ /* <- kernel.function("SyS_read@fs/read_write.c:542") */
$ stap -p2 --runtime=bpf sample.stp
# functions
_set_exit_status:long ()
exit:unknown ()
# probes
kernel.function("SyS_read@fs/read_write.c:542") \
/* pc=_stext+0x273da0 */ /* <- kernel.function("SyS_read@fs/read_write.c:542") */
You can see that stapbpf's exit function involves an additional call to
_set_exit_status but otherwise the two backends are probing the exact same location.
>From this point, the translation processes diverge. Stap's goal is to convert the
script into a kernel module. To accomplish this, stap translates the parse tree
into the C source code of the desired kernel module. At runtime, GCC is used to
compile this source code into the actual kernel module. The '-p4' option can be
used with the stap command to produce the kernel object file.
# stap -p4 sample.stp
[...]_1316.ko
# staprun [...]_1316.ko
hi from sys_read!
Instead of C, stapbpf translates the script directly into BPF bytecode to be
executed by an in-kernel virtual machine. The bytecode is then stored in a BPF-ELF
file intended for use by the stapbpf runtime.
# stap -p4 --runtime=bpf sample.stp
stap_1348.bo
# stapbpf stap_1348.bo
hi from sys_read!
Unlike stap's kernel modules, producing the BPF bytecode requires no external
compiler. This helps keep stapbpf's compile times and installation footprint low.
With the '-v' option we can see the duration of each stage of translation.
# stap -v -p4 sample.stp
[...]
Pass 3: translated to C [...] in 0usr/0sys/4real ms.
Pass 4: compiled C [...] in 1330usr/310sys/1559real ms.
# stap -v -p4 --runtime=bpf sample.stp
[...]
Pass 4: compiled BPF into "stap_3792.bo" in 0usr/0sys/0real ms.
Notice that pass 3 and 4 takes 1563ms for stap but <1ms for stapbpf (which
combines pass 3 and 4 into a single pass).
When loading BPF bytecode programs into the kernel, they are first checked for
safety by a verifier inside the kernel. It checks for undesirable behaviors
such as out of bound jumps, out of bounds stack loads/stores and reads from
uninitialized addresses. It also checks for the presence of unreachable
instructions and infinite loops. Any BPF program which does not pass the
verification will not be loaded into the BPF virtual machine. Although the
default stap is held to similar standards and is known to be very safe to use,
stapbpf has the advantage of inheriting BPF's simpler security model.
However this advantage does come with some trade-offs. For example, BPF does not
support writing to kernel memory. Although stap disables this capability by
default, it does provide a "guru mode" that acts as an escape hatch for the user
who wishes to have this level of control over their operating system. This means
that stapbpf does not share stap's ability to, for example, administer security
band-aids to a live system. Even more restricting is that the verifier rejects
any program with loops. While it would be possible for stapbpf to unwind loops,
BPF also imposes a limit of 4096 instructions per program.
# stap --runtime=bpf contains_loops.stp
Error loading /tmp/stapxSM7Kg/stap_8316.bo: bpf program load failed: Invalid argument
[...]
Pass 5: run failed.
# stap --runtime=bpf too_many_insns.stp
Error loading /tmp/stapqxRXi4/stap_8432.bo: bpf program load failed: Argument list too long
[...]
Pass 5: run failed.
The following table is a summary comparing stap and stapbpf. Features which BPF
permits but are not yet implemented in stapbpf are indicated with 'possible'.
stap stapbpf
non-blocking
probe handlers yes yes
protected probe
execution environment yes yes
lock-protected global per probe per operation
variables locking locking
kprobes (DWARF) yes yes
kprobes (DWARF-less) yes possible
uprobes yes possible
tracepoints yes possible
timer-based probing yes possible
probe dynamically loaded
kernel objects yes possible
able to change state in possible
probed program yes (userspace only)
means available to bypass
protection for advanced yes no
users
loop support
(for, while, foreach) yes no
string support
(variables, literals) yes limited*
probe handler
length limit 1000 statements 4096 instructions
means available to
increase handler yes no
length limit
kernel verifies safety
of program no yes
* There is support for printf's format string literal.
It can be seen that stapbpf is able to provide only a subset of stap's
functionality. However for systems whose security policies either preclude
the full kernel module backend or require software with a security model
simpler than stap's, stapbpf aims to provide a convenient way to utilize
this subset.