Martin Cermak [Tue, 12 Feb 2019 15:34:14 +0000 (16:34 +0100)]
Conditionally define __NR_bdflush in systemtap runtime.
The kernel-headers-4.17.0-0.rc6.1.el8+7.aarch64.rpm dropped
__NR_bdflush define, so let's define it conditionally in
compat_unistd.h so that stap scripts can -p4 without
complaints on those kernels.
It turns out that scripts like "--example errsnoop.stp" involves tens
of millions (!) of substring searches, as combinatorial & nesting
factors cause many embedded-c function/statement bodies to be searched
for "/* pragma */" type tags.
We probably do this too much - see the number (166417+70699) of times
varuse_collecting_visitor::visit_embedded* end up being called in
optimization/relaxation loops. But even without delving into that, we
can improve the constant factor: the actual string searching speed.
This patch adds a little memoization widget to interned_string
substring searching. It takes pass-2 runtime from 71s to 22s on my
workstation and costs practically no extra memory. (Storing copies of
interned_strings is cheap.)
David Ward [Wed, 30 Jan 2019 05:37:56 +0000 (00:37 -0500)]
Handle installation without stapusr group
Do not cause "make install" to return an error if the stapusr group
cannot be found or created (even as root); continue without setting
the ownership or mode of the installed executables. This may happen
when building distribution packages using fakeroot (it was observed
on Arch Linux). This step is often performed directly in the build
files of the distribution package instead (such as systemtap.spec).
David Ward [Wed, 30 Jan 2019 05:37:55 +0000 (00:37 -0500)]
Simplify creation of groups during installation
Use the "-f" option of "groupadd", rather than calling it a second
time if the desired GID is already in use.
Do not call "getent" twice. We know that a group exists if the first
call to "getent" returned successfully, or otherwise if "groupadd"
returned successfully.
This removes the 'write' and 'force' from get_user_pages() and replaces
them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
as use of this flag can result in surprising behaviour (and hence bugs)
within the mm subsystem.
And it changes the function signature of get_user_pages(), so introduce
an extra flag STAPCONF_GET_USER_PAGES_FLAGS and the corresponding test program
to fix it.
Adapt to access_ok() kapi change in commit 96d4f267e40.
Adapt to changes in kbuild dependency automation.
Adapt to a gcc9 false-positive warning about empty array iteration.
Serhei Makarov [Wed, 16 Jan 2019 18:56:23 +0000 (13:56 -0500)]
PR10280 initial fix: force vermagic for guru-mode scripts
Current version checking based on kernel ABI and build-ids is not
strict enough to prevent launching a stap module on a kernel version
it wasn't compiled for. This has the potential to crash a running
kernel, since ABI compatibility may not give a sufficient guarantee of
real compatibility.
Initial fix, should investigate what other scenarios should have
tighter checking.
Martin Cermak [Thu, 6 Dec 2018 12:52:20 +0000 (13:52 +0100)]
Make sysc_bdflush.stp compatible with 4.17+ kernels.
The bdflush syscall itself appears to be obsolete since 2.6,
but this way we at least won't end up with pass 1 "resolution
failed in alias expansion builder" when randomly probing for it.
There is an old customer rh bz 544960 related to bdflush.
Frank Ch. Eigler [Mon, 26 Nov 2018 15:36:28 +0000 (10:36 -0500)]
PR23866 part: expose raw syscall tracepoint to bpf
Support TRACE_EVENT_FN() type tracepoints in bpf target, since those
too define trace-event structures that bpf callbacks have access to.
(We do not yet decode nor explose Synthetic cooked per-syscall
trace-event structs. Those aren't statically extractable from headers,
as the kernel builds them at boot time.)
PR23891: Make sure stap and staprun respond to SIGTERM when stderr/stdout are blocked
When stderr/stdout are blocked (the write buffers are full), write()
syscalls in stap's signal handler and staprun's stp_main_loop() might
prevent these processes from responding to signals like SIGTERM.
Also make staprun respond to SIGPIPE just like SIGTERM.
We introduce the kill_relayfs() function to kill the reader thread in
staprun without waiting for the readers (which might alreayd be blocked
on writing to stdout).
Our local stress tests have confirmed that this patch indeed fixes the
hanging issues in stap and staprun.
Make opeartor @var() no longer assume @entry() in return probes.
The old behavior would yield stale values when the function being probed
changes the global variables being read via @var() in the return probe
handler.
Added tests to cover this fix (and the old behavior for compatibility).
William Cohen [Mon, 19 Nov 2018 20:17:10 +0000 (15:17 -0500)]
Adjust tcp_trace.stp example to work with newer Linux kernel's timers
The newer Linux kernels removed the data field from struct timer_list
and now derive that equivalent information using the container_of
macro to find the data structure the struct timer_list is embedded in.
This patch makes tcp_trace.stp flexible and allows it to get the
needed information when the struct timer_list data field and the
associated $data target variable for timer functions are not
available.
William Cohen [Mon, 19 Nov 2018 19:23:15 +0000 (14:23 -0500)]
Adjust the vfs_open to provide cred variable with 4.18 kernels
The kernel's git commit ae2bb293a3e8adbc54d08cede5afc22929030c03
removed the cred argument from the vfs_open. Thus, there is no $cred
target variable available. This missing target variable lives on as a
field in the $file target variable. The patch makes the tapset use
that field if the $cred target variable is not available. Fixing this
allows the slowvfs.stp example to work with newer linux 4.18 kernels.
Frank Ch. Eigler [Fri, 16 Nov 2018 01:22:34 +0000 (20:22 -0500)]
PR23890 bonus: show nicer messages upon a buildid mismatch
Instead of producing only a one-byte error, we now compute the entire
builds into hex text strings, and report the whole shebang on an
error. (Also, ditch some 2.6.27 kernel-bug compatibiltiy fossil
in the area.)
Frank Ch. Eigler [Thu, 15 Nov 2018 21:27:58 +0000 (16:27 -0500)]
PR23890: tolerate f29+ style ELF files
Reported by kenj@pcp, with mjw et al.'s help, we found out why
systemtap on fedora 29+ routinely fails to verify build-ids for
userspace programs. F29 adds a separate loadable segment with the
relevante .note's, before the main text segment. The runtime code
that listens to mmaps-in-progress now accepts this configuration.
As long as the .note section is loaded (time-wise and space-wise)
before the .text one(s), we're good.
Mark Wielaard [Wed, 14 Nov 2018 18:28:04 +0000 (13:28 -0500)]
PR23747: tolerate symbols with odd section#s
In f29+ kernels, some note-related symbols are not found in our
usual section# search, when collecting unwind/symbol data. As
these symbols are now ignored instead of causing an error.
William Cohen [Mon, 12 Nov 2018 22:24:44 +0000 (17:24 -0500)]
Adjust the periodic.stp example to work with newer Linux kernels
The data field in the timer_list struct was removed in newer kernels.
The various functions executed when a timer expires now use
container_of macros to find the struct that the timer_list was
embedded in. The periodic.stp script has been modified to use
container_of when the data field is not available.
Serhei Makarov [Fri, 9 Nov 2018 21:24:09 +0000 (16:24 -0500)]
PR23860: reduce stack pressure from format strings
Reduce stack pressure created by the earlier commits by allocating
format strings in a predictable location in the top half of the stack
[-BPF_MAXSTRINGLEN*2..0) as long as they fit in there. This works
since only one format string is active at a time and no ordinary
strings are being allocated in that region of the stack now.
* bpf-opt.cxx (alloc_literal_str): Store format_str in top half.
Serhei Makarov [Fri, 9 Nov 2018 19:36:19 +0000 (14:36 -0500)]
PR23860: additional ugly stack/clobber protection for strings
In addition to prior commit, emit_string_copy() does not work for
overlapping source/destination. So, make sure strings are not
allocated in a way which overlaps map key/value arguments.
This increases space pressure, inducing a couple of bpf-asm.exp
testcase failures.
* bpf-internal.h (value::format_str): New flag.
(value::value): Take format_str flag.
(value::mk_str): Take format_str flag.
(program::format_map): New field, caches format_str separately.
(program::new_str): Take format_str flag.
* bpf-base.cxx (program::new_str): Cache format_str separately.
* bpf-opt.cxx (alloc_literal_str): Store non-format str in lower half.
* bpf-translate.cxx (emit_string_copy): Comment -- doesn't support overlap.
(emit_string_copy): DEBUG_CODEGEN -- identify if zero-padding was done.
(emit_print_format): Set format_str flag.
Serhei Makarov [Thu, 8 Nov 2018 21:40:40 +0000 (16:40 -0500)]
PR23860: additional stack protection for strings
Fixes for verifier rejection of some cases requiring string copy,
since the verifier would reject string copy code extending beyond the
end of the string even if it was not reachable.
* bpf-opt.cxx (alloc_literal_str): make sure the offset for a short
string is at least BPF_MAXSTRINGLEN.
(zero_stack): New function.
(program::generate): Use zero_stack() to zero temporary area and
prevent verifier complaints.
* testsuite/systemtap.bpf/asm_tests/pr23860.stp: New testcase.
Serhei Makarov [Wed, 7 Nov 2018 18:07:51 +0000 (13:07 -0500)]
pr23860 verifier workaround :: be sure to delete all mov rN,rN
An apparent bug in the eBPF verifier fails to preserve register state
when MOVing a register to itself, marking rN as 'unknown scalar'.
Previously bpf-opt.cxx failed to remove spurious MOVs if they were the
final instruction in a basic block. This would fail verification if
the register holds a pointer.
Jafeer Uddin [Wed, 7 Nov 2018 14:41:45 +0000 (09:41 -0500)]
PR23761: generalized @entry
With the changes to the syscall tapset for kernel 4.17+, it is now
possible for non-[uk]retprobes to trap return events. This means that
the @entry mechanism to access entry probe target variables in return
probes is not guaranteed to work. To get around this issue, a collection
of 8 global variables have been added to the tapset which can be used
to save variables in the entry probe and retrieve them later in return
probes. The global variables can be accessed using the @this1, .., @this8
macros.
standardize ktime_get_ns() across lkm, bpf runtimes
Make sure ktime_get_ns() is available across runtimes. In the case of
bpf, add a userspace helper to implement the function. Add test case.
Add a systemtap.bpf/nobpf.exp test driver, which runs all the
bpf_tests but specifically without "--bpf", in the hope that all those
scripts should run on the normal backend too. PR23866 blocks some of
that at the moment.
Stan Cox [Tue, 6 Nov 2018 17:09:58 +0000 (12:09 -0500)]
Always use nssInit for http and nss server.
* nsscommon.h (db_init_types): Add db_init_types
* nsscommon.cxx (add_client_cert): Use it to differentiate type of
db init. Change all callers.
* client-http.cxx (fill_in_server_info): Use http server default
port if none specified.
Frank Ch. Eigler [Fri, 26 Oct 2018 15:24:05 +0000 (11:24 -0400)]
prometheus-exporter samples: change reported metric name
When prometheus scrapes metrics, by default it'll simply preserve
the incoming names. This doesn't work well when many different
stap scripts use the same metric name ("count"), or if the name
happens to be a reserved keyword in the promql language ("count").
Victor Kamensky [Wed, 31 Oct 2018 06:15:15 +0000 (23:15 -0700)]
aarch64: add missing system call defines
A set of system call defines like __NR_alarm, __NR_ioperm,
__NR_modify_ldt, __NR_time, __NR_utime is missing in aarch64
kernel as of 4.18 kernel version.
Add corresponding definitions so system call related probes
would compile. Tested with nd_syscalls-all-probes.stp.
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Jafeer Uddin [Tue, 30 Oct 2018 20:07:05 +0000 (16:07 -0400)]
Fix miscellaneous errors/typos in syscall tapset
* tapset/linux/sysc_pkey_*.stp: properly cast target variables in dw_syscall probe
* runtime/linux/compat_unistd.h: fix typos and errors in __NR_* definitions
* testsuite/systemtap.syscall/pkey.c: remove empty first line in file
Victor Kamensky [Tue, 30 Oct 2018 19:58:24 +0000 (15:58 -0400)]
On aarch64 Linux system calls related SystemTap scripts
compilation fail with "__NR_compat_[exit|read|write] redefined"
errors after following two commits:
7abf0aee9 PR23160,PR14690: remove references to ia32 and x86 to make sysc_* files as arch-independent as possible cd84aedca PR23160,PR14690: adapt 13 more syscalls for 4.17 __ARCH_sys_FOO and sys_enter/exit
aarch64 kernel defines __NR_compat_[exit|read|write] after a1ae65b21941 arm64: add seccomp support
aarch64 kernel define __NR_compat_restart_syscall after f3e5c847ec3d arm64: Add __NR_* definitions for compat syscalls
Fix by adding proper conditional compilation based on current
architecture and kernel version.
William Cohen [Tue, 30 Oct 2018 18:20:46 +0000 (14:20 -0400)]
Adjust the BPF translate error report formatting to work on 32-bit architectures
The 32-bit architectures such as arm and i686 had arguments in the
error reporting that did not match up with the %lu or %ld formatting.
Used type casting and %llu and %lld to avoid variation between 32-bit
and 64-bit architectures.
Serhei Makarov [Wed, 24 Oct 2018 20:04:30 +0000 (16:04 -0400)]
Merge branch 'serhei/bpf_asm' -- kernel_string() tapset and experimental bpf assembler
Note the big comment in bpf-translate.cxx explaining the new assembler.
Major changes:
- Embedded-code assembler
- TODO Embedded-code tapset function call support is incomplete, only enabled for exit()
- TODO Token adjustment for assembler diagnostics needs work.
- Refactor bpf_unparser
- Improved metadata about helpers
- String handling changes
- Misc cleanup/notes
Tapset for kernel_string:
* tapset/bpf/conversions.stp: New file.
(kernel_string): New function.
(kernel_string): New function (err_msg version), in assembly.
(kernel_string_n): New function, in assembly.
* testsuite/systemtap.bpf/bpf_tests/context_vars3.stp: New testcase.
Embedded-code assembler:
* bpf-translate.cxx (bpf_unparser::parse_imm): New function.
(bpf_unparser::parse_asm_stmt): New function.
(bpf_unparser::emit_asm_arg): New function.
(bpf_unparser::parse_reg): Removed.
(bpf_unparser::emit_asm_reg): New function.
(bpf_unparser::get_asm_reg): New function.
(bpf_unparser::emit_asm_opcode): New function.
(bpf_unparser::visit_embeddedcode): Process new assembly format.
(BPF_ASM_DEBUG): New (disabled) macro for diagnostics.
(struct asm_stmt): New structure.
(operator <<): New function -- print logic for asm_stmt.
(is_numeric): New function.
* testsuite/systemtap.bpf/asm_tests/*: New testcases for embedded-code assembler.
* testsuite/systemtap.bpf/bpf-asm.exp: TODO Initial test driver for embedded-code assembler.
TODO Embedded-code tapset function call support is incomplete, only enabled for exit():
* bpf-translate.cxx (translate_bpf_pass): Pass systemtap_session to assembler globals.
* bpf-internal.h (globals::session): New field to pass systemtap_session to assembler.
TODO Token adjustment for assembler diagnostics needs work:
* parse.h (token::adjust_location): New function.
Refactor bpf_unparser:
* bpf-translate.cxx (bpf_unparser::emit_functioncall): New function.
(bpf_unparser::visit_functioncall): Use emit_functioncall.
(print_format_add_tag): New function on std::string.
(bpf_unparser::emit_print_format): New function.
(bpf_unparser::visit_print_format): Use print_format_add_tag, emit_print_format.
Improved metadata about helpers:
* bpf-base.cxx (bpf_func_name_map): New structure -- id->name map.
(bpf_func_id_map): New structure -- name->id map.
(init_bpf_helper_tables): New function -- populate name->id and id->name map.
(bpf_function_name): Change to use the maps.
(bpf_function_id): New function -- map from name to helper id.
(bpf_function_nargs): TODO Still need to expand the list of helpers.
* bpf-translate.cxx (translate_bpf_pass): Call init_bpf_helper_tables to populate info.
* bpf-internal.h (init_bpf_helper_table): New function.
(bpf_function_id): New function.
(__STAPBPF_FUNC_MAPPER): New macro -- like __BPF_FUNC_MAPPER for userspace-only helpers.
String handling changes:
* bpf-translate.cxx (bpf_unparser::emit_literal_str): New function.
(bpf_unparser::visit_literal_str): Use emit_literal_str.
(emit_simple_literal_str): Renamed from emit_literal_str.
(bpf_unparser::emit_string_copy): Renamed from emit_copied_str;
rename emit_literal_str to emit_simple_literal_str.
(bpf_unparser::emit_str_arg): Rename emit_copied_str to emit_string_copy.
(translate_escapes): Takes a const string now.
* bpf-opt.cxx (alloc_literal_str): Rename emit_literal_str to emit_simple_literal_str.
* bpf-internal.h (emit_simple_literal_str): Renamed from emit_literal_str.
Misc cleanup/notes:
* bpf-internal.h (BPF_MAXSTRINGLEN): TODO Someday this will be increased.
(program::use_tmp_space): Assert to catch miscalulations.
* tapset/logging.stp (abort): TODO Could abort immediately with assembly in future.
Serhei Makarov [Tue, 23 Oct 2018 17:35:08 +0000 (13:35 -0400)]
stapbpf assembler WIP #6 :: other call functions ({s}printf and tapset)
Only very limited support for tapset functions (restricted to exit()
for now) due to the difficulty of resolving symbols after the semantic
pass is already completed. Could address this in the future.
* bpf_internal.h (program::use_tmp_space): check for overflow.
(globals::session): new field for systemtap_session (used by function lookup).
* bpf_translate.cxx (asm_stmt::has_jmp_target): new field.
(operator <<): printing rules for alloc, call.
(bpf_unparser::parse_asm_stmt): remove printf/error, BUGFIX alloc, call, string literal.
Also calculate has_jmp_target in the resulting stmt.
(bpf_unparser::visit_embeddedcode): handle printf, sprintf and exit().
Also fix the way fallthrough fields are populated to avoid spurious extra jump.
(bpf_unparser::emit_functioncall): new function. Factors out non-staptree code.
(bpf_unparser::visit_functioncall): use new emit_functioncall().
(print_format_add_tag): new function on std::string. Factors out string operations.
(bpf_unparser::emit_print_format): new function. Factors out non-staptree code.
(bpf_unparser::visit_print_format): use new emit_print_format().
(translate_bpf_pass): store session in globals.
Jafeer Uddin [Tue, 23 Oct 2018 19:19:29 +0000 (15:19 -0400)]
PR21080: support added for new pkey_* syscalls
* sysc_pkey_*.stp: new syscall probes
* aux_syscalls.stp: add new function to convert init_val to PKEY_DISABLE_[ACCESS|WRITE]
* compat_unistd.h: add new syscall numbers
* pkey.c: tests for new syscall
Stan Cox [Tue, 23 Oct 2018 02:29:32 +0000 (22:29 -0400)]
Use NSS_InitContext instead of NSS_Init.
* nsscommon.cxx (nssInitContext): New function which allows
multiple nss invocations. Change all callers except where
write access is required.
(nssCleanup): Add context parameter. Change all callers.
* client-http.cxx (download): Add cleanup parameter to choose
curl_easy_cleanup. Change all callers.
(download_pem_cert): If CURLINFO_CERTINFO fails then retrieve cert from server.
(find_and_connect_to_server): If the database cert fails then try again
with retrieved server cert.
(fill_in_server_info): Likewise.
* testsuite/systemtap.http_server/server_trust.exp: New test.
For demos like also_ran.stp, the incoming strings are usually already
quoted. Re-quoting them for prometheus labeling is counterproductive,
so we now offer an option to bypass that string_quoted() wrapping.
also_ran.stp updated.
William Cohen [Fri, 19 Oct 2018 18:59:27 +0000 (14:59 -0400)]
Use cast to make c->cycles_sum aways match the %lld format.
On aarch64 and ppc64le cycles_t is a slightly different type from the
x86_64 and does not match up with the %lld format. Cast c->cyles_sum
to always be (long long) to avoid the compile failing on aarch64 and
ppc64le with the following message:
**** failed systemtap kernel-devel smoke test:
/tmp/stapzubPmR/stap_e418199b88a6f8adf13a14e064ae79da_1403_src.c: In function '_stp_hrtimer_notify_function':
/tmp/stapzubPmR/stap_e418199b88a6f8adf13a14e064ae79da_1403_src.c:477:45: error: format '%lld' expects argument of type 'long long int', but argument 2 has type 'cycles_t' {aka 'long unsigned int'} [-Werror=format=]
_stp_error ("probe overhead (%lld cycles) exceeded threshold (%lld cycles) in last %lld cycles", c->cycles_sum, STP_OVERLOAD_THRESHOLD, STP_OVERLOAD_INTERVAL);
~~~^ ~~~~~~~~~~~~~
%ld
cc1: all warnings being treated as errors