Frank Ch. Eigler [Sat, 12 Aug 2023 18:28:44 +0000 (14:28 -0400)]
PR30749: correct stap --sign-module timing
Previous code signed the temp directory copy, after it had already
been copied into the cache -- so the signature never made it to a
permanent artifact.
If the module was being fetched from the cache from a previous build
run, a sign (re)attempt will still be done. This may not be
necessary, but shouldn't be harmful.
Logic in commit cd48874296e00 (2021, PR28449) fixed broken cross-cpu
message ordering that followed previous transport concurrency fixes,
but imposed a lot of userspace synchronization delays upon the threads
who were supposed to drain messages from the kernel relayfs streams as
fast as possible. This has led to unnecessarily lossy output overall.
New code uses a new many-writers single-reader data structure, a mutex
protected heap. All the per-cpu readers copy & pump messages into
that heap as rapidly as possible, sorted by the generally monotonic
sequence number. The reader is signalled via a condition variable and
time to print & release messages in sequence number order. It also
handles lost messages (jumps in the sequence numbers) by waiting a while
to let the stragglers come in.
The kernel-user messages now also include a framing sequence to allow
the per-cpu readers to resynchronize to the message boundaries, in
case some sort of buffer overflow or something else occurs. It
reports how many bytes and/or messages were skipped in order to
resynchronize. It does so in a lot less lossy way than previous code,
which just tried to flush everything then-currently available, hoping
that it'd match message boundaries.
Unfortunately, this means that the user-kernel message ABI has
changed! Previous-version staprun instances won't work with the new
modules, nor will current-version staprun with old modules. This flag
day is enforced by changing the numbers of the various ctl message
numbers, so old/new kernel/user combinations will generate errors
rather than quasi-successful staprun startup.
New code also dramatically simplifies the use of signals in staprun
(or rather stapio). Gone is the signal thread, a lot of the
masking/blocking/waiting. Instead a single basic signal handler just
increments globals when signals of various kinds arrive, and all the
per-cpu etc. threads poll those globals periodically. This includes
logic needed for -S (output file rotation on SIGUSR2) as well as
flight recorder (-L / -A) modes.
The reader_timeout_ms value (-T) in both bulk/serialized mode for all
ppoll timeouts, to prevent those threads from sleeping indefinitely,
now that they won't be bothered by signals.
William Cohen [Fri, 28 Jul 2023 17:26:10 +0000 (13:26 -0400)]
Simplify init_backlog function to avoid coverity BAD_SHIFT errors
The init_backlog function determines the power of two sized memory
allocation that would have at least fnum_max elements. Reworked the
code to make it clearer to the coverity analyzer what it is doing.
Rather than overshooting the desired order value and then adjusting it
down by one the while loop has been revised to exit when the order is
the correct value.
William Cohen [Sun, 9 Jul 2023 20:46:20 +0000 (16:46 -0400)]
Adjust runtime _access_process_vm_ to work with linux 6.5
Linux kernel commit ca5e863233e8f6acd1792fd85d6bc2729a1b2c10
eliminated the vma argument for ‘get_user_pages_remote. For linux 6.5
kernel use the get_user_page_vma_remote function in its place like the
__access_remote_vm function in mm/memory.c of the kernel.
William Cohen [Thu, 29 Jun 2023 17:17:38 +0000 (13:17 -0400)]
Fedora rawhide kernels are now flagging use of zero length arrays
The kernel has switched from using zero length arrays to flexible
arrays. The kernel compiles have gotten picker and now flags accesses
beyond the end of end of arrays when possible. When trying to run the
testsuite on Fedora rawhide got the following error due to a zero
length array:
In file included from /tmp/stapaBPtwB/stap_6b7e9ee7df4a3f6e4cfbffb7f92d8405_1736_src.c:543:
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/stp_tracepoint.c: In function 'add_tracepoint':
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/stp_tracepoint.c:148:22: error: 'strcmp' reading 1 or more bytes from a region of size 0 [-Werror=stringop-overread]
148 | if (!strcmp(name, e->name)) {
| ^~~~~~~~~~~~~~~~~~~~~
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/stp_tracepoint.c:61:14: note: source object 'name' of size 0
61 | char name[0];
| ^~~~
Switched the zero length array in the struct to a flexible array to
eliminate the issue.
Frank Ch. Eigler [Tue, 20 Jun 2023 18:04:48 +0000 (14:04 -0400)]
systemtap.spec: SPDX review cleanup
SPDX codes for the testsuite, -client (tapset+docs!) and -devel
(tapset!) updated.
Also corrected/filledin licenses for stap-prep and
interactive-notebook/codemirror/package.json. Many other files remain
without a formal license header. These all default to GPL-2.0-or-later
William Cohen [Thu, 8 Jun 2023 17:10:00 +0000 (13:10 -0400)]
Make runtime/transport/alloc.c compatible with newer struct module_memory
The upstream kernel commit ac3b43283923440900b4f36ca5f9f0b1ca43b70e
changed the structures for modules. The runtime/transport/alloc.c
made an access to the struct module_memory when -DSTP_MAXMEMORY is
used on the command line and needed the appropriate field name for the
newer kernels. This change allows stap script builds using
-DSTP_MAXMEMORY to work on Linux 6.4 kernels.
William Cohen [Thu, 8 Jun 2023 01:50:34 +0000 (21:50 -0400)]
Make runtime/transport/symbols.c compatible with newer struct module_memory
The upstream kernel commit ac3b43283923440900b4f36ca5f9f0b1ca43b70e
changed the structures for modules. The runtime/transport/symbols.c
made an access to the struct module_memory and needed the appropriate
field name for the newer kernels. This change allows another dozen of
the systemtap examples to pass on Linux 6.4 kernels.
William Cohen [Wed, 7 Jun 2023 17:18:01 +0000 (13:18 -0400)]
Adjust runtime module_kallsyms_on_each_symbol to work with Linux 6.3 kernels
The recent fix for PR30415 worked for new Linux 6.4 kernels and
pre-6.3 kernels, but did not work for Linux 6.3 kernels. The Linux
6.3 kernel module_kallsyms_on_each_symbol function has both the
modname argument of the 6.4 kernels and the function passed in has the
earlier kernel's struct module pointer argument. The runtime/sym.c has
been adjusted to work with the the Linux 6.3 kernels.
William Cohen [Wed, 17 May 2023 14:38:31 +0000 (10:38 -0400)]
Support newer kernels with struct module_memory
The upstream kernel commit ac3b43283923440900b4f36ca5f9f0b1ca43b70e
changed the structures for modules. The runtime printing of kernel
information accessed information about modules and the fields in
module structure. A test has been added to the autoconf list to
determine the appropriate fields to get information about the
module.
Bug: our autoconf mechanism might find unexported symbols in kernel headers not meant for kernel modules
The current BULID_CHECK thing does not pass -DMODULE option as the real
kernel build system does and thus may expose unexported symbols like
nmi_uaccess_okay() to our autoconf test programs.
PR30408: fixed excessive read faults when reading userland memory from within perf event/kprobes handlers
The user_addr_max() macro is gone since kernel 5.18, which broke stap's
userland reading routines.
And also since kernel 5.18, access_ok() now does address range checks on
all architectures. so we don't bother checking it ourselves for newer
kernels.
Frank Ch. Eigler [Fri, 12 May 2023 16:43:55 +0000 (12:43 -0400)]
stap-server logic: drop scraped NSS error table
This used to be needed in the ancient days, when the NSS-related
shared libraries did not reliably decode error codes into usable
messages. This stuff works nwo, so we don't have to carry this
hand-scraped table around any more.
Frank Ch. Eigler [Fri, 12 May 2023 15:13:45 +0000 (11:13 -0400)]
PR30442: failing optional statement probes should not trigger pass2 exceptions
In tapsets.cxx, query_cu() and query_module() aggressively caught &
sess-print_error'd semantic_errors from subsidiary call sites. They
are unaware of whether the probe in question is being resolved within
an optional (? or !) context. Instead of this, they now simply let
the exceptions propagate out to derive_probes() or similar, which does
know whether exceptions are errors in that context. That means
exceptions can propagate through elfutils iteration machinery too,
perhaps risking C level memory leaks, but so be it.
This fix goes well beyond statement probes per se, but hand-testing
and the testsuite appear not to show regressions related to this.
Serhei Makarov [Mon, 8 May 2023 12:12:59 +0000 (08:12 -0400)]
fix PR30395: Regex code has invalid memory reads caught by KASAN
The TNFA tag cleanup on a '\0' byte would incorrectly read beyond the
end of the string. Keeping YYCURSOR on the nul byte fixes this.
Will harden the fix a little (adding a separate increment-only cursor
for safety) before I close the bug, but this change is already
sufficient if the DFA was generated correctly.
William Cohen [Tue, 25 Apr 2023 14:56:47 +0000 (10:56 -0400)]
Test for kernels that backported removal of <linux/genhd.h> include
Some kernels (RHEL9) backported patches that removed the
<linux/genhd.h> include. Thus, the ioblock.stp tapset cannot simply
check the kernel version to determine whether the include file is
available. The added autoconf test will determine whether the include
is available.
William Cohen [Tue, 25 Apr 2023 13:44:51 +0000 (09:44 -0400)]
Allow nfsd.stp tapset to work on kernels with CONFIG_NFSD_V2 unset
Some of the newer Fedora kernels have CONFIG_NFSD_V2 unset (*). The
nfsd.stp tapset was requiring various NFSD V2 probes points to exist.
These required probes caused examples like nfsd-trace and nfsdtop
build failures. Making the NFSD V2 probes optional allows the
nfsd.stp tapset to work on these kernels.
BZ2180328: disable pass-2 dyninst liveness analysis on CONFIG_RETPOLINE kernels
As a stopgap measure, ameliorate the dramatic dyninst analysis time
required to liveness-check $var assignments in kernels compiled with
retpolines. Just skip the effort (with a warning).
See also: https://github.com/dyninst/dyninst/issues/1305 .
PR30123: rework dwarf4/5 DW_AT_data_bit_offset support
$subject DWARF attribute is another way of designating the relative
position of a member field of a struct within it, generally a
bitfield. It's an absolute bit offset relative to the beginning of
the containing object, rather than the immediately containing word, so
the bit offset numbers can become huge.
New code treats these more correctly, by intercepting them in
dwflpp::translate_final_fetch_or_store to offset the final load/store
address, and relativizing the bit offsets.
New test case covers a variety of -gdwarf* levels with a userspace
target program.
Gioele Barabucci [Mon, 27 Feb 2023 11:56:52 +0000 (12:56 +0100)]
dtrace: Use deterministic temp file creation for all temp files
`dtrace -G -C` creates temporary files with random filenames. The name
of these temporary files gets embedded in the ELF `.symtab` of the final
object files, making them always slightly different.
This behavior makes all packages that use `dtrace`-produced object files
inherently non reproducible.
To fix this issue all temporary files are now created using
the same deterministic procedure currently used only for the
temporary "c." files.
Martin Cermak [Fri, 10 Feb 2023 13:08:22 +0000 (14:08 +0100)]
interactive.cxx: use temporary file with .stp suffix
In systemtap interactive mode (stap -i), editors like vim can
benefit from this change by automatically turning on the stap
syntax highlighting and completion. For this to work, the
EDITOR env var needs to point to the editor of choice.
Ryan Goldberg [Wed, 18 Jan 2023 21:40:35 +0000 (16:40 -0500)]
Lang-server: optimized local definition parsing
In order to speed up full-syncs (ex. jupyter-lsp)
compute the diff between the old source and the text.
This allows for a much faster updating of local definitions
and thus a faster completion (without a multi-second delay)
Ryan Goldberg [Mon, 19 Dec 2022 22:29:48 +0000 (17:29 -0500)]
Added a new mode: language server
This mode will turn the stap process into a
language server, which will use the official
language-server-protocol. It can be started
with the new --language-server flag
Aaron Merey [Fri, 27 Jan 2023 16:16:43 +0000 (11:16 -0500)]
client-http.cxx: Fix build error rpmFreeCrypto not declared
rpm-4.18.0 moved the declaration of rpmFreeCrypto into rpm/rpmcrypto.h.
Include this header in client-http.cxx when required in order to avoid
the following error:
CXX stap_gen_cert-util.o
../systemtap/client-http.cxx: In member function ‘std::string http_client::get_rpmname(std::string&)’:
../systemtap/client-http.cxx:482:5: error: ‘rpmFreeCrypto’ was not declared in this scope
482 | rpmFreeCrypto ();
| ^~~~~~~~~~~~~
See https://sourceware.org/bugzilla/show_bug.cgi?id=29094
See the very last line of the above trace, which is duplicit. This problem
was detected by the backtrace.exp testcase. This update prevents calling the
fallback _stp_stack_print_fallback() in case _stp_print_addr() was already able
to successfully provide some output based on dwarf unwinding.