William Cohen [Wed, 17 May 2023 14:38:31 +0000 (10:38 -0400)]
Support newer kernels with struct module_memory
The upstream kernel commit ac3b43283923440900b4f36ca5f9f0b1ca43b70e
changed the structures for modules. The runtime printing of kernel
information accessed information about modules and the fields in
module structure. A test has been added to the autoconf list to
determine the appropriate fields to get information about the
module.
Bug: our autoconf mechanism might find unexported symbols in kernel headers not meant for kernel modules
The current BULID_CHECK thing does not pass -DMODULE option as the real
kernel build system does and thus may expose unexported symbols like
nmi_uaccess_okay() to our autoconf test programs.
PR30408: fixed excessive read faults when reading userland memory from within perf event/kprobes handlers
The user_addr_max() macro is gone since kernel 5.18, which broke stap's
userland reading routines.
And also since kernel 5.18, access_ok() now does address range checks on
all architectures. so we don't bother checking it ourselves for newer
kernels.
Frank Ch. Eigler [Fri, 12 May 2023 16:43:55 +0000 (12:43 -0400)]
stap-server logic: drop scraped NSS error table
This used to be needed in the ancient days, when the NSS-related
shared libraries did not reliably decode error codes into usable
messages. This stuff works nwo, so we don't have to carry this
hand-scraped table around any more.
Frank Ch. Eigler [Fri, 12 May 2023 15:13:45 +0000 (11:13 -0400)]
PR30442: failing optional statement probes should not trigger pass2 exceptions
In tapsets.cxx, query_cu() and query_module() aggressively caught &
sess-print_error'd semantic_errors from subsidiary call sites. They
are unaware of whether the probe in question is being resolved within
an optional (? or !) context. Instead of this, they now simply let
the exceptions propagate out to derive_probes() or similar, which does
know whether exceptions are errors in that context. That means
exceptions can propagate through elfutils iteration machinery too,
perhaps risking C level memory leaks, but so be it.
This fix goes well beyond statement probes per se, but hand-testing
and the testsuite appear not to show regressions related to this.
Serhei Makarov [Mon, 8 May 2023 12:12:59 +0000 (08:12 -0400)]
fix PR30395: Regex code has invalid memory reads caught by KASAN
The TNFA tag cleanup on a '\0' byte would incorrectly read beyond the
end of the string. Keeping YYCURSOR on the nul byte fixes this.
Will harden the fix a little (adding a separate increment-only cursor
for safety) before I close the bug, but this change is already
sufficient if the DFA was generated correctly.
William Cohen [Tue, 25 Apr 2023 14:56:47 +0000 (10:56 -0400)]
Test for kernels that backported removal of <linux/genhd.h> include
Some kernels (RHEL9) backported patches that removed the
<linux/genhd.h> include. Thus, the ioblock.stp tapset cannot simply
check the kernel version to determine whether the include file is
available. The added autoconf test will determine whether the include
is available.
William Cohen [Tue, 25 Apr 2023 13:44:51 +0000 (09:44 -0400)]
Allow nfsd.stp tapset to work on kernels with CONFIG_NFSD_V2 unset
Some of the newer Fedora kernels have CONFIG_NFSD_V2 unset (*). The
nfsd.stp tapset was requiring various NFSD V2 probes points to exist.
These required probes caused examples like nfsd-trace and nfsdtop
build failures. Making the NFSD V2 probes optional allows the
nfsd.stp tapset to work on these kernels.
BZ2180328: disable pass-2 dyninst liveness analysis on CONFIG_RETPOLINE kernels
As a stopgap measure, ameliorate the dramatic dyninst analysis time
required to liveness-check $var assignments in kernels compiled with
retpolines. Just skip the effort (with a warning).
See also: https://github.com/dyninst/dyninst/issues/1305 .
PR30123: rework dwarf4/5 DW_AT_data_bit_offset support
$subject DWARF attribute is another way of designating the relative
position of a member field of a struct within it, generally a
bitfield. It's an absolute bit offset relative to the beginning of
the containing object, rather than the immediately containing word, so
the bit offset numbers can become huge.
New code treats these more correctly, by intercepting them in
dwflpp::translate_final_fetch_or_store to offset the final load/store
address, and relativizing the bit offsets.
New test case covers a variety of -gdwarf* levels with a userspace
target program.
Gioele Barabucci [Mon, 27 Feb 2023 11:56:52 +0000 (12:56 +0100)]
dtrace: Use deterministic temp file creation for all temp files
`dtrace -G -C` creates temporary files with random filenames. The name
of these temporary files gets embedded in the ELF `.symtab` of the final
object files, making them always slightly different.
This behavior makes all packages that use `dtrace`-produced object files
inherently non reproducible.
To fix this issue all temporary files are now created using
the same deterministic procedure currently used only for the
temporary "c." files.
Martin Cermak [Fri, 10 Feb 2023 13:08:22 +0000 (14:08 +0100)]
interactive.cxx: use temporary file with .stp suffix
In systemtap interactive mode (stap -i), editors like vim can
benefit from this change by automatically turning on the stap
syntax highlighting and completion. For this to work, the
EDITOR env var needs to point to the editor of choice.
Ryan Goldberg [Wed, 18 Jan 2023 21:40:35 +0000 (16:40 -0500)]
Lang-server: optimized local definition parsing
In order to speed up full-syncs (ex. jupyter-lsp)
compute the diff between the old source and the text.
This allows for a much faster updating of local definitions
and thus a faster completion (without a multi-second delay)
Ryan Goldberg [Mon, 19 Dec 2022 22:29:48 +0000 (17:29 -0500)]
Added a new mode: language server
This mode will turn the stap process into a
language server, which will use the official
language-server-protocol. It can be started
with the new --language-server flag
Aaron Merey [Fri, 27 Jan 2023 16:16:43 +0000 (11:16 -0500)]
client-http.cxx: Fix build error rpmFreeCrypto not declared
rpm-4.18.0 moved the declaration of rpmFreeCrypto into rpm/rpmcrypto.h.
Include this header in client-http.cxx when required in order to avoid
the following error:
CXX stap_gen_cert-util.o
../systemtap/client-http.cxx: In member function ‘std::string http_client::get_rpmname(std::string&)’:
../systemtap/client-http.cxx:482:5: error: ‘rpmFreeCrypto’ was not declared in this scope
482 | rpmFreeCrypto ();
| ^~~~~~~~~~~~~
See https://sourceware.org/bugzilla/show_bug.cgi?id=29094
See the very last line of the above trace, which is duplicit. This problem
was detected by the backtrace.exp testcase. This update prevents calling the
fallback _stp_stack_print_fallback() in case _stp_print_addr() was already able
to successfully provide some output based on dwarf unwinding.
Martin Cermak [Wed, 18 Jan 2023 13:24:12 +0000 (14:24 +0100)]
dw_entry_value.exp: fix the testcase
After fixing 05eb6742c1 (Handle DWARF5 DW_OP_implicit_pointer and
DW_OP_entry_value), dw_entry_value.exp no more ends up untested,
but instead often fails in Pass 5.
The problem was that stap_run() sends kill -INT to stap right after
the load generation function (no_load() in this case) is executed,
causing a Pass 5 failure and => testcase failure (unexpected output).
A workaround would be to sleep a second before the signal is sent so
that stap can cleanly finish, and the signal can't be delivered, making
the testcase green. But that'd be just a workaround.
This update relies on stap_run2() instead of stap_run(), simplifying
the testcase and making it stable.
William Cohen [Mon, 16 Jan 2023 01:29:51 +0000 (20:29 -0500)]
Generate event syscall name<->number mappings for 32-bit RISCV
There can be a lot of compiler complaints on 64-bit RISCV when
compiling systemtap scripts using syscall_any tapsets about the
missing 32-bit syscall name<->number mappings. The strace code does
not have special tables for 32-bit RISCV. However, the numbers look
to be virtually the same for both 64-bit and 32-bit RISCV. For the
time being just generating a 32-bit version of the table from the
64-bit strace tables.
William Cohen [Mon, 16 Jan 2023 01:28:24 +0000 (20:28 -0500)]
Update syscall mapping information for syscall_any tapset
There are a couple new syscalls available, futex_waitv and
set_mempolicy_home_node. Regenerated the num2name and name2num
associative arrays to include those new syscalls for syscall_any
tapset.
Correct a misuse of dejagnu pass/fail descriptive text. Pass/fail
status is sufficiently communicated by the proc, and should not be
repeated in the text parameter.
Martin Cermak [Fri, 16 Dec 2022 21:08:20 +0000 (16:08 -0500)]
tapset: nfs.proc.commit_done compilation on some kernels
Correct:
9.0 Server x86_64 # stap -vp3 nfs.proc.commit_done.stp
Pass 1: parsed user script and 482 library scripts using 108088virt/88468res/12460shr/75476data kb, in 190usr/60sys/501real ms.
semantic error: invalid access '->task' vs 'void*': operator '->' at /usr/share/systemtap/tapset/linux/nfs_proc.stpm:16:21
source: ( get_ip(&@nfs_data->task) )
^
in expansion of macro: operator '@_nfs_data_server_ip' at /usr/share/systemtap/tapset/linux/nfs_proc.stp:1421:15
source: server_ip = @_nfs_data_server_ip($task->tk_calldata)
^
Ryan Goldberg [Thu, 1 Dec 2022 21:15:44 +0000 (16:15 -0500)]
PR29676: Wildcard expansion fix for labels
PR29676, introduced an bug where function symbols from the symbol
table were expanded in the function component resulting in wildcards
not being expanded in labels. This fix, removes the issue by restricting
the symbol table query to probes which don't need further debuginfo to
expand.
William Cohen [Fri, 4 Nov 2022 15:12:05 +0000 (11:12 -0400)]
Ensure that SystemTap runtime uses smp_processor_id() in proper context
There were cases on Fedora 36 and Rawhide running kernels with
CONFIG_DEBUG_PREEMPT=y where systemtap scripts would trigger kernel
log messages like the following:
This issue was introduced by git commit 1641b6e7ea which added a fast
path check that used smp_processor_id() without first having a
preempt_disable(). The code now ensures that preemption is disabled
before using the smp_processor_id().
Serhei Makarov [Thu, 3 Nov 2022 16:56:11 +0000 (12:56 -0400)]
Revert "runtime: stat: avoid allocating stat_data memory on offline CPUs"
This reverts commit ba42203ae957bb62805e18eac30459eb74cde3d2.
There are indications that on some non-x86 platforms (ppc64le)
this patch may be causing problems i.e.
'sleeping function called in invalid context' warnings.
Reverting for the release, may return this patch if I get a
clearer idea of the cause of the problem.
Serhei Makarov [Thu, 3 Nov 2022 16:54:02 +0000 (12:54 -0400)]
Revert "Revert "Bug: runtime: we might not sync the tracepoint's SRCU state after unregistering the tracepoints""
This reverts commit fae609baeb93c4f41983adcb451378f049b4cdc9.
(There was a miscommunication about which commit was causing the
problem on ppc64le. Just re-confirmed the tracepoint-SRCU commit
was not the culprit.)