William Cohen [Fri, 4 Nov 2022 15:12:05 +0000 (11:12 -0400)]
Ensure that SystemTap runtime uses smp_processor_id() in proper context
There were cases on Fedora 36 and Rawhide running kernels with
CONFIG_DEBUG_PREEMPT=y where systemtap scripts would trigger kernel
log messages like the following:
This issue was introduced by git commit 1641b6e7ea which added a fast
path check that used smp_processor_id() without first having a
preempt_disable(). The code now ensures that preemption is disabled
before using the smp_processor_id().
Serhei Makarov [Thu, 3 Nov 2022 16:56:11 +0000 (12:56 -0400)]
Revert "runtime: stat: avoid allocating stat_data memory on offline CPUs"
This reverts commit ba42203ae957bb62805e18eac30459eb74cde3d2.
There are indications that on some non-x86 platforms (ppc64le)
this patch may be causing problems i.e.
'sleeping function called in invalid context' warnings.
Reverting for the release, may return this patch if I get a
clearer idea of the cause of the problem.
Serhei Makarov [Thu, 3 Nov 2022 16:54:02 +0000 (12:54 -0400)]
Revert "Revert "Bug: runtime: we might not sync the tracepoint's SRCU state after unregistering the tracepoints""
This reverts commit fae609baeb93c4f41983adcb451378f049b4cdc9.
(There was a miscommunication about which commit was causing the
problem on ppc64le. Just re-confirmed the tracepoint-SRCU commit
was not the culprit.)
William Cohen [Tue, 1 Nov 2022 02:48:29 +0000 (22:48 -0400)]
Adjust runtime/linux/task_finder2.c to work with Linux 6.1 kernels
Unlike the earlier kernels the mm_struct does not have mmap field in
the Linux 6.1 kernel. This required some adjustments to the
'__stp_call_mmap_callbacks_for_task function to work with the newer
kernel. Newer kernels use VMA_ITERATOR and for_each_vma macros to
access the equivalent to the vma information accessed from the mmap
field. When reviewing the commit history of the Linux kernel found git
commit 3a4f7ef4be by Liam Howlett that provided backward compatible
VMA_ITERATOR and for_each_vma macros in cases where those macros were
not available. This approach was adapted for task_finder2.c and
allows the code to compile on both the old and new kernels.
This assembler takes named opcodes following the syntax
from https://github.com/iovisor/bpf-docs/blob/master/eBPF.md
and supports a more natural syntax (fewer commas and semicolons
needed).
Opcode table to be filled-in by further commits.
Proof-of-concept usage in tapsets for testing ditto.
The advantage of implementing the assembler this way is that the prior
syntax is still supported, hence we do not *have* to change tapset
code until we want to.
Martin Cermak [Mon, 24 Oct 2022 17:41:09 +0000 (19:41 +0200)]
testsuite: make bad-code.exp compatible with modern kernels
This update makes the bad-code.exp testcase compatible with modern
rhel-9 kernels that use force_sig_info_to_task() instead of
force_sig_info(). This is upstream kernel commit 5ad18b2e60b.
Also mark this testcase KFAIL on s390x.
Martin Cermak [Fri, 21 Oct 2022 15:20:43 +0000 (17:20 +0200)]
testsuite: make at_var_mark.exp LTO agnostic.
Sometimes LTO is used for compilation (default for building
RHEL packages). With LTO, the morehelp variable would get
optimized out, and e.g. at_var_mark.exp would start failing.
Make the generated version string run in "git worktree" siblings,
produce the GIT_PRETTY_REV message, and work even if the last commit
(like this one) happens to be signed.
Bug: runtime: taskfinder2: stap_stop_task_finder() might busy-wait a spinlock forever
The __stp_inuse_count counter might get out of sync when kernel memory
allocations fail. This leads to stap_stop_task_finder() waits for the
counter forever.
This might happen on systems short of available memory. The stapio
process might stuck at almost 100% CPU usage (though it is not a CPU
soft-lockup due to the use of schedule() function calls inside the lock
waiting loop. The hottest kernel backtrace for such stapio processes
look like this:
Bug: procfs: NULL ptr deref might happen in relay_file_open()
inode->i_private might be NULL ocassionally in relay_file_open()
(which is triggered by stapio's openat() syscall) due to a race
condition in our __stp_procfs_relay_create_buf_file_callback()
function.
Add a wrapper around kernel's relay_file_open() for our procfs's
open operation so that we always check if inode->i_private is NULL.
Stan Cox [Fri, 9 Sep 2022 20:09:10 +0000 (16:09 -0400)]
Initial python 3.11 backtrace support
Use @defined to handle PyFrameObject members moved to PyInterpreterFrame by
python 3.11 such as: f_code f_back f_globals, f_localsplus, f_lasti. Support
3.11 dictionaries: me_value, me_key, dk_size, dk_entries, dk_kind. Support
3.11 local variable accessing: co_varnames.
Optimize: runtime: context: avoid allocating context structs for offline CPUs
We used to allocate context structs for all the "possible CPUs", which
is quite wasteful.
Some VM hypervisors like VMWare assigns large number of "possible CPUs"
to their guests by default, which might lead to huge amount of memory
allocated in the stap ko module.
Optimize: runtime: print: avoid allocating string buffers for offline CPUs
We used to allocate string buffers for all the "possible CPUs", which
is quite wasteful.
Some VM hypervisors like VMWare assigns large number of "possible CPUs"
to their guests by default, which might lead to huge amount of memory
allocated in the stap ko module.
Frank Ch. Eigler [Fri, 19 Aug 2022 19:00:22 +0000 (15:00 -0400)]
PR29507: generalize sample python tapset for loose python{2,3} library versions
We can rely on stap 4.2+'s probe-context passing to functions to make
it unnecessary to decorate each @cast() with a libpython path name.
This lets these tests work on a range of python libraries.
These helper functions really should go into the standard python tapset,
rather than sit here in the examples, but that's for later.
Martin Cermak [Wed, 20 Jul 2022 10:50:00 +0000 (12:50 +0200)]
Fix failing nfsd.createv3 in testsuite/buildok/nfsd-all-probes.stp
* tapset/linux/nfsd.stp: Make nfsd.createv3 and nfsd.createv3.return
optional in nfsd.entries, since the underlying probe point no longer
exists in kernels 5.19+ per kernel commit 1c388f27759c5d9271d4fca0 .
This fixes `stap -p4 testsuite/buildok/nfsd-all-probes.stp`.
* Testsuite/buildok/nfsd-detailed.stp: Make nfsd.createv3 tests
optional.
Note 1: testsuite/buildok/nfsd-all-probes.stp tries to compile
something like:
probe nfsd.* , nfsd.*.* , nfsd.*.*.* { ... }
which means that the testcase overrides the tree level ? optionality, and
forces each level of the tree to carry ? also.
Note 2: this update is an analogy to PR18856 / 3fc11ed07bad37 .
Stan Cox [Wed, 13 Jul 2022 13:49:51 +0000 (09:49 -0400)]
python 3.11 removed direct access to PyFrameObject members
Take into account the change in PyFrameObject definition to allow
building systemtap with python 3.11. Additional support for python
3.11 is forthcoming.
William Cohen [Wed, 13 Jul 2022 16:09:26 +0000 (12:09 -0400)]
Make variable initializer work with RHEL6 compiler
The gcc 4.4 compiler in RHEL 6 does not understand initializer that
use ".field=". Adjusted the variable initialization to work with the
older compiler.
William Cohen [Tue, 12 Jul 2022 01:08:46 +0000 (21:08 -0400)]
Update sleeptime.stp to work with newer kernels and tracepoint syscalls
Newer kernels use syscall.clock_nanosleep instead of
syscall.nanosleep. In some cases tracepoint implementations of
syscall.* used which do not allow the use of @entry(). The revised
code has an explicit associative array to track time for syscall entry
rather than @entry() in the syscall.*.return handler.
William Cohen [Mon, 11 Jul 2022 22:10:01 +0000 (18:10 -0400)]
Extract the exit_reason from trace_kvm_exit vcpu argument on newer kernels
For x86_64 processors newer kernels change where the exit_reason
information is located. In older kernels the exit_reason was a
parameter for the trace_kvm_exit. For the newer kernels exit_reason
is a field buried in a member field of vcpu argument. Making
kvm_service_time.stp pick the appropriate location for exit_reason.
William Cohen [Fri, 24 Jun 2022 21:09:39 +0000 (17:09 -0400)]
PR29037 Handling gcc11 bitfields
The newer DWARF5 output provided by GCC11 no longer have a
DW_AT_data_member_location attributed describing where the bitfield is
located. This information needs to be extracted from the
DW_AT_data_bit_offset.
The patch maps the newer DWARF5 DW_AT_data_bit_offset information
internally to a format that matches up with the
DW_AT_data_member_location information because dwarf_getlocation_addr
function does not understand the DW_AT_data_bit_offset. An equivalent
DW_AT_data_member_location attribue based on the size of the
underlying type being used to store the bitfield and the
DW_AT_data_bit_offset information is generated.
The get_bitfield function was also modified to determine
the shifts and masking operations using the DW_AT_data_bit_offset.
William Cohen [Thu, 26 May 2022 20:45:52 +0000 (16:45 -0400)]
Filter out aarch64 mapping symbols
Like the 32-bit ARM the aarch64 also has mapping symbols in
the binaries to mark the start of A64 code ("$x") and data ("$d").
The code for 32-bit ARM has been extended to handle the aarch64.
This improves the backtraces from: