William Cohen [Mon, 11 Oct 2021 20:26:06 +0000 (16:26 -0400)]
Update syscall_num.stp mappings between syscall number and name
Need to generate new versions of syscall_num.stp files that include
the preprocessor architecture guards so multiple syscall_num.stp
can be used by BPF code at the same time.
William Cohen [Mon, 11 Oct 2021 20:17:41 +0000 (16:17 -0400)]
Allow multiple architecture syscall_num.stp files to be used by bpf backend
Systemtap's BPF still has the concept of target architecture and isn't
write once run anywhere for BPF code. Thus, to make the syscall_any
tapset work the tapset needs to have all the different architecture
initialization code available and select the appropriate one based on
the architecture. Thus, each syscalls_num.stp file has preprocessing
guards to make them empty unless the architecture matches.
William Cohen [Mon, 11 Oct 2021 19:49:33 +0000 (15:49 -0400)]
Update dump-syscalls.sh to work with newer strace-code
There have been some changes in the strace-code:
-Header files now in strace-code/src/linux/*/syscallent.h
-mips headers use BASE_NR and file needs to be massaged to get numeric value
-corrected the name of riscv architecture name to riscv64
There are lots of races when printing warnings/errors/dbugs in staprun
because multiple eprintf() calls are used to print a single message and
stderr is not line-buffered. As a result, warnings/errors/dbugs race with
the relayfs reader threads printing to stdout and with other stap scripts
running concurrently in the same PTY. This causes the messages printed to
stderr and stdout to be garbled.
Fix all of this by using a single eprintf() for each warning/error/dbug
message, and by making stderr line-buffered so that we don't need to worry
about differing libc implementations potentially flushing a single message
in chunks rather than flushing the whole message in one go.
Sultan Alsawaf [Thu, 30 Sep 2021 02:42:26 +0000 (19:42 -0700)]
runtime: make _stp_vlog() more robust to avoid truncating log messages
Currently, _stp_vlog() very readily drops or truncates warnings, errors,
and debug messages. In the case of warnings and errors, this is quite
problematic because these messages are of high importance and, as such,
are even sent to stapio via the control channel rather than the relay
transport.
The reason why _stp_vlog() truncates and even drops these messages so
easily is twofold: the normal print buffer is used directly without any
attempt to flush it when there isn't enough space and it's used as
temporary storage for warnings and errors.
When warnings and errors are sent to the control channel, they are
copied into a new buffer, which is wasteful due to the copy operation
and the effort put into scrounging for space in the print buffer.
Instead of using a temporary buffer to construct warnings and errors,
it's more reliable and efficient to construct the message in one of the
control channel's buffers that would've been used anyway to send the
message.
In the case of debug messages, the print buffer can take appropriate
steps to ensure there's enough space via _stp_reserve_bytes(). Now, the
length of a debug message is calculated before it's generated, making it
possible to use _stp_reserve_bytes().
Altogether, this makes _stp_vlog() very resistant to losing both normal
debug messages and high-priority warning and error messages.
Stan Cox [Thu, 30 Sep 2021 20:11:29 +0000 (16:11 -0400)]
PR27829: Support floating point values passed through sdt.h markers
Add the type to the individual arg entries in the .notes.stapsdt section;
currently SP@A, where S is optional '-' sign, P is precision of type and A is
address. Revised format is SPT@A where T is optional 'f' for float variables.
Add x8664 float registers xmm8 - xmm15 and aarch64 float registers v8 - v31.
Parse the type field; result is currently ignored. asm statements are
restricted to 30 arguments; sdt probes can have up to 12 arguments. To fit
this into a single asm statement, precision and type are encoded into a single
field: 0xSSTT where SS is the precision and TT is the type as encoded by
__builtin_classify_type. The sign S, precision P, and type T are decoded by
_SDT_SIGN, _SDT_SIZE, and _SDT_TYPE. Test that the revised
.notes.stapsdt section interacts correctly with eu-elfutils and gdb.
Make this test case operable without kernel debuginfo by using
@cast( ..., "kernel<header>") to extract iocb / iovec decls
from headers. Rename IO_CMD_* values to IOCB_CMD_* to match
linux aio_abi.h
Sultan Alsawaf [Fri, 17 Sep 2021 21:17:16 +0000 (14:17 -0700)]
Fix races in perf probe task finder callback
The task finder callback for a perf probe can run concurrently across
different tasks' workers, resulting in redundant registrations since
_stp_perf_init() and _stp_perf_del() lack synchronization. This results
in the redundant perf events never being removed and a use-after-free
scenario occurring in the kernel's core perf code after the stap module
is unloaded. Adding a mutex lock to the task finder callback fixes the
issue.
Di Chen [Thu, 2 Sep 2021 04:52:47 +0000 (12:52 +0800)]
The /* pc=0x... */ is no longer printed by "stap -v -L 'kernel.function("*")'
The disappeared /* pc=0x... */ resulted from the missing implementation
of the function "dwarf_derived_probe::printsig_nonest".
Which makes "p->printsig_nonest(sig)" in main.cxx end up calling
"derived_probe::printsig_nonest", and the type of "p" is
(gdb) ptype /m p
type = /* real type = dwarf_derived_probe * */
This patch added "dwarf_derived_probe::printsig_nonest" for PC value
print.
William Cohen [Tue, 14 Sep 2021 01:32:38 +0000 (21:32 -0400)]
Use task_state tapset function to avoid task_struct changes
The Linux 5.14 kernel's task_struct changed the state field to
__state. The task_state tapset function selects the appropriate
version. Make the scheduler.stp tapset and schedtimes.stp example use
the task_state function rather than directly trying to access the
task_struct state field (and get it wrong for newer kernels).
William Cohen [Mon, 17 May 2021 01:00:14 +0000 (09:00 +0800)]
Avoid generating problematic asynchronous unwind tables on RISC-V
By default SystemTap turns on the generations of asynchronous
unwind tables for all processor. When this was enabled for RISC-V
kernel modules would be generated, but the resulting kernel modules
would fail to load because the kernel's module loader could not
handle the R_RISCV_32_PCREL relocations in the .ko files.
Disabling the asynchronous unwind tables for RISC-V is a
work around to get things functioning for RISC-V.
There's a circular dependency because _stp_cleanup_and_exit() may send a
message over the control channel to indicate an issue, like with
_stp_warn(). The reordering thus causes this crash:
The original bug that the reorder attempted to fix is already fixed by 166a95089 ("runtime: fix panics when polling on the control channel
while unloading"). That commit doesn't allow delete_module() to run when
there are open file descriptors to the control channel, which guarantees
that the control channel cannot be in use during module cleanup. Thus,
we can simply restore the old ordering.
Sultan Alsawaf [Wed, 25 Aug 2021 02:27:43 +0000 (19:27 -0700)]
runtime: fix panics when polling on the control channel while unloading
When the stapio pselect() runs while the given stap module is unloading,
there's a use-after-free opportunity in do_select(). This occurs because
the control channel's poll function, _stp_ctl_poll_cmd(), passes a
pointer to a global variable along to do_select(), which can then
dereference the pointer after the stap module is unloaded.
Normally, this wouldn't be a problem because do_select() uses get_file()
and fput(), which respectively grab and release references to the module
owner specified in `file->f_op->owner`. However, procfs doesn't provide
any interface to pass in a module owner, and instead all procfs files
use an internal `struct file_operations` declared in fs/proc/inode.c.
As a result, we cannot bolster procfs files with module reference
count protection through any normal means, so we must inject a module
owner the hard way.
A module owner is now patched into the control channel's file ops when
the file is opened by making a copy of the existing file ops and then
setting the module owner inside the copy, which then replaces the old
`file->f_op` pointer. This neatly fixes the race because procfs *does*
guarantee that none of the procfs callback functions are still running
after an entry is removed, and because _stp_ctl_poll_cmd() cannot be
reached without first passing through _stp_ctl_open_cmd().
Since delete_module() can now return EWOULDBLOCK, we must make staprun
aware that it's not a fatal error and that the module deletion should
be retried. EWOULDBLOCK simply indicates that a pselect() on the control
channel has yet to finish, so it will go away after a brief wait.
This can be easily reproduced by having a background thread quickly loop
on trying to rmmod any stap modules, resulting in the module's exit
routine running concurrently with the STP_START command from stapio.
Closing the control channel before attempting clean-up fixes this race.
pr23478 WIP: rework bpf foreach to handle multi-key array
The major addition is a new ELF section giving details on each
foreach loop in the program, including where the sort column
is located within a composite key.
Previously all the foreach info was packed into a uint64_t
flags parameter to stapbpf's map_get_next_key userspace-only
helper function, which would not work for this nor for future
foreach work (sort_aggr, array slicing).
* bpf-internal.h (BPF_MAXKEYLEN,BPF_MAXKEYLEN_PLUS): new const.
(SORT_FLAGS etc): deprecate, but still read from old .bo files.
(globals::foreach_info): new struct with info for bpfinterp.cxx.
(globals::foreach_loop_info): vector of foreach_info structs.
(typedef interned_foreach_info): serialized foreach_info.
({intern,deintern}_loop_info): [de]serialize foreach_info.
(loop_idx): alias for index into foreach_loop_info vector.
* bpf-shared-globals.h ({intern,deintern}_loop_info): crudely
[de]serialize foreach_info by putting the fields in a vector.
* bpf-translate.cxx (bpf_unparser::visit_foreach_loop): change
to generate foreach_info, handle multi-key array iteration.
(output_foreach_info): serialize foreach_loop_info into a
new 'stapbpf_foreach_loop_info' ELF section.
(translate_bpf_pass): add 'stapbpf_foreach_loop_info' ELF section.
* stapbpf/stapbpf.cxx (foreach_loop_info): global table
of foreach loop information.
(load_bpf_file): load 'stapbpf_foreach_loop_info' ELF section.
(init_perf_transport): add foreach_loop_info to bpf_transport_context.
(main): add foreach_loop_info to bpf_transport_context.
* stapbpf/bpfinterp.h (struct bpf_transport_context): add
field for storing foreach_loop_info data.
(bpf_transport_context::bpf_transport_context): add
field for storing foreach_loop_info data.
* stapbpf/bpfinterp.cxx (struct foreach_state): represent an
in-progress foreach loop iteration including sorted values.
(foreach_info): alias for bpf::globals::foreach_info.
(foreach_state_add): new function.
(foreach_cmp_{str,int}): new functions.
(foreach_state_sort): new function.
(foreach_state_empty): new function.
(convert_key): new function.
(_foreach_state_next,foreach_state_next): new functions.
(foreach_state_cleanup): new function. Avoid C++ destructor.
(typedef foreach_stack): stack of foreach_state, used for
handling nested foreach loop iteration.
(map_get_next_key): rewrite to use foreach_info,foreach_state
and handle multi-key arrays (storing large keys in map_values).
(bpf_interpret): pass map_values to map_get_next_key, to be
used for storing composite keys in addition to values.
(struct map_keys): replaced by struct foreach_state.
(convert_{int,str}_{key,kp}): deleted functions.
(convert_{key,kp}): deleted functions.
(computed_key_size): deleted function.
(map_sort,map_next): deleted functions.
The testsuite has been observed to intermittently hang on 5.13+
generation kernels. This is caused by some test binaries (especially
recv*.c) suffering a segv and terminating, but their forked child
process pals still hanging around (indefinitely). This patch adds an
alarm(30) to each such test, so the children will burn twice as
bright, but half as long, or something.
releng: ditch custom pie/ssp CFLAGS engine in configure.ac
Just inherit the desired c*flags from autoconf via environment
variables from the distro spec files. This lets us automatically
benefit from centralized hardening flags on some distros. OTOH
distros without that now will need to add such settings to the build
scripts that invoke this configure script.
Linux commit ab3257042c2 makes it necessary for us to stop overriding
CONFIG_STACK_VALIDATION= (originally a workaround for a 2016 rawhide
bug). This fixes the tracepoints.stp test case.
Linux kbuild commit d936eb23874 sets $subject CFLAGS, so to play
catch-up, we also need to use gcc attribute(fallthrough) to label such
spots in switch() statements in our runtime / tapset. Tested on
linux5.14 gcc11 rawhide and linux3.10 gcc4 rhel7.
William Cohen [Tue, 20 Jul 2021 15:32:27 +0000 (11:32 -0400)]
PR27984: Adjust the address so dwfl_module_addrinfo finds correct function name
PR27984 discovered that the logic to determine when a location was
part of a partially inlined function was not operating correctly for
shared libraries. The existing systemtap.base/partial-inline.exp
verified that the test worked for executables, but shared libraries
include a non-zero bias that needs to be added in. Added code to get
the required bias and add it to the address so the correct name is
returned by dwfl_module_addrinfo.
PR27934: give fuller diagnosis for pass-5 probe-registration errors
While we cannot solve or prevent runtime probe registration errors, we
can help users understand them. Add a new warning::pass5 man page,
and point registration error messages at it.
PR27820 tapset/bpf/logging.stp: implement abort() tapset function
A more obsessively accurate implementation would add a check
equivalent to if (c->aborted) to the start of each probe,
but it would be an imperfect solution in any case.
PR27820 tapset/bpf/logging.stp: move bpf versions of functions
Follows the same scheme as what I instituted earlier for the
uconversions.stp tapsets: toplevel has functions common to
all 3 backends or common to lkm+dyninst (guarded by a
runtime != "bpf" conditional). BPF implementations are
moved to tapset/bpf.
* tapset/bpf/logging.stp: New file.
* logging.stp: Move bpf versions of functions.
Sultan Alsawaf [Mon, 12 Jul 2021 20:31:36 +0000 (15:31 -0500)]
task_finder_vma: add autoconf check for hlist_add_tail_rcu()
The 3.10 version check for hlist_add_tail_rcu() only works for RHEL
kernels. Kernels older than 4.7 that lack the hlist_add_tail_rcu() backport
won't compile (such as Debian kernels). Add an autoconf stub to know for
certain if hlist_add_tail_rcu() is present.
Sultan Alsawaf [Mon, 12 Jul 2021 20:01:57 +0000 (16:01 -0400)]
Don't fail vma tracking mmap callback if module is already known.
An -EEXIST returned by stap_add_vma_map_info() just indicates that the
module is currently in stap's vma cache; it isn't a real issue. Calling
_stp_error() when this occurs causes stap to exit when there isn't a
real bug. Ignore the -EEXIST error to avoid breakage.
William Cohen [Tue, 6 Jul 2021 02:56:09 +0000 (22:56 -0400)]
Update list of reasons for latencytap.stp example
Backtraces change over time. Added additional function names to
monitor in the backtrace and map to reasons. This should reduce the
number of lines with blank reasons when using newer Linux kernels.
rhbz1972805: add basic syscall-in-ptregs support for s390x
Akin to commit 7be7af0fda36 for ARM, add basic syscalls via
tracepoints / CONTEXT->sregs support for s390x. The argno=6 case is
funny because for syscalls they travel in registers, whereas normally
they hop onto the stack.
Frank Ch. Eigler [Thu, 24 Jun 2021 17:30:38 +0000 (13:30 -0400)]
rhbz1972828: tapsets: iommu tracepoints
Disable detection of intel-iommu tracepoint family on non-x86
platforms, because the 5.13ish kernel headers for this tracepoint
include references to functions like clcache_flush_range which don't
exist on all non-x86.
Stan Cox [Thu, 3 Jun 2021 21:19:49 +0000 (17:19 -0400)]
Get the enumerator's enumeration type
Enumeration values are ultimately treated as constants but the path:
literal_stmt_for_local -> find_variable_and_frame_base (-> dwarf_get_enum)
-> translate_final_fetch_or_store often assume there is a
type die. Have dwarf_get_enum also get the type from the enumeration
type and percolate it along.
Sevan Janiyan [Wed, 2 Jun 2021 18:55:53 +0000 (14:55 -0400)]
testsuite/systemtap.base/perf.sh drop bashism
You don't need to use == to check for equality in a test statement, a
single equals sign is sufficient. The use of double equals sign is a
bashism which doesn't always translate as intended on other shells.
[fche checked other scripts in the source tree that used [ == ]; they
are all marked /bin/bash so can stay as is.]
Frank Ch. Eigler [Sun, 23 May 2021 18:12:32 +0000 (14:12 -0400)]
stap-prep: switch to using main vmlinuz file as debuginfod test download
Using the vdso* files only as the debuginfod tests gives us false
positives on platforms where the install vmlinuz files cannot be used
as a basis for debuginfod queries, because they're not elf nor
compressed-elf. This new stap-prep tries to download the vmlinuz
debuginfo itself. It's large, but at least once it's here, it's here!
And if it fails (as it will on those few platforms), the user is
advised to do full platform package-manager debuginfo download.
Timm Bäder [Wed, 19 May 2021 20:38:30 +0000 (16:38 -0400)]
Fix -Woverloaded-virtual warnings when building with clang
Satisfy clang by removing option for non-nested signature printing from
implementations of printsig and declare derived_probe::printsig with 'override'.
Add function derived_probe::printsig_nonest to perform non-nested signature
printing.
Timm Bäder [Wed, 19 May 2021 20:28:29 +0000 (16:28 -0400)]
Add missing copy constructors to set1_ref and set1_const_ref
Clang complains about the missing copy constructors if a user-defined
copy assignment operator exists, e.g.:
./bpf-bitset.h:108:19: error: definition of implicit copy constructor for 'set1_const_ref' is deprecated because it has a user-declared copy assignment operator [-Werror,-Wdeprecated-copy]
set1_const_ref& operator= (const set1_const_ref &); // not present
^
./bpf-bitset.h:256:12: note: in implicit copy constructor for 'bpf::bitset::set1_const_ref' first required here
return set1_const_ref(data + w2 * i, w2);
Timm Bäder [Wed, 19 May 2021 20:23:06 +0000 (16:23 -0400)]
util.cxx: Use abs() instead of labs()
Taking the absolute value of unsigned values is pointless, as reported
by clang:
util.cxx:1545:28: error: taking the absolute value of unsigned type 'unsigned long' has no effect [-Werror,-Wabsolute-value]
unsigned min_score = labs(target.size() - it->size());
^
util.cxx:1545:28: note: remove the call to 'labs' since unsigned values cannot be negative
unsigned min_score = labs(target.size() - it->size());
Frank Ch. Eigler [Wed, 19 May 2021 01:03:17 +0000 (21:03 -0400)]
systemtap.spec: python3 tweaks
Embrace build configurations where python3 is not installed by
default, so needs an explicit BuildRequires; also ones where
python3-probes are not built, ergo stap-exporter isn't packaged.
(The latter is automake-conditionalized on the wrong parameter,
HAVE_PYTHON3_PROBES rather than HAVE_PYTHON3, but this doesn't
matter on normal distro builds.)