Sultan Alsawaf [Thu, 28 Apr 2022 01:59:53 +0000 (18:59 -0700)]
buildrun.cxx: skip objtool processing for tracequery and typequery modules
The tracequery and typequery modules are never loaded, so objtool's
instruction rewrites for things like jump targets aren't needed. Since
objtool is slow and uses a lot of memory, skip it when compiling the
tracequery and typequery modules.
William Cohen [Wed, 27 Apr 2022 18:14:17 +0000 (14:14 -0400)]
PR29094: Include rpm/rpmcrypto.h when required
rpm-4.18.0 moved the prototypes for rpmFreeCrypto() into a new header,
/usr/include/rpm/rpmcrypto.h. Have the configure check for it
and include it when required.
Sultan Alsawaf [Wed, 27 Apr 2022 01:24:10 +0000 (18:24 -0700)]
runtime: fix tracepoint entry leak on error when add_probe() fails
When add_probe() in stp_tracepoint_probe_register() fails on a tracepoint
entry that's just been created, the refcount of the freshly-made tracepoint
entry will be zero by the time stp_tracepoint_exit() runs, at which point
stp_kernel_tracepoint_remove() will skip freeing the tracepoint because its
refcount won't be one. Furthermore, since stp_tracepoint_probe_unregister()
isn't called for a stp_tracepoint_probe_register() that fails, tracepoints
which are registered for internal stap use (like the utrace ones) cannot
be cleaned up on error by stp_tracepoint_exit(), so removing the refcount
check in stp_kernel_tracepoint_remove() won't always fix this.
As such, fix the leak by removing the tracepoint entry immediately on error
when it has a refcount of zero.
William Cohen [Tue, 26 Apr 2022 15:56:45 +0000 (11:56 -0400)]
PR29028: Support Linux kernels with CONFIG_RETHOOK set
The Linux 5.18.0 kernels added function exit_handler to fprobe
(https://lkml.org/lkml/2022/1/28/616). kretprobe makes use of that
infrastructure if it is available. However, this use of fprobe
infrastructure changes the member field location depending on
CONFIG_RETHOOK. Access to ret_addr field needs to be done through a
William Cohen [Tue, 26 Apr 2022 14:11:19 +0000 (10:11 -0400)]
Adjust ioblock.stp tapset includes for Linux 5.18.0
Linux kernel commit 322cbb50de711814c42fb088f6d31901502c711a moved the
contents of genhd.h into blkdev.h and eliminated genhd.h. Use genhd.h
for pre-5.18.0 kernels and blkdev.h for 5.18.0 and later.
William Cohen [Mon, 25 Apr 2022 19:02:15 +0000 (15:02 -0400)]
Avoid gcc-12 -Werror=format= issues in staprun/monitor.c
The %*s format in the wprintw takes a pair of arguments, an int and a
pointer to a string. The the width array supplying the first argument
was declared as size_t. On rawhide gcc-12 would flag those with
errors like the following:
monitor.c:450:27: error: field width specifier ‘*’ expects argument of type ‘int’, but argument 3 has type ‘size_t’ {aka ‘long unsigned int’} [-Werror=format=]
450 | wprintw(status, "\n%*s\t%*s\t%*s\t%*s\t%*s\t%*s\t%s\n",
| ~^~
| |
| int
451 | width[p_index], HIGHLIGHT("index", p_index, comp_fn_index),
| ~~~~~~~~~~~~~~
| |
| size_t {aka long unsigned int}
The %*s makes use of the integer sign to indicate whether to left
justify or right justify the output, so the cautious compiler flags
passing in the long unsigned int. To follow the %*s conventions made
width array an int which eliminates these errors.
High-message-rate stap scripts more easily lose message synch or bog
down if the subbuf size is large. PAGE_SIZE appears to be a sweet
spot, so let's fix that. (At least one subbuf is used per probe hit
that produces output. Allocation occurs at the subbuf granularity, so
making it smaller is apparently of no advantage.) stap -s and
free-memory still affect transport memory allocation, but only as to
the number of subbufs.
Sultan Alsawaf [Fri, 22 Apr 2022 23:06:45 +0000 (16:06 -0700)]
runtime: fix timing stat leaks when module init fails partway through
When systemtap_module_init() fails partway through, cleanup isn't done for
stp_session_init(), which allocates memory for probe and refresh timing
stat collection. Fix it by adding the appropriate cleanup on error to
systemtap_module_init().
Sultan Alsawaf [Thu, 21 Apr 2022 20:58:58 +0000 (13:58 -0700)]
runtime: use RCU-protected get_mm_exe_file() on old kernels that have it
Some old kernels (such as the one in CentOS 7) have the RCU-protected
get_mm_exe_file() patch backported to them, in which case it's preferable
to make use of the RCU optimization to avoid sporadic failures from the
down_read_trylock() due to mmap_sem contention. Since the commit that adds
the RCU protection to get_mm_exe_file() also adds a get_file_rcu() macro,
we can just check for the existence of get_file_rcu() on kernels < 4.1. If
the macro doesn't exist for some reason despite the old kernel having the
RCU optimization, we just fall back to using down_read_trylock() the same
as before. If the old kernel has get_file_rcu() despite lacking the RCU
protection that goes along with it, then said kernel has bigger problems.
Sultan Alsawaf [Thu, 21 Apr 2022 00:11:37 +0000 (17:11 -0700)]
staprun: interpret a non-zero systemtap_module_init() return as an error
Errors returned from systemtap_module_init() can often be positive, and
tracking down all sources of the positive return values is error-prone.
Instead, simply interpret any non-zero return from systemtap_module_init()
as an error so that staprun doesn't poll forever on waiting for a dead
stap module to do something.
Sultan Alsawaf [Wed, 20 Apr 2022 23:49:40 +0000 (16:49 -0700)]
runtime: clean up when starting the task finder fails partway through
When the task finder fails to start, systemtap_module_exit() won't be
called to handle the cleanup because systemtap_module_init() will have
returned an error. This becomes lethal when the task finder errors out
*after* initializing utrace, since that means utrace won't be stopped and
thus the utrace tracepoint callbacks will remain registered after the stap
module is unloaded, causing the kernel to explode spectacularly upon
executing code in memory that's been freed.
To fix this, make stap_start_task_finder() handle partial cleanup itself
when there's an error, since systemtap_module_exit() won't be the one to do
it. This also reorders the task finder starting process to make the hardest
item to clean up (utrace init) come last, and removes a bogus decrement on
the task finder state variable on error since we now know the hard way that
stap_stop_task_finder() won't actually be called to do cleanup when there's
a failure partway through stap_start_task_finder().
Sultan Alsawaf [Tue, 12 Apr 2022 21:00:47 +0000 (14:00 -0700)]
runtime: fix race between different stap modules creating /proc/systemtap
Since stap modules operate independently of one another, there's a race
between the first stap modules loaded on a system where they try to create
/proc/systemtap and all but one fail, leading to the losing stap modules
either failing to load on 3.19+ kernels or loading successfully on <3.19
kernels but leaking an inode and directory refcount, with both cases
additionally producing a WARN.
To fix this, we abuse `module_mutex` in the kernel to synchronize between
all stap modules, which resolves the race completely. However, on 5.12+
kernels, `module_mutex` is no longer an exported symbol and therefore we
cannot find its address and use it unless the host kernel is built with
CONFIG_KALLSYMS_ALL=y and the address of kallsyms_lookup_name() is resolved
in a way that doesn't require the transport to be active (since, right now,
staprun sends the address of kallsyms_lookup_name() via the transport).
This lack of coverage on 5.12+ turns out to be alright though because the
only real issues we're concerned about fixing are the leaks on <3.19
kernels and the module load failure on 3.19+ kernels. Since the lack of
synchronization on 5.12+ kernels will only lead to a cosmetic WARN at
worst, we simply ignore any error from proc_mkdir() when making
/proc/systemtap and thus the module load failure is avoided. Nonetheless,
we still optimistically avoid the cosmetic WARN on kernels >3.19 and <5.12
by using `module_mutex` if it's exported.
Since we don't own `module_mutex`, we elide it after the race window passes
in order to limit the scope of our abuse. Once the race window passes, the
overhead in _stp_mkdir_proc_module() goes back to exactly how it was prior
to this change; i.e., the average case will still be just the single check
for the existence of /proc/systemtap and nothing more.
Stan Cox [Tue, 12 Apr 2022 15:21:01 +0000 (11:21 -0400)]
Have the stap server mok sign modules using stap --sign-module=PATH
Add --sign-module=PATH for use by stap server to pass a specific client
fingerprint to stap for mok signing a module. Add mok path to mok_sign_file,
sign_module, and mok_dir_valid_p. Use mok path to differentiate --sign-module
vs --sign-module=PATH. Without PATH, fingerprints that are considered are those
present in $SYSTEMTAP_DIR/.systemtap/ssl/server/moks that are also listed by
'mokutil -l'
New tool to profile a process or userspace generally, then produce a
hit-counted annotated version of all the relevant sources.
Downloading all the debuginfo & source files requires a working
debuginfod-find with a set $DEBUGINFOD_URLS.
Includes tests and man page.
Signed-off-by: Noah Sanci <nsanci@redhat.com> Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
William Cohen [Wed, 6 Apr 2022 19:12:55 +0000 (15:12 -0400)]
Adjust threadstacks.stp to work with newer versions of glibc
Newer versions of glibc have moved the allocate_stack function from
libpthread.so.* to libc.so.*. Similarly, the default stack size has
been moved to a different target variable. The threadstacks.stp
script needed to be adjusted to use the new probe point and target
variable.
Martin Cermak [Tue, 5 Apr 2022 19:48:20 +0000 (21:48 +0200)]
The faccessat2 and adjtimex syscall updates
- compat_unistd.h: Add missing defines for faccessat2.
- compile_flags.exp: Omit -m64 on aarch64, where GCC doesnt't recognize
such a cmdline switch (using it causes a compile
time error).
- adjtimex.c: Testcase update for modern glibc and kernel.
- systemtap.syscall/tapset/syscall.stp: clock_adjtime64 user persp alias.
William Cohen [Mon, 4 Apr 2022 23:23:30 +0000 (19:23 -0400)]
Add riscv specific ptrace support functions
The riscv linux kernel does not add any other ptrace functionality in
addition to the kernel's base ptrace_request function. Thus, the
_arch_ptrace_argstr and _ptrace_return_arch_prctl_addr functions do
very little. They are defined to allow systemtap scripts
instrumenting ptrace syscalls to compile on riscv.
Stan Cox [Tue, 29 Mar 2022 01:08:34 +0000 (21:08 -0400)]
Add --sign-module to enable users to mok sign their own modules
Add sign-module option. Move MOK_CONFIG_TEXT, mok_dir_valid_p, mok_sign_file,
generate_mok from stap-serverd.cxx to cscommon.cxx. Add sign_module function
to cscommon.cxx. Move MOK_PRIVATE_CERT_NAME, MOK_PRIVATE_CERT_FILE,
MOK_CONFIG_FILE to cscommon.h. Add report_error parameter to generate_mok,
sign_module, mok_dir_valid_p so they can be called from server or client. If
sign-module is requested then call sign_module from passes_0_4. stap-server
continues to mok sign using the same code path.
Sultan Alsawaf [Tue, 22 Mar 2022 23:00:31 +0000 (16:00 -0700)]
PR28974: initialize the VMA tracker before all probes
Now that task finder targets are added to __stp_task_finder_list in the
correct order, a new problem is present since the vma tracker isn't always
the first task finder target that gets registered. In these cases, the
aforementioned ordering fix actually breaks what was once working fine. One
such example is a stap script which contains only a probe.begin and uses
@var on a PIE binary; in this case, the VMA tracker's mmap callback won't
be conveniently chained onto a probe that runs earlier (since there isn't
one), and will instead run after the probe.begin's callback.
To fix this, simply initialize the VMA tracker before all probes by
decoupling it from the task finder and putting it into its own derived
probe group, which is then placed before all other derived probe groups in
all_session_groups().
William Cohen [Tue, 22 Mar 2022 17:39:45 +0000 (13:39 -0400)]
PR28958: Fix tapset macros to allow nfsd-trace.stp and task_paths.stp to work
The task_dentry_path function in the tapset was getting read faults
instead of returning valid strings describing a filesystem path. This
caused the nfsd-trace.stp example and task_path.stp in the testsuite
to not function properly.
The root cause of the problem was some cast operators in macros were
not accessing the proper struct debuginfo due to a missing "kernel" in
the cast operator. The cast operators were corrected and scripts
using the task_dentry_path function now function properly.
William Cohen [Thu, 17 Mar 2022 20:01:36 +0000 (16:01 -0400)]
Fix deviceseeks.stp example to explicitly cast queue variable
The deviceseeks.stp example was failing to build because several
uses of the queue variable were not explicitly casted and result
in the following error message:
semantic error: autocast variable '' may not be used as a structure: operator '->' at testsuite/systemtap.examples/io/deviceseeks.stp:26:8
source: queue->limits->logical_block_size :
^
Used @q_cast(queue) in place of the plain queue to correctly
cast the variable and eliminate the error.
William Cohen [Thu, 17 Mar 2022 14:58:15 +0000 (10:58 -0400)]
Remove unneeded include of <linux/nfsd/nfsfh.h> from nfsd.stp tapset
The nfsfh.h header file was removed from the kernel in 2014. The
nfsd.stp tapset attempting to include the header will cause the stap
module to fail to build. The nfsd.stp tapset does not require
anything from the the nfsfh.h header file and it can be safely
removed. This will eliminate the following systemtap examples
failures on newer kernels:
Sultan Alsawaf [Thu, 17 Mar 2022 02:13:18 +0000 (19:13 -0700)]
PR28974: runtime: add task finder targets to __stp_task_finder_list in order
There's an issue with probes which require the VMA tracker where a
probe.begin can run before the VMA tracker does, leading to unexpected
results. With a reproducer using @var to print the value of a variable in a
PIE binary, this leads to the @var failing with a NULL pointer dereference,
since the VMA tracker's mmap callback runs right after the probe.begin
rather than prior.
The VMA tracker's mmap callback is chained onto the "inode-uprobes"
stapiu_consumer, and the probe.begin's callback is chained onto the
"lifecycle tracking" stap_utrace_probe. While the order in which these are
initialized is correct (stapiu_consumer init comes before stap_utrace_probe
init), the corresponding quiescent state workers are executed by utrace in
reverse order. This happens because stapiu_consumers and stap_utrace_probes
are attached to utrace in reverse order, which is done by the task finder.
Although the task finder iterates forward through __stp_task_finder_list to
create utrace attachments, all of its targets are added to
__stp_task_finder_list in reverse order via list_add() in the first place.
To fix this, simply use list_add_tail() instead of list_add() when adding
task finder targets to __stp_task_finder_list, so that they are processed
in the order with which they are initialized.
Frank Ch. Eigler [Fri, 11 Mar 2022 15:29:55 +0000 (10:29 -0500)]
sys/sdt.h: set x86-64 STAP_SDT_ASM_CONSTRAINT back to "nor"
It turns out the kernel and some other sdt consumers haven't learned
how to use %xmm registers in sdt operands. So under this duress, stap
will go back to the old school integer register set "nor" as a
default. We'll revisit this in the future, though this egg might not
turn into a chicken.
PR28923: dtrace.in: add atexit removal & timeout to .dtrace-temp file
On erroneous inputs or other error cases, it was possible to leave
behind a .dtrace-temp*.c file at exit. That would indefinitely block
a subsequent dtrace job, due to excessive optimism in commit cfabd38cfdd75e. Now we time out, and we try harder to remove the
temp file.
William Cohen [Tue, 1 Mar 2022 21:03:54 +0000 (16:03 -0500)]
Avoid triggering error with -Werror=unused-value
Fedora RPMs are compiled using -Wall which includes
-Werror=unused-value. On architecures that do not have Dyninst
support the flush_analysis_caches define in analysis.h would trigger
the following error:
In file included from elaborate.cxx:20:
elaborate.cxx: In function ‘void build_no_more(systemtap_session&)’:
analysis.h:30:34: error: statement has no effect [-Werror=unused-value]
30 | #define flush_analysis_caches() (0)
| ~^~
elaborate.cxx:1871:3: note: in expansion of macro ‘flush_analysis_caches’
1871 | flush_analysis_caches();
| ^~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
Tweaked the flush_analysis_caches define to avoid creating an unused
value.
William Cohen [Fri, 4 Feb 2022 19:18:46 +0000 (14:18 -0500)]
Clear out the Dyninst-related data structures after analysis finishes
The liveness analysis for SystemTap uses Dyninst to examine the
binaries. For large binaries such as the Linux kernel this can
consume quite a bit of memory. Once the analysis is done, the code
needs to clean up as much of that as possible.
dann frazier [Tue, 1 Mar 2022 16:02:27 +0000 (11:02 -0500)]
PR28923: dtrace: Use hash-based scheme for predictable file generation
commit c245153 ("dtrace: Allow for reproducible .o file builds.")
introduced a condition where 2 dtrace processes can race when
generating the same file. Since both processes now use the same
temporary file name, one may delete the temporary .c file the other
is still processing:
--------------------------------------------------------------------
user@host:~/foo$ make -j2
dtrace -o foo.out -G -s /dev/null
dtrace -o foo.out -G -s /dev/null
Traceback (most recent call last):
File "/usr/bin/dtrace", line 455, in <module>
sys.exit(main())
File "/usr/bin/dtrace", line 440, in main
os.remove(fname)
FileNotFoundError: [Errno 2] No such file or directory: 'foo.out.dtrace-temp.c'
make: *** [Makefile:4: ../foo/foo.out] Error 1
--------------------------------------------------------------------
This can happen when a Makefile processes a pattern rule for two different
targets that happen to map to the same file, but addressed by different
relative paths. I discovered this in a real world case involving libvirt,
but here's a contrived reproducer:
It would be ideal if we could inject a null .file directive, then we could
just use a mkstemp() file and keep the build reproducible by avoiding a
record of the source file path in the binary at all, but I can't find a
straightforward way of passing a .file through to the assembler. So,
instead, let's create a reproducible filename by building a hash of the
input and output paths. Note: this still leaves open a race in the case of
2 dtrace processes with identical input/output paths. But, at least in my
testing, GNU Make is smart enough to detect this case and not create
duplicate jobs.
Fixes: Commit c245153 ("dtrace: Allow for reproducible .o file builds.") Signed-off-by: dann frazier <dann.frazier@canonical.com>
Two variables in bpf-translate.cxx can trigger -Werror=maybe-uninitialized.
The code is designed so that uninitialized uses are not actually possible,
but to convince gcc of this we move a throw statement and initialize one
of the variables with a value.
Frank Ch. Eigler [Fri, 25 Feb 2022 01:05:41 +0000 (20:05 -0500)]
gcc12 warning suppression
The translator emits a pair of type declarations that alternate
between a char[] and a char*, depending on the size of strings
involved. The polymorphic client code includes pointer null-checking,
which -Waddress code rejects for the char[] case. The simplest
workaround is just to disable that particular diagnostic.
Stan Cox [Fri, 28 Jan 2022 20:28:27 +0000 (15:28 -0500)]
Attempt to access string in userspace if kernel access fails
Add kernel_or_user_string_quoted(_utf16 _utf32) tapsets to handle
situations where a kernelspace access was assumed but string is in
userspace. Add new kernel_user_var test. Page in the utf strings in the
utf_pretty test.
Frank Ch. Eigler [Wed, 26 Jan 2022 19:10:38 +0000 (14:10 -0500)]
PR28804: tune default stap -s ## buffer size on small RAM machines
Insert a forgotten division by num_online_cpu() to adjust downward the
calculated bufsize. Tweak normal defaults back to 128 * 2 * 64K
(16MB) per CPU, as the stap man page indicates. This may need further
tweaking when balancing against staprun consumption performance, but
at least we have the docs lined up with the code at the moment.
Serhei Makarov [Fri, 21 Jan 2022 23:21:46 +0000 (18:21 -0500)]
gcc12 c++ compatibility re-tweak for rhel6: use function pointer instead of lambdas instead of ptr_fun<>
Saving 2 lines in ltrim/rtrim is probably not a good reason to drop
compatibility with the RHEL6 system compiler. Actually declaring a
named function and passing the function pointer is compatible with
everything.
William Cohen [Tue, 18 Jan 2022 03:02:44 +0000 (22:02 -0500)]
Graceful continuation when not enough memory available for liveness analysis
The dyninst parsing of binaries can take a significant amount of
memory. On machines without enough memory to parse a large binary we
want the analysis to fail gracefully with a warning that the liveness
analysis was unable to run and continue on rather than immediately
exiting with a std::bad_alloc exception.
William Cohen [Fri, 14 Jan 2022 19:00:02 +0000 (14:00 -0500)]
configure finds appropriate default 32-bit or 64-bit Dyninst libraries
Earlier versions of the systemtap configuration would just include
two -L paths to both 32-bit and 64-bit versions of the Dyninst
libraries. However, attempting to link a 32-bit library with a 64-bit
build (and vice versa) may cause the build to fail. This revision of
the configure tests determines which default Dyninst library works
with the compiler being used and selects it.
The configure can't bindly use ${libdir}/dyninst to select the path to
the default Dyninst libary as this only selects the ${prefix}/lib64 on
appropriate machines if the prefix is /usr. If the prefix is set to
something else, ${libdir} is always ${prefix}/lib. The would cause
the build to attempt to link with nonexistent Dyninst libaries in the
${prefix} directory. If systemtap needs to be use a version of
Dysninst in a non-standard place, the --with-dyninst=<path_to_dyninst>
should be used.
Stan Cox [Fri, 7 Jan 2022 17:01:49 +0000 (12:01 -0500)]
Standardize dyninst include file use.
Change stapdyn to use the more standard '#include <dyninst/*.h>' form
and the standard include path. Adjust configure so that
--with-dyninst=PATH works, assuming PATH is a /usr style path laid out
in standard linux form.
It appears that various versions of gcc continue to show signs of
confusion at our newly offered asm-operand alternatives for floating
point sdt.h marker parameters.
Stan Cox [Wed, 1 Dec 2021 21:19:22 +0000 (16:19 -0500)]
Handle user supplied sdt probe argument template
User supplied templates were erroneously removed by commit eaa15b047,
which complicated the template expansion. To do the above the
expansion of STAP_PROBE_ASM(provider, fooprobe,
STAP_PROBE_ASM_TEMPLATE(3)) adds an unused argument:
STAP_PROBE_ASM(provider, fooprobe, /*template expansion*/ "%[SDT..]..",
"use _SDT_ASM_TEMPLATE") A supplied template
STAP_PROBE_ASM(provider, fooprobe, "4@%rdx 8@%rax") is left alone. If
the varargs has 2 args (the fake "use ..") then macro expansion
inserts the expanded string, otherwise "4@.." becomes an ascii op.
Frank Ch. Eigler [Sat, 20 Nov 2021 03:22:45 +0000 (22:22 -0500)]
configury: let python3 be python3
Our baroque heuristics for identifying python2/3 under their various
historical aliases is showing its age. On some modern distros,
/usr/bin/python is to be positively NOT used. Fixing configure.ac
$PYTHON3 search to only look for python3, and not even consider
$PYTHON_UNKNOWN. At some point we'll want to simplify further, and
get rid of python2 remnants.