1. Merge stapbpf.cxx log_level with verbose.
2. Alias stapbpf.cxx module_name to __name__ (module_name is better).
3. Put back double-space in usage messages.
Serhei Makarov [Thu, 25 Feb 2021 16:54:00 +0000 (11:54 -0500)]
stapbpf PR25177/27032: handle process exit (SIGCHLD)
* stapbpf.cxx (target_pid_failed_p): New variable.
(sigchld): New function, handle child process exit and signal bpf program.
(main): Set up and disable SIGCHLD handler and propagate failure exit.
Serhei Makarov [Wed, 24 Feb 2021 19:48:41 +0000 (14:48 -0500)]
stapbpf PR25177/27032: link start_cmd() into stapbpf, launch the process
This patch takes care of launching the process, next patch must take care
of listening for the process to finish running.
TODO some redundant declarations in stapbpf.cxx should be merged.
TODO in general staprun uses much nicer print functions,
should stapbpf borrow them too?
* buildrun.cxx (make_bpf_run_command): pass s.cmd to the stapbpf command.
* stapbpf/stapbpf.cxx (verbose): TODO redundant decl for start_cmd.c.
(read_stdin): TODO unused decl for start_cmd.c.
(__name__): TODO redundant decl for start_cmd.c.
(target_pid): now shared with C code in start_cmd.c.
(target_cmd): now shared with C code in start_cmd.c.
(eprintf): decl for start_cmd.c.
(start_cmd): decl for start_cmd.c.
(resume_cmd): decl for start_cmd.c.
(load_bpf_file): populate TODO redundant __name__ decl.
(usage): add '-c' option.
(main): add '-c' option, use start_cmd()/resume_cmd() to launch the process.
* stapbpf/bpfinterp.h (target_pid): now shared with C code in start_cmd.c.
* stapbpf/Makefile.{am,in}: add ../staprun/start_cmd.c to sources.
Serhei Makarov [Tue, 23 Feb 2021 21:12:47 +0000 (16:12 -0500)]
PR25177/27032 staprun: factor out start_cmd() for stapbpf usage
Most/all of the start_cmd() code in stap run should apply
to stapbpf '-c' support as well. Factor it out into an object
file that can be linked by both staprun and stapbpf.
* staprun/start_cmd.c: New file to be linked into both staprun and stapbpf.
* staprun/staprun.h: Declare functions in start_cmd.c.
* common.c (closefrom): Moved to start_cmd.c.
(OPEN_MAX): Moved to start_cmd.c.
* mainloop.c (signal_usr1): Moved to start_cmd.c.
(start_cmd): Moved to start_cmd.c.
(stp_main_loop): Move signal code to resume_cmd() in start_cmd.c.
Frank Ch. Eigler [Sat, 20 Feb 2021 00:37:30 +0000 (19:37 -0500)]
usage message: explain [man FOO] notation right there
Maybe hinting right during the naked stap startup will work!
% stap
A script must be specified.
Try '-i' for building a script interactively.
Try '--help' for more information. [man stap]
A message like [man foo] means for more info, run % man foo
Part of the work in commit 88db3a197 (version 4.3)
caused a regression with --dump-functions, wherein
optimizers would eliminate uncalled functions, even
in dump mode. We now suppress this again and extend
the test case to look for a few more functions.
PR27361 part 1: enable typequery result memoization & operation on linux 5.11
Linux 5.11 rejects previous kmod_typequery .c files because they don't
have a MODULE_LICENSE bit. Here, stap tries and tries and tries
building kmod_typequery files, up to thousands (!). We solve the
first by adding the MODULE_LICENSE. We solve the second by memoizing
typequery build results (whether they succeed or fail) in the
systemtap_session object.
The utrace structs suffer from a number of use-after-free scenarios,
which cause panics and other mayhem. They can be used in queued-up task
workers after being freed, fetched from task_utrace_struct() and quickly
freed after while still in use, or fetched from task_utrace_struct()
after having already been freed. And utrace->task can be used after the
task struct it points to is freed.
A number of changes are made to remedy the widespread use-after-free
issues. Firstly, utrace structs now have a reference counter to protect
them from being abruptly freed while in use. This guarantees that a
utrace struct will only be freed after everything using it, including
task workers, is finished.
The task struct assigned to each utrace struct is now pinned as well,
making all utrace->task usage safe while the utrace struct is alive.
When utrace exits, all of the possible utrace workers are now canceled
instead of only some of them. Although this mass cancel isn't necessary
now, it's good to do so to speed up utrace exit so that exiting doesn't
require waiting for all of the in-flight workers to finish running. We
are also guaranteed that utrace->task will be valid if the utrace struct
has a positive reference count since the task struct is pinned.
To accommodate for the reference counting, rename get_utrace_bucket() to
find_utrace_bucket() since it doesn't grab any references, and rename
task_utrace_struct() to get_utrace_struct() to indicate that it grabs a
reference on the returned utrace struct.
The utrace_flags member of the utrace struct is reduced from a long to
an int to make room for an atomic reference counter without increasing
the size of the utrace struct. This is fine since there aren't enough
utrace flags to require the use of a long.
Sultan Alsawaf [Fri, 5 Feb 2021 22:07:17 +0000 (14:07 -0800)]
stp_utrace: always assume in_atomic() in utrace_report_death()
As we've established in the past, in_atomic() only works when PREEMPT is
enabled. On non-PREEMPT kernels, in_atomic() will return false even when
used inside a spin-locked critical section. Therefore, in order to make
utrace_report_death() safe on all configurations, always use a task
worker to perform the report work.
Sultan Alsawaf [Fri, 5 Feb 2021 21:22:16 +0000 (13:22 -0800)]
stp_utrace: remove useless RCU read locks
These RCU read locks don't actually protect anything. The task struct
pointer is not actually dereferenced in get_utrace_lock(), and the RCU
read lock in utrace_reset() serves no purpose at all.
Stan Cox [Fri, 5 Feb 2021 21:20:22 +0000 (16:20 -0500)]
Add support for accessing enumerators
Adds dwarf_get_enum, similar to dwarf_getscopevar, except search for a matching
DW_TAG_enumerator. If a scoped variable then use the enumerator die
and attributes; translate_location will convert to a constant. If a
struct member then translate_components will fold the component locations then
use translate_location.
Serhei Makarov [Wed, 3 Feb 2021 15:36:33 +0000 (10:36 -0500)]
stapbpf PR27030 WIP :: new bpf/uconversions.stp tapset
Tentative version of user_long_error() for bpf, needs more testing.
NEXT, the __bpf_probe_read_user_error() helper can be used to implement
the other user_{char,short,int} tapset functions.
This patch starts a new practice for tapset layout for tapset functions
that have a lkm/dyninst implementation and a separate bpf implementation:
- The lkm/dyninst implementation is placed in the toplevel tapset/
directory, surrounded by %( runtime != "bpf" %? ... %).
- The bpf implementation is placed in the tapset/bpf/ directory.
Once applied to the rest of the tapsets, this practice should work to
eliminate the current proliferation of two-way
%( runtime != "bpf" %? implementation1 %: implementation2 %)
conditionals in the runtime code, and allow bpf versions of numerous
remaining tapset functions to be implemented and cleanly placed
into separate files under bpf/.
buildrun: stop adding our own -Wframe-larger-than=XYZ CFLAGS
linux kbuild has included a CONFIG_FRAME_WARN option for some time,
which adds a -Wframe-larger-than=$(...) CFLAGS. Until this patch,
systemtap used to add one of its own, on the theory that our generated
code really should not have large stack frames. This is partly
assured by use of preallocated per-cpu "struct context" structures, so
our warning was a belt-and-suspenders protection. However, this lower
limit is not sufficient any more.
On rawhide (linux 5.11, gcc 11), KASAN etc. makes the stack-frame of
the unwind_frame() function about 700 bytes, which is easily within
the CONFIG_FRAME_WARN but larger than 512. We now just defer to
kbuild's wisdom.
William Cohen [Mon, 1 Feb 2021 03:16:49 +0000 (22:16 -0500)]
Add CONFIG_COMPAT 32-bit support for aarch64 and powerpc
The aarch64 and powerpc kernels may be built with support for 32-bit
user-space applications. On Fedora 33 the aarch64 kernel is
configured with CONFIG_COMPAT enabled. Needed to provide some support
in _stp_is_compat_task2 for the aarch64. The Fedora 33 ppc64le
kernels by default do not have CONFIG_COMPAT enabled, but included
similar check in there for powerpc in case locally built kernels
enable it.
Frank Ch. Eigler [Fri, 29 Jan 2021 03:12:48 +0000 (22:12 -0500)]
PR27273: port to linux 5.11
Main change is removal/movement of TIF_IA32 in linux commit ff170cd05953
and nearby. Now using single a central wrapper _stp_is_compat_task() and
_stp_is_compat_task2() functions, instead of sprinkling
test_tsk_thread_flag(...) around the code base.
Also, suppressing CONFIG_DEBUG_INFO_BTF_MODULES generation for stap
modules, for diagnostic noise reduction.
Frank Ch. Eigler [Tue, 26 Jan 2021 16:09:53 +0000 (11:09 -0500)]
PR27251: support buildid probing on --x--x--x perm binaries
We need to delegate to executable searches to debuginfod for binaries
that exist on the local system, but are unreadable. Correcting
util.cxx access(3) checks from X_OK -> R_OK to avoid relying on
unreadable links/binaries, which results in stap being unable to probe
them at all. This can come up for probing some setuid programs.
Serhei Makarov [Mon, 25 Jan 2021 18:29:58 +0000 (13:29 -0500)]
bpf-translate.cxx WIP bugfix for PR27030: delay adjusted_toks deallocation
Identified more diagnostic data (adjusted BPF assembler tokens)
that the BPF translator was deallocating prematurely on exception throw.
For now, fix the problem by keeping deallocation disabled
entirely. Only other solution I can think of is to catch and re-throw
any semantic error, copying the adjusted token at that point to
keep it from being deallocated.
Frank Ch. Eigler [Sun, 24 Jan 2021 19:45:54 +0000 (14:45 -0500)]
PR27067: set procfs traceNN files' uid/gid too
commit e3d03db828 neglected to include the proper calls to set the
procfs traceNN files to the correct uid/gid ownership. With those
files left as uid/gid=0/0, stapio running with a user with
stapusr/stapdev privileges couldn't fopenat() those files. Now they
can again. This problem became obvious after commit 4706ab3ca5c0,
which makes STAP_TRANS_PROCFS the default.
bugfix: unwinder: expr: DW_OP_push*: we forgot to push the result to the dwarf stack.
Most of the assembly routines in OpenSSL's libcrypto would produce unwinder failures
and the warning "WARNING: DWARF expression stack underflow in CFI" due to this bug.
Sultan Alsawaf [Fri, 22 Jan 2021 23:19:33 +0000 (15:19 -0800)]
runtime: utilize relay subbufs as much as possible
The relay subbufs used for printing are used very inefficiently, causing
print messages to be frequently dropped. The cause for this inefficiency
is that every time a print flush occurs, the current subbuf is switched
out even if it isn't filled. We can't wait for a subbuf to fill up
before switching it out either, or messages will be delayed.
To remedy this, we instead check to see if there's any data in any
subbuf and use that as an indicator to staprun to tell if there's data
available to read. Then when staprun attempts to read the data out, we
can switch out the current subbuf if it has data in it. This lets us
squeeze out every bit of storage from the subbufs.
Any print drops experienced after this patch should be fixed by
increasing the subbuf count (_stp_nsubbufs).
Sultan Alsawaf [Fri, 22 Jan 2021 18:46:22 +0000 (10:46 -0800)]
runtime: default to using procfs for the transport
Using debugfs for the transport results in a multitude of bugs when
running stap modules in parallel. The bugs include debugfs outright
failing, warnings hit in fs/inode.c, and kernel panics. Since procfs has
been more stable over the years, default to using it for the transport
instead of debugfs. Using procfs resolves the issues faced with debugfs.
Sultan Alsawaf [Wed, 20 Jan 2021 20:55:40 +0000 (12:55 -0800)]
stp_utrace: remove unneeded RCU-freed field from struct utrace
We're only using RCU on struct utrace in order to allow non-blocking
iteration through the hashlists. RCU is not used to manage the lifetime
of utrace structs; once a utrace struct is ready to be freed, nothing
will try to grab it anymore, so we can remove the unneeded utrace->freed
check.
Sultan Alsawaf [Wed, 20 Jan 2021 20:51:19 +0000 (12:51 -0800)]
stp_utrace: remove kmem cache usage
Some kernels appear to have trouble registering the same kmem_cache in
parallel, resulting in the following error as well as some other mayhem,
such as staprun hangs and kernel freezes:
sysfs: cannot create duplicate filename '/kernel/slab/:0000144'
This occurs when stap modules are registered in parallel with one
another.
The justification for using kmem caches in utrace is that the utrace
struct sizes are not powers of 2, and a lot of them can be allocated, so
leaving them to the kernel's default kmem caches can waste quite a bit
of memory. However, this is only a problem for the utrace struct, and
not really the utrace_engine struct, as the utrace_engine struct is 56
bytes on 64-bit, and can be allocated by the kernel's internal 64-byte
kmem cache with only 8 bytes wasted per allocation.
The same cannot be said for the utrace struct, since it's 144 bytes. It
would therefore be allocated from the 256-byte kmem cache, resulting in
112 bytes wasted per allocation. We can remedy this by reusing existing
memory in the struct for the 16-byte RCU callback head, bringing the
overall struct size down to 128 bytes and thus eliminating the need for
a kmem cache. This is safe because the reused struct members are no
longer used once the struct is ready to be freed.
This also eliminates a pesky rcu_barrier() we no longer required.
Martin Cermak [Wed, 20 Jan 2021 21:09:49 +0000 (22:09 +0100)]
systemtap-service onboot: Skip updating the bootloader
It shows up that just modifying the default initrd is good enough,
no need to call kernel-install or new-kernel-pkg. This speeds up
the systemtap-service onboot operation.
Sultan Alsawaf [Tue, 19 Jan 2021 23:36:24 +0000 (15:36 -0800)]
stp_utrace: add missing rcu_read_unlock() in get_utrace_lock() error path
This bug was introduced in 619f6940d ("PR26697: fix NULL pointer deref
in get_utrace_lock()), which was my first stap commit, and subsequently
where this project started going downhill as a result.
Sultan Alsawaf [Tue, 19 Jan 2021 23:00:50 +0000 (15:00 -0800)]
stp_utrace: remove kmem cache usage
Some kernels appear to have trouble registering the same kmem_cache in
parallel, resulting in the following error as well as some other mayhem,
such as staprun hangs and kernel freezes:
sysfs: cannot create duplicate filename '/kernel/slab/:0000144'
This occurs when stap modules are registered in parallel with one
another.
The justification for using kmem caches in utrace is that the utrace
struct sizes are not powers of 2, and a lot of them can be allocated, so
leaving them to the kernel's default kmem caches can waste quite a bit
of memory. However, this is only a problem for the utrace struct, and
not really the utrace_engine struct, as the utrace_engine struct is 56
bytes on 64-bit, and can be allocated by the kernel's internal 64-byte
kmem cache with only 8 bytes wasted per allocation.
The same cannot be said for the utrace struct, since it's 144 bytes. It
would therefore be allocated from the 256-byte kmem cache, resulting in
112 bytes wasted per allocation. We can remedy this by reusing existing
memory in the struct for the 16-byte RCU callback head, bringing the
overall struct size down to 128 bytes and thus eliminating the need for
a kmem cache. This is safe because the reused struct members are no
longer used once the struct is ready to be freed.
This also eliminates a pesky rcu_barrier() we no longer required.
Craig Ringer [Sat, 16 Jan 2021 00:08:55 +0000 (19:08 -0500)]
PR27185: conversions.exp test enhancements
I've adjusted the stress tests referenced in
systemtap.stress/conversions.exp to make them more concise and allow
all the test definitions to be shared across each variant.
In order to avoid a use-after-free bug, we must remove the pde before
doing the path_put(), even though this contradicts what one would
naturally think the ordering should be.
This fixes the following KASAN splat:
BUG: KASAN: use-after-free in proc_remove+0x7b/0x80
Read of size 8 at addr ffff8882dc3b00b8 by task staprun-d/6732
CPU: 7 PID: 6732 Comm: staprun-d Tainted: G OE --------- - - 4.18.0-259.el8.x86_64+debug #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
dump_stack+0x8e/0xd0
print_address_description.constprop.3+0x1f/0x300
__kasan_report.cold.7+0x76/0xbf
? proc_remove+0x7b/0x80
kasan_report+0xe/0x20
proc_remove+0x7b/0x80
__stp_procfs_relay_remove_buf_file_callback+0x254/0x346 [orxray_libc_usleep_XX_6587]
? __stp_procfs_relay_remove_buf_file_callback+0x346/0x346 [orxray_libc_usleep_XX_6587]
? __stp_relay_remove_buf_file_callback+0x28/0x29 [orxray_libc_usleep_XX_6587]
? relay_close_buf+0xe0/0x130
? relay_close+0x14f/0x470
? _stp_transport_data_fs_close+0x1b/0x27 [orxray_libc_usleep_XX_6587]
? _stp_procfs_transport_fs_close+0xa/0xb [orxray_libc_usleep_XX_6587]
? _stp_transport_fs_close+0x24/0x26 [orxray_libc_usleep_XX_6587]
? _stp_transport_close+0x1e/0x24 [orxray_libc_usleep_XX_6587]
? cleanup_module+0xa/0xb [orxray_libc_usleep_XX_6587]
? __x64_sys_delete_module+0x2cc/0x4a0
? __ia32_sys_delete_module+0x4a0/0x4a0
? lockdep_hardirqs_on_prepare+0x343/0x4f0
? do_syscall_64+0x22/0x420
? do_syscall_64+0xa5/0x420
? entry_SYSCALL_64_after_hwframe+0x6a/0xdf
Allocated by task 6587:
save_stack+0x19/0x80
__kasan_kmalloc.constprop.10+0xc1/0xd0
kmem_cache_alloc+0xfe/0x350
__proc_create+0x1f6/0x740
proc_create_reg+0x61/0x100
proc_create_data+0x79/0xf0
__stp_procfs_relay_create_buf_file_callback+0xcb/0x43e [orxray_libc_usleep_XX_6587]
Freed by task 6732:
save_stack+0x19/0x80
__kasan_slab_free+0x125/0x170
kmem_cache_free+0xcd/0x360
proc_evict_inode+0x73/0x100
evict+0x29e/0x590
__dentry_kill+0x326/0x5a0
dentry_kill+0x94/0x410
dput+0x3b0/0x4a0
path_put+0x2d/0x60
__stp_procfs_relay_remove_buf_file_callback+0x24c/0x346 [orxray_libc_usleep_XX_6587]
The buggy address belongs to the object at ffff8882dc3b0000
which belongs to the cache proc_dir_entry of size 512
The buggy address is located 184 bytes inside of
512-byte region [ffff8882dc3b0000, ffff8882dc3b0200)
The buggy address belongs to the page:
page:ffffea000b70ec00 refcount:1 mapcount:0 mapping:ffff8881061d4f00 index:0x0 compound_mapcount: 0
flags: 0x17ffffc0008100(slab|head)
raw: 0017ffffc0008100dead000000000100dead000000000200ffff8881061d4f00
raw: 0000000000000000000000008019001900000001ffffffff0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8882dc3aff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8882dc3b0000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8882dc3b0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff8882dc3b0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8882dc3b0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
stap commit b71d20af819 fixed error messages in bpf assembly
(being broken by implicit deallocation of stack vars on exception throw)
but neglected the fact that visit_embeddedcode can recurse
(via emit_functioncall) such that a single bpf_unparser field for
storing asm_stmts will get overwritten by the recursive call.
Alloc/free a separate statement list per visit_embeddedcode call instead.
William Cohen [Mon, 11 Jan 2021 03:36:21 +0000 (22:36 -0500)]
Update test for bpf raw tracepoints to work with Linux 5.7 kernels
The kernel commit 70ed506c3bbcfa846d4636b23051ca79fa4781f7 in Linux
5.7 and newer replaced the bpf_raw_tracepoint_release function with
bpf_raw_tp_link_release. This change in function names would cause
SystemTap's test for BPF raw tracepoint support to fail. Updated the
check to look for the newer alternative function name.
Sven Wegener [Sat, 9 Jan 2021 21:40:02 +0000 (16:40 -0500)]
Remove non-posix == operators from configure.ac
The configure.ac script contains test commands with the == operator,
which is supported by most shells, but fails if /bin/sh has a test
built-in which is strictly posix-compliant.
Sultan Alsawaf [Fri, 8 Jan 2021 21:09:34 +0000 (13:09 -0800)]
Don't warn about freeing a NULL pointer for functions that tolerate it
Passing a NULL pointer to kfree(), vfree(), and free_percpu() is fine
and supported behavior; these functions will just return early when
given a NULL pointer. However, DEBUG_MEM doesn't know about this, and
warns about it even though it isn't a problem. This mutes the warning
from _stp_mem_debug_free() when these functions receive a NULL pointer.
Signed-off-by: Sultan Alsawaf <sultan@openresty.com>
Stan Cox [Fri, 8 Jan 2021 20:38:42 +0000 (15:38 -0500)]
Add stapdyn VMA-tracking.
To handle VMA-tracking in stapdyn: 1) do not emit pragma:vma so the
kernel VMA-tracker is not enabled 2) add a stapdyn version of
_stp_umodule_relocate which 3) uses dwfl_linux_proc_report to find the
appropriate module start 4) relocate the offset. This makes tls
possible so it is enabled for stapdyn.
Sultan Alsawaf [Wed, 30 Dec 2020 23:47:58 +0000 (15:47 -0800)]
task_finder2: fix task worker race on module unload
Unfortunately, __stp_tf_cancel_all_task_work() does not guarantee that
all of the task finder's task workers will be finished executing when it
returns. In this case, we rely on the stp_task_work API to prevent the
module from being unloaded while there are task workers in-flight, which
works, but the stp_task_work API is notified of a task worker finishing
before it actually finishes. Inside __stp_tf_task_worker_fn(), the
call to the task worker's function (tf_work->func) is where the final
refcount in the stp_task_work API could be put, but there will still be
instructions left in the task worker that will be executing for a short
time after that. In that short time, there can be a race where the
module is unloaded before the task worker finishes executing all of its
instructions, especially if the task worker gets preempted during this
time on a PREEMPT kernel.
To remedy this, we must ensure that the last instruction in
__stp_tf_task_worker_fn() is where the stp_task_work API is notified of
a task worker finishing.
Sultan Alsawaf [Wed, 30 Dec 2020 23:42:11 +0000 (15:42 -0800)]
task_finder2: fix list corruption in __stp_tf_cancel_all_task_work()
The previous commit (b26b4e2c2 "task_finder2: fix panics due to broken
task work cancellation") made it possible for the next node in the task
work list to be free, which would made list_for_each_entry_safe() not so
safe anymore. Using list_for_each_entry_safe() is still the fastest
approach here, so when the next node in the list happens to be freed, we
should just restart iteration on the list.
Sultan Alsawaf [Wed, 30 Dec 2020 22:21:42 +0000 (14:21 -0800)]
task_finder2: fix panics due to broken task work cancellation
The task_work_cancel() API uses function pointers to uniquely identify
task work structs, so there's no guarantee that a specific task work
struct we want to cancel is the one that will actually get canceled.
This issue would cause task work structs to be freed while they were
still queued up on the task's task-worker list.
This is an example of one such panic, where the DEBUG_MEM feature
reported that a __stp_tf_task_work struct (56 bytes) wasn't freed,
because that specific task worker got canceled and instead an active
task worker got freed!
William Cohen [Wed, 23 Dec 2020 21:33:03 +0000 (16:33 -0500)]
Work around kernel claims of a function("input_event").inline probe point
Newer Fedora Linux kernels (F32/F33) are claiming a
function("input_event").inline probe point exists and has no
arguments. The build of the stapgames block and eater fail because
the needed arguments are not found. Examined where the claimed inline
input_events functions are with:
stap -v -L 'kernel.function("input_event").*'
There appears to be a bogus one listed inside the callable input_event
function itself. Worked around this by setting the game.input tapset
probe to function("input_event").call to exclude the bogus inline
version. It was verified that the games got input from the keyboard
with this patch.
William Cohen [Wed, 23 Dec 2020 20:19:42 +0000 (15:19 -0500)]
Adjust enospc.stp example to work with Linux 5.9 kernels
The Linux 5.9 kernels changed the type of the inode agument passed
into the btrfs_check_data_free_space function. Need to check to see
if it is the new struct btrfs_inode being used or the old struct inode
and use a different field to get the s_dev value if required.
William Cohen [Tue, 22 Dec 2020 21:17:02 +0000 (16:17 -0500)]
Allow ioblock.request to work Linux 5.9 and newer kernels
In the Linux 5.9 kernel the generic_make_request function was renamed
to submit_bio_noacct by commit ed00aabd5e. Adjust the ioblock
ioblock.request to use the new name if it is available.
Sultan Alsawaf [Wed, 16 Dec 2020 22:46:36 +0000 (14:46 -0800)]
PR26844: remove trailing space from printed backtraces
When _STP_SYM_POST_SPACE is used, a trailing space is left over in the
log buffer which is then copied to the output for the backtrace print.
This issue was exposed by commit fd93cf71d.
Sultan Alsawaf [Wed, 16 Dec 2020 21:03:47 +0000 (13:03 -0800)]
session.cxx: fix print error dupe-elimination for chained errors
Commit 0e1d5b7eb397 introduced an issue where error messages would be
duplicated, like so:
Before:
--------------------8<--------------------
semantic error: type mismatch (long): identifier 'a' at test.stp:8:5
source: a = 32;
^
semantic error: type was first inferred here (string): identifier 'a' at :4:5
source: a = "stringcheese";
^
The first message would be duplicated because the wrong seen_errors is
checked inside the loop, after that first message would be printed
outside the loop. This fixes the issue by using the same error counter
throughout.
Frank Ch. Eigler [Mon, 14 Dec 2020 02:05:23 +0000 (21:05 -0500)]
PR23512: fix staprun/stapio operation via less-than-root privileges
Commit 7615cae790c899bc8a82841c75c8ea9c6fa54df3 for PR26665 introduced
a regression in handling stapusr/stapdev/stapsys gid invocation of
staprun/stapio. This patch simplifies the relevant code in
staprun/ctl.c, init_ctl_channel(), to rely on openat/etc. to populate
and use the relay_basedir_fd as much as possible. Also, we now avoid
unnecessary use of access(), which was checking against the wrong
(real rather than effective) uid/gid.
Frank Ch. Eigler [Fri, 11 Dec 2020 23:06:36 +0000 (18:06 -0500)]
staprun: handle more and fewer cpus better
NR_CPUS was a hard-coded minimum and maximum on the number of CPUs
worth of trace$N files staprun/stapio would open at startup. While a
constant is useful for array sizing (and so might as well be really
large), the actual iteration should be informed by get_nprocs_conf(3).
This patch replaces NR_CPUS with MAX_NR_CPUS (now 1024, why not), and
limits open/thread iterations to the actual number of processors. It
even prints an error if a behemoth >1K-core machine comes into being.
Frank Ch. Eigler [Fri, 11 Dec 2020 20:39:29 +0000 (15:39 -0500)]
relay transport: comment on STP_BULK message
While we've eliminated any STP_BULKMODE effects from the way relayfs
files are used ("always bulkmode"), staprun/stapio still need to know
whether the user intended "stap -b" or not, so they can save files
stpd_cpu* files separately.
Sultan Alsawaf [Thu, 10 Dec 2020 01:22:20 +0000 (17:22 -0800)]
always use per-cpu bulkmode relayfs files to communicate with userspace
Using a mutex_trylock() in __stp_print_flush() leads to a lot of havoc,
for numerous. Firstly, since __stp_print_flush() can be called from IRQ
context, holding the inode mutex from here would make the mutex owner
become nonsense, since mutex locks can only be held in contexts backed
by the scheduler. Secondly, the mutex_trylock implementation has a
spin_lock() inside of it that leads to two issues: IRQs aren't disabled
when acquiring this spin_lock(), so using it from IRQ context can lead
to a deadlock, and since spin locks can have tracepoints via
lock_acquire(), the spin_lock() can recurse on itself inside a stap
probe and deadlock, like so:
The reason the mutex_trylock() was needed in the first place was because
staprun doesn't properly use the relayfs API when reading buffers in
non-bulk mode. It tries to read all CPUs' buffers from a single thread,
when it should be reading each CPU's buffer from a thread running on
said CPU in order to utilize relayfs' synchronization guarantees, which
are made by disabling IRQs on the local CPU when a buffer is modified.
This change makes staprun always use per-CPU threads to read print
buffers so that we don't need the mutex_trylock() in the print flush
routine, which resolves a wide variety of serious bugs.
We also need to adjust the transport sub-buffer count to accommodate for
frequent print flushing. The sub-buffer size is now reduced to match the
log buffer size, which is 8192 by default, and the number of sub-buffers
is increased to 256. This uses exactly the same amount of memory as
before.
Frank Ch. Eigler [Thu, 10 Dec 2020 03:29:43 +0000 (22:29 -0500)]
PR27044: fix lock loop for conditional probes
Emit a nested block carefully so that the "goto out;" from a failed
stp_lock_probe() call in that spot near the epilogue of a
probe-handler goes downward, not upward.
Sultan Alsawaf [Wed, 9 Dec 2020 20:55:10 +0000 (12:55 -0800)]
PR26844: fix off-by-one error when copying printed backtraces
Since log->buf isn't null-terminated, log->len represents the total
number of bytes present in the log buffer for copying. The use of
strlcpy() here with log->len as its size results in log->len - 1 bytes
being copied, with the log->len'nth byte of the output buffer being set
to zero to terminate the string. Use memcpy() instead to remedy this,
while ensuring that the output buffer has space for null termination,
since the output buffer needs to be terminated.
This test case stresses nesting of heavy duty processing (backtrace
printing) within kernel interrupt processing paths. It seems to
sometimes trigger problems - so let's make the test harder to make
latent problems show up more likely. Instead of quitting after the
first irq_* function hit, stick around for 10 seconds.
Guillaume Morin [Fri, 4 Dec 2020 17:18:44 +0000 (12:18 -0500)]
PR27001: fix runtime/transport/transport.c lockdown build problem
On some kernel/configs, CONFIG_SECURITY_LOCKDOWN_LSM !=
STAPCONF_LOCKDOWN_DEBUGFS, which broke the runtime build.
Using the matching macro as detected by autoconf to fix.
Sultan Alsawaf [Thu, 3 Dec 2020 20:57:34 +0000 (12:57 -0800)]
runtime: fix print races in IRQ context and during print cleanup
Prints can race when there's a print called from IRQ context or a print
called while print cleanup takes place, which can lead to garbled print
messages, out-of-bounds memory accesses, and memory use-after-free. This
is one example of racy modification of the print buffer len in IRQ
context which caused a panic due to an out-of-bounds memory access:
This patch resolves the IRQ print races by disabling IRQs on the local
CPU when accessing said CPU's print buffer, and resolves the cleanup
races with a lock. We also protect against data corruption and panics
from prints inside NMIs now by checking if the current CPU was accessing
the log buffer when an NMI fired; in this case, the NMI's prints will be
dropped, as there is no way to safely service them without creating a
dedicated log buffer for them. This is achieved by forbidding reentrancy
with respect to _stp_print_trylock_irqsave() when the runtime context
isn't held. Reentrancy is otherwise allowed when the runtime context is
held because the runtime context provides reentrancy protection.
Note the deadlock due to _stp_transport_trylock_relay_inode recursing
onto itself via mutex_trylock.
This is a temporary fix for the issue until a proper patch is made to
remove the mutex_trylock from __stp_print_flush. This should be reverted
when that patch lands (it will have something to do with bulkmode).
Sultan Alsawaf [Wed, 2 Dec 2020 19:27:47 +0000 (11:27 -0800)]
task_finder_vma: add kfree_rcu() compat for old kernels
Newer RHEL 6 kernels have kfree_rcu(), but older ones do not. Using
kfree_rcu() is beneficial because it lets the RCU subsystem know that
the queued RCU callback is low-priority, and can be deferred, hence why
we don't replace kfree_rcu() with call_rcu() outright. Luckily,
kfree_rcu() is a macro so we can just #ifdef with it.
Alice Zhang [Fri, 27 Nov 2020 18:45:41 +0000 (13:45 -0500)]
Conscious language initiatives: replaced whitelist->passlist, blacklist->blocklist, master->main/primary. Some occurences of master and slave may not be able to be replaced at this point, eg. name of a terminology or usage of other programs interface.
Sultan Alsawaf [Wed, 2 Dec 2020 02:47:04 +0000 (18:47 -0800)]
runtime_context: replace _stp_context_lock with an atomic variable
We can't use any lock primitives here, such as spin locks or rw locks,
because lock_acquire() has tracepoints inside of it. This can cause a
deadlock, so we have to roll our own synchronization mechanism using an
atomic variable.
Sultan Alsawaf [Tue, 1 Dec 2020 17:54:07 +0000 (09:54 -0800)]
runtime_context: synchronize _stp_context_stop more strictly
We're only reading _stp_context_stop while the read lock is held, so we
can move the modification of it to inside the write lock to ensure
strict memory ordering. As such, it no longer needs to be an atomic_t
variable.
We also don't need to disable IRQs when holding the write lock because
only read_trylock is used from IRQ context, not read_lock, so there's no
possibility of a deadlock occurring.