sourceware.org Git - systemtap.git/log

]> sourceware.org Git - systemtap.git/log

git://sourceware.org / systemtap.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Serhei Makarov [Wed, 13 Jan 2021 19:28:20 +0000 (14:28 -0500)]

stapbpf (for PR27030): bugfix the b71d20af bugfix

stap commit b71d20af819 fixed error messages in bpf assembly
(being broken by implicit deallocation of stack vars on exception throw)
but neglected the fact that visit_embeddedcode can recurse
(via emit_functioncall) such that a single bpf_unparser field for
storing asm_stmts will get overwritten by the recursive call.

Alloc/free a separate statement list per visit_embeddedcode call instead.

commit | commitdiff | tree

Frank Ch. Eigler [Wed, 13 Jan 2021 18:31:03 +0000 (13:31 -0500)]

NEWS: correct arch names for recent tls code

Clarify that it's the 64-bit cpus that have tls var enablement.

commit | commitdiff | tree

William Cohen [Mon, 11 Jan 2021 03:36:21 +0000 (22:36 -0500)]

Update test for bpf raw tracepoints to work with Linux 5.7 kernels

The kernel commit 70ed506c3bbcfa846d4636b23051ca79fa4781f7 in Linux
5.7 and newer replaced the bpf_raw_tracepoint_release function with
bpf_raw_tp_link_release. This change in function names would cause
SystemTap's test for BPF raw tracepoint support to fail. Updated the
check to look for the newer alternative function name.

commit | commitdiff | tree

Sven Wegener [Sat, 9 Jan 2021 21:40:02 +0000 (16:40 -0500)]

Remove non-posix == operators from configure.ac

The configure.ac script contains test commands with the == operator,
which is supported by most shells, but fails if /bin/sh has a test
built-in which is strictly posix-compliant.

commit | commitdiff | tree

Tom Stellard [Sat, 9 Jan 2021 21:38:50 +0000 (16:38 -0500)]

Add BuildRequires: make

https://fedoraproject.org/wiki/Changes/Remove_make_from_BuildRoot

commit | commitdiff | tree

Sultan Alsawaf [Fri, 8 Jan 2021 21:09:34 +0000 (13:09 -0800)]

Don't warn about freeing a NULL pointer for functions that tolerate it

Passing a NULL pointer to kfree(), vfree(), and free_percpu() is fine
and supported behavior; these functions will just return early when
given a NULL pointer. However, DEBUG_MEM doesn't know about this, and
warns about it even though it isn't a problem. This mutes the warning
from _stp_mem_debug_free() when these functions receive a NULL pointer.

Signed-off-by: Sultan Alsawaf <sultan@openresty.com>

commit | commitdiff | tree

Stan Cox [Fri, 8 Jan 2021 20:38:42 +0000 (15:38 -0500)]

Add stapdyn VMA-tracking.

To handle VMA-tracking in stapdyn: 1) do not emit pragma:vma so the
kernel VMA-tracker is not enabled 2) add a stapdyn version of
_stp_umodule_relocate which 3) uses dwfl_linux_proc_report to find the
appropriate module start 4) relocate the offset. This makes tls
possible so it is enabled for stapdyn.

commit | commitdiff | tree

Martin Cermak [Thu, 7 Jan 2021 21:59:52 +0000 (22:59 +0100)]

Conditionally define ASYNC_SIZE in stack-s390.c

Upstream commit ce3dc44749 removed ASYNC_SIZE.

commit | commitdiff | tree

Sultan Alsawaf [Wed, 30 Dec 2020 23:47:58 +0000 (15:47 -0800)]

task_finder2: fix task worker race on module unload

Unfortunately, __stp_tf_cancel_all_task_work() does not guarantee that
all of the task finder's task workers will be finished executing when it
returns. In this case, we rely on the stp_task_work API to prevent the
module from being unloaded while there are task workers in-flight, which
works, but the stp_task_work API is notified of a task worker finishing
before it actually finishes. Inside __stp_tf_task_worker_fn(), the
call to the task worker's function (tf_work->func) is where the final
refcount in the stp_task_work API could be put, but there will still be
instructions left in the task worker that will be executing for a short
time after that. In that short time, there can be a race where the
module is unloaded before the task worker finishes executing all of its
instructions, especially if the task worker gets preempted during this
time on a PREEMPT kernel.

To remedy this, we must ensure that the last instruction in
__stp_tf_task_worker_fn() is where the stp_task_work API is notified of
a task worker finishing.

commit | commitdiff | tree

Sultan Alsawaf [Wed, 30 Dec 2020 23:42:11 +0000 (15:42 -0800)]

task_finder2: fix list corruption in __stp_tf_cancel_all_task_work()

The previous commit (b26b4e2c2 "task_finder2: fix panics due to broken
task work cancellation") made it possible for the next node in the task
work list to be free, which would made list_for_each_entry_safe() not so
safe anymore. Using list_for_each_entry_safe() is still the fastest
approach here, so when the next node in the list happens to be freed, we
should just restart iteration on the list.

commit | commitdiff | tree

Sultan Alsawaf [Wed, 30 Dec 2020 22:21:42 +0000 (14:21 -0800)]

task_finder2: fix panics due to broken task work cancellation

The task_work_cancel() API uses function pointers to uniquely identify
task work structs, so there's no guarantee that a specific task work
struct we want to cancel is the one that will actually get canceled.
This issue would cause task work structs to be freed while they were
still queued up on the task's task-worker list.

This is an example of one such panic, where the DEBUG_MEM feature
reported that a __stp_tf_task_work struct (56 bytes) wasn't freed,
because that specific task worker got canceled and instead an active
task worker got freed!

orxray_resty_mem_X_35062: ERROR: Memory ffff8809ed388620 len=56 allocation type: kmalloc. Not freed.
BUG: unable to handle kernel paging request at ffffffffa0570877
IP: [<ffffffffa0570877>] 0xffffffffa0570876
PGD 1abd067 PUD 1abe063 PMD 1028286067 PTE 0
Oops: 0010 [#1] SMP
CPU: 3 PID: 1338 Comm: nginx Tainted: G           OE  ------------   3.10.0-514.10.2.el7.x86_64.debug #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
task: ffff880eae2d0000 ti: ffff880eaf2e4000 task.ti: ffff880eaf2e4000
RIP: 0010:[<ffffffffa0570877>]  [<ffffffffa0570877>] 0xffffffffa0570876
RSP: 0018:ffff880eaf2e7d78  EFLAGS: 00010282
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff8809ed388640 RSI: 0000000000000000 RDI: ffff8809ed388640
RBP: ffff880eaf2e7da0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: ffffffffff90001c R12: ffffffff8248b1c0
R13: ffff880eae2d0818 R14: ffff880eae2d0000 R15: 00007eff3d2490b0
FS:  00007eff3dcd2740(0000) GS:ffff881037c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0570877 CR3: 0000000ebce67000 CR4: 00000000003406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffffffff810c6544 ffff880eaf2e7f58 ffff880eaf2e7e70 ffff880eae2d0000
00007eff3dcb3338 ffff880eaf2e7e38 ffffffff810b31ba ffff880eaf2e7dc0
ffffffff8106c279 ffff880eaf2e7e50 ffff880ef8a792c0 ffff880eaf2e7df8
Call Trace:
[<ffffffff810c6544>] ? task_work_run+0xb4/0xe0
[<ffffffff810b31ba>] get_signal_to_deliver+0x85a/0x960
[<ffffffff8106c279>] ? kvm_sched_clock_read+0x9/0x20
[<ffffffff810e7b4d>] ? sched_clock_local+0x1d/0x80
[<ffffffff810e7dd8>] ? sched_clock_cpu+0xb8/0xe0
[<ffffffff810324a7>] do_signal+0x57/0x6e0
[<ffffffff8176dba6>] ? int_very_careful+0x5/0xd
[<ffffffff81032b8f>] do_notify_resume+0x5f/0xb0
[<ffffffff8176dbfd>] int_signal+0x12/0x17
Code:  Bad RIP value.
RIP  [<ffffffffa0570877>] 0xffffffffa0570876
RSP <ffff880eaf2e7d78>
CR2: ffffffffa0570877
---[ end trace 1cdf8e5b522b246e ]---

commit | commitdiff | tree

William Cohen [Wed, 23 Dec 2020 21:33:03 +0000 (16:33 -0500)]

Work around kernel claims of a function("input_event").inline probe point

Newer Fedora Linux kernels (F32/F33) are claiming a
function("input_event").inline probe point exists and has no
arguments.  The build of the stapgames block and eater fail because
the needed arguments are not found.  Examined where the claimed inline
input_events functions are with:

  stap -v -L 'kernel.function("input_event").*'

There appears to be a bogus one listed inside the callable input_event
function itself.  Worked around this by setting the game.input tapset
probe to function("input_event").call to exclude the bogus inline
version.  It was verified that the games got input from the keyboard
with this patch.

commit | commitdiff | tree

William Cohen [Wed, 23 Dec 2020 20:19:42 +0000 (15:19 -0500)]

Adjust enospc.stp example to work with Linux 5.9 kernels

The Linux 5.9 kernels changed the type of the inode agument passed
into the btrfs_check_data_free_space function. Need to check to see
if it is the new struct btrfs_inode being used or the old struct inode
and use a different field to get the s_dev value if required.

commit | commitdiff | tree

William Cohen [Tue, 22 Dec 2020 21:17:02 +0000 (16:17 -0500)]

Allow ioblock.request to work Linux 5.9 and newer kernels

In the Linux 5.9 kernel the generic_make_request function was renamed
to submit_bio_noacct by commit ed00aabd5e. Adjust the ioblock
ioblock.request to use the new name if it is available.

commit | commitdiff | tree

Sultan Alsawaf [Wed, 16 Dec 2020 23:17:24 +0000 (15:17 -0800)]

Revert "PR26844: remove trailing space from printed backtraces"

This reverts commit 3d888f650fd13c6051b16573d0f84243689bf999.

Bad toast. This is a hack.

commit | commitdiff | tree

Sultan Alsawaf [Wed, 16 Dec 2020 22:46:36 +0000 (14:46 -0800)]

PR26844: remove trailing space from printed backtraces

When _STP_SYM_POST_SPACE is used, a trailing space is left over in the
log buffer which is then copied to the output for the backtrace print.
This issue was exposed by commit fd93cf71d.

commit | commitdiff | tree

Frank Ch. Eigler [Wed, 16 Dec 2020 21:46:29 +0000 (16:46 -0500)]

testsuite regrediag: new test for diagnostic duplication

Little test case confirms the correct number of semantic error
lines from the test case for commit 405f69a11a6.

commit | commitdiff | tree

Sultan Alsawaf [Wed, 16 Dec 2020 21:03:47 +0000 (13:03 -0800)]

session.cxx: fix print error dupe-elimination for chained errors

Commit 0e1d5b7eb397 introduced an issue where error messages would be
duplicated, like so:
Before:
--------------------8<--------------------
semantic error: type mismatch (long): identifier 'a' at test.stp:8:5
        source:     a = 32;
                    ^

semantic error: type was first inferred here (string): identifier 'a' at :4:5
        source:     a = "stringcheese";
                    ^

Pass 2: analysis failed.  [man error::pass2]
-------------------->8--------------------

After:
--------------------8<--------------------
semantic error: type mismatch (long): identifier 'a' at test.stp:8:5
        source:     a = 32;
                    ^

semantic error: type mismatch (long): identifier 'a' at :8:5
        source:     a = 32;
                    ^

semantic error: type was first inferred here (string): identifier 'a' at :4:5
        source:     a = "stringcheese";
                    ^

Pass 2: analysis failed.  [man error::pass2]
-------------------->8--------------------

The first message would be duplicated because the wrong seen_errors is
checked inside the loop, after that first message would be printed
outside the loop. This fixes the issue by using the same error counter
throughout.

commit | commitdiff | tree

Serhei Makarov [Tue, 15 Dec 2020 20:42:23 +0000 (15:42 -0500)]

stapbpf (for PR27030) tentative regalloc fix: zero unused regs

It can happen with some programs that the register allocator
restores a register whose spill slot was never written.

Tentatively, zero the stack to avoid a verifier error in this case.

commit | commitdiff | tree

Sultan Alsawaf [Mon, 14 Dec 2020 21:20:34 +0000 (13:20 -0800)]

staprun: use the correct out_fd when bulkmode and fsize_max aren't used

When bulkmode and fsize_max aren't used, there is only one output fd and
it is stored at out_fd[avail_cpus[0]].

commit | commitdiff | tree

Serhei Makarov [Mon, 14 Dec 2020 17:39:51 +0000 (12:39 -0500)]

stapbpf (for PR27030): bugfix error messages in bpf assembly

Need to retain the asm_stmts vector so that bpf assembly tokens
are not deallocated on exception throw.

Otherwise, printing semantic errors from bpf assembly causes segfault.

* bpf-translate.cxx (struct bpf_unparser): retain asm_stmts vector.
(bpf_unparser::visit_embeddedcode): reuse retained asm_stmts vector.

commit | commitdiff | tree

Frank Ch. Eigler [Mon, 14 Dec 2020 02:19:15 +0000 (21:19 -0500)]

PR27067 <<< corrected bug# for previous commit

commit | commitdiff | tree

Frank Ch. Eigler [Mon, 14 Dec 2020 02:05:23 +0000 (21:05 -0500)]

PR23512: fix staprun/stapio operation via less-than-root privileges

Commit 7615cae790c899bc8a82841c75c8ea9c6fa54df3 for PR26665 introduced
a regression in handling stapusr/stapdev/stapsys gid invocation of
staprun/stapio. This patch simplifies the relevant code in
staprun/ctl.c, init_ctl_channel(), to rely on openat/etc. to populate
and use the relay_basedir_fd as much as possible. Also, we now avoid
unnecessary use of access(), which was checking against the wrong
(real rather than effective) uid/gid.

commit | commitdiff | tree

Frank Ch. Eigler [Fri, 11 Dec 2020 23:06:36 +0000 (18:06 -0500)]

staprun: handle more and fewer cpus better

NR_CPUS was a hard-coded minimum and maximum on the number of CPUs
worth of trace$N files staprun/stapio would open at startup. While a
constant is useful for array sizing (and so might as well be really
large), the actual iteration should be informed by get_nprocs_conf(3).

This patch replaces NR_CPUS with MAX_NR_CPUS (now 1024, why not), and
limits open/thread iterations to the actual number of processors. It
even prints an error if a behemoth >1K-core machine comes into being.

commit | commitdiff | tree

Cosmin Tanislav [Thu, 10 Dec 2020 21:48:54 +0000 (16:48 -0500)]

bugfix: runtime: transport: handle more error cases in module init

Signed-off-by: Sultan Alsawaf <sultan@openresty.com>

commit | commitdiff | tree

Frank Ch. Eigler [Fri, 11 Dec 2020 20:39:29 +0000 (15:39 -0500)]

relay transport: comment on STP_BULK message

While we've eliminated any STP_BULKMODE effects from the way relayfs
files are used ("always bulkmode"), staprun/stapio still need to know
whether the user intended "stap -b" or not, so they can save files
stpd_cpu* files separately.

commit | commitdiff | tree

Sultan Alsawaf [Fri, 11 Dec 2020 20:31:25 +0000 (12:31 -0800)]

transport: set is_global to zero even when bulkmode is disabled

This is needed now that we always want per-cpu logger threads. When
is_global is set to a non-zero value, relay won't create per-cpu log
files.

commit | commitdiff | tree

Stan Cox [Fri, 11 Dec 2020 16:52:50 +0000 (11:52 -0500)]

Support pointer_arg in dyninst mode

Obey STAPCONF_X86_UNIREGS in _stp_arg2 since dyninst uses the standard
ptrace.h which defines, e.g. rax instead of ax

commit | commitdiff | tree

Sultan Alsawaf [Thu, 10 Dec 2020 01:22:27 +0000 (17:22 -0800)]

Revert "REVERTME: tapset-timers: work around on-the-fly deadlocks caused by mutex_trylock"

This reverts commit 6a27888b118b7a94650a68aae028957cdd5fb5f5.

No longer needed. As promised, we're reverting this.

commit | commitdiff | tree

Sultan Alsawaf [Thu, 10 Dec 2020 01:22:20 +0000 (17:22 -0800)]

always use per-cpu bulkmode relayfs files to communicate with userspace

Using a mutex_trylock() in __stp_print_flush() leads to a lot of havoc,
for numerous. Firstly, since __stp_print_flush() can be called from IRQ
context, holding the inode mutex from here would make the mutex owner
become nonsense, since mutex locks can only be held in contexts backed
by the scheduler. Secondly, the mutex_trylock implementation has a
spin_lock() inside of it that leads to two issues: IRQs aren't disabled
when acquiring this spin_lock(), so using it from IRQ context can lead
to a deadlock, and since spin locks can have tracepoints via
lock_acquire(), the spin_lock() can recurse on itself inside a stap
probe and deadlock, like so:

#0 [ffff88017f6d7a08] kvm_wait at ffffffff81079f5a
#1 [ffff88017f6d7a30] __pv_queued_spin_lock_slowpath at ffffffff8114f51e
#2 [ffff88017f6d7a70] queued_spin_lock_slowpath at ffffffff810e842b
#3 [ffff88017f6d7a80] mutex_trylock at ffffffff81882b1b
#4 [ffff88017f6d7ab8] _stp_transport_trylock_relay_inode at ffffffffc0c599df [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#5 [ffff88017f6d7ad8] __stp_print_flush at ffffffffc09b6483 [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#6 [ffff88017f6d7b10] probe_7879 at ffffffffc0a98c85 [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#7 [ffff88017f6d7b38] enter_real_tracepoint_probe_1543 at ffffffffc0c3b757 [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#8 [ffff88017f6d7b70] enter_tracepoint_probe_1543 at ffffffffc09b117e [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#9 [ffff88017f6d7b80] lock_acquire at ffffffff811460ba

The reason the mutex_trylock() was needed in the first place was because
staprun doesn't properly use the relayfs API when reading buffers in
non-bulk mode. It tries to read all CPUs' buffers from a single thread,
when it should be reading each CPU's buffer from a thread running on
said CPU in order to utilize relayfs' synchronization guarantees, which
are made by disabling IRQs on the local CPU when a buffer is modified.

This change makes staprun always use per-CPU threads to read print
buffers so that we don't need the mutex_trylock() in the print flush
routine, which resolves a wide variety of serious bugs.

We also need to adjust the transport sub-buffer count to accommodate for
frequent print flushing. The sub-buffer size is now reduced to match the
log buffer size, which is 8192 by default, and the number of sub-buffers
is increased to 256. This uses exactly the same amount of memory as
before.

commit | commitdiff | tree

Frank Ch. Eigler [Thu, 10 Dec 2020 03:29:43 +0000 (22:29 -0500)]

PR27044: fix lock loop for conditional probes

Emit a nested block carefully so that the "goto out;" from a failed
stp_lock_probe() call in that spot near the epilogue of a
probe-handler goes downward, not upward.

commit | commitdiff | tree

Sultan Alsawaf [Wed, 9 Dec 2020 20:55:10 +0000 (12:55 -0800)]

PR26844: fix off-by-one error when copying printed backtraces

Since log->buf isn't null-terminated, log->len represents the total
number of bytes present in the log buffer for copying. The use of
strlcpy() here with log->len as its size results in log->len - 1 bytes
being copied, with the log->len'nth byte of the output buffer being set
to zero to terminate the string. Use memcpy() instead to remedy this,
while ensuring that the output buffer has space for null termination,
since the output buffer needs to be terminated.

commit | commitdiff | tree

Frank Ch. Eigler [Sat, 5 Dec 2020 02:33:21 +0000 (21:33 -0500)]

dyninst transport: add _stp_print_*lock_irq* stubs

Recent code on the transport/linux side needs a few new (stub)
functions and type decls.

commit | commitdiff | tree

Frank Ch. Eigler [Sat, 5 Dec 2020 00:47:25 +0000 (19:47 -0500)]

configury noise: regenerate using fedora-33 auto*

autoreconf here and there, nothing of consequence

commit | commitdiff | tree

Frank Ch. Eigler [Sat, 5 Dec 2020 00:33:22 +0000 (19:33 -0500)]

testsuite pr14536.stp: toughen

This test case stresses nesting of heavy duty processing (backtrace
printing) within kernel interrupt processing paths. It seems to
sometimes trigger problems - so let's make the test harder to make
latent problems show up more likely. Instead of quitting after the
first irq_* function hit, stick around for 10 seconds.

commit | commitdiff | tree

Guillaume Morin [Fri, 4 Dec 2020 17:18:44 +0000 (12:18 -0500)]

PR27001: fix runtime/transport/transport.c lockdown build problem

On some kernel/configs, CONFIG_SECURITY_LOCKDOWN_LSM !=
STAPCONF_LOCKDOWN_DEBUGFS, which broke the runtime build.
Using the matching macro as detected by autoconf to fix.

commit | commitdiff | tree

Sultan Alsawaf [Thu, 3 Dec 2020 20:57:34 +0000 (12:57 -0800)]

runtime: fix print races in IRQ context and during print cleanup

Prints can race when there's a print called from IRQ context or a print
called while print cleanup takes place, which can lead to garbled print
messages, out-of-bounds memory accesses, and memory use-after-free. This
is one example of racy modification of the print buffer len in IRQ
context which caused a panic due to an out-of-bounds memory access:

BUG: unable to handle kernel paging request at ffffe8ffff621000
IP: [<ffffffffc05da0f3>] _stp_vsprint_memory+0x83/0x950 [stap_2c44636dfda18135ca3012a752599da6_13_533]
PGD 174b90067 PUD 174b8f067 PMD 174b93067 PTE 0
Oops: 0002 [#1] SMP
CPU: 12 PID: 3468 Comm: cat Kdump: loaded Tainted: G           OE  ------------   3.10.0-1127.19.1.el7.x86_64.debug #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
task: ffff88001f4f0000 ti: ffff88004ea5c000 task.ti: ffff88004ea5c000
RIP: 0010:[<ffffffffc05da0f3>]  [<ffffffffc05da0f3>] _stp_vsprint_memory+0x83/0x950 [stap_2c44636dfda18135ca3012a752599da6_13_533]
RSP: 0018:ffff88004ea5f9a8  EFLAGS: 00010082
RAX: ffffe8ffff621001 RBX: ffffe8ffff620ff2 RCX: fffffffffffffffe
RDX: 000000000000006e RSI: ffffffffffffffff RDI: ffffc90002c23730
RBP: ffff88004ea5fa28 R08: 00000000ffffffff R09: 0000000000000073
R10: ffffc90002c243d7 R11: 0000000000000001 R12: ffffc90002c2373f
R13: ffffe8ffff621004 R14: 0000000000000012 R15: 00000000fffffffe
FS:  00007f8a9b1d4740(0000) GS:ffff880179e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffe8ffff621000 CR3: 00000000b3e3c000 CR4: 0000000000360fe0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff8103eb89>] ? sched_clock+0x9/0x10
[<ffffffff8114036f>] ? lock_release_holdtime.part.30+0xf/0x1a0
[<ffffffffc05dcb80>] function___global_trace__overload_0+0x5b0/0x1220 [stap_2c44636dfda18135ca3012a752599da6_13_533]
[<ffffffffc05d8993>] ? stp_lock_probe+0x53/0xe0 [stap_2c44636dfda18135ca3012a752599da6_13_533]
[<ffffffff8188d879>] ? kretprobe_trampoline_holder+0x9/0x9
[<ffffffffc05e0662>] probe_7118+0x82/0xe0 [stap_2c44636dfda18135ca3012a752599da6_13_533]
[<ffffffffc05de866>] enter_kretprobe_common+0x256/0x490 [stap_2c44636dfda18135ca3012a752599da6_13_533]
[<ffffffff813489f1>] ? proc_sys_open+0x51/0x60
[<ffffffffc05dead0>] enter_kretprobe_probe+0x10/0x20 [stap_2c44636dfda18135ca3012a752599da6_13_533]
[<ffffffff8188e1d8>] trampoline_handler+0x148/0x220
[<ffffffff813489f1>] ? proc_sys_open+0x51/0x60
[<ffffffff8188d89e>] kretprobe_trampoline+0x25/0x57
[<ffffffff813489f1>] ? proc_sys_open+0x51/0x60
[<ffffffff8188d879>] kretprobe_trampoline_holder+0x9/0x9
[<ffffffff81384702>] ? security_inode_permission+0x22/0x30
[<ffffffff813489a0>] ? sysctl_head_finish+0x50/0x50
[<ffffffff812ac11d>] vfs_open+0x5d/0xb0
[<ffffffff812bb74a>] ? may_open+0x5a/0x120
[<ffffffff812c0af5>] do_last+0x285/0x15b0
[<ffffffff812bf18e>] ? link_path_walk+0x27e/0x8c0
[<ffffffff812c1ef0>] path_openat+0xd0/0x5d0
[<ffffffff8107a7f3>] ? kvm_clock_read+0x33/0x40
[<ffffffff812c38ad>] do_filp_open+0x4d/0xb0
[<ffffffff81889497>] ? _raw_spin_unlock+0x27/0x40
[<ffffffff812d5a9b>] ? __alloc_fd+0xfb/0x270
[<ffffffff812ad784>] do_sys_open+0x124/0x220
[<ffffffff812ad89e>] SyS_open+0x1e/0x20
[<ffffffff8188d879>] kretprobe_trampoline_holder+0x9/0x9

This patch resolves the IRQ print races by disabling IRQs on the local
CPU when accessing said CPU's print buffer, and resolves the cleanup
races with a lock. We also protect against data corruption and panics
from prints inside NMIs now by checking if the current CPU was accessing
the log buffer when an NMI fired; in this case, the NMI's prints will be
dropped, as there is no way to safely service them without creating a
dedicated log buffer for them. This is achieved by forbidding reentrancy
with respect to _stp_print_trylock_irqsave() when the runtime context
isn't held. Reentrancy is otherwise allowed when the runtime context is
held because the runtime context provides reentrancy protection.

commit | commitdiff | tree

Frank Ch. Eigler [Thu, 3 Dec 2020 20:24:20 +0000 (15:24 -0500)]

post-release prep: bump version number to 4.5

commit | commitdiff | tree

Alice Zhang [Thu, 3 Dec 2020 16:55:43 +0000 (11:55 -0500)]

Merge branch 'master' of ssh://sourceware.org/git/systemtap into concious_language

commit | commitdiff | tree

Sultan Alsawaf [Thu, 3 Dec 2020 02:09:17 +0000 (18:09 -0800)]

REVERTME: tapset-timers: work around on-the-fly deadlocks caused by mutex_trylock

The following deadlock exists due to tracepoints existing inside a lock
that is used both inside probe context and outside probe context:
#0 [ffff88017f6d7a08] kvm_wait at ffffffff81079f5a
#1 [ffff88017f6d7a30] __pv_queued_spin_lock_slowpath at ffffffff8114f51e
#2 [ffff88017f6d7a70] queued_spin_lock_slowpath at ffffffff810e842b
#3 [ffff88017f6d7a80] mutex_trylock at ffffffff81882b1b
#4 [ffff88017f6d7ab8] _stp_transport_trylock_relay_inode at ffffffffc0c599df [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#5 [ffff88017f6d7ad8] __stp_print_flush at ffffffffc09b6483 [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#6 [ffff88017f6d7b10] probe_7879 at ffffffffc0a98c85 [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#7 [ffff88017f6d7b38] enter_real_tracepoint_probe_1543 at ffffffffc0c3b757 [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#8 [ffff88017f6d7b70] enter_tracepoint_probe_1543 at ffffffffc09b117e [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#9 [ffff88017f6d7b80] lock_acquire at ffffffff811460ba
#10 [ffff88017f6d7be8] mutex_trylock at ffffffff81882a27
#11 [ffff88017f6d7c20] _stp_transport_trylock_relay_inode at ffffffffc0c599df [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#12 [ffff88017f6d7c40] __stp_print_flush at ffffffffc09b6483 [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#13 [ffff88017f6d7c78] _stp_vlog at ffffffffc09b8d32 [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#14 [ffff88017f6d7cd8] _stp_dbug at ffffffffc09ba43b [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#15 [ffff88017f6d7d38] systemtap_module_refresh at ffffffffc09ba51d [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#16 [ffff88017f6d7d50] module_refresher at ffffffffc09ba53e [stap_47650d3377d05db0ab7cbbaa25765809__11657]
#17 [ffff88017f6d7d60] process_one_work at ffffffff810da9cc
#18 [ffff88017f6d7de8] worker_thread at ffffffff810dafe6
#19 [ffff88017f6d7e48] kthread at ffffffff810e44cf
#20 [ffff88017f6d7f50] ret_from_fork_nospec_begin at ffffffff818958dd

Note the deadlock due to _stp_transport_trylock_relay_inode recursing
onto itself via mutex_trylock.

This is a temporary fix for the issue until a proper patch is made to
remove the mutex_trylock from __stp_print_flush. This should be reverted
when that patch lands (it will have something to do with bulkmode).

commit | commitdiff | tree

Sultan Alsawaf [Wed, 2 Dec 2020 19:27:47 +0000 (11:27 -0800)]

task_finder_vma: add kfree_rcu() compat for old kernels

Newer RHEL 6 kernels have kfree_rcu(), but older ones do not. Using
kfree_rcu() is beneficial because it lets the RCU subsystem know that
the queued RCU callback is low-priority, and can be deferred, hence why
we don't replace kfree_rcu() with call_rcu() outright. Luckily,
kfree_rcu() is a macro so we can just #ifdef with it.

commit | commitdiff | tree

Sultan Alsawaf [Wed, 2 Dec 2020 19:07:11 +0000 (11:07 -0800)]

runtime_context: disable preempt while holding the context

After the context lock was converted to an atomic in the previous
commit, the preempt disable logic disappeared. Add it back.

commit | commitdiff | tree

Alice Zhang [Fri, 27 Nov 2020 18:45:41 +0000 (13:45 -0500)]

Conscious language initiatives: replaced whitelist->passlist, blacklist->blocklist, master->main/primary. Some occurences of master and slave may not be able to be replaced at this point, eg. name of a terminology or usage of other programs interface.

commit | commitdiff | tree

Sultan Alsawaf [Wed, 2 Dec 2020 02:47:04 +0000 (18:47 -0800)]

runtime_context: replace _stp_context_lock with an atomic variable

We can't use any lock primitives here, such as spin locks or rw locks,
because lock_acquire() has tracepoints inside of it. This can cause a
deadlock, so we have to roll our own synchronization mechanism using an
atomic variable.

commit | commitdiff | tree

Sultan Alsawaf [Tue, 1 Dec 2020 17:54:07 +0000 (09:54 -0800)]

runtime_context: synchronize _stp_context_stop more strictly

We're only reading _stp_context_stop while the read lock is held, so we
can move the modification of it to inside the write lock to ensure
strict memory ordering. As such, it no longer needs to be an atomic_t
variable.

We also don't need to disable IRQs when holding the write lock because
only read_trylock is used from IRQ context, not read_lock, so there's no
possibility of a deadlock occurring.

commit | commitdiff | tree

Ding Hui [Fri, 27 Nov 2020 18:27:34 +0000 (13:27 -0500)]

PR26958: avoid null-deref in buildid verification failure diagnostic

We can't traverse tsk-> if it's NULL.

Tested by <fche> on hand-modified "stap -kp4" stap-symbols.c file.

commit | commitdiff | tree

Sultan Alsawaf [Tue, 24 Nov 2020 18:50:10 +0000 (10:50 -0800)]

runtime_context: factor out RCU usage using a rw lock

We can factor out the RCU insanity in here by just adding in a rw lock
and using that to synchronize _stp_runtime_contexts_free() with any code
that has the runtime context held.

commit | commitdiff | tree

Frank Ch. Eigler [Tue, 17 Nov 2020 21:34:59 +0000 (16:34 -0500)]

PR26665 detect rhel8 (4.18) era kernel_is_locked_down() as procfs trigger

A different older kernel API needs to be probed for rhel8 era detection
of lockdown in effect. Added an (undocumented) $SYSTEMTAP_NOSIGN env
var to override automatic --use-server on lockdown, so that one can
inspect runtime/autoconf* operation locally, without stap-server.

commit | commitdiff | tree

Sultan Alsawaf [Tue, 17 Nov 2020 19:03:53 +0000 (11:03 -0800)]

task_finder: call _stp_vma_done() upon error to fix memory leak

The memory allocated inside stap_initialize_vma_map() is not freed upon
error when the task finder is started because a call to _stp_vma_done()
in the error path is missing. Add it to fix the leak.

commit | commitdiff | tree

Jamie Bainbridge [Tue, 17 Nov 2020 17:50:04 +0000 (12:50 -0500)]

examples: add timestamp to dropwatch.stp

When using dropwatch.stp to troubleshoot packet drops, it is often done
with additional troubleshooting such as packet captures and collections
of other commands like "ethtool -S" or "netstat -s".

To correspond traffic loss events across the various output, these
should all have timestamps.

Add ctime timestamp to dropwatch to enable this. Update documentation to
show example timestamp collection.

Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>

commit | commitdiff | tree

Frank Ch. Eigler [Mon, 16 Nov 2020 23:54:11 +0000 (18:54 -0500)]

PR26665: mokutil output parsing tweaks

We encountered secureboot keys in the wild that didn't live up
to the expectations of the current little state machine. Tweaked
regexps to accept Issuer: O= as well as Issuer: CN= lines. With
more verbosity, produces output on parsing process.

commit | commitdiff | tree

Frank Ch. Eigler [Fri, 13 Nov 2020 17:36:07 +0000 (12:36 -0500)]

RHBZ1892179: double default UTRACE_TASK_WORKPOOL

Some workloads were observed to exhaust the previous limit of 288.

commit | commitdiff | tree

Sultan Alsawaf [Tue, 10 Nov 2020 18:03:34 +0000 (10:03 -0800)]

stp_utrace: disable IRQs when holding the bucket spin lock

This lock can be acquired from inside an IRQ, leading to a deadlock:

WARNING: inconsistent lock state
4.14.35-1902.6.6.el7uek.x86_64.debug #2 Tainted: G           OE
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
sh/15779 [HC1[1]:SC0[0]:HE0:SE1] takes:
(&(lock)->rlock#3){?.+.}, at: [<ffffffffc0c080b0>] _stp_mempool_alloc+0x35/0xab [orxray_lj_lua_fgraph_XXXXXXX]
{HARDIRQ-ON-W} state was registered at:
  lock_acquire+0xe0/0x238
  _raw_spin_lock+0x3d/0x7a
  utrace_task_alloc+0xa4/0xe3 [orxray_lj_lua_fgraph_XXXXXXX]
  utrace_attach_task+0x136/0x194 [orxray_lj_lua_fgraph_XXXXXXX]
  __stp_utrace_attach+0x57/0x216 [orxray_lj_lua_fgraph_XXXXXXX]
  stap_start_task_finder+0x12e/0x33f [orxray_lj_lua_fgraph_XXXXXXX]
  systemtap_module_init+0x114d/0x11f0 [orxray_lj_lua_fgraph_XXXXXXX]
  _stp_handle_start+0xea/0x1c5 [orxray_lj_lua_fgraph_XXXXXXX]
  _stp_ctl_write_cmd+0x28d/0x2d1 [orxray_lj_lua_fgraph_XXXXXXX]
  full_proxy_write+0x67/0xbb
  __vfs_write+0x3a/0x170
  vfs_write+0xc7/0x1c0
  SyS_write+0x58/0xbf
  do_syscall_64+0x7e/0x22c
  entry_SYSCALL_64_after_hwframe+0x16e/0x0
irq event stamp: 9454
hardirqs last  enabled at (9453): [<ffffffffa696c960>] _raw_write_unlock_irqrestore+0x40/0x67
hardirqs last disabled at (9454): [<ffffffffa6a05417>] apic_timer_interrupt+0x1c7/0x1d1
softirqs last  enabled at (9202): [<ffffffffa6c00361>] __do_softirq+0x361/0x4e5
softirqs last disabled at (9195): [<ffffffffa60aeb76>] irq_exit+0xf6/0x102

other info that might help us debug this:
Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(lock)->rlock#3);
  <Interrupt>
    lock(&(lock)->rlock#3);

*** DEADLOCK ***

no locks held by sh/15779.

stack backtrace:
CPU: 16 PID: 15779 Comm: sh Tainted: G           OE   4.14.35-1902.6.6.el7uek.x86_64.debug #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x81/0xb6
print_usage_bug+0x1fc/0x20d
? check_usage_backwards+0x130/0x12b
mark_lock+0x1f8/0x27b
__lock_acquire+0x6e7/0x165a
? sched_clock_local+0x18/0x81
? perf_swevent_hrtimer+0x136/0x151
lock_acquire+0xe0/0x238
? _stp_mempool_alloc+0x35/0xab [orxray_lj_lua_fgraph_XXXXXXX]
_raw_spin_lock_irqsave+0x55/0x97
? _stp_mempool_alloc+0x35/0xab [orxray_lj_lua_fgraph_XXXXXXX]
_stp_mempool_alloc+0x35/0xab [orxray_lj_lua_fgraph_XXXXXXX]
_stp_ctl_get_buffer+0x69/0x215 [orxray_lj_lua_fgraph_XXXXXXX]
_stp_ctl_send+0x4e/0x169 [orxray_lj_lua_fgraph_XXXXXXX]
_stp_vlog+0xac/0x143 [orxray_lj_lua_fgraph_XXXXXXX]
? _stp_utrace_probe_cb+0xa4/0xa4 [orxray_lj_lua_fgraph_XXXXXXX]
_stp_warn+0x6a/0x88 [orxray_lj_lua_fgraph_XXXXXXX]
function___global_warn__overload_0+0x60/0xac [orxray_lj_lua_fgraph_XXXXXXX]
probe_67+0xce/0x10e [orxray_lj_lua_fgraph_XXXXXXX]
_stp_hrtimer_notify_function+0x2db/0x55f [orxray_lj_lua_fgraph_XXXXXXX]
__hrtimer_run_queues+0x132/0x5c5
hrtimer_interrupt+0xb7/0x1ca
smp_apic_timer_interrupt+0xa5/0x35a
apic_timer_interrupt+0x1cc/0x1d1
</IRQ>

commit | commitdiff | tree

Alice Zhang [Tue, 10 Nov 2020 18:11:13 +0000 (13:11 -0500)]

PR13838: Add float32 support and corresponding test cases

runtime/softfloat.* & runtime/softfloat/: add f32 support and f32 to f64
conversion
tapset/floatingpoint.stp: fixed some documentation typos & add f32_tp_f64
tapset function
testsuite/buildok/floatingpoint.stp: add f32 related test cases
main.cxx: add float parameter to sdt_benchmark_thread function for test purpose

runtime/softfloat.c & tapset/floatingpoint.stp : delete unnecessary functions
to keep the code concise

commit | commitdiff | tree

Frank Ch. Eigler [Wed, 11 Nov 2020 03:13:53 +0000 (22:13 -0500)]

RHBZ1892179: handle exhausted stp_task_work structs

In utrace_report_syscall_entry and _exit, there is a possibility of
dereferencing a NULL pointer, in case __stp_utrace_alloc_task_work
exhausts UTRACE_TASK_WORK_POOL_SIZE live elements. While OOM is
still a possibility, this patch handles it more gracefully.

commit | commitdiff | tree

Frank Ch. Eigler [Wed, 11 Nov 2020 01:23:46 +0000 (20:23 -0500)]

releng: update-po

regenerate po/* files

commit | commitdiff | tree

Frank Ch. Eigler [Wed, 11 Nov 2020 01:15:48 +0000 (20:15 -0500)]

PR26665: relayfs-on-procfs megapatch, rhel6 tweaks

A few more compatibility macros needed to be moved over to transport/procfs.c.

commit | commitdiff | tree

Frank Ch. Eigler [Tue, 10 Nov 2020 03:06:15 +0000 (22:06 -0500)]

pre-release: version timestamping, NEWS tweaks

commit | commitdiff | tree

Frank Ch. Eigler [Tue, 10 Nov 2020 02:13:20 +0000 (21:13 -0500)]

pre-release: regenerate example index

commit | commitdiff | tree

Frank Ch. Eigler [Tue, 10 Nov 2020 01:45:09 +0000 (20:45 -0500)]

pre-release: update-docs

Regenerate man pages and pdf docs.

commit | commitdiff | tree

Frank Ch. Eigler [Tue, 10 Nov 2020 01:18:31 +0000 (20:18 -0500)]

testsuite tweak: buildok/floatingpoint.stp chmod a+x

commit | commitdiff | tree

Frank Ch. Eigler [Tue, 10 Nov 2020 00:18:19 +0000 (19:18 -0500)]

PR26665: relayfs-on-procfs megapatch

On platforms/configurations where debugfs is inaccessible (I'm
side-eyeing at you, secureboot + kernel_lockdown), the stap runtime
needs another way to hook up the relayfs / .cmd files to talk to
staprun/stapio in userspace.  kernel relayfs users all rely on
debugfs (tied closely to struct dentry*), and filesystems where
dentry*'s are not immediately available are SOL.

Until now.  This gigapatch forks pieces of runtime/transport/transport.c
into debugfs and procfs alternatives. The debugfs fork is just like
before. The procfs fork is new, and uses a proc_dir_entry <-> struct
path look-up table to map between procfs objects and the dentry*'s
that relayfs so loves.

The debugfs alternative is default, except when lockdown mode is
detected; then the runtime chooses procfs_p at the strategic moment.
stap -DSTAP_TRANS_PROCFS or -DSTAP_TRANS_DEBUGFS lets the user
override this heuristic.  (Going to a procfs default is worth
considering at some point.)

The staprun/stapio userspace is updated to search both
/sys/kernel/debug/systemtap and /proc/systemtap for the relay/.cmd
file endpoints.

Most of this gigapatch is moving code around in runtime/transport/ so
relay_v2 is agnostic to its enclosing filesystem, going through hooks
in transport.c to either procfs.c or debugfs.c.  The old
runtime/procfs.c file is stripped down to move common bits around a
little.

Signed-off-by: Frank Ch. Eigler <fche@redhat.com>

commit | commitdiff | tree

Frank Ch. Eigler [Thu, 5 Nov 2020 18:51:49 +0000 (13:51 -0500)]

transport relay_v2: drop "dropped" facility

Nothing's consuming the "dropped" debugfs file as per
-D_STP_USE_DROPPED_FILE, so drop this logic for simplicity.

Signed-off-by: Frank Ch. Eigler <fche@redhat.com>

commit | commitdiff | tree

William Cohen [Mon, 9 Nov 2020 18:01:06 +0000 (13:01 -0500)]

Initialize variable in runtime/softfloat.c to avoid RHEL8 -Werror issue

Make sure that the variable is initialized to something to avoid the
following error when running the testsuite on RHEL8:

attempting command stap -p4 floatingpoint.stp -c "stap --benchmark-sdt"
OUT In file included from /tmp/stapBRN9va/stap_825f154f474bfd5b2080a28426f65178_4743_src.c:37:
/usr/share/systemtap/runtime/softfloat.c: In function 'softfloat_shiftRightJamM':
/usr/share/systemtap/runtime/softfloat.c:132:34: error: 'ptr' may be used uninitialized in this function [-Werror=maybe-uninitialized]
     uint32_t wordJam, wordDist, *ptr;
                                  ^~~
cc1: all warnings being treated as errors
make[3]: *** [scripts/Makefile.build:315: /tmp/stapBRN9va/stap_825f154f474bfd5b2080a28426f65178_4743_src.o] Error 1
make[2]: *** [Makefile:1544: _module_/tmp/stapBRN9va] Error 2
WARNING: kbuild exited with status: 2
Pass 4: compilation failed.  [man error::pass4]
child process exited abnormally
RC 1
FAIL: systemtap.examples/general/floatingpoint build

commit | commitdiff | tree

Sultan Alsawaf [Fri, 6 Nov 2020 21:58:29 +0000 (13:58 -0800)]

task_finder2: fix memory leak when task workers fail to get added

None of the error paths for the __stp_tf_task_work_add() calls free the
tf_work allocation when the task_work_add fails. This fixes that.

This also makes a nitpick to __stp_tf_task_worker_fn() to reduce the
critical section of __stp_tf_task_work_list_lock.

Reported-by: Frank Ch. Eigler <fche@redhat.com>
Signed-off-by: Yichun Zhang (agentzh) <yichun@openresty.com>

commit | commitdiff | tree

Aaron Merey [Fri, 6 Nov 2020 21:56:56 +0000 (16:56 -0500)]

man/stapprobes.3stap: Mention nd_syscall argument writing.

commit | commitdiff | tree

Aaron Merey [Fri, 6 Nov 2020 19:31:56 +0000 (14:31 -0500)]

prerelease: update-docs

commit | commitdiff | tree

Aaron Merey [Fri, 6 Nov 2020 19:29:39 +0000 (14:29 -0500)]

tapset/uconversions.stp: Fix format of user_string_n_nofault

Function description needs to be on one line in order for
doc generation to work.

commit | commitdiff | tree

Aaron Merey [Fri, 6 Nov 2020 18:52:24 +0000 (13:52 -0500)]

tapset/uconversions.stp: Fix user_string_n_nofault description.

Fix description to correctly state that the empty string is returned
when userspace data is not accessible

commit | commitdiff | tree

Aaron Merey [Fri, 6 Nov 2020 18:08:47 +0000 (13:08 -0500)]

prerelease: AUTHORS bump

commit | commitdiff | tree

Aaron Merey [Fri, 6 Nov 2020 18:08:13 +0000 (13:08 -0500)]

prerelease: update-po

commit | commitdiff | tree

Sultan Alsawaf [Thu, 5 Nov 2020 21:39:30 +0000 (13:39 -0800)]

PR26144: task_finder2: execute task workers in order

The task finder's task workers need to be executed in the order that
they are added, but the kernel's task_work API doesn't make any ordering
guarantees, so task workers end up getting executed out of order. This
becomes a problem when the mmap callback worker runs after the other two
workers the task finder uses, even though it gets queued beforehand.

We can make the task finder's task workers run in order by wrapping the
task worker API with our own routines to dequeue task workers from a
global list and run them in the correct order. A lot of the scaffolding
needed to achieve this is already present, so this change is not too
invasive.

Signed-off-by: Yichun Zhang (agentzh) <yichun@openresty.com>

commit | commitdiff | tree

Alice Zhang [Thu, 5 Nov 2020 20:51:56 +0000 (15:51 -0500)]

PR13838: update fp systemtap example

testsuite/systemtap.examples/general/floatingpoint.stp&floatingpoint.txt: fixed
typo and add print for initial fp a, b, c

commit | commitdiff | tree

Alice Zhang [Thu, 5 Nov 2020 18:15:21 +0000 (13:15 -0500)]

PR13838: Fix previous commit message (c80f1453eba9430921edd4dc10e93f8d993042da)

commit | commitdiff | tree

Alice Zhang [Thu, 5 Nov 2020 04:07:11 +0000 (23:07 -0500)]

PR13838: add floating point to systemtap.examples

testsuite/systemtap.examples/general/floatingpoint.*: add a demo for extracting
fp and performing some basic fp operations.

instead of printing out results of every operation, print out combinations of two to three
operations.

commit | commitdiff | tree

Alice Zhang [Thu, 5 Nov 2020 04:07:11 +0000 (23:07 -0500)]

PR13838: add floating point to systemtap.examples

testsuite/systemtap.examples/general/floatingpoint.*: add a demo for extracting
fp and performing some basic fp operations.

printing out results of every operation, print out combinations of two to three
operations.

commit | commitdiff | tree

Aaron Merey [Thu, 5 Nov 2020 17:46:31 +0000 (12:46 -0500)]

Makefile.am: Install runtime/softfloat/

Previously the runtime/softfloat directory was not installed when
building systemtap. This lead to errors when trying to use systemtap's
floating point facilities.

Modify Makefile.am so that this directory is installed during a build.

commit | commitdiff | tree

Sultan Alsawaf [Mon, 2 Nov 2020 23:53:09 +0000 (15:53 -0800)]

PR26846: task_finder2: fix kernel panics by eliminating in_atomic() usage

With non-PREEMPT kernels (i.e., kernels with CONFIG_PREEMPT=n),
in_atomic() cannot detect when the current context is within a spin lock
or RCU read-side critical section. Since the syscall tracepoints are
executed from within an RCU read-side critical section (see
__DO_TRACE()), this means that in_atomic() won't know that the current
context doesn't allow sleeping. When this happens, we see kernel panics
occurring in stap's registered tracepoints, like this one:

kernel tried to execute NX-protected page - exploit attempt? (uid: 99)
BUG: unable to handle kernel paging request at ffffffffc1ea7040
IP: [<ffffffffc1ea7040>] _stp_module_3+0x0/0xffffffffffed9fc0 [orxray_c_fgraph_XX_3673]
PGD 1c1814067 PUD 1c1816067 PMD 486e4067 PTE 8000000164606063
Oops: 0011 [#1] SMP
CPU: 39 PID: 6934 Comm: sh Kdump: loaded Tainted: G           OE  ------------ T 3.10.0-1062.4.2.el7.x86_64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
task: ffff943dc3d5b150 ti: ffff943dc27d4000 task.ti: ffff943dc27d4000
RIP: 0010:[<ffffffffc1ea7040>]  [<ffffffffc1ea7040>] _stp_module_3+0x0/0xffffffffffed9fc0 [orxray_c_fgraph_XX_3673]
RSP: 0018:ffff943dc27d7ea8  EFLAGS: 00010282
RAX: ffffffffc1ea7040 RBX: ffff943dc3d5b150 RCX: ffff943d537f4300
RDX: 0000000000001b16 RSI: ffff943dc3d5b150 RDI: 0000000000000000
RBP: ffff943dc27d7f28 R08: 0000000000000000 R09: 0000000180490016
R10: ffff943d537f4300 R11: ffff943d5cd62930 R12: ffff943dc4e38000
R13: 0000000000001b16 R14: 0000000000001b16 R15: ffff943e519351d0
FS:  0000000000000000(0000) GS:ffff943f76fc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc1ea7040 CR3: 000000016d4b8000 CR4: 0000000000340fe0
Call Trace:
[<ffffffffa6e52c64>] ? do_execve_common.isra.24+0x7e4/0x880
[<ffffffffa6e52f99>] SyS_execve+0x29/0x30
[<ffffffffa738d478>] stub_execve+0x48/0x80

Note that the panic occurs from the execve syscall, where stap has a
tracepoint registered:

rc = STP_TRACE_REGISTER(sched_process_exec, utrace_report_exec);

Panics like this occur in all of stap's registered tracepoints. To fix
them, just defer the mmap callbacks to a task worker all the time. That
way, we never need to worry about handling them in a safe context.

Signed-off-by: Yichun Zhang (agentzh) <yichun@openresty.com>

commit | commitdiff | tree

Frank Ch. Eigler [Wed, 4 Nov 2020 21:27:08 +0000 (16:27 -0500)]

translator: disambiguate runtime errors better

If routine runtime errors occur during execution, the c->last_stmt
variable is printed to the user as to best suspected script location
of the failure.  As an optimization, this variable is not set at every
little point during statement/expression evaluation that are not
likely to cause errors.  But we overlooked one spot where it's
absolutely needed: around function calls, especially into synthetic
embedded-c functions that process $context variables.  That meant that
error messages could misidentify some other recent but nonspecific
point for an error.

Now we add a c->last_stmt set immediately before each function call,
after its actual arguments are executed.  This placement also covers
the case where the arguments themselves might fail during evaluation.

commit | commitdiff | tree

Serhei Makarov [Wed, 4 Nov 2020 20:50:49 +0000 (15:50 -0500)]

PR26811 WIP: adapt to set_fs() removal in linux 5.10+

WIP since there are still a few faults in evidence e.g. on check.exp whythefail

Introduce STAPCONF_SET_FS to identify if set_fs is present.

After kernel 5.10 on arches removing set_fs(), kernel
addresses should be read/written with get_kernel_nofault and
copy_to_kernel_nofault while user addresses are still read/written
with __get_user and __put_user. So we have wrapper macros
__stp_{get,put}_either which do the right thing on all kernel
versions.

Also, since KERNEL_DS and USER_DS are no longer available, introduce
STP_KERNEL_DS and STP_USER_DS. These map to KERNEL_DS and USER_DS on
older kernels.

Also, modify loc2c-runtime.h dereferencing functions and lookup_bad_addr
to take STP_KERNEL_DS/STP_USER_DS parameters specifying the address space
to dereference in.

commit | commitdiff | tree

Sultan Alsawaf [Sat, 31 Oct 2020 07:02:12 +0000 (00:02 -0700)]

stp_task_work: don't busy poll in stp_task_work_exit()

Instead of doing a busy poll and forcefully sleeping for one jiffy every
time stp_task_work_exit() checks to see if all the task workers are
finished, just use a wait event and have the last task worker wake up
stp_task_work_exit() when it's finished. This is faster and more
efficient, since there's no uninterruptible sleeping for exactly one
jiffy at a time, and there's no polling involved.

Signed-off-by: Yichun Zhang (agentzh) <yichun@openresty.com>

commit | commitdiff | tree

Sultan Alsawaf [Sat, 31 Oct 2020 06:53:17 +0000 (23:53 -0700)]

stp_utrace: reset the correct atomic var when resume work fails to queue

Signed-off-by: Yichun Zhang (agentzh) <yichun@openresty.com>

commit | commitdiff | tree

Martin Cermak [Fri, 30 Oct 2020 17:50:42 +0000 (18:50 +0100)]

Adapt debugpath.exp to the debuginfod feature.

commit | commitdiff | tree

Alice Zhang [Fri, 30 Oct 2020 06:33:01 +0000 (02:33 -0400)]

PR13838: Added basic floating point support to systemtap

runtime/softfloat.*: including floating point type definition
runtime/softfloat/*: all other required auxiliary functions

These are from https://github.com/ucb-bar/berkeley-softfloat-3
by John R. Hauser, thanks!

tapset/floatingpoing.stp: including fp conversion, fp arithmetic and
comparison functions testsuite/buildok/floatingpoint.stp: including testcase
for corresponding floatingpoing tapset main.cxx: changed sdt_benchmark part
of code for a demo of extracting floating point

Systemtap support 64 bit floating pounint (double type) under ieee754.
Conversions(fp <-> long, fp <-> string), arithmetic(add, sub, div, mul, sqrt)
and comparison between fp(less than, less than or equal to, equal) are
supported, corresponding tapset functions and test case are provided as well.

commit | commitdiff | tree

amerey [Fri, 7 Aug 2020 22:58:33 +0000 (18:58 -0400)]

PR26015: Make syscall arguments writable again

Make syscall arguments writable again in non-DWARF probes on kernels
that use syscall wrappers to pass arguments via pt_regs (currently
x86_64 4.17+ and aarch64 4.19+).

For non-DWARF syscall probes also add an additional probe variable
for each syscall string parameter that holds an unquoted version
of the string parameter. Modifying this variable within the probe
will cause the string it holds to be written to the userspace string
buffer that was passed to the syscall.

commit | commitdiff | tree

Sagar Patel [Thu, 29 Oct 2020 23:34:41 +0000 (19:34 -0400)]

PR26015: Add @probewrite predicate.

The @probewrite predicate checks whether an identifier has been
written to in the probe handler body. The identifier can be either
a script variable or target variable. @probewrite(var) returns 1
if var has been written to in the probe handler body, else 0.

For example,

probe foo = begin { var = 0 }, { if (@probewrite(var)) println(var) }
probe foo { var = 1 }

The @probewrite predicate would resolve to 1 in this case and the
new value of var would be printed.

1) Added probewrite_op.
2) Designed probewrite_evaluator to resolve @probewrite checks.
3) Designed symuse_collecting_visitor (similar to varuse_collecting_visitor).
3) Updated several other visitors accordingly.
4) Added test cases.
5) Updated NEWS.

commit | commitdiff | tree

Aaron Merey [Mon, 20 Jul 2020 18:22:34 +0000 (14:22 -0400)]

Allow individual probes to have both a prologue and epilogue.

Also add new syntax for defining combined prologue and epilogue:
'probe ALIAS = PROBE { <prologue> }, { <epilogue> }'

commit | commitdiff | tree

Yichun Zhang (agentzh) [Thu, 29 Oct 2020 23:23:06 +0000 (16:23 -0700)]

NEWS: mentioned the utrace task hash table optimization

Also mentioned the default hash table size increase.

commit | commitdiff | tree

Sultan Alsawaf [Thu, 29 Oct 2020 18:25:53 +0000 (11:25 -0700)]

task_finder2: change the default engine action to UTRACE_INTERRUPT

There is a race condition where, right after an engine is attached, a
reporting pass will occur before the engine can actually request what it
wants from the target process. In this case, the action that the engine
used when it was first attached will be carried out during the reporting
pass. When the default action is UTRACE_STOP, this means that the
reporting pass will think the newly-attached engine wants to stop the
target process, at which point the target process will be moved into the
TASK_TRACED state (visible via `ps aux | grep ' t '`) and will be
halted forever (until it receives a SIGKILL) because the engine will
never send a UTRACE_RESUME request to bring the target process back to
life. This seems to be an issue with the UTRACE_STOP machinery; it's not
clear how *any* process entering the UTRACE_STOP state can exit that
state naturally. It's also dubious whether the UTRACE_STOP state is even
needed, since tracing is done from within task workers that run inside
the context of the process we're trying to analyze, which allows us to
to safely analyze the process without needing to stop it.

Regardless, it's clear that a newly-attached engine would definitely not
want to stop the process it's trying to analyze; after all, there's
nothing interesting to see if the process is just halted. The common
engine action seems to be UTRACE_INTERRUPT, so let's set that to be the
default instead of UTRACE_STOP.

Signed-off-by: Yichun Zhang (agentzh) <yichun@openresty.com>

commit | commitdiff | tree

Yichun Zhang (agentzh) [Thu, 29 Oct 2020 18:06:42 +0000 (11:06 -0700)]

task_finder2: don't attach to forked children when the target PID is specified

When we have a PID specified for tracing and a fork occurs from our
target PID, the forked child will have the same exe as our target and
will subsequently get matched and attached to by
__stp_utrace_attach_match_filename(). Attaching to these children is not
productive though, since we are only interested in a specific process.

Therefore, as an optimization, only bother trying to attach to forked
children when the target PID is *not* specified. When the target PID is
specified (via -x PID) and match_tsk != path_tsk, we know that a fork
just occurred and match_tsk is the child of path_tsk, in which case
we should just skip attaching to match_tsk.

Signed-off-by: Yichun Zhang (agentzh) <yichun@openresty.com>

commit | commitdiff | tree

Sultan Alsawaf [Thu, 29 Oct 2020 08:40:49 +0000 (01:40 -0700)]

Bug: deadlocks might happen in the spinlocks when -DDEBUG_MEM is specified

Now we always save the irq state in our debug mem allocator's spinlocks.

One sample CPU soft lockup backtrace in the stap ko:

https://gist.github.com/agentzh/68d4ef9574f69595c5d19da3688b8981

Signed-off-by: Yichun Zhang (agentzh) <yichun@openresty.com>

commit | commitdiff | tree

Sultan Alsawaf [Thu, 29 Oct 2020 08:24:43 +0000 (01:24 -0700)]

task_finder: error out when we cannot attach to _stp_target

In order to avoid sleeping, stap_find_exe_file() does a trylock attempt
on an mm's mmap semaphore and returns NULL when the lock is contented.
When this happens, it can cause the task finder to not attach to a
desired target process. This is especially noticeable when a target PID
is specified, in which case the target PID itself can get skipped over
by the task finder.

Therefore, we should treat failures to get the exe file for a specific
target PID as fatal, since that means the target PID will never get
attached. Note that we must return a negative value from
stap_start_task_finder() in order for the fatal error to be honored, so
we shouldn't negate PTR_ERR(mmpath).

Signed-off-by: Yichun Zhang (agentzh) <yichun@openresty.com>

commit | commitdiff | tree

Frank Ch. Eigler [Wed, 28 Oct 2020 00:02:18 +0000 (20:02 -0400)]

testsuite: current.stp module("*") defang

Like for server_concurrency*, the current.stp test case has excessive
debuginfo requirements. We still want -some- decent workload, so
chose usb* as the module wildcard. Far smaller than the "*" there
formerly.

commit | commitdiff | tree

Sultan Alsawaf [Tue, 27 Oct 2020 22:00:49 +0000 (15:00 -0700)]

stp_utrace: replace task_utrace_lock with non-blocking RCU read locks

The global task_utrace_lock is highly contented and results in a lot of
CPU time wasted spinning on it, especially since it's not a r/w lock.

It turns out we can replace all of the task_utrace_lock usage with
non-blocking RCU read locks instead to improve performance. Now, reads
to any of the hash list buckets containing the utrace entries do not
block and can occur concurrently with other readers, and writes to any
hash list won't block readers thanks to the magic of RCU. The only
locking needed is between concurrent writes to a single hash list, and
a per-bucket spin lock is used to achieve this instead of a sprawling
global lock.

Signed-off-by: Yichun Zhang (agentzh) <yichun@openresty.com>

commit | commitdiff | tree

Stan Cox [Tue, 27 Oct 2020 21:21:44 +0000 (17:21 -0400)]

man stapprobes.3stap: Document tls context variable support

commit | commitdiff | tree

Stan Cox [Tue, 27 Oct 2020 15:23:06 +0000 (11:23 -0400)]

Merge branch 'scox/tls': Add tls support.

This merges support for accessing implicit tls variables.

Given a DW_OP_GNU_push_tls_address dwarf entry,
tls.stp::__push_tls_address handles navigating the tls data
structures. stp_tls.h contains minimal versions of a few essential
tls data structures.

commit | commitdiff | tree

Frank Ch. Eigler [Tue, 27 Oct 2020 14:38:11 +0000 (10:38 -0400)]

testsuite: reduce server_concurrency* debuginfo requirements

These tests were using super broad module-name wildcards,
which puts unnecessary stress on debuginfo provision.

commit | commitdiff | tree

Yichun Zhang (agentzh) [Tue, 27 Oct 2020 03:33:09 +0000 (20:33 -0700)]

NEWS: added an entry for the VMA map RCU lock changes

This is for commit 4b937c5e9.

commit | commitdiff | tree

Frank Ch. Eigler [Tue, 27 Oct 2020 02:01:24 +0000 (22:01 -0400)]

step-prep: check for debuginfod capability

Test /usr/bin/debuginfod-find for a vdso*so in the kernel. If
successful, avoid downloading big kernel debuginfo files now,
assuming that the debuginfo server(s) will remain available.

commit | commitdiff | tree

Stan Cox [Mon, 26 Oct 2020 19:53:54 +0000 (15:53 -0400)]

Support tls variables on s390

systemtap: system-wide probe/trace tool

This page took 0.083272 seconds and 5 git commands to generate.