William Cohen [Thu, 2 May 2019 14:41:59 +0000 (10:41 -0400)]
Force correct order of evaluation of macro arguments in check_*register macros
Noted that a number of tests were failing on x86 machines with errors
like the following:
ERROR: register access fault [man error::fault] near identifier 'module_name' at
/usr/share/systemtap/tapset/linux/context.stp:392:10
The problem was traced to the maxregno argument for the macro having a
?: operator which has lower precedence than || or >. This caused the
conditional tests in check_fetch_register and check_store_register for
error reporting to incorrectly trigger. Used ()'s in the conditionals
to force the correct order of evaluation.
William Cohen [Tue, 23 Apr 2019 19:08:08 +0000 (15:08 -0400)]
Adjust syscall_get_arguments to match kernel's implementation
The syscall_get_argument function arguments changed due to
Linux git commit 32d9258662. Remove the unused arguments
to match the expect arguments for syscall_get_arguments
when needed.
William Cohen [Wed, 10 Apr 2019 18:55:05 +0000 (14:55 -0400)]
Disable kprobe optimization again
On x86 processors running linux 5.0 kernel the uprobes_onthefly.exp
test would trigger a RCU hang (PR24416). Disable the kprobes
optimization until these problems reported in RHBZ1697531 get fixed in
the kernel.
David Ward [Mon, 11 Feb 2019 17:25:38 +0000 (12:25 -0500)]
overload.py: Fix python version 2/3 compatibility
The modified XML tree is outputted either as a bytearray with UTF-8
encoding in python version 3, or as a string in python version 2.
Handle this by writing the bytearray directly to sys.stdout.buffer,
or the string directly to sys.stdout, respectively.
Remove what appears to be "troubleshooting code" that was added in
commit 616ec7a0b, which dumps a large amount of unnecessary output
to stderr.
Call this script using the configured program name for python.
David Ward [Mon, 11 Feb 2019 17:25:37 +0000 (12:25 -0500)]
configure: Fix handling of python versions 2 and 3
When python version 2 is not found, AM_PROG_PYTHON sets the output
variable PYTHON to ":" (which is intentional; see "man 1P colon").
Fix incorrect tests that compared PYTHON to an empty string.
Use the same behavior for python version 3: when it is not found,
set the output variable PYTHON3 to ":" and test that accordingly.
Pass the variables "python3" and "py3execdir" to the subconfigure
unconditionally, just like the variables "python" and "pyexecdir".
When a program named "python" exists, fix a conditional that tests
if it is python version 3.
Do not guess the name of the python-config script. Simply append
"-config" to the program name for the python interpreter.
wcohen discovered that the (guru-mode) @*register operator doesn't
sufficiently check the context it is run in, possibly derefencing
null context->*regs pointers, or going out-of-bounds with register
numbering. This code adds checking via a generic runtime/**/loc2c*
check_register_{fetch,store} macro. It is used as a wrapper for
all architectures for both kernel and user register
fetch/store ops.
William Cohen [Mon, 1 Apr 2019 15:25:06 +0000 (11:25 -0400)]
Add needed arch_syscall0_prefix define for arm64
On the x86_64 a functions that implement a syscall with no arguments
is used for both the 32-bit and 64-bit versions of the system call and
there are aliases for the same function. To avoid having handlers run
twice the arch_syscall0_prefix only instruments the 64-bit versions.
However, this prefix was not being set for arm64 and on the arm64 the
syscalls with no arguments would fall back to the tracepoint versions.
Added the arch_syscall0_prefix define to have the syscall tapsets use
the non-dwarf function probes for those syscalls with no arguments on
arm64.
In order to ensure a more welcoming environment for vegans and cows,
all instances of 'deadbeef' in the stapbpf interpreter's memory space
have been replaced by an exhortation to 'ea7bee75' ('eat beets').
William Cohen [Sun, 31 Mar 2019 19:47:36 +0000 (15:47 -0400)]
Update _stp_sockopt_optname_list[] to match current current socket.h defines
There have been a number of updates and additions to the Linux
kernel's include/uapi/asm-generic/socket.h defines since the code in
aux_syscalls.stp for _stp_sockopt_optname_list[] was initially
created. Defines such as SO_RCVTIMEO, SO_SNDTIMEO, and SO_TIMESTAMP
maybe be replaced by SO_RCVTIMEO_NEW, SO_SNDTIMEO_NEW, and
SO_TIMESTAMP_NEW. Before this patch systemtap scripts would fail to
build with very new 5.1.0-rc kernels due to the missing defines.
Frank Ch. Eigler [Tue, 26 Mar 2019 20:31:24 +0000 (16:31 -0400)]
PR24239 redux: testsuite / dump fallout on incremental resolution
Earlier PR24239 work made global / function resolution incremental
(transitive, starting from references in end-user scripts) rather than
tapset-wide (selecting entire tapsets en masse). This also affected
--dump-functions mode (which should be unselective), and
global-printing mode (the ordering of the output variables changed).
Updated the test suite to tolerate some different orderings, and
updated the translator to fix dumping & pragma/c variable arity.
Serhei Makarov [Tue, 26 Mar 2019 17:05:26 +0000 (13:05 -0400)]
PR23875 :: support string map keys in foreach iteration
* bpf-translate.cxx (bpf_unparser::visit_foreach_loop): Add code to handle
and correctly allocate space for string map keys.
* bpf-interp.cxx (as_ptr): New overload yielding void * from char *,
to return cached string map keys into bpf registers.
(typedef map_int_keys): Renamed from map_keys, only handles caching integer keys.
(typedef map_str_keys): New typedef, handled caching string keys.
(struct map_keys): New struct (XXX pseudo-union) for either int or str key iteration.
(map_get_next_key): Rewrite to support both int and str key iteration,
take bpf_transport_context.
(bpf_interpret): Pass bpf_transport_context to map_get_next_key.
Serhei Makarov [Fri, 22 Mar 2019 19:28:48 +0000 (15:28 -0400)]
stapbpf PR24329,PR23816 :: Properly allocate space for map value lookup.
* stapbpf/bpfinterp.cxx (bpf_interpret): new vector map_values for map value storage,
get rid of lookup_tmp, replace return with branch to cleanup code for map_values,
properly allocate a correctly-sized buffer for each bpf_lookup_elem() operation,
cleanup map_values() on exit.
* stapbpf/bpfinterp.h (bpf_transport_context::map_attrs): new field used to pass
map size information to bpf_interpret.
(bpf_transport_context::bpf_transport_context): take map_attrs argument.
* stapbpf/stapbpf.cxx (init_perf_transport): pass map_attrs to bpf_transport_context.
(main): pass map_attrs to bpf_transport_context.
William Cohen [Fri, 22 Mar 2019 17:51:37 +0000 (13:51 -0400)]
Fix the speculate.stp test
The speculate.stp test used target variables for syscall.*.return
probes. The changes to to the syscall tapsets to use non-dwarf
probes in most cases broke this example. Added appropriate probes
on syscall syscall entries to record the needed information.
mcermak noticed that recent PR24239 work broke resolution for script
globals referenced only from embedded-c functions, because at
symresolution-time, we didn't look inside those. We now do, just
once, and store the resolved vardecl*'s in new embeddedcode fields.
This also means that we never have to search those strings for later
pragma:read:* and pragma:write:* again, just use the vectors produced
earlier.
NB: same transformation is still needed for the embedded_expr case.
Martin Cermak [Wed, 20 Mar 2019 13:17:48 +0000 (14:17 +0100)]
Make testcase bz1027459.exp less sensitive to dwarf quality.
This update tries to avoid following type of problems:
=======
spawn stap /root/.mcermak/systemtap/testsuite/systemtap.base/bz1027459.stp
semantic error: not accessible at this address (pc: 0xc000000000a14f28) [man error::dwarf]: identifier '$call' at /usr/local/share/systemtap/tapset/linux/sysc_accept.stp:68:14
dieoffset: 0x57216fd from /usr/lib/debug/lib/modules/4.14.0-115.el7a.ppc64le/vmlinux
function: SyS_socketcall at net/socket.c:2438
alternative locations: [0xc000000000a14f30,0xc000000000a14f7c], [0xc000000000a14f7c,0xc000000000a153e8]
source: if (__int32($call) != @const("SYS_ACCEPT")) next;
=======
This one was happening e.g. with kernel-4.14.0-115.el7a.ppc64le.
Commit 126c384c causes bz1027459.stp to warn about cross-file global variable
reference to identifier 'syscall_string_trunc' at linux/syscalls_cfg_trunc.stp.
After this update, the testcase ignores warnings like this.
Martin Cermak [Wed, 20 Mar 2019 08:50:47 +0000 (09:50 +0100)]
Fix trivial ppc64* compilation issue.
Before this update, GCC on powerpc complained about stapbpf.cxx:1504:35: error:
format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type
‘__u64 {aka long unsigned int}’.
William Cohen [Tue, 19 Mar 2019 20:04:43 +0000 (16:04 -0400)]
Optimize nettop.stp example
Restructured the script to eliminate two foreach loops in
print_activty() to sum transmit and receive entries. Instead, just
tallied those directly into global statistical aggregate ifmerge
entries, then just have a single foreach loop print_activity() to
print the entries. Eliminated the unneeded trinary operators, since
the @sum() of nonexistent entry is 0.
Also adjusted the formatting so long device names do not skew the
columns.
William Cohen [Tue, 19 Mar 2019 18:48:02 +0000 (14:48 -0400)]
Optimize sig_by_pid.stp, sig_by_proc.stp, and syscalls_by_proc.stp
The probe points being used in these examples could be hit quite
frequently. Using the statistical aggregates can reduce the amount of
locking required in the associated probe handlers and allow the probe
handlers to complete more quickly, reducing the amount of overhead
that the instrumentation introduces.
William Cohen [Tue, 19 Mar 2019 18:18:31 +0000 (14:18 -0400)]
Use statistical aggregates in iodevstats.stp, iostats.stp, and iotop.stp
The vfs.read.return and vfs.write.return probe points being used in
these examples could be hit quite frequently. Using the statistical
aggregates can reduce the amount of locking required in the associated
probe handlers and allow the probe handlers to complete more quickly,
reducing the amount of overhead that the instrumentation introduces.
This merges a simple transport layer design which uses
perf_events to send each printf as a series of messages,
freeing stapbpf from the deliberately-oppressive constraints
of the trace_printk() mechanism.
e.g. printf("%8d %8d %6s\n", x, y, foo) is sent as
- STP_PRINTF_START(3)
- STP_PRINTF_FORMAT(#1) -> index into a table of format strings
- STP_PRINTF_ARG_LONG(x)
- STP_PRINTF_ARG_LONG(y)
- STP_PRINTF_ARG_STR(foo) -> requires an strcpy() to compose the message
- STP_PRINTF_END()
Which involves six calls to BPF_perf_event_output from
the kernel-side BPF program. Each CPU has its own perf_events
buffer to avoid entangling parallel printfs().
For now, use separate messages to avoid putting pressure on the
very limited stack and avoid complicated packing code. Further
down the line, we could combine small (LONG) arguments into
a several-argument message, or merge STP_PRINTF_START and STP_PRINTF_FORMAT.
William Cohen [Tue, 19 Mar 2019 00:23:19 +0000 (20:23 -0400)]
Use statistical aggregates to reduce overhead and contention for global array
Statistical aggregates record and update information on a per cpu
basis. This avoids the lock needed by the ++ operation previously
used. For monitoring of high frequency events such as system calls
across the entire machine this can improve performance.
Serhei Makarov [Fri, 15 Mar 2019 20:00:16 +0000 (16:00 -0400)]
stapbpf PR22330 fix :: support for non-contiguous active cpus
This is rather rudimentary (e.g. would be good to listen for cpus
coming online / going offline during the script execution). But
there is a minimum fix needed to ensure stapbpf doesn't break completely
if the CPU arrangement is unusual.
* stapbpf/stapbpf.cxx (cpu_online): new vector tracking active cpus.
(CPUFS, CPUS_ONLINE, CPUS_POSSIBLE): new defines pointing to /sys/devices/system/cpu/.
(mark_active_cpus): new function to parse /sys/devices/system/cpu/online.
Parsing a diagnostic file is not my favourite approach
but I haven't found a better API for this yet.
(count_active_cpus): new function to return number of active (marked) cpus.
(instantiate_maps): invoke mark_active_cpus() when setting up perf_events map.
(init_perf_transport): skip perf transport setup for inactive cpus.
(perf_event_loop): only set up poll_fds for active cpus.
Jafeer Uddin [Wed, 13 Mar 2019 20:12:45 +0000 (16:12 -0400)]
PR24327: Remove printing of unused synthetic globals
The handling of DW_OP_GNU_entry_value introduces synthetic
global variables that after optimizations may end up not
being used. The default behaviour is to print out the values
of unsued globals in user scripts. This printing is intended
for user defined globals and shouldn't occur with the generated
globals.
William Cohen [Wed, 13 Mar 2019 19:24:32 +0000 (15:24 -0400)]
Make varwatch.stp work regardless of kprocess.release and a bit more flexible
With newer kernels the kprocess.release is inlined in some places and
the desired argument $p is unavailable. Make the script just over
writing over older entries in the var associative array.
The script would only work for variables that resulted in strings.
Someone may want to use this script to monitor a single variable that
is an integer. Having the probe never force the associative array
store strings will cause a problem for monitoring a single integer
variable. Removed the probe never as the normal use cases of this
script will have something set the type of the associative array. If
it doesn't, there aren't any value changes to monitor.
Serhei Makarov [Tue, 12 Mar 2019 17:45:45 +0000 (13:45 -0400)]
stapbpf PR22330 fixes :: identify format types of pe_unknown arguments
* bpf-translate.cxx (bpf_unparser::emit_print_format): take tok for diagnostics,
extract non-literal printf components and pass them to printf_arg_type;
signal an error if the number of printf components doesnt' match the number of args.
(printf_arg_type): take format_component argument and use it to infer the
format type in the case of pe_unknown arguments.
(bpf_unparser::emit_transport_msg): pick format_type based on perf_event_type,
which was inferred from the format string in the case of pe_unknown arguments.
(bpf_unparser::visit_embeddedcode): pass tok to emit_print_format.
(bpf_unparser::visit_print_format): remove some old TODOs, pass tok to emit_print_format.
Serhei Makarov [Mon, 11 Mar 2019 19:29:32 +0000 (15:29 -0400)]
stapbpf emit_string_copy() fix :: handle the case of src == NULL
Sometimes a string variable can end up containing NULL (e.g. on map lookup).
This upsets the verifier terribly if the string is passed to emit_string_copy().
NULL case was previously unhandled as trace_printk() did not require a
string copy and did not care about being given a NULL argument
(although it did have an annoying behaviour of silently dropping
the entire printf).
Replace this with a more robust assumption that NULL ~~ "".
* bpf-translate.cxx (emit_simple_literal_str): take const string as input.
(emit_string_copy): rename join_block to return_block,
copy "" string if src == NULL (using zero-pad code if zero_pad is true
and using a call to emit_simple_literal_str otherwise).
* bpf-internal.h (emit_simple_literal_str): take const string as input.
Serhei Makarov [Mon, 11 Mar 2019 17:51:33 +0000 (13:51 -0400)]
stapbpf PR22330 WIP :: transport layer bugfixes and cleanup, round 1 of n
There are still testsuite regressions that need to be fixed after this.
* bpf-translate.cxx (bpf_unparser::emit_transport_msg): Fix for non-literal string args,
fix to code for ensuring double-word alignment of transport message.
* stapbpf/bpfinterp.cxx (bpf_sprintf): TODO Need to support more sprintf arguments.
(bpf_handle_transport_msg): remove debug print, make sure to flush after printing.
* stapbpf.cxx (perf_event_loop): handle EINTR as an ordinary occurrence (from ^C),
common code between exit from EINTR and exit from STP_EXIT message.
(main): Reorder ioctl to enable kprobes, now after launching perf_events listener.
* bpfinterp.cxx (bpf_handle_transport_msg): Fix to handle LONG arguments properly!!!
Serhei Makarov [Mon, 11 Mar 2019 17:45:26 +0000 (13:45 -0400)]
stapbpf PR22330 WIP :: assembler tweaks to support _send_exit_msg()
Fixed _send_exit_msg() with a couple of assembly aids to make the code less error-prone.
* bpf_translate.cxx (asm_stmt::align_alloc): New field to force alignment.
(bpf_unparser::parse_imm): Support BPF_F_CURRENT_CPU as a constant.
(bpf_unparser::parse_asm_stmt): Optional 3rd (align|noalign) argument for alloc.
(bpf_unparser::emit_asm_arg): Support BPF_F_CURRENT_CPU as a constant.
(bpf_unparser::visit_embeddedcode): Align to double-word if align_alloc == true.
Serhei Makarov [Fri, 8 Mar 2019 22:53:48 +0000 (17:53 -0500)]
stapbpf PR22330 WIP :: change exit.stp to send perf_events message
* tapset/bpf/exit.stp (_set_exit_status): comment fix.
(_send_exit_msg): new embedded-code function sending STP_EXIT.
* tapset/logging.stp (exit): switch to using _send_exit_msg.
Serhei Makarov [Fri, 8 Mar 2019 21:48:13 +0000 (16:48 -0500)]
stapbpf PR22330 WIP :: add interned strings section to the ELF file, invoke fprintf
* bpf-internal.h (globals::intern_string): move from standalone intern_str() in bpf-translate.cxx.
(globals::interned_strings): rename from interned_str_map, change to vector.
(globals::interned_str_map): rename from interned_strings.
* bpf-translate.cxx (globals::intern_string): move from standalone intern_str() in bpf-translate.cxx,
adapt to changing fields in globals.
(bpf_unparser::emit_transport_msg): remove USE_INTERNED_STR define (we will always do this),
use BPF_TRANSPORT_ARG instead of magic number,
adapt to moved globals::intern_string.
(output_stapbpf_script_name): populate missing data->d_type.
(output_interned_strings): new function generating pseudo-STRTAB-ish section with format strings.
(bpf_unparser::add_prologue): call output_interned_strings() after codegen.
* stapbpf/stapbpf.cxx (interned_strings): new vector -- global table of interned strings.
(init_perf_transport): pass interned_strings at transport_context creation.
(load_bpf_file): locate and parse stapbpf_interned_strings pseudo-STRTAB-ish section,
populate global interned_strings.
(main): pass interned_strings at transport_context creation.
* stapbpf/bpfinterp.h (bpf_transport_context::interned_strings): new field referencing global strings table.
(bpf_transport_context::printf_format): delete unneeded field.
(bpf_transport_context::bpf_transport_context): adapt to changing fields.
* stapbpf/bpfinterp.cxx (bpf_handle_transport_msg): check for ctx->format_no out of bounds,
retrieve format_str from ctx->interned_strings,
generate an ugly hardcoded call to fprintf with 32 arguments.
* stapbpf/bpfinterp.h (bpf_transport_context): new data structure
to hold in-progress transport calls and references to needed global state.
(bpf_handle_transport_msg): new function.
(bpf_interpret): change to take a bpf_transport_context.
* stapbpf/bpfinterp.cxx (bpf_handle_transport_msg): WIP new function;
decodes and handles transport message.
(bpf_interpret): change to take a bpf_transport_context;
add a handler for BPF_FUNC_perf_event_output.
* stapbpf/stapbpf.cxx (transport_contexts): new array;
holds a transport_context for each perf_events fd to disentangle printf() from different threads.
(init_internal_globals): fix errno usage.
(init_perf_transport): new function (separated out from init_internal_globals);
in addition to creating perf_event fds, create a transport_context for each one.
(struct perf_event_sample): new structure; apparently this is the format we get perf_events in.
(perf_event_handle): fill in implementation inspired by kernel tools/testing/selftests/bpf/trace_helpers.c
(perf_event_loop): change error handling,
add STP_EXIT handling,
avoid issuing multiple warnings.
(main): add init_perf_transport() call,
use modified bpf_interpret() interface,
clean up transport_contexts.
* bpf-internal.h (BPF_TRANSPORT_VAL): new define for message format.
(BPF_TRANSPORT_ARG): new define for message format.
(globals::STP_PRINTF_ARG_LONG): specify arg type; replaces STP_PRINTF_ARG.
(globals::STP_PRINTF_ARG_STR): specify arg type; replaces STP_PRINT_ARG.
* bpf-translate.cxx (bpf_unparser::emit_transport_msg): use BPF_TRANSPORT_* defines.
(printf_arg_type): new function; choose correct STP_PRINTF_ARG_*.
(bpf_unparser::emit_print_format): use printf_arg_type.
Jafeer Uddin [Tue, 5 Mar 2019 14:48:28 +0000 (09:48 -0500)]
PR16596, PR24224: rework handling of DW_OP_GNU_entry_value
The DW_OP_GNU_entry_value handling introduced in commit 68bd23fd0cc
wasn't complete. The value being passed in for addr in dwarf_derived_probe
was not correct for all cases and gave 'inconsistent-relocation-address'
errors. Also the implementation was only able to place entry probes at the
beginning of the function being probed and doesn't handle the case where
you may need to add an entry probe for a different function.
William Cohen [Thu, 7 Mar 2019 14:37:53 +0000 (09:37 -0500)]
Do the search save_stack_trace_regs() in the stap module's initialization
Previously, the search for save_stack_trace_regs() was done the first
time the _stp_stack_print_fallback was called. However, want to avoid
calling kallsyms_lookup_name() in some arbitrary probe handler as it
may take a signficant amount of time to search for
save_stack_trace_regs(). Moving this search to where other
initialization operations are done and there are fewer contraints on
runninging it.
William Cohen [Wed, 6 Mar 2019 21:39:12 +0000 (16:39 -0500)]
Use kallsyms_lookup_name to find save_stack_trace_regs() for fallback unwind
This simplifies the fallback unwinding code to use
save_stack_trace_reg() if it is available. The save_stack_trace_reg()
is available on more architectures. This eliminates the code using the
x86 specific unwind_start(), unwind_done(), and
unwind_get_return_address() functions.
Serhei Makarov [Tue, 5 Mar 2019 20:16:41 +0000 (15:16 -0500)]
stapbpf PR22330 WIP :: deliver perf_events on correct cpu, poll with timeout
* stapbpf/stapbpf.cxx (instantiate_maps): TODO issue with CPU numbering.
(init_internal_globals): zero pe_attr every time,
notify on the very first event,
fix typo in bpf_map_update call,
add diagnostic output for log_level>2.
(perf_event_loop): poll at an interval (1000),
add diagnostic output for log_level>2.
William Cohen [Mon, 4 Mar 2019 16:14:31 +0000 (11:14 -0500)]
If available, use kernel's save_stack_trace_regs() for fallback stack unwind
When the dwarf information is unavailable SystemTap resorts to
alterntive mechanisms to unwinds the stack. Only the x86 kernels have
unwind_start(), unwind_done(), and unwind_get_return_address(). A
number of architectures including the s390, powerpc, and openrisc
currently export save_stack_trace_reg() and expect that this will be
available on additional architectures such as arm64 in the future.
Adding a test to check for the export of the function and use it where
possible.
William Cohen [Wed, 27 Feb 2019 17:22:34 +0000 (12:22 -0500)]
Handle name change of do_brk to do_brk_flags in the newer kernels
Kernel commit 16e72e9b309 changed do_brk to do_brk_flags. Need to
make the memory tapset allow the use of that alternative function name
in newer Linux kernels.
Serhei Makarov [Mon, 25 Feb 2019 23:24:12 +0000 (18:24 -0500)]
stapbpf PR22330 incomplete WIP :: rudiments of perf_events transport layer
* bpf-internal.h (BPF_MAXPRINTFARGS): New define.
(value::format_type): New field.
(value::value): Set default format_type in constructor.
(globals::perf_event_map_idx): New variable denoting perf_events map.
(globals::NUM_CPUS_PLACEHOLDER): New constant replaced by # CPUs at load time.
(globals::perf_event_type): Initial set of transport msgs.
(globals::interned_str_map): WIP String constants to embed in the ELF file.
(globals::interned_strings): Tracks already interned strings.
* bpf-translate.cxx (print_format_add_tag): Delete both variants.
(bpf_unparser::this_in_arg0): Bugfix -- default to NULL.
(bpf_unparser::emit_transport_msg): WIP New function.
(bpf_unparser::visit_embeddedcode): Allow more printf args, don't add tag.
(intern_string): New function to create an interned string.
(bpf_unparser::emit_print_format): Use helper call for sprintf and perf_event
messages for regular printf.
(bpf_unparser::visit_print_format): Allow more printf args, don't add tag.
(build_internal_globals): Create a perf_events map.
(bpf_unparser::add_prologue): Use globals::internal_map_idx for consistency.
* stapbpf.cxx (perf_fds): New variable (one perf fd per CPU).
(perf_headers): New variable (one ring buffer per CPU).
(perf_event_page_size): New variable.
(perf_event_page_count): New variable.
(perf_event_mmap_size): New variable.
(instantiate_maps): Replace NUM_CPUS_PLACEHOLDER for perf_event maps.
(init_internal_globals): Initialize perf_fds and perf_headers.
(perf_event_handle): WIP Will handle transport messages.
(perf_event_loop): WIP Listens for transport messages.
(print_trace_output): WIP Will be deleted.
(main): Replace print_trace_output with perf_event_loop.
* libbpf.h (bpf_perf_event_ret): New enum from libbpf.
(bpf_perf_event_print_t): New typedef from libbpf.
(bpf_perf_event_read_simple): New function from libbpf.
* libbpf.c (ring_buffer_read_head): New function imitating ring_buffer.h.
(ring_buffer_write_tail): New function imitating ring_buffer.h.
(bpf_perf_event_read_simple): New function from libbpf.
William Cohen [Thu, 21 Feb 2019 17:21:13 +0000 (12:21 -0500)]
Add more __NR_* for missing defines on aarch64 (and ppc64)
There are a number of syscall defines (__NR_*) on the x86_64 that are
not on aarch64 (or ppc64). They need to have something defined for
those undefine constants, so the kernel module code for the fallback
tp_syscall.* and tp_syscall.*.return probes compile and do not throw
"'__NR_blah' undeclared" errors.
Most of the added __NR_* defines are for aarch64, but the ppc64 also
needs the __NR_compat_bdflush.
Frank Ch. Eigler [Wed, 20 Feb 2019 22:51:43 +0000 (17:51 -0500)]
PR24239: avoid symbol/type resolution of unused globals/functions
From the earliest days of stap tapset support, the logic in the
symbol-resolution pass has been to select the entire contents of a
tapset file for processing, if any single part was referenced from the
end-user script. This meant that if just one string-processing
function, one time-fetcher, or one syscall flag decoder was needed,
stap still spent time symbol-processing all their neighbours ... only
to throw them away in semantic_pass_opt1.
New code avoids doing that, by moving individual globals & functions
to the session.globals/.functions list during symbol resolution's
find_var() / find_functions() calls. Some conflict resolution must
similarly be pulled to this point, but that's minor.
There should be no semantic change visible to scripts. Diagnostics
at verbosity 3 now trace resolution a little better.
William Cohen [Wed, 20 Feb 2019 19:42:21 +0000 (14:42 -0500)]
Correct the at_register.exp test
The fix for PR23359 in commit c664daa requires guru mode for
@kregister use. Turning guru mode on so the at_register.stp test will
build and run. Also corrected the register name to get the correctly
sized values for i386 (32-bit) and x86_64 (64-bit).
William Cohen [Tue, 19 Feb 2019 21:15:37 +0000 (16:15 -0500)]
Group initialization using the same string literal together in syscall_num.stp
On arm64 and powerpc instruction sequences are required to create a
pointer to an arbitrary memory address. The compiler attempts to
minimize the number of times a particular address value is created and
will store that value in the stack frame rather than regenerating it.
There are a lot of pointer to literal strings being used in the
syscall_num.stp initialization code. This results in large stack
frames being created. The kerrnel module will then fail to compile due
to an error like the following:
[root@apm-mustang-b0-03 general]# stap -kp4 stopwatches.stp
/tmp/stapCwv6n4/stap_17506_src.c: In function ‘probe_6233’:
/tmp/stapCwv6n4/stap_17506_src.c:19796:1: error: the frame size of 2480 bytes is larger than 512 bytes [-Werror=frame-larger-than=]
}
^
cc1: all warnings being treated as errors
make[1]: *** [scripts/Makefile.build:291: /tmp/stapCwv6n4/stap_17506_src.o] Error 1
make: *** [Makefile:1566: _module_/tmp/stapCwv6n4] Error 2
WARNING: kbuild exited with status: 2
The dump-syscalls.sh script now groups all the initialization using
the same string literal together and localize its use. This reduces
the size of the stack frame on arm64 and powerpc.
William Cohen [Mon, 18 Feb 2019 14:58:37 +0000 (09:58 -0500)]
Regenerate the syscall mapping information to add aarch32 to aarch64 syscalls
The kernel CONFIG_COMPAT flag enables aarch32 syscalls on aarch64.
Need to generate those mappings as they are needed for some kernels.
As a side effect of the regeneration all the syscall mappings will get
the recently added kernel syscalls.
William Cohen [Mon, 18 Feb 2019 14:52:49 +0000 (09:52 -0500)]
Update dump-syscalls.sh to generate 32-bit syscalls for aarch64
The aarch64 kernel can support aarch32 syscalls when CONFIG_COMPAT is set
in the kernel configuration (https://cateee.net/lkddb/web-lkddb/COMPAT.html).
This is set on the the aarch64 Fedora 29 kernels and causes builds to
fail with:
OUT semantic error: unresolved arity-1 global array __syscall_32_num2name, missing global declaration?: identifier '__syscall_32_num2name' at /usr/share/systemtap/tapset/linux/syscall_table.stp:8:16
source: return __syscall_32_num2name[num]
^
Pass 2: analysis failed. [man error::pass2]
Adding generation of the 32-bit syscall information for aarch64 to
address this.
William Cohen [Mon, 18 Feb 2019 04:05:12 +0000 (23:05 -0500)]
Match arm64 non-dwarf syscall probe points
The x86_64 linux 4.17 prefixes the syscall function names. The
systemtap tapsets use the arch_syscall_prefix macro to get the proper
prefix. The arm64 kernel also has prefixes for the syscall function
names, but the arch_syscall_prefix needs to be __arm64_ not __x86_64_
or __ia32_. Adjusted syscalls.stpm to pick the appropriate prefix for
the architecture. This patch makes it easier to do the same for any
other architectures that prefix the syscall names.
Frank Ch. Eigler [Fri, 15 Feb 2019 19:44:47 +0000 (14:44 -0500)]
PR24199: don't use exceptions to signal type-resolution failures
Most callbacks in typeresolution_info::visit_FOO() do the right thing
when there is a type check/inference error: call down to mismatch() or
such, incrementing an error count and printing a message. However, a
group of $context-var-related callbacks have copied a pattern of
throwing semantic-errors. The impact of that is to stop the error
reporting process at the -first- type error. This could hide
useful messages if for some reason they would have come temporally
behind junky ones (miscellaneous tapset function problems).
Frank Ch. Eigler [Thu, 14 Feb 2019 17:49:12 +0000 (12:49 -0500)]
PR24199: at pass-2 verbosity > 3, trace $var error-chaining events
The $context variable mappings may incur errors that are routinely
quietly absorbed, as within @defined() conditionals or "?" probe
points. At verbosity > 3, there is now a trace message printed
into the firehose. For example:
chaining to identifier '$count' at [...] tapset/linux/vfs.stp:982:18
semantic error: conditional branches not supported in DWARF expression [8] at 33 (40: 1, 0)
thrown from: ../systemtap2/loc2stap.cxx:433
William Cohen [Wed, 13 Feb 2019 15:20:46 +0000 (10:20 -0500)]
Adjust noptrace.stp to avoid modifying ptrace syscall arguments
SystemTap has mutiple mechanisms to instrument the syscalls. For newer
kernels the non-dwarf versions are more likely to be used than the
dwarf-based ones. Target variables are not available in the non-dwarf
version, making it impossible to change the value of a target variable
$request in syscall.ptrace. An alternative way of forcing the ptrace
calls to fail is now used in noptrace.stp. The capability checks are
instrumented. When the ptrace syscall calls the capability check, the
capability checks are forced to return -EPERM, causing the ptrace
syscalls to fail.
William Cohen [Tue, 12 Feb 2019 21:36:10 +0000 (16:36 -0500)]
Update pfiles.stp to work with Linux 4.17 and newer
The ABI for getname was changed in Linux 4.17 by commit 9b2c45d479d to
eliminate passing in a pointer to the store length information. For
the newer version of the getname function negative return values are
errors and non-negative return value are the length. The changes to
the pfiles.stp script picks the appropriate ABI based on the kernel
version.
Scripts that probe syscall.*.return involve @entry() computations due
to recent syscall machinery changes. These @entry() features in turn
expand to script-level global variables, by the hundreds (one or two
per probe). Then, the probe-condition processing logic that matches
up all globals read in conditions to all globals written in probes is
forced to make O(n**2) searches.
Some improvements:
- previous string-substring-search memoization code
- globals created for @entry() are marked with vardecl->synthetic
- vardecl->unmangled_name set & asserted more frequently
- those globals are excluded from probe-condition processing, since
they can't have been referred to by a user-given condition expression
- condition processing as a whole is shortcut in the typical case of
there being no probe conditions at all
Altogether, these essentially remove the condition processing pass
from the profile of this script, and take -p2 processing time from the
original 70+ seconds to 13.
Martin Cermak [Tue, 12 Feb 2019 15:34:14 +0000 (16:34 +0100)]
Conditionally define __NR_bdflush in systemtap runtime.
The kernel-headers-4.17.0-0.rc6.1.el8+7.aarch64.rpm dropped
__NR_bdflush define, so let's define it conditionally in
compat_unistd.h so that stap scripts can -p4 without
complaints on those kernels.
It turns out that scripts like "--example errsnoop.stp" involves tens
of millions (!) of substring searches, as combinatorial & nesting
factors cause many embedded-c function/statement bodies to be searched
for "/* pragma */" type tags.
We probably do this too much - see the number (166417+70699) of times
varuse_collecting_visitor::visit_embedded* end up being called in
optimization/relaxation loops. But even without delving into that, we
can improve the constant factor: the actual string searching speed.
This patch adds a little memoization widget to interned_string
substring searching. It takes pass-2 runtime from 71s to 22s on my
workstation and costs practically no extra memory. (Storing copies of
interned_strings is cheap.)
David Ward [Wed, 30 Jan 2019 05:37:56 +0000 (00:37 -0500)]
Handle installation without stapusr group
Do not cause "make install" to return an error if the stapusr group
cannot be found or created (even as root); continue without setting
the ownership or mode of the installed executables. This may happen
when building distribution packages using fakeroot (it was observed
on Arch Linux). This step is often performed directly in the build
files of the distribution package instead (such as systemtap.spec).