William Cohen [Fri, 10 Aug 2018 20:21:01 +0000 (16:21 -0400)]
Allow syscallerrorsbypid.stp to track syscall 0
The test in the the sys_exit tracepoint handler would cause errors for
syscalls numbered 0 to be ignored. On i386 and x86_64 machines
syscall 0 is the read syscall, which we would really like to have
error information about. Adjusted the test to properly handle
syscalls numbered 0.
William Cohen [Thu, 9 Aug 2018 19:41:32 +0000 (15:41 -0400)]
Use the flexible Prometheus formatting to lower cost of recording data
We want to keep the recording code as simple as possible to reduce the
overhead. Recording the syscall number avoids making a function call,
generating a string for each syscall, and has simpler associative
array indexing.
The one down side of this approach is on 64-bit machines where there
are differences between 32-bit and 64-bit syscall numbering and names
32-bit code is going to get the wrong 64-bit syscall names for the
32-bit syscalls.
William Cohen [Thu, 9 Aug 2018 19:26:27 +0000 (15:26 -0400)]
Allow more flexible Prometheus output formatting
There are cases where would like adjust the output of the data being
generated in Prometheus format. For example, storing syscall numbers
to minimize the overhead of recording the information and then map
syscall numbers to more symbolic names when generating the Prometheus
formated data.
The existing prometheus_dump_array* macros work as before and there is
now a matching set of prometheus_dump_array_map* macros that have
additional arguments to pass in mapping functions. For unmodified
fields the sprint function is used. Below is a use where the "count"
and "pid" fields are printed out as the default
@prometheus_dump_array2 would print them and the "syscall" field is
translated from a number to the syscall name by the syscall_name
function.
Serhei Makarov [Tue, 7 Aug 2018 21:33:29 +0000 (17:33 -0400)]
BZ1610289: drop rpm dependency on 'initscripts', standalone systemtap-service
Instead of an initscript, prefer a systemd unit file where systemd is available.
The old initscript is retained as a new utility command 'systemtap-service'
since it includes functionality that can't be controlled by systemd's interface.
* systemtap.service: New unit file for systemd.
* systemtap.spec: Remove dependency on 'initscripts' unless systemd is absent;
install old init script to %{_sbindir} as 'systemtap-service'; include new
unit file for systemd.
* man/systemtap.8.in: document the change.
* NEWS: document the change.
Useful for the fastest compilation speed during development (for example,
compiling elaborate.cxx with -O0 is 42.6% faster than -O1 on my mid-2015
MBP).
syscalls tapset: use (void*)(uintptr_t) cast sequence for ->sregs
We need to be able to take 64-bit ints and plop them even into measly
32-bit pointers, without the compiler having a cow. So cast through
(uintptr_t), like elsewhere.
testsuite: support installcheck-parallel in build=src trees
The installcheck* series of Makefile rules both prereq and
may nuke the site.exp file. For some reason, this hits
build=src tree configurations immediately, and is probably
a race in others. We now explicitly remake that file after
the nested "$(MAKE) clean".
PR23488: support CONFIG_DEBUG_INFO_REDUCED kernels for typequery/tracequery .ko's
This kconfig parameter kills use of @cast() and probably some
kernel.trace() usage, so we override it in those Makefiles.
(PS. Real friends stop friends from reducing debuginfo.)
diagnostics: handle -vvvv better for staptrees mid-elision
Several of the dead-statement type elision passes temporarily
substitute 0 pointers for actual staptree nodes. If coupled with
-vvvv pretty-printing, these 0's had a way of triggering segvs.
Now more of these pretty-printers explicitly test for 0.
Stan Cox [Thu, 2 Aug 2018 02:43:34 +0000 (22:43 -0400)]
Add https handling to http client.
The existing nss server certificate database and access routines are used
except without the assistance of avahi. Server is specified
via --use-http-server=https://HOST:PORT
* configure.ac (openssl): Add openssl_LIBS
* configure: Regenerate
* config.in: Regenerate
* Makefile.am (*_LDADD): Add openssl_LIBS
* Makefile.in: Regenerate
* client-http.cxx (http_client::download_pem_cert)
(http_client::add_server_cert_to_client)
(http_client::check_trust): New
(http_client_backend::find_and_connect_to_server): Call new
methods to do https handling.
(http_client_backend::fill_in_server_info)
(http_client_backend::trust_server_info): Likewise.
* nss_funcs.cxx (nss_get_server_cert_info): Also return the cert pem
(nss_get_server_pw_info): Do the private key handling via spawn
of pk12util and openssl.
* server.cxx (base_dir_rh::GET): Add certificate
(server::start): Handle the certificate and private key.
* nss-server-info.cxx (get_server_info_from_db): Move host name
fetch to get_host_name.
(isDomain): Allow for https prefix
(resolve_host): Also set unresolved_host_name to capture the
original host_name
(nss_get_or_keep_online_server_info): Get servers for https case.
* nsscommon.cxx (get_host_name): Moved from
get_server_info_from_db.
(get_pem_cert_is_valid, cvt_nss_to_pem, get_pem_cert)
(have_san_match): New
(testsuite/lib/http_server.exp): Pause to allow pk12util/openssl
spawn to complete.
(testsuite/systemtap.server_trust.exp): New
David Smith [Mon, 23 Jul 2018 19:13:39 +0000 (14:13 -0500)]
Fix a http client bug with a non-existing executable path.
* client-http.cxx (http_client_backend::include_file_or_directory): If we
can't canonicalize a user path, don't pretend we included the file
successfully.
PR21888 WIP: basic bpf variants of logging functions
The prior patch for PR23407 allows some progress on this.
Not everything works yet: PR23435 means some output could get
swallowed in a probe that calls exit(). Moreover assert() does
not work yet -- need to check what must be changed to allow
the string to be passed into a nested call.
* tapset/logging.stp (log): add bpf variant.
(warn): add basic bpf variant.
(error): actually print the error in bpf variant.
David Smith [Fri, 20 Jul 2018 17:46:27 +0000 (12:46 -0500)]
Fix a http server POST data handling bug.
* httpd/server.cxx (connection_info::postdataiterator): Handle POST data
being broken up into several calls. Remove some too verbose status
messages.
* httpd/api.cxx: Remove some too verbose status messages.
PR23407 WIP: stapbpf support for strings as first class values
This is a basic patch which defines the STR value_type, denoting
string constants, which are lowered to pointers to literal strings on
the stack by a pass in bpf-opt.cxx. Currently, space for strings is
allocated using the program::use_tmp_space() mechanism. More than
one string literal can be stored on the stack at a time.
Limitations are 256 bytes for format strings, 64 bytes for other strings.
TODO: The code to allocate literal strings can later be integrated
with register allocation, in order to make more efficient use of
limited (512 bytes) stack space. Currently it's a bit greedy.
The next step is to support storing strings in global data structures
(bpf maps). Since bpf map helpers automatically copy data from the stack
to the map value, this should not be difficult to accomplish.
* bpf-internal.h (BPF_MAXSTRINGLEN, BPF_MAXFORMATLEN): New defines.
(enum value::value_type): New value_type STR denoting string constant.
(value::str_val): New field.
(value::value): Add option to set str_val.
(value::mk_str): New method.
(value::is_str): New method.
(value::str): New method.
(program::str_map): New field.
(program::new_str): New method.
* bpf-base.cxx (value::print): Print STR values.
(program::~program): XXX Should clean up str_map.
(program::new_str): New method.
* bpf-opt.cxx (emit_literal_string): Allocate space for a string
literal on the stack, then emit code to store the string in 4-byte chunks.
(lower_str_values): New function. See explanation at the top of the
commit message.
(program::generate): Add lower_str_values pass.
* bpf-translate.cxx (struct bpf_unparser): triage required visitor
functions by comparison with translate.cxx.
(translate_escapes): New function.
(visit_literal_string): New function, convert literal string to STR value.
(visit_compound_expression): BONUS - trying an implementation of this.
(visit_print_format): Create an STR value instead of emitting the
format string code immediately.
* stapbpf/bpfinterp.cxx (remove_tag): Added sanity check while debugging.
* testsuite/systemtap.bpf/bpf_tests/string1.stp: New file (WIP).
Victor Kamensky [Mon, 9 Jul 2018 16:31:19 +0000 (09:31 -0700)]
dwflpp::function_entrypc avoid usage of uninitialized memory
Failure on 3.3 release was observed. Failure was elusive and
disappeared after seemingly random configure option change, or when
code was compiled with -O1 or -O0 (vs default -O2). Running failing
test case under valgrind memcheck pointed to couple places where
'Conditional jump or move depends on uninitialised value(s)' occured.
After addressing these in two places in dwflpp::function_entrypc,
valgrind memcheck run is clean and original issue got fixed.
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
William Cohen [Thu, 12 Jul 2018 20:35:30 +0000 (16:35 -0400)]
Add the also_ran.stp script to the stap-export examples
The also_ran.stp script provides a tally of the executables and shared
libraries run on the system. The counts provide some indication of
how frequently particular executables and shared libraries are
used. The executables and share libraries could be mapped back to the
packages to give an indication of what software packages are being
used on the system.
William Cohen [Thu, 12 Jul 2018 15:56:12 +0000 (11:56 -0400)]
Adjust stap-exporter probe points to work with newer Linux 4.17 kernels
The syscall functions have changed with the linux 4.17 kernel.
Adjusted the example1.stp and example2.stp like the changes for
the non-dwarf syscall tapsets for similar calls.
stapbpf: add sprintf support to user space interpreter.
This patch implements sprintf for probes that run in stapbpf's user
space BPF interpreter. String support is still very limited. The
return value of sprintf can be assigned to a local variable and
passed to printf as an argument, but not much else. The purpose
of this patch is to add just enough string support for stapbpf
procfs probes to be useful (see PR23285).
* bpf-internal.h: add bpf_func_id for sprintf.
* bpf-translate.cxx (visit_print_format): Add logic for returning
the string instead of calling trace_printk.
* bpfinterp.cxx (bpf_interpret, bpf_sprintf): Add handler for sprintf
call to interpreter.
PR23284 + extra: stapbpf logs loaded BPF programs to dmesg.
The name of the original stap script has been added to the .bo file
generated by bpf-translate.cxx as a new ELF section 'stapbpf_script_name'.
* bpf-internal.h (BPF_MAXSTRINGLEN): New constant, may be configurable in future.
* bpf-translate.cxx (output_stapbpf_script_name): New function.
(translate_bpf_pass): Generate new 'stapbpf_script_name' section.
* stapbpf/stapbpf.cxx (prog_load): Log a notification to dmesg before
loading the BPF program. This is analogous to _stp_print_kernel_info
in the default stap backend.
(load_bpf_file): Obtain module_basename (from module_name) and
script_name (from 'stapbpf_script_name' ELF section).
(main): Open /dev/kmesg as a way to output to dmesg.
stap-exporter: remove wait_for_sess_init, use more descriptive http return codes.
* exporter.py: no longer wait for a stap session to begin after receiving a
request to launch it. Respond with code 301 after launching sessions instead of 200.
Respond with code 501 if session's procfs file cannot found.
PR23359: impose security constraints on @kderef, @kregister
* parse.cxx: add privilege check for @kderef and @kregister
* testsuite/parseko/at_kderef.stp: New file to test privilege check
* testsuite/parseko/at_kregister.stp: New file to test privilege check
* testsuite/parseok/at_kderef.stp: New file to test privilege check
* testsuite/parseok/at_kregister.stp: New file to test privilege check
Frank Ch. Eigler [Sat, 30 Jun 2018 21:00:20 +0000 (17:00 -0400)]
PR23356: stap-serverd: switch back to dbm: nss databases
The sqlite sql: nss ones seem to have different
authentication/login/password protocols as on rawhide, and our logic
can't quite play right with them. Maybe it's due to possible
concurrent access from a stap-serverd and stap client (when both run
as root). Whatever it is, plain dbm: seems to restore function,
even if with a warning about "legacy database".
Frank Ch. Eigler [Sat, 30 Jun 2018 03:04:48 +0000 (23:04 -0400)]
PR23356: nsscommon: adopt new defaults/requirements of rawhide nss
The previous key size of 1024 has become invalid on rawhide, where
2048 is the new minimum. Use 4096 in generate_private_key(). Also,
switch add_cert_db_prefix() to the sql: cert-db prefix, in order to
nuke the -8015 SEC_ERROR_LEGACY_DATABASE error.
Previous code made it possible for an incoming request to be processed
in a handle_connection() thread at the same time as the privkey to it
would be disposed-of in the accept_connections() loop. (This could
happen if the latter encountered a cert-validation error immediately
after the handle_connection thread started, thus its loop was exited.)
We now pass a SECKEY_CopyPrivateKey(privKey) to the thread, which it
will dispose of itself. Thus the main thread can do its prior thing.
Frank Ch. Eigler [Fri, 29 Jun 2018 23:58:49 +0000 (19:58 -0400)]
PR23356: stap-serverd: close client socket upon error
setupSSLSocket() was designed to return NULL on error, but it didn't
clean up its partially constructed / given sockets at all. They'd
stay open, and a remote client would be left to wait indefinitely.
Frank Ch. Eigler [Fri, 29 Jun 2018 23:25:45 +0000 (19:25 -0400)]
PR23356: improve stap-serverd nss diagnostics
On rawhide, we can get certificate verification errors that previous
control flow kept quiet about. When it did talk about them, the error
codes were out of nssError()'s assumed range. Now nssError() tries
the PR_ErrorTo* functions regardless of error number range.
David Smith [Thu, 28 Jun 2018 19:26:51 +0000 (14:26 -0500)]
Update fedora_install_package.py to handle Centos and Fedora packages.
* httpd/docker/fedora_install_package.py (split_nvra): New function.
(build_id_symlink_is_valid): Ditto.
(PkgSystem.build_id_is_valid): If we don't have a build id, pretend it
matched. Handle Fedora vs. Centos build id differences.
(PkgSystem.pkg_install): Don't run scripts when installing rpms. They
can cause issues. Handle the case where 'dnf debuginfo' installs the
wrong version of the debuginfo rpm.
William Cohen [Wed, 27 Jun 2018 14:56:44 +0000 (10:56 -0400)]
Simplify the initialization logic for cpu_throttle.stp
Rather than ensuring there is an instance for each of the cpus on the
machine just make sure there is an instance for cpu0 to avoid having
an empty array. As processors are throttled the other instances will
be added.
William Cohen [Wed, 27 Jun 2018 03:16:20 +0000 (23:16 -0400)]
Add the cpu_throttle.stp example to stap-exporter scripts
The cpu_throttle.stp script monitors Intel x86 cpu throttling due to
power and thermal constraints. The output is formatted to be
consumable by prometheus.
Frank Ch. Eigler [Thu, 28 Jun 2018 02:32:34 +0000 (22:32 -0400)]
PR23160,PR14690: support /* guru */ embedded-C for CONTEXT->sregs
For the newfangled syscall support, an embedded-C function is needed
to set the CONTEXT->sregs field from any of the new gajillion syscall
probe aliases. Since we want that ->sregs field trustworthy, and not
callable by the user with some random junk pointer, we want this
function to be only callable from the tapset, but not replicated (so
not "private" in each file).
To do this, we extend the parser/elaborator to extend the logic of /*
guru */ markup in embedded-C functions. Namely, calls to such
functions from the tapset (which is parsed with privileged flags) are
now permitted, kind of as if they were a private copy in each tapset
file that called them. The stapfile privileged field, which has
fallen into disuse, is brought back to life and propagates the
parse-time pf_guru flag. It's used with a new visitor that checks
each caller-callee relationship in function calls, moving some similar
code from staptree.cxx.
The syscall.read aliases are adapted to use the new single copy of
__set_syscall_pt_regs() in aux2_syscalls.stp.
Aaron Merey [Wed, 27 Jun 2018 20:51:57 +0000 (16:51 -0400)]
tapset: add stap-exporter utility macros and probe alias
* prometheus.stp: defines a "prometheus" probe as an alias for a procfs read probe
* prometheus.stpm: defines utility macros that create metrics from arrays and writes
them to a procfs file (intended for use within prometheus probes).
Frank Ch. Eigler [Wed, 27 Jun 2018 01:16:27 +0000 (21:16 -0400)]
rhbz1595178: add some inter-subrpm cross-version Conflicts:
Some files (e.g. some man pages, some localization) shared between
subrpms make it impossible to upgrade only some subrpms across
versions, due to file conflicts. We now add some Conflicts: so as to
prevent concurrent installation of different versions of some subrpms.
Frank Ch. Eigler [Mon, 25 Jun 2018 16:34:43 +0000 (12:34 -0400)]
PR23160,PR14690: 32-on-64 bit fixes
After reported crashes with the syscalls.* test cases, found that
32-on-64 bits were b0rked, because pt_regs* addresses were being
truncated and yet later dereferenced in kernel space.
To simplify analysis, added a pt_regs *sregs to the common probe
context, which signifies 'syscall mode' register dumps. This is
different from normal kregs (kernel-space, normal abi, 64-bit-only)
and uregs (user-space, normal abi, either 32- or 64-bit), and needs
custom processing in _stp_syscall_nr and especially _stp_arg2.
The sregs-setter embedded-C function __set_syscall_pt_regs(r)
needs to be private/tapset-guru, but we lack proper /* markup */
for that particular mode. So that function currently needs to
be replicated as private inside each sysc_* file, ugh. Not
for long though.
After this patch, while this doesn't quite pass, but the read
parts look good:
sudo make installcheck RUNTESTFLAGS=nd_syscall.exp\ syscall.exp CHECK_ONLY="readwrite"
William Cohen [Thu, 21 Jun 2018 18:06:26 +0000 (14:06 -0400)]
Clean up the exporter.py formatting with autopep8
The formatting in exporter.py did not follow the PEP8 style guide and
pylint would complain about a lot of minor formatting issues. Ran
exporter.py through autopep8 to eliminate those warnings.
William Cohen [Wed, 20 Jun 2018 20:35:03 +0000 (16:35 -0400)]
Use equivalent non-dwarf probe points for example scripts
The dwarf-based kernel.function probes require the installation of
kernel-debuginfo. The examples can be just as easily implemented with
the non-dwarf kprobe.function probes and eliminate the need to install
kernel-debuginfo.
Frank Ch. Eigler [Wed, 20 Jun 2018 18:00:15 +0000 (14:00 -0400)]
tapset: introduce tapset/linux/aux2_syscalls.stp for leaf embedded-c functions
The _stp_syscall_nr embedded-C function may well be needed from the
tracepoint-flavoured syscall probe aliases. For this, it would be
undesirable to necessarily drag in all of the material in
aux_syscalls.stp. So split out a new little aux2_syscall.stp with
just the basics that minimal nd2_syscall or tp_syscall jobs might
need.
Frank Ch. Eigler [Mon, 18 Jun 2018 18:57:21 +0000 (14:57 -0400)]
PR23160,PR14690: uregs setter macro for use from kernel context
When a syscall interjection mechanism gives us a pt_regs*
structure for the syscall parameters/context, we can pretend
as though it were a user-space probe.
Frank Ch. Eigler [Sat, 16 Jun 2018 08:36:00 +0000 (04:36 -0400)]
pass2 elaboration: tweak diagnostics
While debugging 4.17-style syscalls, it was tricky to figure out the
probe derivation process while it was underway, with nested aliases
and optional/sufficient probe points. Tweak verbosity numbers and
messages to give a good overview at the --vp 02 level.
While in the vicinity, introduced a session
suppress_costly_diagnostics counter, which is used to suppress
levenshtein suggestions for optional/sufficient probe points.
These probe points are expected to fail, and no messages will
be printed for them anyway, so the levenshtein stuff was a pure
waste. stap -p2 run time time for scripts like
David Smith [Thu, 14 Jun 2018 18:18:24 +0000 (13:18 -0500)]
No longer run the http server as 'root'.
* httpd/backends.cxx (container_backend::generate_module): Handle python
versions correctly. Add "sudo" to all "buildah" command lines.
* httpd/main.cxx (main): Make sure we're not running as root.
* testsuite/lib/systemtap.exp (systemtap_check_users): Check for the
'stap-server' user.
* testsuite/lib/http_server.exp: Start the http server as the
'stap-server' user, not root.
* configure.ac: Add defines to determine if python 2 and python 3 exist on
the system.
* httpd/Makefile.am: Installs sudoers rule file.
* configure: Regenerated.
* config.in: Ditto.
* httpd/Makefile.in: Ditto.
* util.cxx (get_distro_info): Fixed bug where the version number and
release number got combined.
* httpd/stap-server.sudoers: New file.