PR23284 + extra: stapbpf logs loaded BPF programs to dmesg.
The name of the original stap script has been added to the .bo file
generated by bpf-translate.cxx as a new ELF section 'stapbpf_script_name'.
* bpf-internal.h (BPF_MAXSTRINGLEN): New constant, may be configurable in future.
* bpf-translate.cxx (output_stapbpf_script_name): New function.
(translate_bpf_pass): Generate new 'stapbpf_script_name' section.
* stapbpf/stapbpf.cxx (prog_load): Log a notification to dmesg before
loading the BPF program. This is analogous to _stp_print_kernel_info
in the default stap backend.
(load_bpf_file): Obtain module_basename (from module_name) and
script_name (from 'stapbpf_script_name' ELF section).
(main): Open /dev/kmesg as a way to output to dmesg.
stap-exporter: remove wait_for_sess_init, use more descriptive http return codes.
* exporter.py: no longer wait for a stap session to begin after receiving a
request to launch it. Respond with code 301 after launching sessions instead of 200.
Respond with code 501 if session's procfs file cannot found.
PR23359: impose security constraints on @kderef, @kregister
* parse.cxx: add privilege check for @kderef and @kregister
* testsuite/parseko/at_kderef.stp: New file to test privilege check
* testsuite/parseko/at_kregister.stp: New file to test privilege check
* testsuite/parseok/at_kderef.stp: New file to test privilege check
* testsuite/parseok/at_kregister.stp: New file to test privilege check
Frank Ch. Eigler [Sat, 30 Jun 2018 21:00:20 +0000 (17:00 -0400)]
PR23356: stap-serverd: switch back to dbm: nss databases
The sqlite sql: nss ones seem to have different
authentication/login/password protocols as on rawhide, and our logic
can't quite play right with them. Maybe it's due to possible
concurrent access from a stap-serverd and stap client (when both run
as root). Whatever it is, plain dbm: seems to restore function,
even if with a warning about "legacy database".
Frank Ch. Eigler [Sat, 30 Jun 2018 03:04:48 +0000 (23:04 -0400)]
PR23356: nsscommon: adopt new defaults/requirements of rawhide nss
The previous key size of 1024 has become invalid on rawhide, where
2048 is the new minimum. Use 4096 in generate_private_key(). Also,
switch add_cert_db_prefix() to the sql: cert-db prefix, in order to
nuke the -8015 SEC_ERROR_LEGACY_DATABASE error.
Previous code made it possible for an incoming request to be processed
in a handle_connection() thread at the same time as the privkey to it
would be disposed-of in the accept_connections() loop. (This could
happen if the latter encountered a cert-validation error immediately
after the handle_connection thread started, thus its loop was exited.)
We now pass a SECKEY_CopyPrivateKey(privKey) to the thread, which it
will dispose of itself. Thus the main thread can do its prior thing.
Frank Ch. Eigler [Fri, 29 Jun 2018 23:58:49 +0000 (19:58 -0400)]
PR23356: stap-serverd: close client socket upon error
setupSSLSocket() was designed to return NULL on error, but it didn't
clean up its partially constructed / given sockets at all. They'd
stay open, and a remote client would be left to wait indefinitely.
Frank Ch. Eigler [Fri, 29 Jun 2018 23:25:45 +0000 (19:25 -0400)]
PR23356: improve stap-serverd nss diagnostics
On rawhide, we can get certificate verification errors that previous
control flow kept quiet about. When it did talk about them, the error
codes were out of nssError()'s assumed range. Now nssError() tries
the PR_ErrorTo* functions regardless of error number range.
David Smith [Thu, 28 Jun 2018 19:26:51 +0000 (14:26 -0500)]
Update fedora_install_package.py to handle Centos and Fedora packages.
* httpd/docker/fedora_install_package.py (split_nvra): New function.
(build_id_symlink_is_valid): Ditto.
(PkgSystem.build_id_is_valid): If we don't have a build id, pretend it
matched. Handle Fedora vs. Centos build id differences.
(PkgSystem.pkg_install): Don't run scripts when installing rpms. They
can cause issues. Handle the case where 'dnf debuginfo' installs the
wrong version of the debuginfo rpm.
William Cohen [Wed, 27 Jun 2018 14:56:44 +0000 (10:56 -0400)]
Simplify the initialization logic for cpu_throttle.stp
Rather than ensuring there is an instance for each of the cpus on the
machine just make sure there is an instance for cpu0 to avoid having
an empty array. As processors are throttled the other instances will
be added.
William Cohen [Wed, 27 Jun 2018 03:16:20 +0000 (23:16 -0400)]
Add the cpu_throttle.stp example to stap-exporter scripts
The cpu_throttle.stp script monitors Intel x86 cpu throttling due to
power and thermal constraints. The output is formatted to be
consumable by prometheus.
Frank Ch. Eigler [Thu, 28 Jun 2018 02:32:34 +0000 (22:32 -0400)]
PR23160,PR14690: support /* guru */ embedded-C for CONTEXT->sregs
For the newfangled syscall support, an embedded-C function is needed
to set the CONTEXT->sregs field from any of the new gajillion syscall
probe aliases. Since we want that ->sregs field trustworthy, and not
callable by the user with some random junk pointer, we want this
function to be only callable from the tapset, but not replicated (so
not "private" in each file).
To do this, we extend the parser/elaborator to extend the logic of /*
guru */ markup in embedded-C functions. Namely, calls to such
functions from the tapset (which is parsed with privileged flags) are
now permitted, kind of as if they were a private copy in each tapset
file that called them. The stapfile privileged field, which has
fallen into disuse, is brought back to life and propagates the
parse-time pf_guru flag. It's used with a new visitor that checks
each caller-callee relationship in function calls, moving some similar
code from staptree.cxx.
The syscall.read aliases are adapted to use the new single copy of
__set_syscall_pt_regs() in aux2_syscalls.stp.
Aaron Merey [Wed, 27 Jun 2018 20:51:57 +0000 (16:51 -0400)]
tapset: add stap-exporter utility macros and probe alias
* prometheus.stp: defines a "prometheus" probe as an alias for a procfs read probe
* prometheus.stpm: defines utility macros that create metrics from arrays and writes
them to a procfs file (intended for use within prometheus probes).
Frank Ch. Eigler [Wed, 27 Jun 2018 01:16:27 +0000 (21:16 -0400)]
rhbz1595178: add some inter-subrpm cross-version Conflicts:
Some files (e.g. some man pages, some localization) shared between
subrpms make it impossible to upgrade only some subrpms across
versions, due to file conflicts. We now add some Conflicts: so as to
prevent concurrent installation of different versions of some subrpms.
Frank Ch. Eigler [Mon, 25 Jun 2018 16:34:43 +0000 (12:34 -0400)]
PR23160,PR14690: 32-on-64 bit fixes
After reported crashes with the syscalls.* test cases, found that
32-on-64 bits were b0rked, because pt_regs* addresses were being
truncated and yet later dereferenced in kernel space.
To simplify analysis, added a pt_regs *sregs to the common probe
context, which signifies 'syscall mode' register dumps. This is
different from normal kregs (kernel-space, normal abi, 64-bit-only)
and uregs (user-space, normal abi, either 32- or 64-bit), and needs
custom processing in _stp_syscall_nr and especially _stp_arg2.
The sregs-setter embedded-C function __set_syscall_pt_regs(r)
needs to be private/tapset-guru, but we lack proper /* markup */
for that particular mode. So that function currently needs to
be replicated as private inside each sysc_* file, ugh. Not
for long though.
After this patch, while this doesn't quite pass, but the read
parts look good:
sudo make installcheck RUNTESTFLAGS=nd_syscall.exp\ syscall.exp CHECK_ONLY="readwrite"
William Cohen [Thu, 21 Jun 2018 18:06:26 +0000 (14:06 -0400)]
Clean up the exporter.py formatting with autopep8
The formatting in exporter.py did not follow the PEP8 style guide and
pylint would complain about a lot of minor formatting issues. Ran
exporter.py through autopep8 to eliminate those warnings.
William Cohen [Wed, 20 Jun 2018 20:35:03 +0000 (16:35 -0400)]
Use equivalent non-dwarf probe points for example scripts
The dwarf-based kernel.function probes require the installation of
kernel-debuginfo. The examples can be just as easily implemented with
the non-dwarf kprobe.function probes and eliminate the need to install
kernel-debuginfo.
Frank Ch. Eigler [Wed, 20 Jun 2018 18:00:15 +0000 (14:00 -0400)]
tapset: introduce tapset/linux/aux2_syscalls.stp for leaf embedded-c functions
The _stp_syscall_nr embedded-C function may well be needed from the
tracepoint-flavoured syscall probe aliases. For this, it would be
undesirable to necessarily drag in all of the material in
aux_syscalls.stp. So split out a new little aux2_syscall.stp with
just the basics that minimal nd2_syscall or tp_syscall jobs might
need.
Frank Ch. Eigler [Mon, 18 Jun 2018 18:57:21 +0000 (14:57 -0400)]
PR23160,PR14690: uregs setter macro for use from kernel context
When a syscall interjection mechanism gives us a pt_regs*
structure for the syscall parameters/context, we can pretend
as though it were a user-space probe.
Frank Ch. Eigler [Sat, 16 Jun 2018 08:36:00 +0000 (04:36 -0400)]
pass2 elaboration: tweak diagnostics
While debugging 4.17-style syscalls, it was tricky to figure out the
probe derivation process while it was underway, with nested aliases
and optional/sufficient probe points. Tweak verbosity numbers and
messages to give a good overview at the --vp 02 level.
While in the vicinity, introduced a session
suppress_costly_diagnostics counter, which is used to suppress
levenshtein suggestions for optional/sufficient probe points.
These probe points are expected to fail, and no messages will
be printed for them anyway, so the levenshtein stuff was a pure
waste. stap -p2 run time time for scripts like
David Smith [Thu, 14 Jun 2018 18:18:24 +0000 (13:18 -0500)]
No longer run the http server as 'root'.
* httpd/backends.cxx (container_backend::generate_module): Handle python
versions correctly. Add "sudo" to all "buildah" command lines.
* httpd/main.cxx (main): Make sure we're not running as root.
* testsuite/lib/systemtap.exp (systemtap_check_users): Check for the
'stap-server' user.
* testsuite/lib/http_server.exp: Start the http server as the
'stap-server' user, not root.
* configure.ac: Add defines to determine if python 2 and python 3 exist on
the system.
* httpd/Makefile.am: Installs sudoers rule file.
* configure: Regenerated.
* config.in: Ditto.
* httpd/Makefile.in: Ditto.
* util.cxx (get_distro_info): Fixed bug where the version number and
release number got combined.
* httpd/stap-server.sudoers: New file.
Martin Cermak [Thu, 14 Jun 2018 12:35:26 +0000 (14:35 +0200)]
Improve the foreach_limit(2).exp test results.
Without this update, one can observe following issue with rhel7
powerpc kernels:
=======
# stap -p4 testsuite/systemtap.maps/foreach_limit.stp
...
/usr/local/share/systemtap/runtime/map.c:275:26: error: ‘a’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
struct mlist_head *c, *a, *last, *tmp;
^
/usr/local/share/systemtap/runtime/map.c:275:26: error: ‘a’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
cc1: all warnings being treated as errors
=======
This problem turns out to start happening after the powerpc kernel
build system started using -O3 instead of -O2 as one can see in
http://vault.centos.org/7.5.1804/os/Source/SPackages/kernel-3.10.0-862.el7.src.rpm
=======
# powerpc is compiled with -O3, via specfile rpmbuild -- see rhbz1051067.
# we need to keep consistency here, however, for out of tree kmod builds --
# see rhbz1431029 for reference
ifeq ($(SRCARCH), powerpc)
KBUILD_CFLAGS += -O3
else
KBUILD_CFLAGS += -O2
endif
=======
Reverting this change (using -O2 instead of -O3) works the problem around
as well as this systemtap-side update. For more details, see rhbz1591267.
* exporter.conf: lets user specify which scripts the server can run and
options such as automatic session timeouts and whether sessions are
launched at startup.
* exporter.py: parse config file, periodically check for sessions that
need to be terminated due to timeout.
Paulo Andrade [Tue, 12 Jun 2018 22:48:26 +0000 (18:48 -0400)]
rhbz1547238: adapt vfs.add_to_page_cache probes
The add_to_page_cache_lru variant should also be probed, along with
kernel.function("add_to_page_cache_locked"), but if present, not the
add_to_page_cache variant. This backward compatibility needed a
little bit of tapset probe point operator jiujitsu to go beyond
Paulo's initial patch.
David Smith [Tue, 12 Jun 2018 19:19:01 +0000 (14:19 -0500)]
Fix (and centralize) temporary directory creation in the http server.
* httpd/api.cxx (build_collection_rh::POST): Always ensure there is a
server temporary directory (otherwise files are attempted to be written
to the root directory). Use make_temp_directory() to create the client
temporary directory.
* httpd/utils.cxx (make_temp_dir): New function.
* httpd/utils.h: Added make_temp_dir() prototype.
* httpd/server.cxx (connection_info::postdataiterator): Use
make_temp_directory().
David Smith [Mon, 11 Jun 2018 16:13:11 +0000 (11:13 -0500)]
Simplify temporary directory creation in the http container backend.
* httpd/backends.cxx (container_backend::generate_module): Instead of
creating a new temporary directory for the docker file, just create a
subdirectory of the server temporary directory.
David Smith [Thu, 7 Jun 2018 19:51:55 +0000 (14:51 -0500)]
Implement caching for the http container backend.
* httpd/backends.cxx (class container_image_cache): New class.
(container_backend::generate_module): Implement a caching scheme for
"buildah". If the hash for a docker file matches an existing image,
reuse the image.
Jafeer Uddin [Thu, 7 Jun 2018 17:41:11 +0000 (13:41 -0400)]
PR23226: Added ability to run sample scripts with typing whole path
* cmdline.h: Added new command line option '--example'
* cmdline.cxx: Added new command line option '--example'
* session.h: Added new flag 'run_example' to track example option
* session.cxx: Updated constructors to initialize new flag and added
logic to set run_example flag in parse_cmdline()
* main.cxx: If '--example' is specified then search for script within
example directory and run it if one hit is found
* man/stap.1.in: Added new entry for '--example' option
* testsuite/buildko/example.stp: New file to test new option
* testsuite/buildok/example.stp: New file to test new option
David Smith [Tue, 5 Jun 2018 16:32:42 +0000 (11:32 -0500)]
Add the docker file hash to the container image name.
* httpd/backends.cxx (container_backend::generate_module): Add the docker
file hash to the container image name.
* httpd/utils.cxx (get_file_hash): New function.
* httpd/utils.h: Add get_file_hash() prototype.
* httpd/Makefile.am: Add ../mdfour.c to the list of sources.
* httpd/Makefile.in: Regenerated.
David Smith [Fri, 1 Jun 2018 20:09:23 +0000 (15:09 -0500)]
Started switching the http container backend to use 'buildah'.
* httpd/backends.cxx (container_backend): Initial steps of switching from
'docker' to 'buildah'.
* httpd/backends.h (backend_base): Updated prototype.
* httpd/api.cxx (build_info): Remove the 'tmp_dir' member.
(build_info::~build_info): No longer remove the temporary directory.
(response build_collection_rh::POST): Create the 'client_dir' temporary
directory.
* httpd/api.h (client_request_data): Rename the 'base_dir' member to
'server_dir' and add a 'client_dir' member.
(build_info::module_build): Unzip the 'client.zip' file in the client
directory, not the server directory. Be sure to look for the module in
the client directory.
(client_request_data::~client_request_data): Remove both the client
directory and the server directory.
(client_request_data::get_json_object): Update output.
* httpd/docker/centos.json: Tweaked to not run "yum/dnf clean all" after
the first RUN command.
* httpd/docker/fedora.json: Ditto.
David Smith [Thu, 31 May 2018 15:24:04 +0000 (10:24 -0500)]
Change the http container backend to do all the image building itself.
* httpd/backends.cxx (container_backend::generate_module): Just use the
python script to build the docker file, then build the docker image.
* httpd/docker/stap_build_docker_file.py: Renamed from
'stap_build_docker_image.py' and only builds the docker file, not the
docker image.
* httpd/docker/Makefile.am: Rename 'stap_build_docker_image.py' to
'stap_build_docker_file.py'.
* httpd/docker/Makefile.in: Regenerated.
David Smith [Wed, 30 May 2018 16:32:59 +0000 (11:32 -0500)]
Rename the http docker backend to the http container backend.
* httpd/backends.cxx: No real code change, but in preparation for the
switch from using "docker" to "buildah", rename the docker backend to
the container backend.
Serhei Makarov [Tue, 29 May 2018 17:33:49 +0000 (13:33 -0400)]
Merge branch 'serhei/rt-fixes-clean'
Initial round of fixes for RHBZ1272304 to make systemtap work better on the
realtime (CONFIG_PREEMPT_RT_FULL) kernel. These fixes do not solve all of the
rule violations that occur (and get reported to dmesg on kernel-rt-debug) but
they do make it possible for SystemTap to get through the entire testsuite on
kernel-rt without locking up the system.
Merging rather than rebasing since only the final commit of the branch is 'safe'.
Jafeer Uddin [Tue, 29 May 2018 13:28:17 +0000 (09:28 -0400)]
Added ability to send test results via http to a url
* testsuite/Makefile.am: Expanded check for DEJAZILLA to distinguish
between email address and url, and send test results to the corresponding
destination.
* testsuite/Makefile.in: Regenerated.
* testsuite/configure.ac: Updated messages to reflect the added feature.
* testsuite/configure: Regenerated.
Jeff Moyer [Fri, 11 May 2018 19:25:52 +0000 (15:25 -0400)]
io_submit.stp: let the user know when the script is loaded
I often find myself checking lsmod to see when the script is finally
ready to collect data. Just print a message from the begin probe to
make it obvious when the script is ready.
Jeff Moyer [Fri, 11 May 2018 19:25:51 +0000 (15:25 -0400)]
io_submit.stp: use an accumulator for traces
On very large systems, we get a lot of skipped probes due to lock
contention on the traces array. The end result is that we don't
get any data for such systems. Simply converting the traces array
to an accumulator resolves this issue in testing (on a highly-
loaded 288 cpu system).
probe syscall.io_submit.return {
/* this assumes a given proc will do lots of io_submit calls, and
* so doesn't do the more expensive "delete in_iosubmit[p]". If
* there are lots of procs doing small number of io_submit calls,
* the hash may grow pretty big, so using delete may be better
*/
in_iosubmit[tid()] = 0
}
However, the test to see if a thread is currently executing in io_submit
is performed using the membership operator 'in':
if (tid() in in_iosubmit)
This is obviously wrong. We can do one of two things:
1) change the test to if (in_iosubmit[tid()] == 1) or
2) just perform the delete in the return probe
While I agree that we typically have a small number of threads performing
io_submit, I don't believe there is substance to the performance claims
for the delete operator. So, I've opted for solution 2.
David Smith [Wed, 23 May 2018 17:37:07 +0000 (12:37 -0500)]
Change the http docker backend to run systemtap in the container image.
* httpd/backends.cxx (docker_backend::generate_module): Switch back to
running systemtap inside the container. Trying to use the container
image as a sysroot didn't work well. Trying to run systemtap on the
sysroot worked fine, but running gcc 8 (f28) against a centos 7
sysroot's kernel source failed. Trying to run the centos 7 gcc from f28
kept crashing. So, we're back to running systemtap in the container
image.