Josh Stone [Tue, 26 Jul 2011 19:25:34 +0000 (12:25 -0700)]
PR12895: Use NOSTDINC_FLAGS in kernel stapconf checks
We should never be looking in /usr/include/ for headers when building
for the kernel. This particularly bit us in a case where RHEL6 gained
blk_types.h in newer kernels. So if the system had kernel-headers.rpm
with this new header in /usr/include/, but was still running an older
kernel that lacked it in /lib/modules/`uname -r`/build/, then we'd
misidentify that header's availability in stapconf.
* buildrun.cxx (compile_pass): Add NOSTDINC_FLAGS to CHECK_BUILD.
Josh Stone [Mon, 25 Jul 2011 15:48:31 +0000 (11:48 -0400)]
CVE-2011-2503: read instead of mmap to load modules
As staprun is preparing to load a kernel module, we first mmap the whole
module as MAP_PRIVATE. Then we proceed with our security checks,
including a trusted-signature validation on the mapped region, and if
all checks out, we'll call init_module() with that same mapped region.
However, MMAP(2) says of MAP_PRIVATE, "It is unspecified whether changes
made to the file after the mmap() call are visible in the mapped
region." From my testing, it appears that file changes do indeed show
up in our mapped memory. This means we have a TOCTOU race between
verifying the signature of that memory and then calling init_module().
By using read() instead of mmap(), we ensure that we have a fully
private copy of the module to verify and load, without fear of change.
Josh Stone [Mon, 25 Jul 2011 13:54:28 +0000 (09:54 -0400)]
CVE-2011-2502: Don't allow path-based auth for uprobes
For users that are only members of stapusr, and not stapdev, we only
allow loading modules that are either signed with a trusted certificate
or located in controlled paths. For the script itself, that path is
/lib/modules/.../systemtap/, and for uprobes it is the runtime. When
this policy was first written, uprobes only ever came from the runtime
path, so the path check just returned 1 always.
Later, commit 474d17ad added an optional argument to staprun -u, to
allow the user to specify their own signed copy of uprobes to load.
Unfortunately, if presented with an unsigned module, that would still
fall back to the path check, which blissfully approved it anyway.
Our policy is now that stapusr can only load a signed uprobes.ko, so the
path check for uprobes now unconditionally returns 0.
When invoking stap_run2 with multiple optional stap arguments, the
extra arguments are captured in the tcl list $args. When stap_run2
calls down to stap_run3 to do the real work, it gets packaged as a
single quoted string instead of the original list of options.
We need to unpack this list to pass it on, e.g. via tcl eval.
This impacts test cases that pass multiple parameters, such as
memory1.exp, const_value.exp, process_by_cmd.exp.
Dave Brolley [Thu, 21 Jul 2011 21:34:14 +0000 (17:34 -0400)]
Don't allow the compile server client to honour -I/
- Found by server_args.exp fuzzing tests
- Would require special case code to handle but is a bad bad idea anyway
so don't allow it.
- Update test suite with the offending test case and some similar ones.
William Cohen [Thu, 21 Jul 2011 19:09:07 +0000 (15:09 -0400)]
Add basic functionality for ARM architecture support of nd_syscall.*
The no dwarf syscalls tapset needs some code to access the syscall parameters.
This is a first pass to add the support for the ARM architecture. This
basic support only handles the first 4 arguments on ARM. Argument 5 and
later are on the stack and are not handled.
David Smith [Thu, 21 Jul 2011 16:41:44 +0000 (11:41 -0500)]
Added notes that the STP_OOB_DATA prefixes shouldn't be translated.
* runtime/transport/control.c (_stp_ctl_alloc_special_buffers): Added a
note that the STP_OOB_DATA prefixes ("WARNING:" and "ERROR:") shouldn't
be translated.
(_stp_ctl_get_buffer): Ditto.
William Cohen [Thu, 21 Jul 2011 14:29:00 +0000 (10:29 -0400)]
Make loc2c-runtime.h treat ARM architecture as a 32-bit architecture
The arm is a 32-bit architecture it should be doing the kread()
and kwrite operations in the same manner as other 32-bit architecture
such as the i386.
David Smith [Thu, 21 Jul 2011 13:34:30 +0000 (08:34 -0500)]
Avoid "unknown type" errors on unused parameters.
* tapset/arm/aux_syscalls.stp: Help the translator out by specifying
types on '_ptrace_return_arch_prctl_addr' unused parameters.
* tapset/i386/aux_syscalls.stp: Ditto.
* tapset/ia64/aux_syscalls.stp: Ditto.
* tapset/powerpc/aux_syscalls.stp: Ditto.
On machines that enjoy a sacred zen-like quality of
doing nothing but run systemtap tests, the memory1 test
case can wait, wait, wait, and wait yet more. Nae, it
can wait indefinitely, until some other Godot thread
comes and runs a syscall.open. No syscall.open - no
script exit().
Fix this in two separate ways. First, let the script itself time out.
Second, run the script with a meaningful, profound workload consisting
of "/bin/sh </dev/null", which while pondering the nothingness of it
all, does run at least one open(2).
* testsuite/lib/systemtap.exp (start_server): Locate
stap based on $SYSTEMTAP_PATH; plop in $installed_stap.
(setup_server): Use that location rather than which(1).
Josh Stone [Wed, 20 Jul 2011 22:39:57 +0000 (15:39 -0700)]
Normalize the arch in systemtap_session::clone
* session.cxx (systemtap_session::clone): Normalize the incoming arch
name, so it can be consistently compared to both this->architecture
and other cloned subsessions.
David Smith [Wed, 20 Jul 2011 21:32:30 +0000 (16:32 -0500)]
Improved prcwildcard.exp and cmd_parse.exp tests.
* testsuite/systemtap.base/prcwildcard.exp: If we're testing a stripped
stap, don't bother running the function test, which needs debuginfo.
* testsuite/systemtap.base/cmd_parse.exp: Increase timeout.
* testsuite/lib/systemtap.exp (stripped_p): New function to
determine if an executable is stripped.
Dave Brolley [Wed, 20 Jul 2011 17:46:08 +0000 (13:46 -0400)]
Fix "Unable to shutdown NSS/NSS is not initialized" on RHEL5.
Could also occur for any build with HAVE_NSS && ! HAVE_LIBRPMIO.
In this case, the rpm finder must attempt to shutdown NSS (sometimes initialized
by librpm) without knowing if it was actually initialized. In this case we
will now tolerate failure to shut down NSS if the error is
SEC_ERROR_NOT_INITIALIZED.
Dave Brolley [Wed, 20 Jul 2011 14:37:54 +0000 (10:37 -0400)]
PR 12888 - stap-serverd should be weaned from -k
- stap-serverd no longer passes -k to stap.
- -k specified on client no longer passed on to stap on the server side.
- -k specified to stap-serverd on startup instructs the server to save
its temp dir (contains client request and server response).
- server version 1.6 no longer packs uprobes.ko twice, unless the client
version is < 1.6.
- client version 1.6 looks for uprobes.ko in <response>/stap000000/uprobes
unless server version is < 1.6.
- Update/modify testsuite.
William Cohen [Wed, 20 Jul 2011 14:52:43 +0000 (10:52 -0400)]
Factor out code to normalize the architecture names and add arm arch
A few tests need to know the generic architecture name rather than
the specific variant. This patch factors out the code into
testsuite/lib/systemtap.exp and add entries for the arm architecture
variants.
Mark Wielaard [Wed, 20 Jul 2011 14:05:31 +0000 (16:05 +0200)]
Always look for .note.stapsdt sections in the main elf file.
In dwflpp::iterate_over_notes we really want the actual elf file,
not the dwarf .debug file. Older binutils had a bug where they
mangled the SHT_NOTE type during --keep-debug.
Mark Wielaard [Tue, 19 Jul 2011 20:56:17 +0000 (22:56 +0200)]
Depend on elfutils 0.142+. Remove various workarounds.
We really need at least 0.142 to support quick dwarf unwinding.
Also earlier versions had various bugs that we sometimes worked
around, but not always. Which could lead to misterious failures
when a bias was miscalculated.
David Smith [Tue, 19 Jul 2011 20:15:45 +0000 (15:15 -0500)]
Improve buildid.exp error handling.
* testsuite/systemtap.base/buildid.exp: Once 'error_handler' is called
cleanup has occurred, so the following objcopy commands will fail. Just
return instead.
The PR10854 test case uses a tight loop of staprun and a nexted loop
of pkills, written in a way that counts on staprun's pre-PR12890
"insert; unload; retry insert" module-handling heuristic. With this
heuristic gone (and error messages properly generated), the PR10854
test case goes woozy and hangs in the while { ... pkill ... } tcl
loop. Now we don't loop in there any more.
Mark Wielaard [Mon, 18 Jul 2011 18:55:38 +0000 (20:55 +0200)]
PR10189 and PR12960 reserve system cmd messages for delivery.
runtime/transport/control.c kept one pool for all cmd messages that
the module had to deliver to staprun/io. This pool could become
empty. This meant essential control message would not be delivered.
Leading to the module not properly starting and/or exiting.
We now set aside buffers for one time messages (STP_START, STP_EXIT,
STP_TRANSPORT, STAP_REQUEST_EXIT) and "overflow" messages that get
delivered whenever one of the dynamically allocated messages cannot
get a free slot from the pool (STP_OOB_DATA - warnings and errors,
STP_SYSTEM and STP_REALTIME_DATA).
The type field is used to mark whether or not a special pre-allocated
buffer is currently unused. This needs careful locking using a new
&_stp_ctl_special_msg_lock that is used in the new helper functions
_stp_ctl_get_buffer and _stp_ctl_free_buffer.
Now when we run out of message buffers we just drop the message and
printk. stapio will have received either the one time message or an
overflow message, there is nothing more we can do.
The STP_DEFAULT_BUFFERS for debugfs.c got decreased again to allow
8 pre-allocated and 32 dynamic (pending) cmd messages.
A new testcase testsuite/systemtap.base/warn_overflow.exp was added.
Chris Meek [Mon, 18 Jul 2011 20:39:44 +0000 (16:39 -0400)]
LTTng TMF Custum Text Parser Example
Added proc_snoop_parser to
src/testsuite/systemtap.examples/process/
Follow the instructions in:
src/testsuite/systemtap.examples/process/proc_snoop_parser_instructions.txt
to try out the eclipse plugin tracefile parser.
Mark Wielaard [Fri, 15 Jul 2011 21:54:47 +0000 (23:54 +0200)]
PR12960 Don't msleep in _stp_ctl_send when out of memory.
This is mainly a documentation patch to better explain the transport
layers and the interaction between _stp_ctl_read_cmd, _stp_ctl_send and
_stp_ctl_write.
It also contains the first step to resolve PR12960. The msleep() in
_stp_ctl_send() has been replaced with a loop that checks whether there
are messages on the queue, tries to wake up _stp_ctl_read_cmd so stapio
has a change to read some of the pending messages and a small mdelay
(which is save, because it doesn't actually sleep or schedule). It
only prevents the crash and makes the possibility of loosing control
messages slightly less. A followup patch will introduce special buffers
to hold cannot be lost messages so the module will always be able to
properly shut down.
STP_DEFAULT_BUFFERS for debugfs also got increased a little from 50 to 64.
Josh Stone [Fri, 15 Jul 2011 20:47:45 +0000 (13:47 -0700)]
syscall.*execve: Fix argv access on newer kernels
Kernel commits ba2d0162 and 0e028465, merged in 3.0, refactored the
arguments of do_execve and compat_do_execve, such that "__argv"
is now the name of the incoming pointer, and "argv" is a local
struct user_arg_ptr. Our tapset must adapt to the new names.
* tapset/syscalls.stp (syscall.execve, syscall.compat_execve): Use
@defined to set an internal local __argv to either $__argv or $argv,
then use that for the other __get_argv calls.
* testsuite/buildok/twentyseven.stp: Update for $__argv vs. $argv.
* testsuite/systemtap.base/pointer_array.stp: Ditto.
Josh Stone [Thu, 14 Jul 2011 22:32:42 +0000 (15:32 -0700)]
rhbz717136: Fix SDT relocations in prelinked modules
* tapsets.cxx (sdt_query::handle_probe_entry): The debuginfoless SDT
addresses are relative to the ELF file, so get only that bias. The
DWARF bias is not interesting here.
(sdt_query::setup_note_probe_entry): Add the ELF bias to the semaphore
address too, so record_semaphore can completely relocate it.
(sdt_query::record_semaphore): SDT V3 semaphores need relocation too,
now removing both the bias and prelinking effects.
Petr Muller [Wed, 13 Jul 2011 16:41:47 +0000 (18:41 +0200)]
stap-serverd.cxx: fix memory and resource leaks
While playing with cppcheck tool, I found few resource leaks in stap-serverd.cxx:
- handleRequest: arg was not freed if opening/reading argfile failed
- handleRequest: argfile was not fclosed when reading from it failed
- spawn_and_wait: dotfd was not closed if chdir fails (macro expanded to accomodate resource release)
- spawn_and_wait: cleaned some whitespace up around the fix itself
Josh Stone [Wed, 13 Jul 2011 20:46:29 +0000 (13:46 -0700)]
PR12890 remote: Heed the capabilities of the other side
If the remote side is < 1.6, then it won't know staprun -R, so we'll
have to live without module renaming on that host.
* buildrun.cxx (make_run_command): Add a version parameter, defaulted to
the current VERSION. Don't add -R unless >= 1.6.
* remote.cxx (stapsh::set_child_fds): Save the handshake version.
(stapsh::start): Pass the remote's version to make_run_command.
(ssh_legacy_remote::start): Pass version 1.3 to make_run_command,
treating all "legacy" hosts as somewhat old.
PR12890 cont'd: autoconf elfutils usage in staprun
* configure.ac, Makefile.am: Look for system elfutils.
Check for modern enough version (0.142+), set HAVE_ELF_GETSHDRSTRNDX.
* staprun_funcs.c (rename_module): Conditionally stub out.
* common.c (usage): Conditionally bury -R flag.
* staprun.cxx (init_staprun): Avoid advising people who can't to use -R.
* configure, config.in, aclocal.m4, Makefile.in: Regenerated on F15.
Josh Stone [Tue, 12 Jul 2011 19:08:46 +0000 (12:08 -0700)]
stapsh: Check staprun X_OK and increase error verbosity
* runtime/staprun/stapsh.c (do_run): Explicitly check that we have
execute permissions on staprun before spawning, so we can give a
better error message than just a non-zero status code.
* remote.cxx (stapsh::send_file): Report errors with any verbosity.
(stapsh::start): Report errors with any verbosity, and close handles
on failure so we don't try to wait for further activity.
Lukas Berk [Tue, 12 Jul 2011 19:06:09 +0000 (15:06 -0400)]
PR12729: Improve stap error message
Now report when the user doesn't have permission to run staprun
or if posix_spawnp is unable to launch the process
remote.cxx - finish now reports failure to launch
trycatch.exp - account for the new warning
util.cxx - report if staprun isn't executable or if stap_waitpid failed
Stan Cox [Tue, 12 Jul 2011 02:25:42 +0000 (22:25 -0400)]
PR6954 Add a used variables set for use by automatic global printing.
* staptree.h (varuse_collecting_visitor::used): New.
* staptree.cxx (varuse_collecting_visitor::visit_symbol): Use previous
method for setting read and write sets. Also set used set.
* elaborate.cxx (add_global_var_display): Use the used set.
* global_end.exp (global_end_var): Initialize to non-zero values.
* global_end.stp (global_end_var): Likewise.