Dave Brolley [Fri, 27 May 2011 19:29:26 +0000 (15:29 -0400)]
Improve the chances of using the same port when the compile server is restarted.
1) Keep the socket open when the server takes itself down internally in order
to regenerate its certificate.
2) Use PR_SocketOpt_reuseaddr so that a specified port can be reused when the
server is externally halted and restarted.
Lukas Berk [Wed, 25 May 2011 20:51:17 +0000 (16:51 -0400)]
PR12743 stap -L 'process("PATH").syscall' now reports context variables
NEWS - added news comment
tapset-utrace.cxx - added utrace_derived_probe::getargs to push
back the context variables to be listed
David Smith [Mon, 23 May 2011 21:24:17 +0000 (16:24 -0500)]
Avoid some bad testsuite fails becase of unhandled child exit status warnings.
* testsuite/systemtap.base/target_set.exp: Handle warning about child
process exiting with a bad status.
* testsuite/systemtap.context/uprobe_stmt_num.exp: Ditto.
* testsuite/systemtap.context/uprobe_uaddr.exp: Ditto.
* testsuite/systemtap.context/usymbols.c: Explicitly exit with a status of
0 (to avoid warnings about bad child exit status).
Josh Stone [Sat, 21 May 2011 01:41:39 +0000 (18:41 -0700)]
PR12770: In loc2c, discontiguify based on total_bytes
The total_bytes number is the part that really matters in determining
whether a given dereference is greater than native size. In particular,
using loc->byte_size from a loc_address was telling us that 4 bytes is
ok (sizeof pointer), even though the pointee was 8 total_bytes.
* loc2c.c (discontiguify::pieces_small_enough): Compare total_bytes to
the platform word size to decide whether to split into pieces.
(discontiguify): For loc_constant, we have nothing to do as they're
always copied byte-wise, but we should truncate the constant block to
total_bytes.
Josh Stone [Fri, 20 May 2011 23:29:21 +0000 (16:29 -0700)]
Let the @sum of an empty aggregate be 0
It's handy to have a defined meaning for the @sum of empty aggregate, so
one doesn't need to check for @count > 0 every time. For example, now a
pattern like "x += y; ... println(x)" can be more directly converted to
aggregates with "x <<< y; ... println(@sum(x))".
* translate.cxx (c_unparser::visit_stat_op): Let the @sum be read even
on empty aggregates, but leaving the old behavior for compatibility.
* testsuite/systemtap.maps/absentstats.exp: Check that both @count and
@sum are allowed by default, and only @count for stap <= 1.4.
* testsuite/systemtap.maps/ix_clear*.stp: Use @min for empty failure.
David Smith [Fri, 20 May 2011 22:17:58 +0000 (17:17 -0500)]
Fixed source_context.stp test.
* testsuite/semko/source_context.stp: Moved from parseko, since this is a
pass 2 test, not a pass 1 test. It was sucessfully failing before, but
only because the shell was trying to execute it instead of systemtap.
David Smith [Wed, 18 May 2011 15:32:56 +0000 (10:32 -0500)]
Fix pipe paths in the logger PMDA.
* pcp/src/pmdas/logger/event.c (event_create): Handle non-blocking reads.
* pcp/src/pmdas/logger/util.c (start_cmd): Close the correct side of the
pipe in the parent process.
Josh Stone [Tue, 17 May 2011 20:38:49 +0000 (13:38 -0700)]
Make sure that sigaction always starts zeroed
On some systems, stapsh's child process was getting automatically
reaped, which caused an error when we tried to waitpid(). This was
because we weren't fully zeroing the sigaction, so sa_flags was
uninitialized (and happened to contain SA_NOCLDWAIT).
This patch sprinkles memset-0 on sigactions throughout.
Frank Ch. Eigler [Tue, 17 May 2011 10:53:34 +0000 (06:53 -0400)]
PR12211: rework testsuite with failed-subtask tolerance
After PR12211, a "stap -c CMD" run where the CMD fails (rc != 0)
now results in an overall stap failure. This is undesirable with
some of the example scripts that use failed CMDs.
* systemtap.examples/general/para-callgraph-verbose.meta (test_installcheck):
Add a "|| true" to the sh -c command string to ensure CMD is deemed to
succeed.
* systemtap.examples/general/para-callgraph.meta: Ditto.
* systemtap.examples/process/noptrace.meta: Ditto.
Frank Ch. Eigler [Tue, 17 May 2011 10:50:45 +0000 (06:50 -0400)]
example: make badname demo work with euid=0
The stap testsuite may be run under euid=0, so make sure the
test triggers the name filtering.
* sysadmin.examples/general/badname.stp: Drop euid()==0 filtering.
Use a more specific file name substring, to prevent accidental
interference with host system during testsuite.
* sysadmin.examples/general/badname.meta: Adjust accordingly.
Frank Ch. Eigler [Sat, 14 May 2011 15:26:17 +0000 (11:26 -0400)]
PR12729: make childprocess spawning/waitpid both -vv verbose
With recent code, -vv was enough to get a "Running ...." message for
child processes of the translator, but not their "Spawn waitpid ..."
return codes. That required -vvv. Make them consistent.
* util.cxx (stap_waitpid): Report at verbosity > 1.
Frank Ch. Eigler [Sat, 14 May 2011 14:50:54 +0000 (10:50 -0400)]
--ldd: turn bad-interpreter findings into warnings
A semantic_error is too heavy. We can just skip the --ldd processing
for these binaries and move on.
* dwflpp.cxx (iterate_over_libraries): Print (suppressible) warning
instead of throwing a semantic_error if the module interpreter is
not in our whitelist.
Frank Ch. Eigler [Sat, 14 May 2011 14:49:43 +0000 (10:49 -0400)]
--ldd: improve error handling
* translate.cxx (add_unwindsym_ldd): Use delete, not free() on c++ object.
(prepare_translate_pss): Add an exception catch around prepare_symbol_data()
Sometimes stuff happens to probe handlers, and they refuse to clean up
cleanly (resetting their contexts to non-busy state). This can cause
module shutdowns to hang with a busy-waiting stapio. This patch lets
a user override the infinite loop with -DSTAP_OVERRIDE_STUCK_CONTEXT
to let the module be cleaned up. Without that set, at least the
infinite loop is made to spin less tightly.
* translate.cxx (c_unparser::emit_module_exit): Hang looser
upon stuck contexts.
gcc 4.6 generates more warnings, which made some runtime/autoconf
tests falsely fail. The worst example of this was a crash in the
context.exp test case (due to false negative STAPCONF_WALK_STACK).
This patch adds some dummy code to the tests to make the warnings
go away, and thus let the tests pass even with -Werror.
Dave Brolley [Thu, 12 May 2011 19:50:14 +0000 (15:50 -0400)]
Handle resource limit violations of stap when called by stap-serverd more elegantly.
- Catch SIGXFSZ and SIGXCPU in stap and in stap-serverd.
- Exit gracefully from stap when caught.
- In stap-serverd, compare the current limits against the original limits
and continue if the current limits are less (i.e. are limits intended for stap).
- Set/restore limits around stap_spawn instead of spawn_and_wait.
Nathan Scott [Thu, 12 May 2011 09:41:01 +0000 (19:41 +1000)]
Move (back?) to a select-based PMDA with custom main.
An observed problem with doing log reads only during the
metric fetch callback, is that we start getting behind in
events once the volume starts ramping up from small files
to the point that the PMDA is overwhelmed. Caused by two
issues:
- only reading new events based solely on client fetch
intervals means for relatively long intervals (say once
every few minutes) the consumption doesn't keep up with
the event generation.
- only reading once (i.e. one read(2) call) per fetch,
which made the above even more severe and noticable.
We address these issues by using a custom PMDA loop which
is awoken on either readable file descriptors or expiry of
a timer (iow server side driven event reading, not client).
Whenever we wake, we consume all available events at that
time for each file descriptor.
Probing a process with corrupted DWARF information, it has been
possible to create a kernel-side divison-by-zero. This fixes.
Handle DW_OP_div/mod divide by zero. DW_OP_mod should work unsigned.
* loc2c.c (translate): Use helper functions div_op and mod_op for
DW_OP_div and DW_OP_mod operands. Set used_deref = true.
* translate.cxx (translate_runtime): Emit STAP_MSG_LOC2C_03 define.
* runtime/loc2c-runtime.h: Define dwarf_div_op and dwarf_mod_op macros.
* runtime/unwind.c (compute_expr): Check for zero before executing
DW_OP_mod or DW_OP_div.
Lukas Berk [Wed, 11 May 2011 19:20:48 +0000 (15:20 -0400)]
Gettext a few lines, update /po and Makefiles
Makefile.am - dont include config.h runtime/staprun/config.h or git_version.h
Makefile.in - likewise
nsscommon.cxx - gettexted a few strings
po/* - regenerated files
David Smith [Wed, 11 May 2011 17:52:34 +0000 (12:52 -0500)]
Merge branch 'master' of git://oss.sgi.com/nathans/systemtap
* 'master' of git://oss.sgi.com/nathans/systemtap:
Resolve a couple of issues from missing log file handling code.
Minor cleanups after writing QA tests.
Add code to deal with log file rotation.
Unify separate tables for tracking log files, simpler code.
Several additional metrics for log file PCP agent.
Update the domain number comment now that one is reserved.
Uncomment the seek-to-end-of-log-file code in pmdalogger.
Remove further shared library remnants in pmdalogger build.
Frank Ch. Eigler [Wed, 11 May 2011 17:40:53 +0000 (13:40 -0400)]
dtrace python i18n: make work with autoconf wackyness
autoconf likes to expand some @vars@ in terms of shell-script-like
constructs like @LOCALEDIR@ = "${datarootdir}/locale" and
@datarootdir@ = "${prefix}/share". Since python doesn't interpolate
strings the same way as /bin/sh, this no workie. So we hard-code the
interpolation with a sequence of string.replace calls.
Confirmed working with LANG=fr_FR strace python ./dtrace |& grep /fr
* configure.ac: AC_SUBST a few more values.
* dtrace.in: Specially process ENABLE_NLS and similar values.
Josh Stone [Wed, 11 May 2011 03:01:01 +0000 (20:01 -0700)]
remote: Add tests for manually-specified hosts
To run a basic test on hosts foo and bar, use:
make installcheck RUNTESTFLAGS=remote.exp TESTREMOTES=foo,bar
* testsuite/systemtap.base/remote.exp: New test of --remote hosts.
* testsuite/systemtap.base/remote.stp: New.
* testsuite/Makefile.am: Add TESTREMOTES control of remote.exp.
* testsuite/Makefile.in: Regenerate.
Josh Stone [Wed, 11 May 2011 00:33:59 +0000 (17:33 -0700)]
Create a signal-safe type for tracking spawned pids
* util.cxx (spawned_pids_t): New type which wraps a set<pid_t>, masking
signals on each access to ensure consistency in and out of the signal
handler. The spawned_pids global is the only instance.
(stap_waitpid): Use !contains(pid) rather than count(pid)==0.
(kill_stap_spawn): Use spawned_pids_t::killall().
Josh Stone [Wed, 11 May 2011 00:25:47 +0000 (17:25 -0700)]
Consolidate signal-masking into a utility class
* util.h (stap_sigmasker): New, masks our usual signals for the life of
the stap_sigmasker object.
* remote.cxx (direct_stapsh::direct_stapsh): Use stap_sigmasker while
spawning the stapsh child process.
(ssh_remote::connect): Ditto for the ssh process.
(remote::run): Use stap_sigmasker around the polling loop.
Josh Stone [Tue, 10 May 2011 23:45:54 +0000 (16:45 -0700)]
remote: Disambiguate the private target names
It's conceivable, however unlikely, that a user may have an actual host
named "direct" or "stapsh", which would conflict with our internal
methods if used as a --remote. Such a user could say "ssh://direct" to
be explicit, but we can also hide ours a little better. Those internal
names are now tested as proper URI schemes, e.g. "direct:...", so they
should never conflict with a user's legitimate target.
* remote.cxx (remote::create): Test for "direct" and "stapsh" only as
the scheme of a decoded URI.
* main.cxx (main): Use "direct:" for non-remote use.
* testsuite/systemtap.base/stapsh.exp: Use "stapsh:" for testing.
Josh Stone [Tue, 10 May 2011 22:03:02 +0000 (15:03 -0700)]
PR12749: Replace popen calls with stap_spawn_piped
The new form has the advantages that child processes are managed by
signals to stap, and that arguments are provided in a vector so they
don't need to be escaped.
* dwflpp.cxx (dwflpp::iterate_over_libraries): Convert popen call to
stap_spawn_piped, followed by fdopen so the same FILE* operations are
still supported. Finish with fclose+stap_waitpid instead of pclose.
* tapsets.cxx (symbol_table::read_from_elf_file): Ditto.
Josh Stone [Tue, 10 May 2011 22:00:58 +0000 (15:00 -0700)]
Remove the unused git_revision()
The use of this function had been commented out for some time now, and
it contained an unescaped call to popen. Rather than trying to fix dead
code, just remove it altogether.
Dave Brolley [Tue, 10 May 2011 18:37:17 +0000 (14:37 -0400)]
Systemtap Compile Server Integration (rewrite):
- Rewrite stap-serverd in C++
- Rewrite related tools (stap-gen-cert, stap-authorize-cert, stap-sign-module)
in C++. Integrate functionality into stap-serverd.
- Remove stap-server-connect (integrated into stap-serverd).
- Move all common NSS related code into nsscommon.cxx (renamed from nsscommon.c).
- Rename modsign.cxx to stap-sign-module.cxx.
- Update test suite with new expected messages.
- Update man pages.
- Remove obsolete tools (scripts).
- Remove test for certutil from configuation.
Josh Stone [Fri, 6 May 2011 23:31:08 +0000 (16:31 -0700)]
uprobes: impedance match insn tables with test_bit()
The kernel's test_bit expects its bitmap to be const volatile, but we
had ours as simply const. On Fedora 15 with gcc 4.6, compiling uprobes
gave a few warnings like this:
arch/x86/include/asm/bitops.h:319:2: warning: use of memory input
without lvalue in asm operand 1 is deprecated [enabled by default]
That line is the asm statement in variable_test_bit().
The symptom noticed was that handle_riprel_insn was reading need_modrm:0
for opcode 0x89, when our table says it should be 1. Who knows what
other havok ensued...
When our instruction tables are set const volatile to match test_bit(),
the warning goes away, and need_modrm is now computed correctly.