Bug 10849 - make MAXSKIPPED overflow trigger an error message
Summary: make MAXSKIPPED overflow trigger an error message
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: translator (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Charley
URL:
Keywords:
Depends on:
Blocks: blockers-1.1
  Show dependency treegraph
 
Reported: 2009-10-26 17:16 UTC by Frank Ch. Eigler
Modified: 2009-11-09 20:06 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Simplified patch (370 bytes, text/plain)
2009-10-28 18:20 UTC, Charley
Details
Updated patch, generated via git diff (546 bytes, patch)
2009-10-28 20:13 UTC, Charley
Details | Diff
patch for displaying error on MAXSKIPPED exceeded (558 bytes, patch)
2009-10-30 20:29 UTC, Roland Grunberg
Details | Diff
mimic functionality of atomic_cmpxchg (768 bytes, patch)
2009-11-05 22:05 UTC, Roland Grunberg
Details | Diff
Additional changes to pseudo_atomic_cmpxchg in runtime.h (760 bytes, patch)
2009-11-06 18:59 UTC, Roland Grunberg
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Ch. Eigler 2009-10-26 17:16:53 UTC
Since we abort a script when num_skipped > MAXSKIPPED, we should
emit an ERROR message.  It should name MAXSKIPPED, and could mention
rerunning with "stap -t" for greater skip detail info.
Comment 1 Roland Grunberg 2009-10-26 20:08:44 UTC
Error is generated mentioning MAXSKIPPED and info about running again with -t
option.

diff --git a/tapsets.cxx b/tapsets.cxx
index 17e6c6c..008a7f9 100644
--- a/tapsets.cxx
+++ b/tapsets.cxx
@@ -216,6 +216,7 @@ common_probe_entryfn_epilogue (translator_output* o,
   o->newline(1) << "_stp_softerror (\"%s\", c->last_error);";
   o->indent(-1);
   o->newline() << "atomic_inc (& error_count);";
+  o->newline() << "c->last_error = \"Skipped too many probes, check MAXSKIPPED
or try again with stap -t for more details.\";";
   o->newline() << "if (atomic_read (& error_count) > MAXERRORS) {";
   o->newline(1) << "atomic_set (& session_state, STAP_SESSION_ERROR);";
   o->newline() << "_stp_exit ();";
@@ -229,7 +230,7 @@ common_probe_entryfn_epilogue (translator_output* o,
   // Check for excessive skip counts.
   o->newline() << "if (unlikely (atomic_read (& skipped_count) > MAXSKIPPED)) {";
   o->newline(1) << "atomic_set (& session_state, STAP_SESSION_ERROR);";
-  o->newline() << "_stp_exit ();";
+  o->newline() << "_stp_error (\"Skipped too many probes, check MAXSKIPPED or
try again with stap -t for more details.\");";
   o->newline(-1) << "}";

   o->newline() << "#if INTERRUPTIBLE";
Comment 2 Frank Ch. Eigler 2009-10-28 12:49:03 UTC
The second hunk (the _stp_error() call) should do the trick; the first
(just setting c->last_error) probably doesn't do what you want here.

Can you describe your testing?
Comment 3 Charley 2009-10-28 18:20:42 UTC
Created attachment 4333 [details]
Simplified patch

(In reply to comment #2)
> The second hunk (the _stp_error() call) should do the trick; the first
> (just setting c->last_error) probably doesn't do what you want here.
> 
> Can you describe your testing?
> 

Roland and I tried setting the MAXUPROBES and MAXSKIPPED directives to force
SystemTap to trigger the MAXSKIPPED error message seems to work for testing
purposes. The current proposal should print the error message once if the
number of skipped probes ever exceeds MAXSKIPPED.

Command:
stap -DMAXSKIPPED=0 -DMAXUPROBES=0 -c 'userBin' userScript.stp

Where userScript.stp probes a number of functions in the target binary.

Results:
ERROR: Skipped too many probes, check MAXSKIPPED or try again with stap -t for
more details.
WARNING: Number of errors: 0, skipped probes: 26

[result of probe begin, if any]

Attached a version of the patch (generated via git diff) with only the single
line of code inserted :)

-Charley
Comment 4 Charley 2009-10-28 20:13:39 UTC
Created attachment 4334 [details]
Updated patch, generated via git diff

[Message summary, if you don't want to read it all: Trying to always generate
the skipped message, no matter what other errors occur nor what the script
says]

Previous patch does not work if there is no probe end {} in the script. Created
a patch that does work with probe end.

Just adding the error messages to the appropriate places results in a println
for each skipped probe, so I used an 'atomic_read(& session_state)' and a
comparison to STAP_SESSION_ERROR to determine whether or not to print the
error.

The problem, though, is that if another error sets session state to
STAP_SESSION_ERROR, then the skipped message will not be printed. The only
error that does this is when error_count > MAXERROR -- here if we encounter
error_count > MAXERROR before skipped > MAXSKIPPED then the latter will not
print any error message.

-Charley
Comment 5 Frank Ch. Eigler 2009-10-28 23:17:49 UTC
> Previous patch does not work if there is no probe end {} in the script. 

Why is that?


> Just adding the error messages to the appropriate places results in a println
> for each skipped probe, so I used an 'atomic_read(& session_state)' and a
> comparison to STAP_SESSION_ERROR to determine whether or not to print the
> error.

A better way might be to use 

atomic_compxchg(& skipped_count, STAP_SESSION_RUNNING, STAP_SESSION_ERROR)


> The problem, though, is that if another error sets session state to
> STAP_SESSION_ERROR, then the skipped message will not be printed. The only
> error that does this is when error_count > MAXERROR -- here if we encounter
> error_count > MAXERROR before skipped > MAXSKIPPED then the latter will not
> print any error message.

That's not so bad.  What was worse is there appearing only a warning to
indicate exceeding MAXSKIPPED.
Comment 6 Roland Grunberg 2009-10-30 20:29:11 UTC
Created attachment 4346 [details]
patch for displaying error on MAXSKIPPED exceeded

Used function atomic_cmpxchg to check the value of the state, then change to
ERROR, and only launch the print statement if the state was originally RUNNING.
Comment 7 Frank Ch. Eigler 2009-11-05 14:46:43 UTC
just need a rhel4 atomic_compxchg implementation/hack
Comment 8 Roland Grunberg 2009-11-05 22:05:55 UTC
Created attachment 4363 [details]
mimic functionality of atomic_cmpxchg

This is the proposed implementation to mimic 'atomic_cmpxchg'. Have not been
able to confirm it works for RHEL 4 as it is has been difficult finding various
kernel debuginfo requirements. If anyone else is able to confirm it works
before either Charley or myself, please let us know.
Comment 9 Roland Grunberg 2009-11-06 18:59:44 UTC
Created attachment 4365 [details]
Additional changes to pseudo_atomic_cmpxchg in runtime.h

Changed assignment to/from atomic_t to use atomic_read/set. Changed function to
be a macro. There are other places in the systemtap code where the result of
atomic_read is assumed to be of type int. ie. current HEAD,
translate.cxx:1426-1441.

Tested successfully on RHEL 4.6
Comment 10 Frank Ch. Eigler 2009-11-09 20:06:31 UTC
committed