To recap the discussion, we need a benchmark suite to measure both translation-time performance and script execution performance. Microbenchmarks focused on particular scripting constructs (loops, conditionals, string operations) should be included. The suite should be in the form of a dejagnu test that anyone can run to look for performance regressions. Eye candy is optional.
It would be a good idea to have the stap translator be able to report the time for various phases, e.g. parse, elaborate, translate, compile, and run script. Have something similar to the gcc -ftime-report. -Will
Added code to print timing information in verbose mode. # stap -p4 -v [...] 2>&1 | grep 'Pass.*in.*ms' Pass 1: parsed user script and 10 library script(s) in 380+10 (u+s) ms. Pass 2: analyzed user script. 595 probe(s), 32 function(s), 2 global(s) in 4480+600 (u+s) ms. Pass 3: translated to C into "/tmp/stapYStmvr/stap_21070.c" in 620+150 (u+s) ms. Pass 4: compiled C into "stap_21070.ko" in 45510+1040 (u+s) ms. Will work on making verbosity a gradated measure, so plain "-v" prints just those lines, and more "-v"s spout more details.
I was thinking of having a separate option "-t" for the timing information to Minimize the amount of additional information the scripts have to parse to get timing information. However, the current scheme with the graduated "-v" seems to keep down the amount of extraneous information. It would be worthwhile to include wall clock time to get some idea how long the user would perceive things as taking.
Now collecting latency information for the various phases of systemtap translator. Modified tests/testsuite/lib/stap_run.exp and stap_run2.exp to make use of the timing information generated by "stap -v". The information is written into the systemtap.log when the dejagnutests are run. This information can be extracted in tab separated form with the simple script in tests/tools/stap_metrics. The resulting information can be imported into a spreadsheet and analyzed. There is also a tests/tools/stap_profile script that runs the dejagnu systemtap tests under OProfile. This provides some measurements where the hotspots in the instrumentation system are.
Frank asked me to add some support to get timing information about the SystemTap probe actions and provide some statistics about the probe actions. The thought is to have a "-t" (benchmark timing) option for stap to instrument the probe prologues and epilogues. As a quick sanity I quickly wrote something this afternoon as a strictly systemtap script with no modifications to the systemtap translator. One thing I noticed is that the @avg doe not handle no entries gracefully. It just gives a "ERROR: divide by zero" global _kf_sys_getuid global _begin probe kernel.function("sys_getuid") { start = get_cycles() /* FIXME action goes in here */ stop = get_cycles() _kf_sys_getuid <<< (stop-start) } probe begin { start = get_cycles() printf( "starting\n" ); stop = get_cycles() _begin <<< (stop-start) } probe end { printf( "kernel.function(\"sys_getuid\") %d@%d\n", @count(_kf_sys_getuid), @avg(_kf_sys_getuid) ) printf( "begin %d@%d\n", @count(_begin), @avg(_begin) ) }
Created attachment 993 [details] a little start to -DBENCHMARK Additional stuff needed includes: - code in c_unparser::emit_probe to emit a Stat object for each probe - code in emit_probe_epilogue to add the cycles_elapsed into that Stat object - code in c_unparser::emit_module_exit, within the unregister_probe_* loop, to print a brief summary of the computed Stat values for each probe.
Created attachment 997 [details] collect and print out cycle information. Have a macro STP_TIMING that could be defined to trigger the measurements
Created attachment 1009 [details] Patch to map timing information back to script The attached patch provide file and line number information to make it easier to figure out what parts of the script the timing information refers to.
Created attachment 1011 [details] include symbolic location information
Go ahead and commit; we can tweak the format logic in plain cvs.
If you don't intend to do any more work on this, please close the bug.
Basic script timing code has been committed long ago. We still lack an organized benchmark suite in dejagnu - something to run if we make code generation changes.
One possible place to look for microbenchmark ideas is in the runtime/bench2 directory. I used the scripts there when doing some optimization work in the past.
stap -t is well baked. Individual suspected performance bottlenecks are already being tracked separately.