Typical compile times for simple scripts probing kernel.syscall.* is 1 to 2 minutes. ~> time stap -p2 sys.stp > foo real 0m1.570s user 0m1.518s sys 0m0.052s ~> time stap -p3 sys.stp > foo real 0m3.282s user 0m2.136s sys 0m1.148s ~> wc -l foo 183458 foo ~> time stap -p4 sys.stp real 1m27.217s user 1m23.365s sys 0m4.691s So we have a 183458 line C file to compile. The context struct itself is over 14000 lines long and includes stuff like: struct function__module_flags_str_locals { int64_t f; union { struct { }; struct { }; struct { }; struct { }; struct { }; struct { }; struct { }; struct { }; struct { }; struct { }; struct { }; struct { }; }; string_t __retvalue; } function__module_flags_str; Everything else in the C file looks normal at first glance. Very repetetive, obviously.
Created attachment 804 [details] my test case
Are you sure you're running cvs systemtap? Graydon made a big improvement in just this area of code a few days ago: bug #1931
Never mind, misunderstood your timings. Needs further study.
This looks decidedly wrong. Off hand I can't tell why. It's possible that we're simply generating too much code -- maybe 200 syscalls times a handful of parameter-accessor functions makes "too much code" -- but it also looks like we're generating junk as well.
Experiments ongoing. Counterintuitively, it seems like the probe handler bodies are *not* the dominant factor. With all ~500 of them commented out, the compile time is still just as long. Judging by the resulting function/symbol sizes, I infer that the module_init/module_exit functions are stressing the C compiler most, and therefore will look there first.
Patches just committed appear to improve this significantly.
BEFORE ~> time stap -p4 sys.stp real 1m40.500s user 1m35.334s sys 0m5.947s AFTER ~> time stap -p4 sys.stp real 0m47.287s user 0m46.979s sys 0m1.393s That was a big improvement. Still, I hope we can eventually improve upon this. I suggest keeping this open at a lowered priority.
Right. I anticipate further improvements are possible along these lines: - reducing the amount of code generated (duh), particularly: - collecting the activity-count additions & especially checks - reducing the frequency of last_stmt assignments, and last_eerror checks - raising some global variable locking/unlocking code up to the outermost nesting level of probe/function bodies; beyond simplifying the emitted C code, this could reduce potential concurrency but it would kill a bunch of race conditions - adjusting the kbuild CFLAGS to lessen optimization
*** Bug 1159 has been marked as a duplicate of this bug. ***
*** Bug 1330 has been marked as a duplicate of this bug. ***
- will include lock lifting, unused $target elimination, and one or two other optimizations
mostly done; need just lock lifting now
lock lifting done. other future improvements are possible; will be tracked separately.