I'm playing with adding userspace markers to glib, and while it works fine the marker code really doesn't look very efficient. (Using gcc-4.4.2-20.fc12.x86_64 and systemtap-1.0-2.fc12.x86_64 here) Here is an example on instrumenting g_object_ref() which is essentially an atomic inc (and some type checks in this debug build). http://www.gnome.org/~alexl/ref-notracing.s Here is how it looks when adding a marker that stores the address the object type and the old refcount: http://www.gnome.org/~alexl/ref-tracing.s There is a whole bunch of code before the nop that'll be the breakpoint, and additionally there is extra code in at function entry/exit in order to handle the stack usage that was added. This is not really acceptable to ship in a production build of glib, since something like g_object_ref() is called a lot of times. The main cause of this is the volatile used for arg1-arg3 in STAP_PROBE3(), which cause all these values to be forced out to memory, and then re-read when we want them in registers for the tracepoint. This is kinda weird, as just having them in either the stack or in the registers should be enough. And of those two, only having it in the registers is obviously more efficient. Removing the volatiles helps a lot, code wise: http://www.gnome.org/~alexl/ref-tracing-novolatile.s However, doing this on a larger scales shows why this doesn't work: semantic error: failed to retrieve location attribute for local 'arg2' (dieoffset: 0x26af9): identifier '$arg2' at /gnome/INSTALL/share/systemtap/tapset/gobject.stp:31:27 source: type = gtypenames[pid(),$arg2]; In this particular case the marker looks like: 2: nop /* %rbp 32(%rsp) */ Obviously the debug info produces by gcc for this was unable to find the storage position for $arg2. While I can sort of understand the debug info having problems with this I still think its an unacceptable cost to do what the code does atm which means this is not really suitable for being enabled in production code. Furthermore, I have a generic issue with the way the markers are implemented, which is somewhat related to this. Systemtap is able to look up the position of the probe via the references in the .probe section and enable the breakpoint in the right place. However, to read the actual probe arguments it need debugging information in the binary, even though the information required is already read into registers. This means: a) You can't use static markers on stripped code b) You have to forever fight debug info generation vs optimization But, a single look at the asm generated for the probe shows us that all the required information is already availible. Take the above example: nop /* %rbp 32(%rsp) */ This is from sdt.h: __asm__ volatile ("2:\n" STAP_NOP "/* %0 %1 %2 */" :: "g"(arg1), "g"(arg2), "g"(arg3)); Clearly all you have to do is instead of putting the args in a comment just do add something like this to the asm: .section .probes,"aw" .align 8 1: .asciz "%0 %1 %2" .align 8 .quad 1b .previous Then all you have to do is add some simple asm argument parser to systemtap and you'll immediately get rid off having to rely on debug info.
Obviously you also need to know the data type, but as only primitive types are supported anyway we just need to insert sizeof(argN) into the data.
committed, subject to more tweaks for minimum gcc version numbering
Implemented no debuginfo support upstream, currently turned on via -DSTAP_SDT_V2
Stan: You are my hero!
Thanks for the good suggestion!