This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: stptracer-20060828 has released.
Frank Ch. Eigler wrote:
Hi -
On Fri, Sep 01, 2006 at 11:47:05AM +0800, Li Guanglei wrote:
[...] But I have some questions about compiled approach. stap will
treat all integer data as 64-bit, but most binary trace integer data
items need only 1 or 2 bytes. [...] And will it use a fix length for
all string items? Some string trace data items only have a few
chars.
The translator could apply obvious policies to this question. If the
formatting string includes "%1b", it would write just the low-order
byte of the integer. If the formatting string includes "%64s", this
would result in a fixed-width substring field. For plain "%s", it
would be a dynamic-width field, which alone would not be ideal for
binary format streams.
The idea here is to make this compiled tracing a pure optimization:
not to change the script input nor data output, but just to produce it
quicker.
Thanks. I see.
[...] But in some situations I will put the print/trace statement in
embedded c codes to avoid the calls to function__dwarf_tvar_get*.
If you don't call those tvar_get functions, how are you safely
extracting target data?
One example is the struct scsi_cmnd in scsi trace hooks in LKET. I
need to retrieve 10 arguments from this struct. In the embedded c
codes they are only 10 assignments while in stap script they will be
extended to 10 function calls.
In exchange for those function calls, you're getting safe
dereferencing of all those values. How much slowdown is
that worth?
I run a series of tests to figure it out. I use the same single-thread
application named app_getsid with will call getsid() in a loop, the
results are:
<1> without probe
cpu 0: loops=5000000, average=392 ns
<2> probe using LKET(2.stp): log 4 integers
cpu 0: loops=5000000, average=2280 ns
<3> probe using LKET(3.stp): log 4 integers + visit 5 target variables
cpu 0: loops=5000000, average=2445 ns
<4> probe using LKST(4.stp): log 4 integers
cpu 0: loops=5000000, average=1710 ns
<5> probe using LKST(5.stp): log 4 integers + visit 5 target variables
cpu 0: loops=5000000, average=1859 ns
The testing shows that visiting 5 target variables will cost ~150ns,
and LKST is ~570ns faster than LKET if only log 4 integers. I think
the compiled approach should have almost the same speed as <5>, so it
will roughly be at least ~420ns faster than LKET.
But it's not common that the trace handler will visit a number of data
members of a single struct. In most cases it will just visit different
target variables and in such case LKET will also need the same number
of tvar_get functions as the compiled approach does in script level.
Most target variables can be visited safely unless there is a bug in
Kernel. Those target variables that could potentially have invalid
values need explicitly checking before use them, and LKET needs a
piece of c codes to check them. But we can't make sure every one
doesn't miss some checking codes. So considering the benefit of
tvar_get() and its cost, I think it is worth to use tvar_get().
The following is the stp scripts.
2.stp:
function foo()
%{
_lket_trace(1, 1, "%8b%8b%8b%8b", 111,22,333,444);
%}
probe syscall.getsid {
foo()
}
3.stp:
function foo(a:long, b:long, c:long, d:long, e:long)
%{
_lket_trace(1, 1, "%8b%8b%8b%8b", 111,22,333,444);
%}
probe syscall.getsid {
foo($pid, $pid, $pid, $pid, $pid)
}
4.stp:
probe syscall.getsid {
lkst_trace_n (4, 1, 12, 222, 333, 4444)
}
5.stp:
function foo:long(a:long, b:long, c:long, d:long, e:long)
%{
THIS->__retvalue = 0;
%}
probe syscall.getsid {
foo($pid, $pid, $pid, $pid, $pid)
lkst_trace_n (4, 1, 12, 222, 333, 4444)
}
- Guanglei