This is the mail archive of the
systemtap@sources.redhat.com
mailing list for the systemtap project.
Re: Script tapsets.
Frank Ch. Eigler wrote:
I just
committed a partial new tapset section in the archpaper directory,
which may finally illuminate the promise of script-only tapsets.
- FChE
Thanks for starting a write up about script based tapset.
I went through your script tapset write-up here are my observations
You mentioned tapsets are stored in a library used in elaboration
phase. When you say library does this mean compiled .o or .a form or
just script sources themselves? How about we let folks write the tapset
functions in scripts or "C" but we will generate the code to "C" form
and compile it into a module that can be loaded independently. If not
module at least make them in the form of a library that we can link to
the end user generated code. The main reason for this is it is easy to
package and test for a given release.
1) Useful auxiliary functions: You have introduced the idea that one can
write useful auxiliary functions centrally. One can do these as easily
in C as in systemtap language. In fact we have to have a systemtap
tapset that has start and end functions to do the setup and cleanup. We
can make these start/end and any other useful functions as a module that
every other systemtap module depends on. I am not sure of any advantage
of writing these library functions in scripting language versus "C"
2) Automagic Global variables: Like the above we can have the same
features implemented in the systemtap module and export them as GPL
symbols and any one can access them from anywhere. I am not sure i see
any advantage of doing this in a script.
3) Probe alias: Probe alias is a useful feature. It can be implemented
in "C" as well using a lookup table which contains what is the original
name and what is the advertised name. I am not sure i see any particular
advantage of doing it in a script.
In your write up you mentioned "The following script defines a new
"event" and supplies some variables for use by its handlers"
I am thinking "event" in the above statement means a "probe point", is
that right.
Let us look at the real script you have mentioned in the doc.
kernel.statement("do_page_fault").label("out_of_memory") {
if ($tsk->uid == 0) next;
victim_tgid = $tsk->tgid;
victim_pid = $tsk->pid;
victim_uid = $tsk->uid;
victim_fault_addr = $address
}
The main interesting piece of code in the above is code generated to get
local variables tsk and address.
If these local variables are made available let us say through a
function or macro call writing the above code in C is trivial as well.
The problems with this script based approach that i can see are
1) These scripts are going to leave outside the kernel code hence
maintenance is a major problem. The problem is even more severe as we
access datastructures directly not through an advertised API.
2) It is not easy if not impossible to convince kernel developers to
learn new scripting language for dynamic tracing when the have the
luxury of rebuilding the kernel at will. I personally think without the
help of kernel developers we can not come up with good tapsets in all
the areas as we are not experts in each subsystem.
3) Another problem is if the variables needed in the probe handlers are
declared local to the "C" files, script based tapsets can not be used.
While working on VM tapset we encountered this problem with struct
scan_control that is local to the file. We have encountered similar
problems in the filesystem area.
4) If you take the above example it is not clear to me how are we going
to figure out which header file has the definition of struct task and
what are all the dependent headerfiles that we have to include in order
to compile the above generated code.
5) The example you have provided is simple enough hence it doesn't
really matter if we write in "C" or script but if we have a complicated
one where one might have to do some locking and traverse a list and
compute some values etc., i am not sure it is easy to express that in
systemtap limited language.
If you look from an existing kprobes users point of view, as they are
potential tapset writers, what they want out of systemtap is
1)A Convenient way to access local variables and arguments any where in
the function. Function entry is achieved through jprobes now.
2) An enhancement to Kprobes API so that they can specify the probe
point location in a more portable fashion than the current hex address
format.
The second one can be more easily solved through an api such as
tapset_function_register similar to kprobes_register function where in
they can describe the location similar to our systemtap co-oridnate system.
If we can achieve most of our probe handlers through jprobes and return
probes then first problem is take otherwise it is more difficult one to
solve.
One way to solve the first problem is let systemtap consult debug data
and pass the required variable to the handlers. The above example would
look like the following.
dopgflt_outofmem_handler (struct pt_regs *regs, struct task *tsk,
unsigned long addr; void *buf, int bufsize) {
if (task->uid != 0)
{
copy relevent variables to buf;
}
}
In the above approach consulting debugging code is completely out of the
kernel hence handlers can be written like any other functions. The
disadvantage with the above approach is API for communication between
the handlers and systemtap is not fixed, arguments vary from call to call.
Another approach would be if there is a way to provide an API to get
local variables like GET_LOCAL_VAR(varname, vartype) then kernel
developers can write the handlers like any other functions.
For example if there was such a function/macro to get locals the above
code can be easily written in C as follows with a standard API between
systemtap and handlers.
dopgflt_outofmem_handler (struct pt_regs *regs, void *buf, int bufsize) {
struct task *task;
unsigned long addr;
task = GET_LOCAL_VAR(tsk);
addr = GET_LOCAL_VAR(address);
if (task->uid != 0)
{
copy relevent variables to buf;
}
}
I think it is not possible to provide such an API as we need to consult
debug data to generate the code to access locals and args. One round
about way may be is to run a tool that can consult and expand these
local variables access macros after the rest of the kernel is compiled,
then we can compile all the handlers. I am not a compiler expert, so
please correct me if i am wrong.
Looking at Dtrace papers it makes me feel fairly certain that Dtrace
providers are written in C but i dont know how they solved this problem
of accessing local variables, unless all their provider functions are
like jprobes only.
Just to summarize my long posting, i think we need to get kernel
community onboard for us to be successful with tapsets and long term
maintenance of them. I think we should look at all the solutions
possible to make that happen. I would even suggest once we have
concrete ideas we can even post in LKML and solicit kernel developers
opinion which one they might prefer or they might have a better
suggestion that we all can leave with.