This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Script tapsets.

From: Vara Prasad <prasadav at us dot ibm dot com>
To: "Frank Ch. Eigler" <fche at redhat dot com>
Cc: "Chen, Brad" <brad dot chen at intel dot com>, systemtap at sources dot redhat dot com
Date: Fri, 06 May 2005 13:48:56 -0700
Subject: Re: Script tapsets.
References: <75EC4D5486CAC247B84AAAA6F96AA55804AB6B19@orsmsx402.amr.corp.intel.com> <y0mhdhk6ra7.fsf@toenail.toronto.redhat.com>

Frank Ch. Eigler wrote:


 I just
committed a partial new tapset section in the archpaper directory,
which may finally illuminate the promise of script-only tapsets.

- FChE

Thanks for starting a write up about script based tapset.

I went through your script tapset write-up here are my observations

You mentioned tapsets are stored in a library used in elaboration phase. When you say library does this mean compiled .o or .a form or just script sources themselves? How about we let folks write the tapset functions in scripts or "C" but we will generate the code to "C" form and compile it into a module that can be loaded independently. If not module at least make them in the form of a library that we can link to the end user generated code. The main reason for this is it is easy to package and test for a given release.

1) Useful auxiliary functions: You have introduced the idea that one can write useful auxiliary functions centrally. One can do these as easily in C as in systemtap language. In fact we have to have a systemtap tapset that has start and end functions to do the setup and cleanup. We can make these start/end and any other useful functions as a module that every other systemtap module depends on. I am not sure of any advantage of writing these library functions in scripting language versus "C"

2) Automagic Global variables: Like the above we can have the same features implemented in the systemtap module and export them as GPL symbols and any one can access them from anywhere. I am not sure i see any advantage of doing this in a script.

3) Probe alias: Probe alias is a useful feature. It can be implemented in "C" as well using a lookup table which contains what is the original name and what is the advertised name. I am not sure i see any particular advantage of doing it in a script.

In your write up you mentioned "The following script defines a new "event" and supplies some variables for use by its handlers" I am thinking "event" in the above statement means a "probe point", is that right.

Let us look at the real script you have mentioned in the doc.

kernel.statement("do_page_fault").label("out_of_memory") {
  if ($tsk->uid == 0) next;

  victim_tgid = $tsk->tgid;
  victim_pid = $tsk->pid;
  victim_uid = $tsk->uid;
  victim_fault_addr = $address
}

The main interesting piece of code in the above is code generated to get local variables tsk and address. If these local variables are made available let us say through a function or macro call writing the above code in C is trivial as well.

The problems with this script based approach that i can see are 1) These scripts are going to leave outside the kernel code hence maintenance is a major problem. The problem is even more severe as we access datastructures directly not through an advertised API. 2) It is not easy if not impossible to convince kernel developers to learn new scripting language for dynamic tracing when the have the luxury of rebuilding the kernel at will. I personally think without the help of kernel developers we can not come up with good tapsets in all the areas as we are not experts in each subsystem. 3) Another problem is if the variables needed in the probe handlers are declared local to the "C" files, script based tapsets can not be used. While working on VM tapset we encountered this problem with struct scan_control that is local to the file. We have encountered similar problems in the filesystem area. 4) If you take the above example it is not clear to me how are we going to figure out which header file has the definition of struct task and what are all the dependent headerfiles that we have to include in order to compile the above generated code. 5) The example you have provided is simple enough hence it doesn't really matter if we write in "C" or script but if we have a complicated one where one might have to do some locking and traverse a list and compute some values etc., i am not sure it is easy to express that in systemtap limited language.

If you look from an existing kprobes users point of view, as they are potential tapset writers, what they want out of systemtap is 1)A Convenient way to access local variables and arguments any where in the function. Function entry is achieved through jprobes now. 2) An enhancement to Kprobes API so that they can specify the probe point location in a more portable fashion than the current hex address format.

The second one can be more easily solved through an api such as tapset_function_register similar to kprobes_register function where in they can describe the location similar to our systemtap co-oridnate system.

If we can achieve most of our probe handlers through jprobes and return probes then first problem is take otherwise it is more difficult one to solve.

One way to solve the first problem is let systemtap consult debug data and pass the required variable to the handlers. The above example would look like the following. dopgflt_outofmem_handler (struct pt_regs *regs, struct task *tsk, unsigned long addr; void *buf, int bufsize) { if (task->uid != 0) { copy relevent variables to buf; } }

In the above approach consulting debugging code is completely out of the kernel hence handlers can be written like any other functions. The disadvantage with the above approach is API for communication between the handlers and systemtap is not fixed, arguments vary from call to call.

Another approach would be if there is a way to provide an API to get local variables like GET_LOCAL_VAR(varname, vartype) then kernel developers can write the handlers like any other functions.

For example if there was such a function/macro to get locals the above code can be easily written in C as follows with a standard API between systemtap and handlers.

dopgflt_outofmem_handler (struct pt_regs *regs, void *buf, int bufsize) {
   struct task *task;
   unsigned long addr;

   task = GET_LOCAL_VAR(tsk);
   addr = GET_LOCAL_VAR(address);
   if (task->uid != 0)
       {
        copy relevent variables to buf;
       }
}

I think it is not possible to provide such an API as we need to consult debug data to generate the code to access locals and args. One round about way may be is to run a tool that can consult and expand these local variables access macros after the rest of the kernel is compiled, then we can compile all the handlers. I am not a compiler expert, so please correct me if i am wrong.

Looking at Dtrace papers it makes me feel fairly certain that Dtrace providers are written in C but i dont know how they solved this problem of accessing local variables, unless all their provider functions are like jprobes only.

Just to summarize my long posting, i think we need to get kernel community onboard for us to be successful with tapsets and long term maintenance of them. I think we should look at all the solutions possible to make that happen. I would even suggest once we have concrete ideas we can even post in LKML and solicit kernel developers opinion which one they might prefer or they might have a better suggestion that we all can leave with.

Follow-Ups:
- Re: Script tapsets.
  - From: Frank Ch. Eigler

References:
- RE: separating policy and mechanism
  - From: Chen, Brad
- Re: separating policy and mechanism
  - From: Frank Ch. Eigler

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]