This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Script tapsets.

From: "Frank Ch. Eigler" <fche at redhat dot com>
To: Vara Prasad <prasadav at us dot ibm dot com>
Cc: systemtap at sources dot redhat dot com
Date: Sun, 8 May 2005 10:45:43 -0400
Subject: Re: Script tapsets.
References: <75EC4D5486CAC247B84AAAA6F96AA55804AB6B19@orsmsx402.amr.corp.intel.com> <y0mhdhk6ra7.fsf@toenail.toronto.redhat.com> <427BD838.4030206@us.ibm.com>

Hi -

It seems to me that there is a large philosophical divide between your
and my preferred approaches to writing systemtap extensions.  You
expect people (some subset of users or developers) to prefer raw C as
much as possible, and I expect the opposite.  You would like a rich C
interface that makes script unnecessary, I would like a rich script
interface that makes C unnecessary.  You ask for justification for
doing something in script when also perhaps possible in C, and I vice
versa.  You appear interested in systemtap as being a library for
writing kprobes routines in C; I am interested in kprobes as one of a
number of implicit backends for writing scripts.

Luckily, we don't have to agree to make progress.

varap wrote:
> [...]
> You mentioned tapsets are stored in a  library used in elaboration 
> phase.  When you say library does this mean compiled .o or .a form or 
> just script sources themselves? 

Just the script sources, since there is no "partial compilation"
facility being contemplated for scripts.  This makes sense since
the translator needs global program information in order to 
perform type inferences, to compute proper declarations for
all the supporting structs, and probably other reasons.

> How about we let folks write the tapset functions in scripts or "C"
> but we will generate the code to "C" form and compile it into a
> module that can be loaded independently [or] at least make them in
> the form of a library [...]

Once a C interface to the translator/runtime is designed, such
packaging options are likely to be supported.

> [...]  In your write up you mentioned "The following script defines
> a new "event" and supplies some variables for use by its handlers" I
> am thinking "event" in the above statement means a "probe point", is
> that right.

I was referring to the event of a probe point being fired.

> [...]
>   victim_tgid = $tsk->tgid;
> [...]
> The main interesting piece of code in the above is code generated to get 
> local variables tsk and address.
> If  these local variables are made available let us say through a  
> function or macro call writing the above code in C is trivial as well.

As you agree later, this is far from trivial.  An introspection
library for C is beyond what systemtap needs to offer to script users.

> The problems with this script based approach that i can see are
> 1) These scripts are going to leave outside the kernel code hence 
> maintenance is a major problem. 

It would be good to gather data to support and quantify this
hypothesis.

> The problem is even more severe as we access datastructures directly
> not through an advertised API.

True, but at the same time some advertised APIs are not suitable for
traversal from within contexts such as from interrupt handlers.

> 2) It is not easy if not impossible to convince kernel developers to 
> learn new scripting language for dynamic tracing when the have the 
> luxury of rebuilding the kernel at will.

And yet I expect even they would prefer to avoid a rebuild/reboot,
other things being equal.

> I personally think without the help of kernel developers we can not
> come up with good tapsets in all the areas as we are not experts in
> each subsystem.

The group of "kernel developers" is too amorphous to agree or disagree
with this.  The set of experts for any given area may or may not match
the set of people who might refuse to use script, or who may be
willing to maintain instrumentation code in their area.

> 3) Another problem is if the variables needed in the probe handlers are 
> declared local to the "C" files, script based tapsets can not be used. 
> [...]

Why do you think so?  Consider the model of a debugger supervising a
stopped program, not another C program linking to another.  A debugger
can make references to "local" (static?) variables.  For systemtap, we
just need a syntax for making references to symbols outside the
default lookup algorithm.

> 4)  If you take the above example it is not clear to me how are we going 
> to figure out which header file has the definition of struct task and 
> what are all the dependent headerfiles that we have to include in order  
> to compile the above generated code.

To access "$globalptr->field", the translator need emit *no* #includes
for the declarations of typeof(globalptr).  That's because this
dereference operation would be expanded to the same sort of dwarf
walking expression already shown to access function parameters and
locals.  Field names and types would be resolved within the
translator, and would show up only as machine level
pointer/offset/dereference operations.

> 5) The example you have provided is simple enough hence it doesn't 
> really matter if we write in "C" or script but if we have a complicated 
> one where one might have to do some locking and traverse a list and 
> compute some values etc., i am not sure it is easy to express that in 
> systemtap limited language.

Yes, these operations are still missing.  They may end up with some
respectable expression in the script language, or else may force
descent into C.

> If you look from an existing kprobes users point of view, as they are 
> potential tapset writers, what they want out of systemtap is
> 1) A Convenient way to access local variables and arguments any where in 
> the function. Function entry is achieved through jprobes now.
> 2) An enhancement to Kprobes API so that they can specify the probe 
> point location in a more portable fashion than the current hex address 
> format.
> [...]

...  and yet neither of these is practical without access to the
debugging information.  Is this the sort of person for whom dprobes
was written?

> [...]
> One way to solve the first problem is let systemtap consult debug data 
> and pass the required variable to the handlers. The above example would 
> look like the following.
>
> dopgflt_outofmem_handler (struct pt_regs *regs, struct task *tsk, 
> unsigned long addr; void *buf, int bufsize) {
>   
>    if (task->uid != 0)
>        {
>         copy relevent variables to buf;
>        }
> }

Your examples need more meat.  If this probe handler was written in C,
how is the translator supposed to know what variables it might like to
have extracted; how it is supposed to decode the "buf" contents; how
the script code may call to it, or be called by it; ...

This is part of the burden of championing a two-language solution to
the problem: you must work out how it should look from both ends, what
information is available on each side, how concepts map, what the
build implications are, and so on.

> [...]  Looking at Dtrace papers it makes me feel fairly certain that
> Dtrace providers are written in C but i dont know how they solved
> this problem of accessing local variables [...]

They do not access general local variables.  They can access certain
specially designated variables: function arguments and return values
(since the ABI fixes their location), and others identified by a
static instrumentation macro.  Look up the dtrace "probe site"
mechanism that uses the DTRACE_PROBE* family of macros.  A variant of
this can be supported by systemtap, even without the extra
"provider { ... }" declarations on the script side.

- FChE

References:
- RE: separating policy and mechanism
  - From: Chen, Brad
- Re: separating policy and mechanism
  - From: Frank Ch. Eigler
- Re: Script tapsets.
  - From: Vara Prasad

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]