This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

tapset proposal

From: Vara Prasad <prasadav at us dot ibm dot com>
To: systemtap at sources dot redhat dot com
Date: Wed, 06 Apr 2005 23:11:04 -0700
Subject: tapset proposal

Hi,

Last week i started discussion about tapsets, i am attaching some more details. Sorry i have not kept up with the mailing list while working on this design, so i might not have addressed some of the issues already raised.

Please let me know your comments.

bye,
Vara Prasad

The main idea of a tapset as i see is, an expert in a given subsystem knows 
what is important to understand the inner workings of that subsystem. 
That expert will export such vital data in the form of a tapset using 
one or more functions that can be called from a probe. In other words 
expert will export data and also announce what is the api to get the data, 
so that every one doesn't have to become expert in all the areas of the 
kernel yet every one can get the vital data of the system.

Tapsets just like probes can be either concurrent to execution which are 
program counter (PC) based or asynchronous event based. 

Terminology:
Let me write some terminology so we are all on the same page for the
rest of the discussion

Instrumentation function: A function that expert provides that can
be called at the specified address.

Probe: A predetermined combination of an address and
the corresponding function that can be called in a probe handler. 
This function itself is not the probe handler but it can be called 
in a probe handler. 

Tap set: Set of probes in a related functional area of the system like 
scheduler, vm, etc.  

Tapset author: An expert in some part of the kernel, who understands
how to write kernel modules.

Script writer: A user who may or may not have expertise in any area
of the kernel but knows how to express the queries in a script.

What does Tapset authors do?

Experts write functions in their area of functionality and associate
each function to an address in the execution path. What this means is
this function can be called in a probe handler for this location but
it is not always required to be called in a probe handler. Example
section will cover cases where it is called and not called. 
When the function is called stack is setup so that all the
arguments and local variables of the function being probed
are available. 
Experts will publish what data is exported out when this function
is called. These functions will have allocated buffer as one of the
arguments. That data that function exports will be returned back in the
buffer argument. Experts also provide a function that unpacks
the data from the buffer. The unpack data function is used in the
probe handler if the supplied function is used. 

Probe handler
The code specified by the script writer in the script for a given probe 
point along with some runtime systemtap functions and expert provided
function will be used in the probe handler. Probe handlers
are generated by the systemtap compiler. 

Tapset restrictions:
Tapsets can not do I/O operations as that can lead to sleeping and
issues related to locking. Tapsets can not disable interrupts. I think
we should not allow tapsets to do memory allocations aswell, however
they can use preallocated memory that systemtap might make it available
through an api. 

Example
In the following example expert provided an instrumentation function
sys_read_tapset_func1 at a line number in the read system call. 
This function exports filename, offset in the file, read size.  

Let us say user is only interested in monitoring the 
reads done by a particular process like specified in the following 
script.

probe kernel.syscall.sys_read.line(xxx)
{
  if ($PID == 399) && (args2 > 1024) {
         numkreads ++:
         print(args2);
     }
}

The handler generated code would look like the following and it gets
executed when the system call read is executed by any program
in the system.   

handler(fd, buf, size, pt_regs)
{
   if (current->pid == 399) {
      sys_read_tapset_func1(fd, buf, size, void *buf);
      extract_args(buf, args0, args1, arg2);
      if (args2 >1024) {
         numkreads ++:
         systap_print(args2);
     }
  }
}

Let us say if the user specified script doesn't 
refer to any arguments of the probe handler like the following. 
probe kernel.syscall.sys_read.line(xxx)
{
  if ($PID == 399) {
         num_sysreads ++:
     }
}

We dont really need to call the expert provided function at all the
in this case, the generated code looks like the below.
handler(fd, buf, size, pt_regs)
{
   if (current->pid == 399) {
         num_sysreads ++:
  }
}

The main advantage of the above is tapset functions are used
at an api level hence we are not accessing any unsafe data
without our knowledge.  Callers dont have to worry about 
concurrency issues as experts would have taken
them into account. In safemode it is very easy for users to write
scripts. The disadvantage is developer of tapset has the burden of 
providing extract functions.

After discussing the above with Jim he suggested experts dont
really have to write the code for tapset functions. They can 
specify in a well defined format what address they would like 
to put the probe and what variables they would like to access. 
Systemtap as part of its eloboration can take that description 
and generate the probe handler along with user specified code.
It is not clear if there is a need to traverse lists, or do 
any kind of computation etc, in the probe handler how does a 
developer specify. I guess developers could also write those 
as scripts using systemtap language. This idea needs further
discussion.

Data Structure based probes:
These probes are similar to PE probes except that the probe points
are not mentioned as an address instead, as a datastructure and a
routine in that datastructure. TODO: confirm with Will to see
if this works for drivers, if not how do we accomplish that.  
These probes are same as PE based probes except that 

TODO: I think with safety checks to verify pointers being traversed
are not NULL it should be safe to allow this access. If there are 
any safety issues that our translator can not guard against we 
need to address them. One of the useful language constructs for 
the list pointers would be iterator where the iterator function 
code verifies the pointer before traversing.

Asynchronous event based probes:
These probes cover asynchronous events like timer, memory access etc.
These kinds of probes can be called from any context. In these probes
only variables that are safe to access is globals in the kernel. 
No kernel functions can be called from these probes except some
systemtap API's that doesn't need any context. In these probes
what ever user specifies will be converted directly to the probe
handler.  

Systemtap tapset:
We will have one default tapset called systemtapset. It will have three 
functions that can be called in the probes which will be  init, finish or 
final and error. The main idea of of this tapset's init() function is to 
initialize the systeamtap specific datastructures that are needed. final() is 
to dump out all the collected data and to do any processing and cleanup. 
error() when we find any runtime errors, this can lead to not to finish 
the scripting and unloading the module. For this tapset these functions 
gets executed during the module initialization, error handling and cleanup.

Follow-Ups:
- Re: tapset proposal
  - From: Frank Ch. Eigler

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]