This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Ksummit-2008-discuss] DTrace


Hi -


On Tue, Jul 01, 2008 at 07:13:27PM -0400, Theodore Tso wrote:

> [...]  And one of the major flaws of the Linux's RAS tools is that
> the LKML development community doesn't use them; and to the extent
> that tapsets would be written more quickly if they are easy for
> kernel developers who aren't depending on distro packaging and
> distro building of systemtap.  [...]

Please excuse my return to this point, but it meshes with something
else:

> probe kernel.function ("vfs_write"),
>       kernel.function ("vfs_read")
> {
>   dev_nr = $file->f_dentry->d_inode->i_sb->s_dev
>   inode_nr = $file->f_dentry->d_inode->i_ino
> 
>   if (dev_nr == ($1 << 20 | $2) # major/minor device
>       && inode_nr == $3)
>     printf ("%s(%d) %s 0x%x/%u\n",
>       execname(), pid(), probefunc(), dev_nr, inode_nr)
> }

So, one way a kernel developer could help write a tapset piece for us
is to encapsulate this into a tapset script fragment:

probe vfs.read = kernel.function ("vfs_read")
  {
    dev_nr = $...expression
    inode_nr = $...expression
  }

Then this definition would be shipped with the kernel or systemtap,
tested in one or the other build system for currency.  (Not by
coincidence, something much like that is already in our tapset, just
lacks those two values.)

Then the end user just does

   probe vfs.read { if (dev_nr != MKDEV(2,3)) printf ("whatever you want to print") }


****  or  ****


Kernel maintainers could add a marker or two right into their C code:

vfs_read() 
{
    /* ... */
    trace_mark (vfs_read, "dev %u inode %u whatever %s",
                          expression1, expression2, whatever);
    /* ... */
}

And that's it.  It's compiled-in, and checked as a part of your
routine builds.  Then the systemtap-side interpration code is trivial,
and anyone can write it.  And it doesn't require debugging data.

   probe vfs.read = kernel.mark("vfs_read") { dev_nr = $arg1; inode_nr = $arg2 }
   probe vfs.read = kernel.mark("vfs_read") { dev_nr = $arg1; inode_nr = $arg2 }


If people could get over the funny look of the markers (since
performance effects have been shown to be negligible), they could make
a significant contribution to this problem, with just a few lines of C
code.


- FChE


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]