Bug 2035

Summary: investigate boot-time probing
Product: systemtap Reporter: Frank Ch. Eigler <fche>
Component: runtimeAssignee: Unassigned <systemtap>
Status: RESOLVED FIXED    
Severity: normal CC: jlebon, mhiramat
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Bug Depends on: 1145    
Bug Blocks:    
Attachments: systemtap-initscript documentation patch for boot-time probing

Description Frank Ch. Eigler 2005-12-12 20:04:25 UTC
Action item from face-to-face meeting.  The task is to investigate the following
idea, to estimate effort required to implement it.

Idea: make it possible to link in a systemtap probe module, perhaps statically,
into the kernel, for execution early during boot time.  In contrast with the
usual behavior, this would likely require the module to *not* rely on stpd
chitchat in order to register and activate probes.  Likewise, it may be
appropriate to use normal module unload hooks to unregister / clean up.

I/O early during boot is of course tricky.  Perhaps the runtime could buffer a
large amount of data in procfs-bound buffers, even before a user-level process
comes along, opens procfs and pulls the data out.  Perhaps the runtime could
flush buffered data to printk if no user-space program comes along by the time
it's time to shut down.

When complete, this work might allow stpd to be partially deprecated, by using
the module-init / module-exit hooks for normal operations, and something as
simple as "cat" to fetch probe output (procfs) data.
Comment 1 Martin Hunt 2005-12-13 20:26:48 UTC
Subject: Re:  New: investigate boot-time probing

On Mon, 2005-12-12 at 20:04 +0000, fche at redhat dot com wrote:
> Action item from face-to-face meeting.  The task is to investigate the following
> idea, to estimate effort required to implement it.
> 
> Idea: make it possible to link in a systemtap probe module, perhaps statically,
> into the kernel, for execution early during boot time.  

So we generate a module, run mkinitrd and reboot. Problems with that are
1. IO and
2. we can't proble into modules because we don't know their addreses.
That is why this is dependson 1145.

The IO solution is likely going to be the same as what I am working on
for another problem. (need BZ) I am playing with support for network IO
and want to also look at serial ports. This would be for uses like in a
flight data recorder where we need data right up to the moment the
system crashes. The cost of this approach is likely to be some
flexibility and performance.

> When complete, this work might allow stpd to be partially deprecated, by using
> the module-init / module-exit hooks for normal operations, and something as
> simple as "cat" to fetch probe output (procfs) data.
That is a strange goal. Do you have some reason for it?




Comment 2 Frank Ch. Eigler 2005-12-13 20:34:41 UTC
(In reply to comment #1)
> > > When complete, this work might allow stpd to be partially deprecated [...]
> That is a strange goal. Do you have some reason for it?

Nothing deep or urgent, just desiderata like the reduction in number of
interacting components.
Comment 3 William Cohen 2006-01-19 19:44:14 UTC
Dan Berrange mentioned that he has developed some techniques to do boot probes.
The following is the link to his people page with the information:

http://people.redhat.com/berrange/systemtap/bootprobe/
Comment 4 Frank Ch. Eigler 2006-01-20 03:03:19 UTC
Berrange's method is simple & neat, and may be good enough for some users.

My guess is that we will still need something initrd-based & stpd-less, for
those who need to deal with debugging of device drivers, kernel initialization,
and the like.
Comment 5 Masami Hiramatsu 2008-03-27 16:29:35 UTC
(In reply to comment #4)
> My guess is that we will still need something initrd-based & stpd-less, for
> those who need to deal with debugging of device drivers, kernel initialization,
> and the like.

I'm still interested in this feature for debugging initialization bugs of device
drivers. Sometimes those bugs can not be reproduced or randomly happen, so I
think initrd-based tracing is very helpful.

Fortunately, we already have attach/detach feature(bz3857) and crash extension,
so I/O is not a problem.
Comment 6 Jonathan Lebon 2013-11-15 21:29:51 UTC
I've started looking into this. Thankfully dracut helps out a lot here. It provides us a straightforward method of hooking scripts into the various stages of the bootup (see dracut.bootup(7)).

After some work and research, I successfully inserted a module in the initramfs that gets loaded as early as possible.

I'll just go over the steps of what I did:

First, we need to create a dracut module. This is no more than a directory and some shell scripts that will get pulled in whenever dracut is called to create a new image.

Create the directory e.g. '01stap' under '/lib/dracut/modules.d'. There, we have two simple scripts. The first script is module-setup.sh, which contains functions dracut will call during image creation.

$ cat module-setup.sh
#!/bin/bash

check() {
    return 0
}

depends() {
    echo ""
}

install() {
    inst_hook cmdline 01 "$moddir/start-staprun.sh"
}
$

The 'inst_hook' line is telling dracut that we want the 'start-staprun.sh' script to be executed at the cmdline hook, which is the earliest one. In 'start-staprun.sh' we have:

$ cat start-staprun.sh
. /lib/dracut-lib.sh
/home/vm/codebase/systemtap/install/bin/staprun -L module_watcher
$

The kernel module 'module_watcher' is a simple stap test script I wrote to showcase what kind of things we could do if running at boot. It watches for module load/unload events. Here is the script:

$ cat module_watcher.stp
global args

probe begin {
   println("0.000000 - Started module_watcher")
   start_stopwatch("timer")
}

probe syscall.init_module {
   args = uargs
}

probe kernel.function("do_init_module") {
   timer = read_stopwatch_us("timer")
   printf("%d.%.6d - ", timer/1000000, timer%1000000)
   printf("Loading module %s", kernel_string($mod->name))
   if (args != "")
      printf(" with args %s ", args)
   println("")
}

probe syscall.delete_module {
   timer = read_stopwatch_us("timer")
   printf("%d.%.6d - ", timer/1000000, timer%1000000)
   printf("Unloading module %s with flags %x\n", user_string($name_user), flags);
}

probe end {
   println("Exiting module_watcher")
}
$

I compiled the script and placed it in /lib/modules/`uname -r`/systemtap. I also ran depmod after.

All that's left to do is to create a new image in which the compiled SystemTap kernel module we want, staprun and stapio are included:

# dracut --force --install '/home/vm/codebase/systemtap/install/bin/staprun /home/vm/codebase/systemtap/install/libexec/systemtap/stapio' --add-drivers module_watcher
# 

I think we could forgo '--add-drivers module_watcher' if we instead specify it in the install() function of our module-setup.sh script. Haven't played with that yet.

Reboot, open a terminal, and reconnect to the module (note we need to do this as root because the module belongs to root, otherwise staprun won't be happy):

# /home/vm/codebase/systemtap/install/bin/staprun -A module_watcher
0.000000 - Started module_watcher
0.102458 - Loading module pata_acpi
0.103298 - Loading module ata_generic
0.108215 - Loading module i2c_core
0.108700 - Loading module virtio_net
0.112446 - Loading module drm
0.139114 - Loading module virtio_blk
0.139537 - Loading module ttm
0.142624 - Loading module vmwgfx
1.205084 - Loading module uinput
1.271417 - Loading module mperf
1.273269 - Loading module acpi_cpufreq
1.291843 - Loading module i2c_piix4
1.298882 - Loading module microcode
1.304980 - Loading module virtio_balloon
1.307476 - Loading module ghash_clmulni_intel
1.315539 - Loading module serio_raw
1.326341 - Loading module crc32c_intel
1.335179 - Loading module crc32_pclmul
1.967812 - Loading module iptable_raw
1.979030 - Loading module iptable_security
1.990345 - Loading module iptable_mangle
1.994897 - Loading module nf_conntrack
1.996680 - Loading module nf_nat
1.997620 - Loading module nf_nat_ipv4
1.999513 - Loading module nf_defrag_ipv4
2.000688 - Loading module nf_conntrack_ipv4
2.001808 - Loading module iptable_nat
2.006181 - Loading module ip6_tables
2.013440 - Loading module ip6table_filter
2.053304 - Loading module ip6table_raw
2.065209 - Loading module ip6table_security
2.080719 - Loading module ip6table_mangle
2.119615 - Loading module nf_nat_ipv6
2.124040 - Loading module nf_defrag_ipv6
2.125342 - Loading module nf_conntrack_ipv6
2.126901 - Loading module ip6table_nat
2.193750 - Loading module ebtables
2.199242 - Loading module ebtable_filter
2.207249 - Loading module llc
2.208277 - Loading module stp
2.212893 - Loading module bridge
2.215973 - Loading module ebtable_broute
2.225101 - Loading module ebtable_nat
2.481361 - Loading module rfkill
2.487218 - Loading module bluetooth
2.491427 - Loading module bnep
2.500985 - Loading module xt_conntrack
2.606265 - Loading module ip6t_REJECT
2.786338 - Loading module ipt_MASQUERADE
2.831228 - Loading module nf_conntrack_broadcast
2.832305 - Loading module nf_conntrack_netbios_ns
^CExiting module_watcher
#

And voila!

Now, the goal is to streamline this process and skip the parts that can be skipped.

One issue which comes to mind right away is the permission issue, since the module gets inserted as real root. But then again, if they are privileged enough to be allowed to have a module inserted at bootup time, maybe we don't have to change anything.

For the interface, I was thinking of introducing a 'list' paradigm. E.g. we could have the following switches for stap:
--boot-add --> add script to list of boot scripts
--boot-list --> list all scripts currently enabled on boot
--boot-remove --> remove script from list of boot scripts
--boot-once --> add script to list just for the next boot

Of course in the background, we would be rerunning dracut whenever necessary. This can be a time-consuming process (in my VM, it can take up to 30s).

Not sure yet how much that makes sense. I guess it depends on how people will use the feature.

Also, do we want to support platforms pre-dracut (e.g. RHEL5)? I haven't played with mkinitrd, although I do see that it has a --preload=MODULE option which might do the trick.
Comment 7 Jonathan Lebon 2013-12-04 22:44:17 UTC
Created attachment 7312 [details]
systemtap-initscript documentation patch for boot-time probing

This is how I'm thinking of changing the interface of the initscript.
Comment 8 Jonathan Lebon 2014-01-08 21:24:54 UTC
I've just created a branch (jlebon/boot-time) containing the work so far. The easiest way to test it is to 'make rpm' and install the rpms. However if you'd like to do it manually, you will need to make and
- Copy initscript/99stap/{module-setup.sh,start-staprun.sh} into /lib/dracut/modules.d/99stap (remove {prefix})
- Copy initscript/systemtap to /etc/init.d/ (also remove {prefix})

Once installed, you can test it by doing the following (also see section 5.9 of initscript/README.systemtap):

# cat > /etc/systemtap/script.d/hello.stp
probe begin {
   println("Hello World!")
}
# service systemtap onboot hello
# reboot
---
# staprun -A hello
Comment 9 Jonathan Lebon 2014-01-22 21:16:55 UTC
Now in master (commit 527e696 and previous ones).