This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
nits to please a kernel hacker
- From: Roland McGrath <roland at redhat dot com>
- To: systemtap at sources dot redhat dot com
- Date: Mon, 29 Sep 2008 16:01:07 -0700 (PDT)
- Subject: nits to please a kernel hacker
Hi guys. So today I have my random kernel hacker hat on. I have a
thing to debug in my branch of the upstream kernel du jour, and kgdb
still doesn't all the way work, so I thought I'd try systemtap.
Here's how it went. After this email, I've done my thinking about
systemtap for the day and will get on with debugging my thing; next
time I have a thing to debug, I'll think about systemtap again and
try the same things from zero again to see if systemtap has achieved
utility in helping me do such work. So if the follow-up to any of
these is to ask me to do anything (including think about bugzilla),
other than "here read this, it answers your question" or "here, try
this again, it might be fixed now", then you lose.
I did a fresh git pull and then configure;make in a separate build dir.
That much any hacker does before even thinking about how you actually use
the thing. (It went fine.) Naturally, I'm going to use it out of the
build dir and not install it at all. I just built it.
I'd used stap before, so I knew it had to be able to find some installed
files. Nothing immediately obvious told me the list of these to set.
I looked in Makefile and surmised from the 'make check' usage that it's:
SYSTEMTAP_RUNTIME=/my/builds/stap/runtime \
SYSTEMTAP_TAPSET=/my/builds/stap/tapset \
/my/builds/stap/stap [options...]
[sic...more to come]
This should be said up front in README, or at least HACKING. Even from the
stap(1) man page I have to tease these out from mentions in the FILES
section. It should have an ENVIRONMENT section that pops right out for
easy reference.
In the glibc build, we write a tiny script called testrun.sh in the build
directory that execs tests with all the magic path options set correctly
for that build directory. Maybe there could be a test-stap.sh or some such
file written by configure/make. It could have build/src paths wired in,
and/or do something clever with $0. Just so after configure;make, one can do:
/my/builds/stap/test-stap.sh -e 'probe me.baby() { real=$good }'
Ok, so now I have stap and I think I know how to run it. Naturally, I am
using my own hand-built kernel and I don't properly install it anywhere.
(I don't use any modules, and only care about finding debuginfo for vmlinux
itself.) So I use (env vars etc from above):
.../stap -r /my/builds/linux-foobar -e 'probe kernel.function("foo") {}'
This actually works fine as far as finding debuginfo, because the libdwfl
calls take a string starting with / as an absolute directory to use in
place of /lib/modules/something. Here it is in reality:
-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/build/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/build/systemtap/tapset $HOME/build/systemtap/stap -r ~/build/linux-utrace -e 'probe kernel.function("tracehook_report_clone") {}'
WARNING: side-effect-free probe 'probe_1': keyword at <input>:1:1
Module directory /lib/modules//home/roland/build/linux-utrace/build check failed: No such file or directory
Make sure kernel devel is installed.
Pass 4: compilation failed. Try again with more '-v' (verbose) options.
-bash-3.2$
Ok, first peeve. If you tell me one more time to use "more" -v options,
I'm going to beat you with a rusty tire-iron. HOW MANY MORE???? The
message should say exactly the option that will give information about
the kind of failure that just happened.
When there is one, that is. For this particular failure, from -v all
the way up to -vvvvvv give me no more information on what the critical
failure was. If the failure was in pass 4, then there should be a way
to say "make pass 4 very verbose" without making passes 1-3 spew 100s of
lines of verbosity I have to skip over looking for "Pass 4". Being sure
I had all the info I could get from the phase that is actually failing
would have made me give up quicker and realize that:
Module directory /lib/modules//home/roland/build/linux-utrace/build check failed: No such file or directory
Make sure kernel devel is installed.
was in fact the fatal message and meant it had nothing else to explain or
show going wrong in detail. Since this message stands out less than a
"WARNING:" that I already knew I didn't care about, it was far from
immediately obvious that this was "FATAL ERROR" and it's still only me
surmising now that this means "I didn't even try anything else in pass 4,
there are no more error messages to see."
So I now I see that -r /... does not mean the same thing to two different
parts of systemtap--one worked, and now the other is looking for
/lib/modules//... just like you expect from code that doesn't grok /... as
an absolute directory name. So that's a stupid bug, but I know from stupid
bugs. So seeing that, I do:
ln -s /home/roland/build/linux-utrace /lib/modules/foobar
And try again.
-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/build/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/build/systemtap/tapset $HOME/build/systemtap/stap -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
WARNING: side-effect-free probe 'probe_1': keyword at <input>:1:1
Module directory /lib/modules/foobar/build check failed: No such file or directory
Make sure kernel devel is installed.
Pass 4: compilation failed. Try again with more '-v' (verbose) options.
-bash-3.2$
Oh right, I know from stupid code.
ln -s . /home/roland/build/linux-utrace/build
And try again.
-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/build/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/build/systemtap/tapset $HOME/build/systemtap/stap -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
WARNING: side-effect-free probe 'probe_1': keyword at <input>:1:1
Pass 4: compilation failed. Try again with more '-v' (verbose) options.
Now I am raising my rusty tire-iron.
-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/build/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/build/systemtap/tapset $HOME/build/systemtap/stap -v -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
Warning: changing last pass to 4 since cross-compiling
Pass 1: parsed user script and 0 library script(s) in 0usr/0sys/0real ms.
WARNING: side-effect-free probe 'probe_1': keyword at <input>:1:1
Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 300usr/110sys/422real ms.
Pass 3: translated to C into "/tmp/staprRLw6c/stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.c" in 530usr/1250sys/1779real ms.
Pass 4: compiled C into "stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.ko" in 670usr/1060sys/1771real ms.
Pass 4: compilation failed. Try again with more '-v' (verbose) options.
Now I am beating you with my rusty tire-iron.
-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/build/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/build/systemtap/tapset $HOME/build/systemtap/stap -vv -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
Warning: changing last pass to 4 since cross-compiling
SystemTap translator/driver (version 0.7.1/0.133 git branch master, commit 338e2309)
Copyright (C) 2005-2008 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
Session arch: x86_64 release: foobar
Created temporary directory "/tmp/stapHqdXIq"
Pass 1: parsed user script and 0 library script(s) in 0usr/0sys/0real ms.
probe tracehook_report_clone@/home/roland/redhat/linux/utrace/include/linux/tracehook.h:301 kernel reloc=.dynamic section=.text pc=0xffffffff80240f96
WARNING: side-effect-free probe 'probe_1': keyword at <input>:1:1
Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 330usr/90sys/424real ms.
probe_1 locks nothing
dump_unwindsyms kernel index=0 base=0xffffffff80200000
Pass 3: translated to C into "/tmp/stapHqdXIq/stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.c" in 520usr/1290sys/1804real ms.
Running make -C "/lib/modules/foobar/build" M="/tmp/stapHqdXIq" modules >/dev/null
/tmp/stapHqdXIq/stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.c:41:21: error: runtime.h: No such file or directory
/tmp/stapHqdXIq/stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.c:42:18: error: regs.c: No such file or directory
/tmp/stapHqdXIq/stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.c:43:19: error: stack.c: No such file or directory
[...]
Luckily for your cranium, I can put down the tire-iron after only two -v
increments. Imagine the violence and bloodshed if this were a problem
explained only at -vvvvvvvvv.
At this point I decide those rumors about systemtap's runtime not being
compatible with current upstream kernels are true, and prepare to give up
on the exercise for the day. A random glance makes me realize that in
transcribing the 'make check' command lines into my actual situation, I
made the obvious mistake. (They are $srcdir references, not build dir
references.) Phew. A less attentive kernel hacker who was no less error
prone to begin with than I am would probably have given up and then
reinforced the rumor about the runtime. All because there isn't an obvious
explanation you can't miss about how to correctly run stap out of a fresh
build directory.
So, operator error due to inadequate automation discovered and corrected.
We can resume our story (rumors are false and the current stap runtime is
fine with the current upstream Linus tree).
-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
/home/roland/.systemtap/cache/88/stap_8820c6519888bb04e4fc2e61ef2f1f11_325.ko
-bash-3.2$
Um, what?
-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -v -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
Warning: changing last pass to 4 since cross-compiling
Pass 1: parsed user script and 45 library script(s) in 320usr/50sys/391real ms.
WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 340usr/110sys/442real ms.
/home/roland/.systemtap/cache/88/stap_8820c6519888bb04e4fc2e61ef2f1f11_325.ko
Pass 3: using cached /home/roland/.systemtap/cache/88/stap_8820c6519888bb04e4fc2e61ef2f1f11_325.c
Pass 4: using cached /home/roland/.systemtap/cache/88/stap_8820c6519888bb04e4fc2e61ef2f1f11_325.ko
-bash-3.2$
After extra seconds because the eyes are drawn to "WARNING:" that I already
know to ignore, I notice:
Warning: changing last pass to 4 since cross-compiling
I'm cross-compiling? Since when, buddy? Oh, you mean magically considered
"cross-compiling" just because I used -r to tell you where to find the
kernel build. So when the usage says:
-r RELEASE cross-compile to kernel RELEASE, instead of 2.6.27-rc7.utrace-00170-gbe4ac41
even though 2.6.27-rc7.utrace-00170-gbe4ac41 is my `uname -r`, i.e. my
running kernel, it means that just giving -r changes the fundamental
control flow of what the command means. It's not just giving the directory
to find the kernel stuff in, like -R does. I never would have guessed
that. (It's a little hard for me to tell, but I suspect it's only because
I do secretly know lots of stap internals that the .ko file name output was
not just a complete mystery, and that it even occurred to me when there was
apparently no error message mentioning it to try -v next.)
Ok, I know from cockamamy. I'm not going to try to figure out what stap
isn't doing, but I can guess how to fool it.
mv /lib/modules/foobar /lib/modules/`uname -r`
(That's my symlink from before.)
-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -e 'probe kernel.function("tracehook_report_clone") {}'
WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
sh: /usr/local/bin/staprun: No such file or directory
Pass 5: run failed. Try again with more '-v' (verbose) options.
-bash-3.2$
Crikey! Is there another variable for finding staprun? I looked at the
docs, and even grepped the source for "getenv" and I sure don't see one!
I think I tracked it down in buildrun.cxx:run_pass and it's frickin'
hard-wired to the configure-chosen $(bindir)/staprun at compile time.
Give me a break!
-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -v -e 'probe kernel.function("tracehook_report_clone") {}'
Pass 1: parsed user script and 45 library script(s) in 340usr/40sys/393real ms.
WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 330usr/110sys/441real ms.
Pass 3: using cached /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.c
Pass 4: using cached /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
Pass 5: starting run.
sh: /usr/local/bin/staprun: No such file or directory
Pass 5: run completed in 0usr/0sys/9real ms.
Pass 5: run failed. Try again with more '-v' (verbose) options.
-bash-3.2$
Warming up the tire-iron...
-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -vv -e 'probe kernel.function("tracehook_report_clone") {}'
SystemTap translator/driver (version 0.7.1/0.133 git branch master, commit 338e2309)
Copyright (C) 2005-2008 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
Session arch: x86_64 release: 2.6.27-rc7.utrace-00170-gbe4ac41
Created temporary directory "/tmp/stapXpCOU6"
Searched '/home/roland/redhat/systemtap/tapset/x86_64/*.stp', found 2
Searched '/home/roland/redhat/systemtap/tapset/*.stp', found 43
Pass 1: parsed user script and 45 library script(s) in 330usr/50sys/450real ms.
probe tracehook_report_clone@/home/roland/redhat/linux/utrace/include/linux/tracehook.h:301 kernel reloc=.dynamic section=.text pc=0xffffffff80240f16
WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 330usr/870sys/2202real ms.
Pass 3: using cached /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.c
Pass 4: using cached /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
Pass 5: starting run.
Running /usr/local/bin/staprun -v /tmp/stapXpCOU6/stap_32273b197bad7e53129d892d463a65cd_351.ko
sh: /usr/local/bin/staprun: No such file or directory
Pass 5: run completed in 0usr/10sys/11real ms.
Pass 5: run failed. Try again with more '-v' (verbose) options.
Running rm -rf /tmp/stapXpCOU6
-bash-3.2$
Ok, there is finally. Two -v's just to see the command line it tried to run?
Really? Even though you deserved it for that smarmy damn error message,
I'm feeling a little bad about the tire-iron. So, I'm working with ya!
-bash-3.2$ $HOME/build/systemtap/staprun -v /tmp/stapXpCOU6/stap_32273b197bad7e53129d892d463a65cd_351.ko
ERROR: The effective user ID of staprun must be set to the root user.
Check permissions on staprun and ensure it is a setuid root program.
Ok, so stap was actually using sudo to run staprun but didn't say the
actual command it ran in the -vv "Running ..." message. Still letting the
tire-iron cool down, we can do this.
-bash-3.2$ sudo $HOME/build/systemtap/staprun -v /tmp/stapXpCOU6/stap_32273b197bad7e53129d892d463a65cd_351.ko
ERROR: Error opening '/tmp/stapXpCOU6/stap_32273b197bad7e53129d892d463a65cd_351.ko': No such file or directory
-bash-3.2$
What? Oh, no more temps. Hmm. Hey, maybe if I screw myself the other way
this will add up to goodness!
-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -r `uname -r` -e 'probe kernel.function("tracehook_report_clone") {}'
WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
-bash-3.2$ sudo $HOME/build/systemtap/staprun -v /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
/usr/local/libexec/systemtap/stapio: No such file or directory
-bash-3.2$
Crispy Fried Jeebus. Man, I could have had such a nice lunch in the time
it took to get this far. Ok, we've been through this drill. Almost there!
Stay on target! Damn, let's skip a step and go straight to three -v's!
bash-3.2$ sudo $HOME/build/systemtap/staprun -vvv /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
staprun:parse_modpath:170 inpath=/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
staprun:main:249 modpath="/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko", modname="stap_32273b197bad7e53129d892d463a65cd_351"
staprun:init_staprun:207 init_staprun
staprun:insert_module:47 inserting module
staprun:insert_module:66 module options: _stp_bufsize=0
Error inserting module '/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko': File exists
Wait, what? Does that mean what I think it means?
-bash-3.2$ /sbin/lsmod
Module Size Used by
stap_32273b197bad7e53129d892d463a65cd_351 706120 0
-bash-3.2$
What a low-rent cheeseball freshman maneuver. Too bad I don't drink, or
some kernel hackers would get an earful of fresh rumors about the mental
acuity of stap hackers at the bar tonight! Ok, staprun, let me get a baby
wipe there, I'm feeling parental.
-bash-3.2$ sudo /sbin/rmmod stap_32273b197bad7e53129d892d463a65cd_351
-bash-3.2$ sudo $HOME/build/systemtap/staprun -vvv /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
staprun:parse_modpath:170 inpath=/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
staprun:main:249 modpath="/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko", modname="stap_32273b197bad7e53129d892d463a65cd_351"
staprun:init_staprun:207 init_staprun
staprun:insert_module:47 inserting module
staprun:insert_module:66 module options: _stp_bufsize=0
staprun:init_ctl_channel:30 Opening /sys/kernel/debug/systemtap/stap_32273b197bad7e53129d892d463a65cd_351/.cmd
execing: /usr/local/libexec/systemtap/stapio -vvv /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
/usr/local/libexec/systemtap/stapio: No such file or directory
-bash-3.2$
Hey that time the failing to clean up the module after a failure is handy,
we can pretend it's a feature!
-bash-3.2$ sudo $HOME/build/systemtap/stapio -vvv /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
stapio:parse_modpath:170 inpath=/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
stapio:main:37 modpath="/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko", modname="stap_32273b197bad7e53129d892d463a65cd_351"
stapio:init_stapio:224 init_stapio
stapio:init_ctl_channel:30 Opening /sys/kernel/debug/systemtap/stap_32273b197bad7e53129d892d463a65cd_351/.cmd
stapio:stp_main_loop:313 in main loop
stapio:stp_main_loop:320 nb=4
stapio:init_relayfs:124 initializing relayfs
stapio:init_relayfs:148 attempting to open /sys/kernel/debug/systemtap/stap_32273b197bad7e53129d892d463a65cd_351/trace0
stapio:init_relayfs:148 attempting to open /sys/kernel/debug/systemtap/stap_32273b197bad7e53129d892d463a65cd_351/trace1
stapio:init_relayfs:154 ncpus=1, bulkmode = 0
stapio:init_relayfs:204 starting threads
stapio:stp_main_loop:320 nb=12
stapio:stp_main_loop:360 probe_start() returned 0
Holy crap! I think it's working!
[hit C-c]
^Cstapio:signal_thread:36 sigproc 2 (Interrupt)
stapio:stp_main_loop:320 nb=4
stapio:stp_main_loop:353 got STP_EXIT
stapio:cleanup_and_exit:271 detach=0
stapio:close_relayfs:221 closing
stapio:reader_thread:108 exiting thread 0
stapio:close_relayfs:240 done
stapio:cleanup_and_exit:284 closing control channel
stapio:cleanup_and_exit:290 removing stap_32273b197bad7e53129d892d463a65cd_351
stap_32273b197bad7e53129d892d463a65cd_351: No such file or directory
Huh? What does that mean?
-bash-3.2$ /sbin/lsmod
Module Size Used by
stap_32273b197bad7e53129d892d463a65cd_351 706120 0
-bash-3.2$
Oh, I guess that's not actually what it meant (who the hell knows).
I guess that rmmod is what staprun would have done after if it had
been able to run stapio. So this is normal. I think it really worked.
Alrighty, then.
I wonder if I want to repeat all that with a script that actually does
something. Nah, I'm starving and in the meantime I rebuilt a kernel
where kgdb works a little more. I think after lunch I'll try that instead.
Thanks,
Roland