Bug 10991

Summary: freeze on centos5.3+preempt x86_64 for most scripts
Product: systemtap Reporter: Steve Fink <sphink>
Component: kprobesAssignee: Unassigned <systemtap>
Status: RESOLVED INVALID    
Severity: critical    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description Steve Fink 2009-11-20 01:36:07 UTC
I am running a preemptive x86_64 kernel (pretty much just the CentOS 5.3 kernel
with preemption enabled). Most systemtap scripts hang the machine. I have also
tried running it from within a VirtualBox VM, and get similar results.

The version I am using on actual hardware and in the VM is
systemtap-0.9.7-5.el5. Within the VM, I have also compiled the snapshot from
20091114. It managed to run for a little while before hanging.

Specific example: this script

probe kernel.function("*@net/socket.c")
{
  printf ("%s -> %s\n", thread_indent(1), probefunc())
}
probe kernel.function("*@net/socket.c").return
{
  printf ("%s <- %s\n", thread_indent(-1), probefunc())
}

run as stap -v socket-tree.stap

On the hardware, it just hangs immediately after it starts running. On the VM,
it does the same for 0.9.7, but with the snapshot version it manages to print
out (after "Pass 5: starting run."):

   0 ntpd(1903): -> do_sock_read
1000834 ntpd(1902): <- do_sock_read

and then hangs the VM. The same thing on my 32-bit desktop prints out a bunch of
stuff and then exits cleanly after 1.2 seconds. (Without me pressing Ctrl-C,
which I don't understand yet, but that's different.)

Earlier, I also got an oops on the VM with 0.9.7, but I can't remember what
exactly I was doing. (It was something simple like stap -v -e 'probe syscall.* {
printf("I am alive\n"); exit(); }')

I can post my .config. I didn't change a whole lot in it. It is not built to be
relocatable.
Comment 1 Steve Fink 2009-11-20 01:45:53 UTC
Actually, it's kind of random how far the VM makes it. I rebooted and ran the
same thing, and this time only the 1st line made it out.
Comment 2 Frank Ch. Eigler 2009-11-20 02:05:13 UTC
I suspect we need to get hold of a suchly configured kernel
in order to diagnose the situation.
Comment 3 Masami Hiramatsu 2009-11-20 20:21:23 UTC
(In reply to comment #0)
> I am running a preemptive x86_64 kernel (pretty much just the CentOS 5.3 kernel
> with preemption enabled).

Does this happen on upstream kernel too? (with preemption enabled)
Comment 4 Steve Fink 2009-11-25 19:17:07 UTC
Whoops! My kernel does have one local change applied: the atop accounting
patches. Removing those fixes the problem. I'll go try to file a bug on atop
instead. (Fortunately, I really don't need those patches.)