Summary: | system crash when running "./systemtap.stress/current.stp" on power | ||
---|---|---|---|
Product: | systemtap | Reporter: | Jim Keniston <jkenisto> |
Component: | kprobes | Assignee: | Ananth Mavinakayanahalli <ananth> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: | Patch against 2.6.9-42.EL |
Description
Jim Keniston
2005-12-22 02:42:33 UTC
Subject: Re: New: system crash when running "./systemtap.stress/current.stp" on power
>> My suggestions for diagnosing this bug include:
> 1. Try the same thing without the entry probes.
> 2. Try the same thing without the return probes.
> 3. Run "stap -p3 xxx.stp > xxx.c" and extract the list of kretprobe
> probe
> addresses (dwarf_kprobe_1[]?). Build a C module that establishes
> entry kprobes
> and/or return probes for all these functions. See if that crashes.
> If so, keep
> removing functions from the list until you get a module that doesn't
> cause a
> crash. Keep playing with the list until you figure out a minimal
> list to
> demonstrate the bug.
Thanks for your suggestions. I've tried 1 and 2, and no crashes. I'll
try 3
to minimize the list.
With Anil's fix for bz#2071 applied to kernel v2.6.15-rc5 and modify the systemtap.stress/current.stp (comment out probe module("*"), since it does not work on ppc64). I was able to run the test on Power 5. Here's the output of the test systemtap starting systemtap ending probe count = 6535502 sum = 22080034 min = 2 max = 15 avg = 3 systemtap test success systemtap test success WARNING: Number of errors: 0, skipped probes: 47933 Running rm -rf /tmp/stapU0gKN6 I tried to recreate this problem on a POWER4 LPAR: [root@llm16 systemtap.stress]# cat /proc/cpuinfo processor : 0 cpu : POWER4+ (gq) clock : 1200.791720MHz revision : 18.3 (pvr 0038 1203) processor : 1 cpu : POWER4+ (gq) clock : 1200.791720MHz revision : 18.3 (pvr 0038 1203) timebase : 150098965 platform : pSeries machine : CHRP IBM,7028-6C4 The test ran just fine, for two iterations atleast: [root@llm16 systemtap.stress]# stap -g current.stp systemtap starting probe systemtap ending probe count = 5889610 sum = 40002649 min = 3 max = 11 avg = 6 systemtap test success systemtap test success [root@llm16 systemtap.stress]# stap -g current.stp systemtap starting probe systemtap ending probe count = 16514488 sum = 112075474 min = 3 max = 15 avg = 6 systemtap test success systemtap test success WARNING: Number of errors: 0, skipped probes: 440 [root@llm16 systemtap.stress]# This machine is running RHEL (U3) [root@llm16 systemtap.stress]# cat /etc/redhat-release Red Hat Enterprise Linux AS release 4 (Nahant Update 3) But the kernel running at the time is upstream 2.6.18-rc3 (compiled with pseries_defconfig): [root@llm16 systemtap.stress]# uname -a Linux llm16.in.ibm.com 2.6.18-rc3 #4 SMP Wed Aug 2 17:50:14 IST 2006 ppc64 ppc64 ppc64 GNU/Linux I'll try to increase the test run duration to see if the problem can be recreated. Mike, Jian Gui, was the problem with just the RHEL4-Ux kernel? Can you please try the same test with the upstream kernel? Ananth We've changed our machines soon after the original bug report, thus now I have to try the same test on my Power5 lpar. I can run this test successfully. I think this bug has been fixed as Hien mentioned above and we can close this bug. The environment is also RHEL4_U3 and kernel 2.6.18-rc3 (compiled with pseries_defconfig). root:systemtap.stress>cat /proc/cpuinfo ... processor : 7 cpu : POWER5 (gr) clock : 1502.496000MHz revision : 2.2 (pvr 003a 0202) timebase : 188044000 platform : pSeries machine : CHRP IBM,9124-720 root:systemtap.stress>stap -g current.stp systemtap starting probe systemtap ending probe count = 53479304 sum = 316413369 min = 4 max = 11 avg = 5 systemtap test success systemtap test success root:systemtap.stress>stap -g current.stp systemtap starting probe systemtap ending probe count = 13909012 sum = 74516691 min = 2 max = 15 avg = 5 systemtap test success systemtap test success WARNING: Number of errors: 0, skipped probes: 2020 This bug shouldn't be closed yet. The fix Hien mentions in comment #2 was not accepted into the kernel. We're currently seeing this problem with SLES 10 on power4 and the latest systemtap snapshot. We are not seeing the problem on power5, although I don't think this is a power4 vs power5 issue per se. I think it's partially related to which functions the wildcards resolve to, especially for modules. I'll update this report with more details later today. Somebody from IBM needs to follow up on this. Subject: Re: system crash when running "./systemtap.stress/current.stp" on power
On Thu, Nov 16, 2006 at 01:15:33AM -0000, jkenisto at us dot ibm dot com wrote:
>
> ------- Additional Comments From jkenisto at us dot ibm dot com 2006-11-16 01:15 -------
> Somebody from IBM needs to follow up on this.
>
> --
> What |Removed |Added
> ----------------------------------------------------------------------------
> AssignedTo|systemtap at sources dot |ananth at in dot ibm dot com
> |redhat dot com |
> Status|NEW |ASSIGNED
>
>
> http://sourceware.org/bugzilla/show_bug.cgi?id=2091
Amit will be looking into this issue.
Ananth
Created attachment 1425 [details]
Patch against 2.6.9-42.EL
Patch that fixed the power4-only itrace bug. I don't know for sure if this is
needed on RHEL4, but its worth a test.
A survey of the weekly snapshot tests for powerpc shows that current.stp is working fine. Closing this bug as WORKSFORME. If this problem is seen again, we can reopen this bug. |