Bug 17126 - tracepoints.exp testcase causing stalls/hang on ppc64
Summary: tracepoints.exp testcase causing stalls/hang on ppc64
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: runtime (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Frank Ch. Eigler
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-08 14:32 UTC by David Smith
Modified: 2014-11-20 21:32 UTC (History)
2 users (show)

See Also:
Host:
Target: ppc64
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Smith 2014-07-08 14:32:50 UTC
When I run the tracepoints.exp testcase on RHEL7 ppc64 (3.10.0-123.el7.ppc64), the testcase never finishes and can't be interrupted. The console output reports several processor stalls:

====
[  517.597905] stap_1fc11586ceeef8b8c42a1c32930268cd_4_3498: module verification failed: signature and/or required key missing - tainting kernel
[  577.656336] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 17, t=6003 jiffies, g=614, c=613, q=0)
[  577.656360] Task dump for CPU 0:
[  577.656364] swapper/0       R  running task        0     0      0 0x00000000
[  577.656372] Call Trace:
[  577.656378] [c000000001273820] [0000007e43fbfed6] 0x7e43fbfed6 (unreliable)
[  577.656389] [c0000000012739f0] [c000000001273aa0] init_thread_union+0x3aa0/0x4000
[  577.656396] [c000000001273a80] [c000000001273b10] init_thread_union+0x3b10/0x4000
[  637.676283] INFO: rcu_sched detected stalls on CPUs/tasks: { 7 12 13 16} (detected by 17, t=6002 jiffies, g=615, c=614, q=0)
[  637.676306] Task dump for CPU 7:
[  637.676311] swapper/7       R  running task        0     0      1 0x00000804
[  637.676318] Call Trace:
[  637.676323] [c0000003ccbc3870] [0000006d368b1ce6] 0x6d368b1ce6 (unreliable)
[  637.676331] [c0000003ccbc3a40] [0000000000000c00] 0xc00
[  637.676336] Task dump for CPU 12:
[  637.676340] swapper/12      R  running task        0     0      1 0x00000800
[  637.676347] Call Trace:
[  637.676351] [c0000003ccbd7870] [000000592709e938] 0x592709e938 (unreliable)
[  637.676359] [c0000003ccbd7a40] [0000000000000c00] 0xc00
[  637.676364] Task dump for CPU 13:
[  637.676367] swapper/13      R  running task        0     0      1 0x00000800
[  637.676374] Call Trace:
[  637.676379] [c0000003ccbdb870] [c00000000011b63c] .find_busiest_group+0x2ec/0x930 (unreliable)
[  637.676387] [c0000003ccbdba40] [0000000000000c00] 0xc00
[  637.676392] Task dump for CPU 16:
[  637.676395] stapio          R  running task        0  3498   3497 0x00008084
[  637.676402] Call Trace:
[  637.676406] [c0000003c43ab540] [c0000003d3ec3d00] 0xc0000003d3ec3d00 (unreliable)
[  637.676415] [c0000003c43ab710] [c000000000c50380] decrementers+0x0/0x100
[  817.746133] INFO: rcu_sched detected stalls on CPUs/tasks: { 7 12 13 16} (detected by 17, t=24008 jiffies, g=615, c=614, q=0)
[  817.746157] Task dump for CPU 7:
[  817.746161] swapper/7       R  running task        0     0      1 0x00000804
[  817.746169] Call Trace:
[  817.746174] [c0000003ccbc3870] [0000006d368b1ce6] 0x6d368b1ce6 (unreliable)
[  817.746182] [c0000003ccbc3a40] [0000000000000c00] 0xc00
[  817.746187] Task dump for CPU 12:
[  817.746191] swapper/12      R  running task        0     0      1 0x00000800
[  817.746197] Call Trace:
[  817.746201] [c0000003ccbd7870] [000000592709e938] 0x592709e938 (unreliable)
[  817.746209] [c0000003ccbd7a40] [0000000000000c00] 0xc00
[  817.746214] Task dump for CPU 13:
[  817.746217] swapper/13      R  running task        0     0      1 0x00000800
[  817.746224] Call Trace:
[  817.746229] [c0000003ccbdb870] [c00000000011b63c] .find_busiest_group+0x2ec/0x930 (unreliable)
[  817.746237] [c0000003ccbdba40] [0000000000000c00] 0xc00
[  817.746242] Task dump for CPU 16:
[  817.746245] stapio          R  running task        0  3498   3497 0x00008084
[  817.746252] Call Trace:
[  817.746256] [c0000003c43ab540] [c0000003d3ec3d00] 0xc0000003d3ec3d00 (unreliable)
[  817.746264] [c0000003c43ab710] [c000000000c50380] decrementers+0x0/0x100
[  997.785985] INFO: rcu_sched detected stalls on CPUs/tasks: { 7 12 13 16} (detected by 15, t=42013 jiffies, g=615, c=614, q=0)
[  997.786018] Task dump for CPU 7:
[  997.786024] swapper/7       R  running task        0     0      1 0x00000804
[  997.786034] Call Trace:
[  997.786043] [c0000003ccbc3870] [0000006d368b1ce6] 0x6d368b1ce6 (unreliable)
[  997.786053] [c0000003ccbc3a40] [0000000000000c00] 0xc00
[  997.786060] Task dump for CPU 12:
[  997.786065] swapper/12      R  running task        0     0      1 0x00000800
[  997.786074] Call Trace:
[  997.786079] [c0000003ccbd7870] [000000592709e938] 0x592709e938 (unreliable)
[  997.786089] [c0000003ccbd7a40] [0000000000000c00] 0xc00
[  997.786095] Task dump for CPU 13:
[  997.786100] swapper/13      R  running task        0     0      1 0x00000804
[  997.786109] Call Trace:
[  997.786117] [c0000003ccbdb870] [c00000000011b63c] .find_busiest_group+0x2ec/0x930 (unreliable)
[  997.786130] [c0000003ccbdba40] [0000000000000c00] 0xc00
[  997.786137] Task dump for CPU 16:
[  997.786141] stapio          R  running task        0  3498   3497 0x00008084
[  997.786150] Call Trace:
[  997.786155] [c0000003c43ab540] [c0000003d3ec3d00] 0xc0000003d3ec3d00 (unreliable)
[  997.786166] [c0000003c43ab710] [c000000000c50380] decrementers+0x0/0x100
[ 1177.835828] INFO: rcu_sched detected stalls on CPUs/tasks: { 7 12 13 16} (detected by 17, t=60018 jiffies, g=615, c=614, q=0)
[ 1177.835853] Task dump for CPU 7:
[ 1177.835857] swapper/7       R  running task        0     0      1 0x00000804
[ 1177.835865] Call Trace:
[ 1177.835870] [c0000003ccbc3870] [0000006d368b1ce6] 0x6d368b1ce6 (unreliable)
[ 1177.835878] [c0000003ccbc3a40] [0000000000000c00] 0xc00
[ 1177.835883] Task dump for CPU 12:
[ 1177.835887] swapper/12      R  running task        0     0      1 0x00000800
[ 1177.835893] Call Trace:
[ 1177.835897] [c0000003ccbd7870] [000000592709e938] 0x592709e938 (unreliable)
[ 1177.835905] [c0000003ccbd7a40] [0000000000000c00] 0xc00
[ 1177.835909] Task dump for CPU 13:
[ 1177.835913] swapper/13      R  running task        0     0      1 0x00000804
[ 1177.835919] Call Trace:
[ 1177.835924] [c0000003ccbdb870] [c00000000011b63c] .find_busiest_group+0x2ec/0x930 (unreliable)
[ 1177.835932] [c0000003ccbdba40] [0000000000000c00] 0xc00
[ 1177.835937] Task dump for CPU 16:
[ 1177.835941] stapio          R  running task        0  3498   3497 0x00008084
[ 1177.835947] Call Trace:
[ 1177.835951] [c0000003c43ab540] [c0000003d3ec3d00] 0xc0000003d3ec3d00 (unreliable)
[ 1177.835959] [c0000003c43ab710] [c000000000c50380] decrementers+0x0/0x100
[ 1357.905677] INFO: rcu_sched detected stalls on CPUs/tasks: { 7 12 13 16} (detected by 17, t=78024 jiffies, g=615, c=614, q=0)
[ 1357.905702] Task dump for CPU 7:
[ 1357.905706] swapper/7       R  running task        0     0      1 0x00000804
[ 1357.905713] Call Trace:
[ 1357.905718] [c0000003ccbc3870] [0000006d368b1ce6] 0x6d368b1ce6 (unreliable)
[ 1357.905726] [c0000003ccbc3a40] [0000000000000c00] 0xc00
[ 1357.905732] Task dump for CPU 12:
[ 1357.905735] swapper/12      R  running task        0     0      1 0x00000800
[ 1357.905742] Call Trace:
[ 1357.905746] [c0000003ccbd7870] [000000592709e938] 0x592709e938 (unreliable)
[ 1357.905753] [c0000003ccbd7a40] [0000000000000c00] 0xc00
[ 1357.905758] Task dump for CPU 13:
[ 1357.905762] swapper/13      R  running task        0     0      1 0x00000804
[ 1357.905768] Call Trace:
[ 1357.905773] [c0000003ccbdb870] [c00000000011b63c] .find_busiest_group+0x2ec/0x930 (unreliable)
[ 1357.905781] [c0000003ccbdba40] [0000000000000c00] 0xc00
[ 1357.905786] Task dump for CPU 16:
[ 1357.905790] stapio          R  running task        0  3498   3497 0x00008084
[ 1357.905796] Call Trace:
[ 1357.905800] [c0000003c43ab540] [c0000003d3ec3d00] 0xc0000003d3ec3d00 (unreliable)
[ 1357.905808] [c0000003c43ab710] [c000000000c50380] decrementers+0x0/0x100
[ 1538.005524] INFO: rcu_sched detected stalls on CPUs/tasks: { 7 12 13 16} (detected by 17, t=96034 jiffies, g=615, c=614, q=0)
[ 1538.005548] Task dump for CPU 7:
[ 1538.005553] swapper/7       R  running task        0     0      1 0x00000804
[ 1538.005560] Call Trace:
[ 1538.005565] [c0000003ccbc3870] [0000006d368b1ce6] 0x6d368b1ce6 (unreliable)
[ 1538.005573] [c0000003ccbc3a40] [0000000000000c00] 0xc00
[ 1538.005578] Task dump for CPU 12:
[ 1538.005582] swapper/12      R  running task        0     0      1 0x00000800
[ 1538.005589] Call Trace:
[ 1538.005592] [c0000003ccbd7870] [000000592709e938] 0x592709e938 (unreliable)
[ 1538.005600] [c0000003ccbd7a40] [0000000000000c00] 0xc00
[ 1538.005605] Task dump for CPU 13:
[ 1538.005608] swapper/13      R  running task        0     0      1 0x00000804
[ 1538.005615] Call Trace:
[ 1538.005620] [c0000003ccbdb870] [c00000000011b63c] .find_busiest_group+0x2ec/0x930 (unreliable)
[ 1538.005628] [c0000003ccbdba40] [0000000000000c00] 0xc00
[ 1538.005633] Task dump for CPU 16:
[ 1538.005636] stapio          R  running task        0  3498      1 0x00008084
[ 1538.005643] Call Trace:
[ 1538.005647] [c0000003c43ab540] [c0000003d3ec3d00] 0xc0000003d3ec3d00 (unreliable)
[ 1538.005655] [c0000003c43ab710] [c000000000c50380] decrementers+0x0/0x100
====
Comment 1 David Smith 2014-07-08 19:55:52 UTC
From some more debugging, it appears to be something going on with tracepoints that start with 'h'. If I change tracepoints.stp to probe 'kernel.trace("[a-gi-z]*")', the testcase works fine.

Here are the tracepoints that start with 'h':

====
# stap -l 'kernel.trace("h*")'
kernel.trace("hcall_entry")
kernel.trace("hcall_exit")
kernel.trace("host1x_cdma_begin")
kernel.trace("host1x_cdma_end")
kernel.trace("host1x_cdma_push")
kernel.trace("host1x_cdma_push_gather")
kernel.trace("host1x_channel_open")
kernel.trace("host1x_channel_release")
kernel.trace("host1x_channel_submit")
kernel.trace("host1x_channel_submit_complete")
kernel.trace("host1x_channel_submitted")
kernel.trace("host1x_syncpt_load_min")
kernel.trace("host1x_syncpt_wait_check")
kernel.trace("host1x_wait_cdma")
kernel.trace("hrtimer_cancel")
kernel.trace("hrtimer_expire_entry")
kernel.trace("hrtimer_expire_exit")
kernel.trace("hrtimer_init")
kernel.trace("hrtimer_start")
====

So, there are basically 3 sets: hcall_*, host1x_*, and hrtimer_*. Each of those sets works fine individually. I haven't quite found the combination that causes the stall.
Comment 2 David Smith 2014-07-09 14:03:09 UTC
After some more testing, it appears that any combination of the hrtimer_* tracepoints with the hcall_* tracepoints cause the stall/hang.

The hcall_* tracepoints are ppc64 specific. I believe 'hcall' is short for hypervisor call.
Comment 3 David Smith 2014-07-09 21:02:37 UTC
Using "perf" (via the following command) doesn't show the stall.

     # perf record -e 'timer:hrtimer_*' -e 'powerpc:hcall_*' -a ls
Comment 4 David Smith 2014-07-10 15:43:49 UTC
Using "stap -DSTP_ALIBI" (which ifdef's out probe bodies) doesn't show the stall either. This means that something the probe body is doing is causing the hang.

Over in __trace_hcall_entry() (in arch/powerpc/platforms/pseries/lpar.c), I see the following comment:

	/*
	 * We cannot call tracepoints inside RCU idle regions which
	 * means we must not trace H_CEDE.
	 */

I wonder if a specific hcall opcode triggers this stall.
Comment 5 hemkshaw@in.ibm.com 2014-07-21 09:08:02 UTC
Hi,
I am trying to reproduce this bug on a RHEL7 system. But due to some reasons, stap is not able to find the trace points related to hcall.
# uname -r
3.10.0-123.el7.ppc64

And the debug and devel pkgs for this kernel are also installed :
# rpm -qa | grep kernel-de
kernel-devel-3.10.0-123.el7.ppc64
kernel-debug-3.10.0-123.el7.ppc64

But,
# stap -l 'kernel.trace("hcall_*")'
shows nothing!

Although, perf is able to show them,
# perf list tracepoints | grep hcall
  powerpc:hcall_entry                                [Tracepoint event]
  powerpc:hcall_exit                                 [Tracepoint event]

However, stap is able to show the events related to hrtimer and is also able to probe into these events, viz.

# stap -l 'kernel.trace("hrtimer_*")'
kernel.trace("hrtimer_cancel")
kernel.trace("hrtimer_expire_entry")
kernel.trace("hrtimer_expire_exit")
kernel.trace("hrtimer_init")
kernel.trace("hrtimer_start")

Since, the problem is due to the combination of probing into hrtimer_* and hcall_*, I am unable to produce this bug as systemtap is failing to probe into hcall_* :

# stap -v -e 'probe kernel.trace("hcall_*") { printf("hcall hit!"); }'
Pass 1: parsed user script and 97 library script(s) using 153600virt/35968res/7936shr/24640data kb, in 220usr/10sys/233real ms.
semantic error: while resolving probe point: identifier 'kernel' at <input>:1:7
        source: probe kernel.trace("hcall_*") { printf("hcall hit!"); }
                      ^

semantic error: no match (similar functions: kfree, kmalloc, console, kvm_fpu, cpu_idle)
Pass 2: analyzed script: 0 probe(s), 0 function(s), 0 embed(s), 0 global(s) using 164416virt/46144res/11648shr/28608data kb, in 60usr/180sys/243real ms.
Pass 2: analysis failed.  [man error::pass2]

Thanks,
Hemant
Comment 6 Frank Ch. Eigler 2014-07-21 11:38:10 UTC
Please install the kernel-debuginfo-3.10.0-123.el7.ppc64 package,
not the kernel-debug-3.10.0-123.el7.ppc64 package.  The former
contains kernel sources and debugging object files; the latter
contains kernel binaries for a completely separate configuration/build.
Comment 7 hemkshaw@in.ibm.com 2014-07-28 12:02:57 UTC
Thanks Frank, I installed the kernel-debuginfo and kernel-debuginfo-common packages for the required kernel.
But, even after installation of kernel-debuginfo-3.10.0-123.el7.ppc64 package, the problem still persists.

# rpm -qa | grep kernel-debuginfo
kernel-debuginfo-3.10.0-123.el7.ppc64
kernel-debuginfo-common-ppc64-3.10.0-123.el7.ppc64

# stap -l' kernel.trace("hcall_*")'
#

I looked into /proc/kallsysms to find where this hcall related symbols differ from others, but the hcall related symbols and hrtimer related symbols were similar. 
The /proc/kallsyms have entries for __tracepoint_ptr_hcall_entry and __tracepoint_ptr_hrtimer_init.

I strace'd stap to try and find out where actually stap is failing to find the hcall_* events, but stap is going through all the usual files it needs. So far,  I didn't notice anything unusual on the strace o/p.

Thanks,
Hemant
Comment 8 David Smith 2014-07-28 15:00:03 UTC
(In reply to hemkshaw@in.ibm.com from comment #7)
> Thanks Frank, I installed the kernel-debuginfo and kernel-debuginfo-common
> packages for the required kernel.
> But, even after installation of kernel-debuginfo-3.10.0-123.el7.ppc64
> package, the problem still persists.
> 
> # rpm -qa | grep kernel-debuginfo
> kernel-debuginfo-3.10.0-123.el7.ppc64
> kernel-debuginfo-common-ppc64-3.10.0-123.el7.ppc64
> 
> # stap -l' kernel.trace("hcall_*")'
> #

Does "perf list tracepoint" show the hcall tracepoints on your system?
Comment 9 hemkshaw@in.ibm.com 2014-07-29 09:33:52 UTC
Hi David,
(In reply to David Smith from comment #8)
> (In reply to hemkshaw@in.ibm.com from comment #7)
> > Thanks Frank, I installed the kernel-debuginfo and kernel-debuginfo-common
[SNIP]
> > # stap -l' kernel.trace("hcall_*")'
> > #
> 
> Does "perf list tracepoint" show the hcall tracepoints on your system?

Yes, perf list tracepoints show the hcall events :

# perf list tracepoints | grep hcall
  powerpc:hcall_entry                                [Tracepoint event]
  powerpc:hcall_exit                                 [Tracepoint event]

Thanks,
Hemant
Comment 10 Frank Ch. Eigler 2014-07-29 15:46:50 UTC
Can you run

% stap --poison-cache -k --vp 05000 -L 'kernel.trace("hcall_*")'

and check out the resulting messages & /var/tmp/stapFOO directory,
as to processing of the arch/powerpc/include/asm/trace.h tracepoints?
Comment 11 hemkshaw@in.ibm.com 2014-10-20 11:28:04 UTC
Hi,
If I try to reproduce this bug with systemtap-2.6.0 and (upstream)kernel v3.17, I can't reproduce this.

# cat my_probes.stp
probe kernel.trace("hcall_entry") { printf("hcall_entry\n"); }
probe kernel.trace("hcall_exit") { printf("hcall_exit\n"); }
probe kernel.trace("hrtimer*") { printf("hrtimer_\n"); }

# stap -v my_probes.stp

hrtimer_
hrtimer_
hrtimer_
...
Comment 12 David Smith 2014-10-20 20:18:37 UTC
I just tried tracepoints.exp on 3.10.0-188.el7.ppc64 and HEAD systemtap. I still get the hang, both with the traceponts.exp testcase and the following 1-line script:

# stap -ve 'probe kernel.trace("hcall_*"), kernel.trace("hrtimer_*") { printf("%s\n", ppfunc()) }'

Note that however I do get different behavior. With the tracepoints.exp testcase the system's console reported a backtrace similar to comment #1. With the 1-liner, I get a hard system hang with nothing reported on the console.
Comment 13 David Smith 2014-10-20 20:29:53 UTC
I don't see any changes in __trace_hcall_entry()/__trace_hcall_exit() between RHEL7's source and the latest kernel sources.
Comment 14 hemkshaw@in.ibm.com 2014-10-21 15:46:48 UTC
(In reply to David Smith from comment #13)
> I don't see any changes in __trace_hcall_entry()/__trace_hcall_exit()
> between RHEL7's source and the latest kernel sources.

Ok, I didn't see this bug with kernel 3.17 (upstream). Although, I was able to reproduce this bug in RHEL7 ppc64 (3.10.0-123.el7.ppc64). Both with the same versions of systemtap.
Comment 15 David Smith 2014-10-21 17:03:59 UTC
(In reply to hemkshaw@in.ibm.com from comment #14)
> (In reply to David Smith from comment #13)
> > I don't see any changes in __trace_hcall_entry()/__trace_hcall_exit()
> > between RHEL7's source and the latest kernel sources.
> 
> Ok, I didn't see this bug with kernel 3.17 (upstream). Although, I was able
> to reproduce this bug in RHEL7 ppc64 (3.10.0-123.el7.ppc64). Both with the
> same versions of systemtap.

I'm glad to know you were able to reproduce it on RHEL7, which I don't think I knew. Since you didn't see on on 3.17, I noted that I don't see any difference between the source for the hcall trace functions between the 2 kernels. But it does potentially sound like a RHEL7 kernel bug if you've used the same version of systemtap on both the crashing and non-crashing kernels.

Thanks for continuing to look at this.
Comment 16 Frank Ch. Eigler 2014-11-20 21:32:58 UTC
commit c4a048331603 works around the kernel problem via a translator blacklist