Bug 18577 - on rhel7, listing_mode_sanity.exp always gets a failure when doing 'stap -l **'
Summary: on rhel7, listing_mode_sanity.exp always gets a failure when doing 'stap -l **'
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: testsuite (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-22 18:25 UTC by David Smith
Modified: 2015-06-25 15:14 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Smith 2015-06-22 18:25:20 UTC
On all RHEL7 architectures (x86_64, ppc64, and s390x), the listing_mode_sanity.exp test case always gets a failure when doing 'stap -l **':

====
FAIL: listing_mode_sanity (stap -l ** exited badly)
====

At first I thought this was because of tapset errors - but the tapsets don't have any errors on x86_64 RHEL7. So, I went a bit further. The failure occurs because the command is taking too long. The testsuite uses an rc file to ensure that systemtap doesn't take too long. So, in effect, when doing a 'stap -l **' we are really doing the following the following:

====
# time stap -v --rlimit-cpu=300 -l '**'
Pass 1: parsed user script and 112 library script(s) using 219868virt/37520res/2928shr/34884data kb, in 280usr/50sys/330real ms.
Killed

real	5m34.124s
user	5m32.627s
sys	0m1.075s
====

So, how long does doing 'stap -l **' take? On this VM (2 cpus, 2GB of ram), it takes over 15 minutes:

====
time stap -v -l '**' > stap.log
Pass 1: parsed user script and 112 library script(s) using 219868virt/37520res/2924shr/34884data kb, in 270usr/40sys/325real ms.
Pass 2: analyzed script: 9649 probe(s), 2892 function(s), 122 embed(s), 404 global(s) using 572132virt/388464res/4396shr/387148data kb, in 927390usr/2750sys/931367real ms.

real	15m31.976s
user	15m27.846s
sys	0m2.894s
====

15 minutes is too long. We could work around this by extending the rlimit value in the testsuite, but really we should look into ways to speed this up. Pass 1 is short, Pass 2 takes too long.

In addition, I see some odd/unexpected output (which may be part of the reason this is taking so long). For example, in the networking tapset, there is the following probe alias:

====
probe netdev.receive
	=  kernel.function("netif_receive_skb")
====

This is the only place in the tapsets where the netif_receive_skb() kernel function is referenced. However, in the 'stap -l **' output I see the following:

====
kernel.function("netif_receive_skb@net/core/dev.c:3592").call
kernel.function("netif_receive_skb@net/core/dev.c:3592").callee("__net_timestamp@include/linux/skbuff.h:2603")
kernel.function("netif_receive_skb@net/core/dev.c:3592").callee("__stack_chk_fail@kernel/panic.c:483")
kernel.function("netif_receive_skb@net/core/dev.c:3592").callee("enqueue_to_backlog@net/core/dev.c:3096")
kernel.function("netif_receive_skb@net/core/dev.c:3592").callee("get_rps_cpu@net/core/dev.c:2929")
kernel.function("netif_receive_skb@net/core/dev.c:3592").callee("skb_defer_rx_timestamp@net/core/timestamping.c:104")
kernel.function("netif_receive_skb@net/core/dev.c:3592").callee("static_key_false@include/linux/jump_label.h:93")
kernel.function("netif_receive_skb@net/core/dev.c:3592").exported
kernel.function("netif_receive_skb@net/core/dev.c:3592").return
====

I wouldn't have expected to see netif_receive_skb() in the 'stap -l **' output at all, and certainly not all its variants. In the 'stap -l **' output I see similar output for several (if not all) kernel functions that a probe alias uses.
Comment 1 Josh Stone 2015-06-22 18:47:22 UTC
(In reply to David Smith from comment #0)
> ====
> probe netdev.receive
> 	=  kernel.function("netif_receive_skb")
> ====
[...]
> I wouldn't have expected to see netif_receive_skb() in the 'stap -l **'
> output at all, and certainly not all its variants.

The '**' will end up giving you 'netdev.receive.**' too, which is why you're seeing all these .call/.return/.callee suffixes.

I agree it's not a great way they're listed though.  On F22 I see:

$ stap -l 'netdev.receive**'
kernel.function("netif_receive_skb@net/core/dev.c:3834").call
kernel.function("netif_receive_skb@net/core/dev.c:3834").exported
kernel.function("netif_receive_skb@net/core/dev.c:3834").return
netdev.receive

Ideally those should look like 'netdev.receive.call' etc.  It's tricky to reconcile though.  If someone wrote 'netdev.receive.callee("skb*")' then only the lower-level dwarf processing knows how to expand that wildcard, but that part only deals with "kernel.function.callee" and doesn't know how to leave the alias intact.
Comment 2 David Smith 2015-06-24 17:17:59 UTC
So, I compiled everything with '-pg' to enable profiling with grpof. Running "stap -l '**'" took ~2.5 hours with profiling enabled. Here's the top of the gprof output:

====
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 49.45    250.48   250.48 120081098     0.00     0.00  levenshtein(std::string const&, std::string const&)
 22.85    366.22   115.74    21077     0.01     0.02  levenshtein_suggest(std::string const&, std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, unsigned int, unsigned int)
  7.19    402.66    36.44        1    36.44    43.34  semantic_pass_conditions(systemtap_session&)
====

It looks like we need to look at how we are using the levenshtein functions (which suggest alternatives), since between the 2 functions they took ~72% of the execution time.
Comment 3 David Smith 2015-06-25 15:14:48 UTC
Fixed in commit cff6336.

That commit fixes 2 sources of 'stap -l **' slowness:

1) If we're in either of the listing modes (-l/-L) and the verbose level is less than 2, we're generating the alternatives list... and then throwing it away.

So, to make things faster, if we're in either of the listing modes and the verbose level is less than 2, we no longer try to generate the alternatives list at all.

2) So, why are we trying to generate alternative lists in the first place? Because we've got semantic errors in the tapsets. But, we don't really have semantic errors in the tapsets, only because of the way 'stap -l **' expands everything. Here's an example (with the probe's guts removed):

====
probe syscall.pipe = __syscall.pipe2 ?, __syscall.ia64_pipe ?, __syscall.pipe
{}
probe __syscall.pipe2 = kernel.function("sys_pipe2").call
{}
probe __syscall.ia64_pipe = kernel.function("sys_ia64_pipe").call
{}
probe __syscall.pipe = kernel.function("sys_pipe").call
{}
====

So, the syscall.pipe probe alias has 3 possible locations, 2 of them being optional. So, if you use syscall.pipe you'll be fine. However, if you used __syscall.pipe2 and your kernel doesn't have sys_pipe2, you'll get an error. The same thing would happen with __syscall.ia64_pipe.

'stap -l **' tries to expand all probe aliases, including the "internal" ones starting with '__'.

To workaround this, I changed the above to:

====
probe syscall.pipe = __syscall.pipe2 ?, __syscall.ia64_pipe ?, __syscall.pipe
{}
probe __syscall.pipe2 = kernel.function("sys_pipe2").call ?
{}
probe __syscall.ia64_pipe = kernel.function("sys_ia64_pipe").call ?
{}
probe __syscall.pipe = kernel.function("sys_pipe").call
{}
====

That works by making __syscall.pipe2 and __syscall.ia64_pipe optional. So, there aren't any errors there that need alternatives listed.

However, there are some probe aliases that have a similar problem that can't be fixed:

====
probe nfs.aop.write_end = __nfs.aop.write_end !,
                          __nfs.aop.commit_write
{}                                                                              
probe __nfs.aop.commit_write = kernel.function ("nfs_commit_write") !,
                               module("nfs").function ("nfs_commit_write")
{} 
====

Here, I can't make module("nfs").function("nfs_commit_write") optional, since I want an error here if no targets can be found for the nfs.aop.write_end probe alias.


With the above 2 fixes, I'm now seeing the following times for 'stap -l **' on that same RHEL7 VM:

====
# time stap -l '**' > /dev/null ; echo $?
real	2m33.532s
user	2m48.206s
sys	0m19.263s
0
dsmith@kvm-el7-64-1: rhel7-64 50 -> time stap -l '**' > /dev/null ; echo $?

real	1m46.183s
user	1m43.278s
sys	0m2.750s
0
====

So, we're seeing run times less than 3 minutes. That's much better than 15 minutes.