Bug 15775 - relay host/guest functionality is broken on RHEL6
Summary: relay host/guest functionality is broken on RHEL6
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: runtime (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-23 19:02 UTC by David Smith
Modified: 2016-02-23 19:01 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Smith 2013-07-23 19:02:35 UTC
On RHEL6, the systemtap.printf/sharedbuf.exp test fails:

====
Running /es/scratch/dsmith/systemtap/src/testsuite/systemtap.printf/sharedbuf.exp ...
spawn stap /es/scratch/dsmith/systemtap/src/testsuite/systemtap.printf/sharedbuf.stp -DRELAY_HOST=test1
Host: begin
PASS: shared buffer hosting
WARNING: "stp_print_flush_test1" [/tmp/stap3qLPPW/stap_3afb871f7052c2f8b13ef4bc30d1f01a_1033.ko] undefined!
ERROR: Couldn't insert module '/tmp/stap3qLPPW/stap_3afb871f7052c2f8b13ef4bc30d1f01a_1033.ko': Unknown symbol in module
WARNING: /usr/local/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]
PASS: shared buffer guest
FAIL: buffer sharing (1, 0)
testcase /es/scratch/dsmith/systemtap/src/testsuite/systemtap.printf/sharedbuf.exp completed in 127 seconds
====

In /var/log/messages, you'll see this:

====
Jul 23 13:03:09 kvm-el6-64-1 kernel: stap_2a284551e36db9394c1f2b499dfa812_12005: no symbol version for stp_print_flush_test1
Jul 23 13:03:09 kvm-el6-64-1 kernel: stap_2a284551e36db9394c1f2b499dfa812_12005: Unknown symbol stp_print_flush_test1
====

I believe this is happening because of kernel symbol versioning. When the 2nd module gets built, is doesn't have access to the Module.symvers from the 1st module.

Here's what the kernel's Documentation/kbuild/modules.txt file says about symbol versioning:

====
--- 6.3 Symbols From Another External Module

	Sometimes, an external module uses exported symbols from
	another external module. kbuild needs to have full knowledge of
	all symbols to avoid spliitting out warnings about undefined
	symbols. Three solutions exist for this situation.

	NOTE: The method with a top-level kbuild file is recommended
	but may be impractical in certain situations.

	Use a top-level kbuild file
		If you have two modules, foo.ko and bar.ko, where
		foo.ko needs symbols from bar.ko, you can use a
		common top-level kbuild file so both modules are
		compiled in the same build. Consider the following
		directory layout:

		./foo/ <= contains foo.ko
		./bar/ <= contains bar.ko

		The top-level kbuild file would then look like:

		#./Kbuild (or ./Makefile):
			obj-y := foo/ bar/

		And executing

			$ make -C $KDIR M=$PWD

		will then do the expected and compile both modules with
		full knowledge of symbols from either module.

	Use an extra Module.symvers file
		When an external module is built, a Module.symvers file
		is generated containing all exported symbols which are
		not defined in the kernel. To get access to symbols
		from bar.ko, copy the Module.symvers file from the
		compilation of bar.ko to the directory where foo.ko is
		built. During the module build, kbuild will read the
		Module.symvers file in the directory of the external
		module, and when the build is finished, a new
		Module.symvers file is created containing the sum of
		all symbols defined and not part of the kernel.

	Use "make" variable KBUILD_EXTRA_SYMBOLS
		If it is impractical to copy Module.symvers from
		another module, you can assign a space separated list
		of files to KBUILD_EXTRA_SYMBOLS in your build file.
		These files will be loaded by modpost during the
		initialization of its symbol tables.
====

Based on that, I tried using the 3rd solution.

In one terminal for the server:

====
# stap -p4 -k testsuite/systemtap.printf/sharedbuf.stp  -DRELAY_HOST=test1
stap_14594.ko
Keeping temporary directory "/tmp/stapie3sXR"
# staprun /tmp/stapie3sXR/stap_14594.ko 
Host: begin
HelloWorld
====

In another terminal for the client. Notice I try first without setting 'KBUILD_EXTRA_SYMBOLS'.

====
# stap testsuite/systemtap.printf/hello.stp -DRELAY_GUEST=test1
ERROR: Couldn't insert module '/tmp/stapYX78X6/stap_2a284551e36db9394c1f2b499dfa812b_1006.ko': Unknown symbol in module
WARNING: /usr/local/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]
# KBUILD_EXTRA_SYMBOLS=/tmp/stapie3sXR/Module.symvers stap --disable-cache -v testsuite/systemtap.printf/hello.stp -DRELAY_GUEST=test1
Pass 1: parsed user script and 95 library script(s) using 200112virt/26232res/2936shr/23760data kb, in 180usr/20sys/196real ms.
Pass 2: analyzed script: 1 probe(s), 1 function(s), 0 embed(s), 0 global(s) using 200640virt/27016res/3208shr/24288data kb, in 0usr/0sys/5real ms.
Pass 3: translated to C into "/tmp/stap7CL6zk/stap_14836_src.c" using 200764virt/27252res/3428shr/24412data kb, in 0usr/0sys/1real ms.
Pass 4: compiled C into "stap_14836.ko" in 6950usr/1220sys/8715real ms.
Pass 5: starting run.
Pass 5: run completed in 20usr/30sys/318real ms.
====

Strangely enough sharedbuf.exp passes on rawhide (3.11.0-0.rc0.git7.1.fc20.x86_64).
Comment 1 Frank Ch. Eigler 2015-05-19 14:39:50 UTC
(not a high-priority functionality)
Comment 2 David Smith 2016-02-22 19:31:52 UTC
Note that the relay host/guest functionality also doesn't work on RHEL6, RHEL7 or any current fedora or rawhide kernels. The only place this works is on RHEL5-era kernels.

Perhaps we should just deprecate this functionality (and remove its documentation from the manpage)?

The least thing we could do here would be to change the sharedbuf.exp test case to issue KFAILS (and reference this bug report).

(One *ugly* solution here would be to bypass the module exporting mechanism here and use kallsyms_lookup_name() to find the function. The problem with this approach is that wouldn't update the relay host module's reference count, making it possible to remove the relay host module before the relay guest module. We might be able to update the relay host module's reference count ourselves but we don't really know the module's name and there are probably several time-of-check-to-time-of-use (TOCTTOU) problems lurking there.)
Comment 3 David Smith 2016-02-23 19:01:50 UTC
We decided to remove this functionality. Fixed in commit 3d7e775.