Systemtap testcase for vfs (testsuite/buildok/vfs_testcase.stp)fails on custom kernel on RHEL 5.3 on ppc64 uname -a : Linux 2.6.30-rc5 #1 SMP Tue May 12 17:15:22 IST 2009 ppc64 ppc64 ppc64 GNU/Linux SystemTap translator/driver (version 0.9.7/0.137 non-git sources) (detailed output attached) On reading the vmlinux with readelf, one finds that the function parameters/context variables (which systemtap is unable to resolve) have valid location details. The error persists even on systemtap git sources built with elfutils-0.140. Even a simple systemtap probe such as the following : # stap -vve 'probe kernel.function("generic_file_llseek"){printf("\n %p",$file);}' Fails with the following msg : ------------------------------------------------------------------- SystemTap translator/driver (version 0.9.7/0.137 non-git sources) Copyright (C) 2005-2009 Red Hat, Inc. and others This is free software; see the source for copying conditions. Session arch: ppc64 release: 2.6.30-rc5 Created temporary directory "/tmp/stapED7o0s" Searched '/usr/local/share/systemtap/tapset/ppc64/*.stp', found 2 Searched '/usr/local/share/systemtap/tapset/*.stp', found 50 Pass 1: parsed user script and 52 library script(s) in 500usr/10sys/562real ms. probe generic_file_llseek@fs/read_write.c:89 kernel reloc=.dynamic section=.text pc=0xc00000000014e004 semantic error: not accessible at this address: identifier '$file' at <input>:1:61 source: probe kernel.function("generic_file_llseek"){printf("\n %p",$file);} ^ Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 550usr/960sys/1763real ms. Pass 2: analysis failed. Try again with another '--vp 01' option. Running rm -rf /tmp/stapED7o0s --------------------------------------------------------------------- The testcase runs successfully with distro kernel. It also passes on same kernel built on a ppc64 fedora system.
Created attachment 3980 [details] output Output when running : #stap -v /usr/share/systemtap/testsuite/buildok/vfs_testcase.stp
The problem is not unique to vfs. I am seeing that any variable reference fails: [ananth@... ~]$ stap -ve 'probe kernel.function("do_fork") { printf("%x\n", $clone_flags) exit() }' Pass 1: parsed user script and 52 library script(s) in 490usr/20sys/781real ms. semantic error: not accessible at this address: identifier '$clone_flags' at <input>:1:51 source: probe kernel.function("do_fork") { printf("%x\n", $clone_flags) exit() } ^ Pass 2: analyzed script: 1 probe(s), 1 function(s), 0 embed(s), 0 global(s) in 560usr/1470sys/9074real ms. Pass 2: analysis failed. Try again with another '--vp 01' option. Could this be something to do with the toolchain/debuginfo created thereof? Even $$parms fails thus: [ananth@llm27lp1 ~]$ stap -ve 'probe kernel.function("do_fork") { printf("%s\n", $$parms) exit() }' Pass 1: parsed user script and 52 library script(s) in 500usr/10sys/566real ms. Pass 2: analyzed script: 1 probe(s), 1 function(s), 0 embed(s), 0 global(s) in 560usr/960sys/1774real ms. Pass 3: translated to C into "/tmp/stapmuqFAA/stap_68f940748294de8d2965c98af4c68f48_836.c" in 1180usr/970sys/2441real ms. Pass 4: compiled C into "stap_68f940748294de8d2965c98af4c68f48_836.ko" in 7100usr/950sys/9487real ms. Pass 5: starting run. clone_flags=? stack_start=? regs=? stack_size=? parent_tidptr=? child_tidptr=? Pass 5: run completed in 20usr/60sys/8336real ms.
I don't have a ppc machine for testing, but I was not able to reproduce this on i686 or x86_64. I tried both 0.9.7/0.137 and 0.9.7/0.141, and everything looked fine. This is all with the normal el5 toolchain too (gcc-4.1.2-44.el5, binutils-2.17.50.0.6-9.el5). I configured my kernel 2.6.30-rc5 by copying config-2.6.18-128.1.10.el5, then running "make oldconfig" with all default answers. Can you provide your config for the kernel that's not working?
Created attachment 3985 [details] .config of the powerpc kernel .config attached. This is a streamlined config for the machine in question and will probably need to be tweaked for other machines. Also, this problem has only been seen on RHEL5.x powerpc boxes -- Fedora powerpc works fine. Further, I am now fairly convinced this isn't a debuginfo mismatch issue. I built a new kernel from scratch and new SystemTap with REHL5.3 elfutils: [ananth@... linux-2.6.30-rc8]$ stap -vvve 'probe kernel.function("do_fork") { printf("%s\n", $clone_flags) exit() }' SystemTap translator/driver (version 0.9.7/0.137 non-git sources) Copyright (C) 2005-2009 Red Hat, Inc. and others This is free software; see the source for copying conditions. Session arch: ppc64 release: 2.6.30-rc8 Created temporary directory "/tmp/stapkxanHQ" Searched '/usr/local/share/systemtap/tapset/ppc64/*.stp', found 3 Searched '/usr/local/share/systemtap/tapset/*.stp', found 52 Pass 1: parsed user script and 55 library script(s) in 640usr/10sys/724real ms. blacklist regexps: blfn: ^(atomic_notifier_call_chain|default_do_nmi|__die|die_nmi|do_debug|do_general_protection|do_int3|do_IRQ|do_page_fault|do_sparc64_fault|do_trap|dummy_nmi_callback|flush_icache_range|ia64_bad_break|ia64_do_page_fault|ia64_fault|io_check_error|mem_parity_error|nmi_watchdog_tick|notifier_call_chain|oops_begin|oops_end|program_check_exception|single_step_exception|sync_regs|unhandled_fault|unknown_nmi_error|.*raw_.*lock.*|.*read_.*lock.*|.*write_.*lock.*|.*spin_.*lock.*|.*rwlock_.*lock.*|.*rwsem_.*lock.*|.*mutex_.*lock.*|raw_.*|.*seq_.*lock.*|atomic_.*|atomic64_.*|get_bh|put_bh|.*apic.*|.*APIC.*|.*softirq.*|.*IRQ.*|.*_intr.*|__delay|.*kernel_text.*|get_current|current_.*|.*exception_tables.*|.*setup_rt_frame.*|.*preempt_count.*|preempt_schedule)$ blfn_ret: ^(do_exit|sys_exit|sys_exit_group)$ blfile: ^(kernel/kprobes.c|arch/.*/kernel/kprobes.c|include/asm/io.h|include/asm/bitops.h|arch/.*/include/asm/io.h|arch/.*/include/asm/bitops.h|drivers/ide/ide-iops.c)$ parsed 'do_fork' -> func 'do_fork' focused on module 'kernel = [0xc000000000000000-0xc00000000093cfb0, bias 0x0] file /boot/vmlinux-2.6.30-rc8 ELF machine |ppc64 (code 21) focused on module 'kernel' selected function do_fork probe do_fork@kernel/fork.c:1347 kernel reloc=.dynamic section=.text pc=0xc000000000082ad0 finding location for local 'clone_flags' near address 0xc000000000082ad0, module bias 0x0 Eliding unused function error Eliding unused function log Eliding unused function stp_print_binary Eliding unused function warn Resolution problem with probe probe_1839 { printf("%s\\n", $clone_flags) exit() } semantic error: not accessible at this address: identifier '$clone_flags' at <input>:1:51 source: probe kernel.function("do_fork") { printf("%s\n", $clone_flags) exit() } ^ Pass 2: analyzed script: 1 probe(s), 1 function(s), 0 embed(s), 0 global(s) in 550usr/990sys/1794real ms. Pass 2: analysis failed. Try again with another '--vp 01' option. Running rm -rf /tmp/stapkxanHQ [ananth@... linux-2.6.30-rc8]$ grep do_fork /proc/kallsyms c000000000082ad0 T .do_fork
Further: binutils-2.17.50.0.6-9.el5 gcc-4.1.2-44.el5
You will need to examine the generated vmlinux file a little deeper, such as "eu-readelf -w vmlinux", searching for the functions, variables, and especially location lists in question. You could also try siccing gdb onto that file, asking it to print locals ("info scope FUNCTION"). Also you can run stap with "--vp 03" or so to make sure it finds the particular vmlinux file you expect it to.
[ananth@... ~]$ gdb latest/linux-2.6.30-rc8/vmlinux ... (gdb) info scope do_fork Scope for do_fork: Symbol clone_flags is a variable with multiple locations, length 8. Symbol stack_start is a variable with multiple locations, length 8. Symbol regs is a variable with multiple locations, length 8. Symbol stack_size is a variable with multiple locations, length 8. Symbol parent_tidptr is a variable with multiple locations, length 8. Symbol child_tidptr is a variable with multiple locations, length 8. Symbol p is a variable with multiple locations, length 8. Symbol trace is a variable with multiple locations, length 4. Symbol nr is a variable with multiple locations, length 8. Symbol count is in static storage at address 0xc00000000080a2d8, length 4. Also using $$parms, stap is able to decode parameter names: [ananth@... ~]$ stap -ve 'probe kernel.function("do_fork") { printf("%s\n", $$parms) exit() }' Pass 1: parsed user script and 55 library script(s) in 640usr/10sys/716real ms. Pass 2: analyzed script: 1 probe(s), 1 function(s), 0 embed(s), 0 global(s) in 550usr/980sys/1779real ms. Pass 3: translated to C into "/tmp/staprB28ja/stap_af541a9acb56640cfdb92d9022e5fbf7_780.c" in 1370usr/990sys/2756real ms. Pass 4: compiled C into "stap_af541a9acb56640cfdb92d9022e5fbf7_780.ko" in 6790usr/1000sys/12479real ms. Pass 5: starting run. clone_flags=? stack_start=? regs=? stack_size=? parent_tidptr=? child_tidptr=? Pass 5: run completed in 10usr/50sys/8486real ms.
Without specific readelf data that shows the location lists of a sample variable, plus symbol table information to confirm stap's placement of the probe, there is not enough information. You could try posting your vmlinux file somewhere if you wish to delegate this diagnosis further.
Sure Frank. http://www.4shared.com/file/111192529/8827deeb/vmlinux-2630-rc8.html is the bz2 of the vmlinux in question.
Blame GCC. The location list's first defined range is at .do_fork+0x48, but the PC of interest is .do_fork+0 (the entry point). Relevant snippets: [643eda] subprogram sibling [6446e5] external name "do_fork" decl_file 1 decl_line 1347 prototyped type [630f4a] low_pc 0xc000000000082ad0 <.do_fork> high_pc 0xc000000000082f14 <.get_task_mm> frame_base location list [ c71f5] [643f00] formal_parameter name "clone_flags" decl_file 1 decl_line 1341 type [630ecd] location location list [ c7242] [ c7242] 0xc000000000082b18 <.do_fork+0x48>..0xc000000000082b38 <.do_fork+0x68> [ 0] reg3 0xc000000000082b38 <.do_fork+0x68>..0xc000000000082f14 <.get_task_mm> [ 0] reg26
"Blame GCC" is right. I think the systemtap problem is just yet another symptom of the generally worsening state of gcc debug info over the 4.x series. Out of curiousity I tried compiling with 4.3.4 and 4.5.0 20090521 compilers I had lying around, and they both exhibited similar problems to the 4.1.2 compiler reportedly used.
(In reply to comment #11) > "Blame GCC" is right. I think the systemtap problem is just yet another symptom > of the generally worsening state of gcc debug info over the 4.x series. > > Out of curiousity I tried compiling with 4.3.4 and 4.5.0 20090521 compilers I > had lying around, and they both exhibited similar problems to the 4.1.2 compiler > reportedly used. With gcc-VTA now nearly ready, this could be a good demo of its worth.
So, do we designate this a 'feature' and document it some place?
Some more info: The gcc problem with debug location lists for function parameters not starting at the beginning of the function is triggered by -mno-sched-epilog. -mno-sched-epilog stops function prologue scheduling as well as epilogue, and does so by emitting the function prologue/epilogue very late in the compilation process. The prologue insns so emitted are located before the "start of function" as seen by the rest of the compiler. This confuses variable tracking. I think this bug against systemtap ought to be closed and another opened against gcc.
I've opened http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40473
(In reply to comment #14) > Some more info: The gcc problem with debug location lists for function > parameters not starting at the beginning of the function is triggered by > -mno-sched-epilog. -mno-sched-epilog stops function prologue scheduling as well. In this case, it may be worth trying to use "stap -P", to activate the heuristic prologue-end searching code for probe placement.
Unfortunately, running stap in prologue-searching mode still doesn't locate these variables on ppc.
Last I checked, Alan Modra was working on http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40473 for a possible fix.
Fix applied to mainline, 4.4 and 4.3 gcc branches. Please verify and close this bug.
Verified on Rawhide - Alan's patch fixes the problem. [ananth@p550lp3 ~]$ gcc --version gcc (GCC) 4.4.1 20090925 (Red Hat 4.4.1-17) [root@p550lp3 ~]# stap -ve 'probe kernel.function("do_fork") { printf("%s\n", $$parms) exit() }' Pass 1: parsed user script and 59 library script(s) in 390usr/50sys/1096real ms. Pass 2: analyzed script: 1 probe(s), 7 function(s), 0 embed(s), 0 global(s) in 1040usr/340sys/3468real ms. Pass 3: using cached /root/.systemtap/cache/d1/stap_d1515867ee23dbef085a1d33aacdd040_2401.c Pass 4: using cached /root/.systemtap/cache/d1/stap_d1515867ee23dbef085a1d33aacdd040_2401.ko Pass 5: starting run. clone_flags=0x1200011 stack_start=0xff89d130 regs=0xc000000034af3ea0 stack_size=0x0 parent_tidptr=0x0 child_tidptr=0xf7cff068 Pass 5: run completed in 0usr/80sys/13318real ms. Thanks Alan!