As reported in: <http://sourceware.org/ml/systemtap/2013-q2/msg00249.html> On a hand-built systemtap, the at_var_mark.exp test works. However, with an rpm build, using '@var("morehelp@session.cxx")' fails, even with systemtap-debuginfo installed. Here's an even smaller test: ==== /usr/bin/stap -ve 'probe process.mark("pass*") { printf("%p\n", @var("morehelp@session.cxx")) }' -c '/usr/bin/stap --help' Pass 1: parsed user script and 103 library script(s) using 220184virt/35776res/3056shr/33168data kb, in 240usr/90sys/331real ms. semantic error: target-symbol requires debuginfo: operator '@var' at <input>:1:46 source: probe process.mark("pass*") { printf("%p\n", @var("morehelp@session.cxx")) } ^ Pass 2: analyzed script: 15 probe(s), 13 function(s), 0 embed(s), 0 global(s) using 247928virt/41720res/5136shr/36880data kb, in 40usr/10sys/58real ms. Pass 2: analysis failed. [man error::pass2] ==== Also note that gdb can find the symbol: ==== # gdb /usr/bin/stap GNU gdb (GDB) Fedora (7.6-32.fc20) Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/bin/stap...Reading symbols from /usr/lib/debug/usr/bin/stap.debug...done. done. (gdb) p &systemtap_session::morehelp $1 = (const char **) 0x466028 <systemtap_session::morehelp> ==== This might be related to the flags that the rpm build process passes to the compiler/linker, like: -m64 -mtune=generic -march=x86-64 -g -O2 -fPIE -fexceptions -fstack-protector -fstack-protector-all --param ssp-buffer-size=4
Here's what information I see after running "eu-readelf -N --debug-dump=info /usr/lib/debug/usr/bin/stap.debug" ==== ... [ a381f] member name (strp) "morehelp" decl_file (data1) 2 decl_line (data1) 125 type (GNU_ref_alt) [ 7c] external (flag_present) Yes declaration (flag_present) Yes ... [ d8509] variable specification (ref_udata) [ a381f] decl_file (data1) 10 decl_line (data2) 2012 linkage_name (strp) "_ZN17systemtap_session8morehelpE" location (exprloc) [ 0] addr +0x466028 ... ====
/usr/bin/stap -ve 'probe process.mark("pass*") { printf("%p\n", @var("morehelp@session.cxx")) }' -c '/usr/bin/stap --help' Seems to work on RHEL6 with systemtap-1.8-7.el6.x86_64 (although at runtime the @var seems to resolve to NULL). Gives "semantic error: target-symbol requires debuginfo: operator '@var'" on Fedora 19 pre-release with systemtap-2.2.1-1.fc19.x86_64. Also fails with git version of translator/driver (version 2.3/0.155, commit release-2.2.1-175-gb5ca36b). But only on the system installed /usr/bin/stap but not on itself. (On itself it also works at runtime.)
In my case morehelp isn't a global variable, and it hasn't since commit e2012a. So I guess the error message is really correct. In the failing cases for me the stap binary really doesn't have a "morehelp" global variable. That doesn't seem to match David's observation in comment #1.
Actually that doesn't explain it. Now I am somewhat confused why a git version of systemtap works on itself. That also shouldn't work. @var("morehelp@session.cxx") shouldn't resolve in that case either. Why does it?
So looks like we also match on the "short name" of a C++ class variable, messy, but convenient in this case. It also works (on itself) with --enable-pie. So maybe what is happening is that the @var() (type) lookup gets confused by the dwz split in the installed package.
(In reply to Mark Wielaard from comment #5) > So maybe what is happening is > that the @var() (type) lookup gets confused by the dwz split in the > installed package. That seems to be it because I can trigger the same failure when running dwz -m on the local build. I am at a loss where the problem.lookup failure originates in the source code though.
FWIW, agentzh's patch for PR11096 reworks @var a fair bit, and the example here works just fine. But I've also determined that $var can show the issue, and this is also only a problem in the mark("pass5*"), not mark("pass[12346]*"). $ ./run-stap -e 'probe process.mark("pass4*") { println($$name, $s->verbose) }' -c /usr/bin/stap -p4 /home/jistone/.systemtap/cache/e0/stap_e0bc68b472d4982509234edc9c5ee57e_1907.ko $ ./run-stap -e 'probe process.mark("pass5*") { println($$name, $s->verbose) }' -c /usr/bin/stap -p4 semantic error: target-symbol requires debuginfo: identifier '$s' at <input>:1:48 source: probe process.mark("pass5*") { println($$name, $s->verbose) } ^ Pass 2: analysis failed. [man error::pass2]
AFAICS the problem starts in query_addr(), which calls dwflpp::getscopes(addr) and gets nothing back, so it returns without creating the probe. Then sdt_query::handle_probe_entry() takes that as a sign to create a "debuginfoless" probe with a NULL scope_die, and without a scope the $var expansion isn't even attempted.
Also, while dwflpp does a lot of caching, dwflpp::getscopes(Dwarf_Addr pc) is a straight call to dwarf_getscopes(cu, pc, &return_ptr). That cu came from query_addr's earlier dwflpp::query_cu_containing_address(addr), which just goes to dwfl_module_addrdie(). There is a difference that dwarf_getscopes' pc has the module bias removed, but we've already asserted this bias matches that returned by dwfl_module_addrdie(). (And besides, the other pass* marks don't have any bias issue.)
This was caused by (2) bugs in elfutils dwarf_getscopes. Patch posted https://lists.fedorahosted.org/pipermail/elfutils-devel/2013-June/003110.html
I'm sure Mark did test it, but I can confirm that his second elfutils patch does work for me. (I never got around to testing the first patch.) Side note: our --with-elfutils build doesn't --enable-dwz for elfutils; should we start doing that by default? I think configure will just ignore the option in pre-0.155 builds which didn't have dwz support. But it confused me for a bit why almost nothing was working with my own elfutils build...
(In reply to Josh Stone from comment #11) > I'm sure Mark did test it, but I can confirm that his second elfutils patch > does work for me. (I never got around to testing the first patch.) > > Side note: our --with-elfutils build doesn't --enable-dwz for elfutils; > should we start doing that by default? I think configure will just ignore > the option in pre-0.155 builds which didn't have dwz support. But it > confused me for a bit why almost nothing was working with my own elfutils > build... Thanks for the extra testing. And yes we should. commit 90491495d6e1d94b12b70f1a92f5bec68857a729 Author: Mark Wielaard <mjw@redhat.com> Date: Thu Jun 27 05:48:15 2013 -0400 When configuring --with-elfutils use elfutils configure --enable-dwz. Just add the configure option unconditionally. It will be ignored with older elfutils releases with an harmless warning. configure: WARNING: unrecognized options: --enable-dwz
problem does not reproduce any more