This is a continuation of bug #10289 comment #1: > I don't yet have a test because this fix ends up failing with an unrelated > problem: kernel read fault at 0xfffffffffffffe15 (addr) near identifier '$arg1' This might or might not be related to bug #10305 where I see a failure to resolve the arguments of a mark probe (after fixing the bias offsets that made the finding of probes fail). Stan, with which testcase did you see the above?
Here is an example from exelib.exe: sourcing: /home/mark/src/systemtap/testsuite/systemtap.exelib/mark.tcl for uprobesgcc-O0-m32-debug-uprobeslibgcc-O0-m32-debug_uprobeslibgcc-O0-m32-debug executing: stap /home/mark/src/systemtap/testsuite/systemtap.exelib/mark.stp ./uprobesgcc-O0-m32-debug-uprobeslibgcc-O0-m32-debug_exe ./libuprobeslibgcc-O0-m32-debug.so -c ./uprobesgcc-O0-m32-debug-uprobeslibgcc-O0-m32-debug_exe FAIL: mark-uprobesgcc-O0-m32-debug-uprobeslibgcc-O0-m32-debug_uprobeslibgcc-O0-m32-debug line 1: expected "main_count: 3" Got "ERROR: kernel read fault at 0xfffffffffffffff4 (addr) near identifier '$arg1' at /home/mark/src/systemtap/testsuite/systemtap.exelib/mark.stp:5:30"
Relevant verbose output from: $ stap -vvv /home/mark/src/systemtap/testsuite/systemtap.exelib/mark.stp ./uprobesgcc-O0-m32-debug-uprobeslibgcc-O0-m32-debug_exe ./libuprobeslibgcc-O0-m32-debug.so -c ./uprobesgcc-O0-m32-debug-uprobeslibgcc-O0-m32-debug_exe focused on module '/home/mark/src/systemtap/testsuite/uprobesgcc-O0-m32-debug-uprobeslibgcc-O0-m32-debug_exe' selected function main_func probe main_func@/home/mark/src/systemtap/testsuite/systemtap.exelib/uprobes_exe.c:22 process=/home/mark/src/systemtap/testsuite/uprobesgcc-O0-m32-debug-uprobeslibgcc-O0-m32-debug_exe reloc=.absolute section=.text pc=0x80484c3 finding location for local 'arg1' near address 0x80484c3, module bias 0x0 focused on module '/home/mark/src/systemtap/testsuite/libuprobeslibgcc-O0-m32-debug.so = [0x10000-0x116e4, bias 0x0] file /home/mark/src/systemtap/testsuite/libuprobeslibgcc-O0-m32-debug.so ELF machine i?86|x86_64 (code 3) focused on module '/home/mark/src/systemtap/testsuite/libuprobeslibgcc-O0-m32-debug.so' selected function lib_func probe lib_func@/home/mark/src/systemtap/testsuite/systemtap.exelib/uprobes_lib.c:19 process=/home/mark/src/systemtap/testsuite/libuprobeslibgcc-O0-m32-debug.so reloc=.dynamic pc=0x467 finding location for local 'arg1' near address 0x467, module bias 0x10000 loc2c-test doesn't want to play along though... $ ../loc2c-test -e ./uprobesgcc-O0-m32-debug-uprobeslibgcc-O0-m32-debug_exe 0x80484c3 arg1 ../loc2c-test: fetch supported only for base type or pointer
(In reply to comment #2)> loc2c-test doesn't want to play along though... > $ ../loc2c-test -e ./uprobesgcc-O0-m32-debug-uprobeslibgcc-O0-m32-debug_exe > 0x80484c3 arg1 > ../loc2c-test: fetch supported only for base type or pointer loc2c-test needed to resolve through const and volatile types. I pushed a patch for that. Now it does play along: $ ../loc2c-test -e ./uprobesgcc-O0-m32-debug-uprobeslibgcc-O0-m32-debug_exe 0x80484c3 arg1 #define PROBEADDR 0x80484c3ULL static void print_value(struct pt_regs *regs) { intptr_t value; { intptr_t addr; intptr_t frame_base; { // DWARF expression: 0x75(8) { intptr_t s0; s0 = fetch_register (5) + 8L; frame_base = s0; } } { // DWARF expression: 0x91(-20) { intptr_t s0; s0 = frame_base + -20L; addr = s0; } } { int32_t value = deref (4, addr);value = value; } } printk (" ---> %ld\n", (unsigned long) value); return; deref_fault: printk (" => BAD ACCESS\n"); }
I suspect we do something wrong with the "fetch_register (5)". gdb seems able to resolve the arg1 variable fine: Breakpoint 1, 0x080484c3 in main_func (foo=3) at /home/mark/src/systemtap/testsuite/systemtap.exelib/uprobes_exe.c:25 25 STAP_PROBE1(test, main_count, foo); (gdb) print arg1 $1 = 3 (gdb) print &arg1 $2 = (volatile int *) 0xffffd2ec
The kernel-mode definitions of fetch_register() et al (not to mention deref!) are wholly inappropriate for dealing with user-mode register states, especially 32-bit ones on 64-bit kernels. You need an entirely different regime of runtime calls (should use user_regset calls) for dealing with user-mode registers.
Created attachment 4023 [details] parameterize loc2c with a callback to emit what now are deref and store_deref macro uses We discussed this a bit on irc (transcribed in this comment - mostly roland talking) The attached patch by roland is a sketch for the start of the first bit: parameterize loc2c with a callback to emit what now are deref and store_deref macro uses. You could also nix the used_deref tracking in loc2c and just make stap's emit_deref callback set its tracking flag. Notice how in the patch emit_deref has the "size" value at translation time (expr[i].number is the size, where expr[i] is the DW_OP_deref or whatnot that we are translating), so the callback can get an int rather than just a string of the size number as we emit now. The point being that the new callbacks should take that int, rather than it being hidden as a literal string of C or whatnot. deref/store_deref are the easy ones. For proper interface in loc2c, it should have a callback to emit. But in fact, the stap callback will just differ in the name of the macro it emits for kernel vs user. Next, parameterize where it emits fetch_register and store_register macros so it uses a callback to emit, that takes the register number as an int arg to the callback. Hopefully this patch illustrated how to tease apart the loc2c impl macros like push() where you need to split up what was a simple push("x = fetch_register (%d)", blah); line to have a callback in the middle. After the parameterization, deref et al are simple: just want a different runtime macro that does checking like vanilla get_user()/put_user() macros do. For fetch/store_register what you want at runtime is calls to user_regset functions, which take byte offset and length for regset layout. What you have at translation time is a DWARF register number, so ideally you want to translate that to to a regset+offset at translation time and emit code that calls the runtime for user_regset fetches from an emitted literal-number offset. libebl has the the DWARF reg # -> regset layout mapping, but it is not exported prettily now. It is a fixed ABI, so for now probably easiest to put some hard-coded tables in stap for some arch's (later elfutils will make this translation easy for stap, for now the backends/*_corenote.c files have the tables you could translate to hard-code something). The user case callback for emit-register would map reg# to regset+offset, then emit a "fetch_from_regset(regset, offset)" runtime call or suchlike.
Not sure whether this is another instance of this bug. It's on FC11, latest git source. $stap -vve 'probe kernel.function("sys_read"){print($fd)}' SystemTap translator/driver (version 0.9.8/0.137 commit release-0.9.8-146-gec6fdef) Copyright (C) 2005-2009 Red Hat, Inc. and others This is free software; see the source for copying conditions. Session arch: i686 release: 2.6.29.4-167.fc11.i686.PAE Created temporary directory "/tmp/stapZvtGh0" Searched '/usr/local/share/systemtap/tapset/i686/*.stp', found 3 Searched '/usr/local/share/systemtap/tapset/*.stp', found 51 Pass 1: parsed user script and 54 library script(s) in 90usr/40sys/317real ms. probe sys_read@fs/read_write.c:372 kernel reloc=.dynamic section=.text pc=0xc04a94c9 Pass 2: analyzed script: 1 probe(s), 1 function(s), 0 embed(s), 0 global(s) in 740usr/1220sys/9097real ms. Pass 3: using cached /home/wjhuang/.systemtap/cache/66/stapconf_66ed81ed4078196cbba85d4ef02c9350_446.h probe_1746 locks nothing dump_unwindsyms kernel index=0 base=0xc0400000 Found build-id in kernel, length 20, end at 0xc071afd4 Pass 3: translated to C into "/tmp/stapZvtGh0/stap_99279a38a8b4c7286c326f231792e7bb_984.c" in 1010usr/890sys/22950real ms. Running make -C "/lib/modules/2.6.29.4-167.fc11.i686.PAE/build" M="/tmp/stapZvtGh0" modules >/dev/null Pass 4: compiled C into "stap_99279a38a8b4c7286c326f231792e7bb_984.ko" in 2370usr/1860sys/21529real ms. Copying /tmp/stapZvtGh0/stapconf_66ed81ed4078196cbba85d4ef02c9350_446.h to /home/wjhuang/.systemtap/cache/66/stapconf_66ed81ed4078196cbba85d4ef02c9350_446.h Copying /tmp/stapZvtGh0/stap_99279a38a8b4c7286c326f231792e7bb_984.ko to /home/wjhuang/.systemtap/cache/99/stap_99279a38a8b4c7286c326f231792e7bb_984.ko Copying /tmp/stapZvtGh0/stap_99279a38a8b4c7286c326f231792e7bb_984.ko.sgn to /home/wjhuang/.systemtap/cache/99/stap_99279a38a8b4c7286c326f231792e7bb_984.ko.sgn Copying /tmp/stapZvtGh0/stap_99279a38a8b4c7286c326f231792e7bb_984.c to /home/wjhuang/.systemtap/cache/99/stap_99279a38a8b4c7286c326f231792e7bb_984.c Pass 5: starting run. Running /usr/local/bin/staprun -v /tmp/stapZvtGh0/stap_99279a38a8b4c7286c326f231792e7bb_984.ko ERROR: kernel read fault at 0x009880c0 (addr) near identifier '$fd' at <input>:1:41 WARNING: Number of errors: 1, skipped probes: 1 stapio:cleanup_and_exit:371 detach=0 stapio:cleanup_and_exit:388 closing control channel Pass 5: run completed in 0usr/70sys/431real ms. Running rm -rf /tmp/stapZvtGh0
(In reply to comment #7) > Not sure whether this is another instance of this bug. It's on FC11, > latest git source. > > $stap -vve 'probe kernel.function("sys_read"){print($fd)}' > SystemTap translator/driver (version 0.9.8/0.137 commit release-0.9.8-146-gec6fdef) > Copyright (C) 2005-2009 Red Hat, Inc. and others > This is free software; see the source for copying conditions. > Session arch: i686 release: 2.6.29.4-167.fc11.i686.PAE > [...] > Pass 5: starting run. > Running /usr/local/bin/staprun -v > /tmp/stapZvtGh0/stap_99279a38a8b4c7286c326f231792e7bb_984.ko > ERROR: kernel read fault at 0x009880c0 (addr) near identifier '$fd' at <input>:1:41 > WARNING: Number of errors: 1, skipped probes: 1 This is most likely related to bug #10408
PR10601 was the root cause. *** This bug has been marked as a duplicate of bug 10601 ***