Summary: | inode-uprobes should compute proper SET_REG_IP before probe invocation | ||
---|---|---|---|
Product: | systemtap | Reporter: | Negreanu Adrian <adrian.m.negreanu> |
Component: | runtime | Assignee: | Unassigned <systemtap> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | dsmith, fche, jistone, mark |
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: |
readelf -a on the debug file
readelf -a on the target executable output from stap and staprun output from stap and staprun when UNW_PC_FROM_RA=1 readelf --debug-dump=frames target_executable gdb output when the same breakpoint is added dataAlign patch (bad) readelf --debug-dump=frames gcc-4.6-target_executable patch that sets IP before calling probe handler |
Created attachment 6373 [details]
readelf -a on the target executable
Created attachment 6374 [details]
output from stap and staprun
Created attachment 6375 [details]
output from stap and staprun when UNW_PC_FROM_RA=1
Created attachment 6376 [details]
readelf --debug-dump=frames target_executable
Created attachment 6377 [details]
gdb output when the same breakpoint is added
The patch below made it work for me: diff --git a/runtime/unwind.c b/runtime/unwind.c index e440177..3ac7f8d 100644 --- a/runtime/unwind.c +++ b/runtime/unwind.c @@ -610,7 +610,7 @@ static int processCFI(const u8 *start, const u8 *end, unsigned long targetLoc, break; } dbug_unwind(1, "targetLoc=%lx state->loc=%lx\n", targetLoc, state->loc); - if (ptr.p8 > end) + if (ptr.p8 >= end) result = 0; if (result && targetLoc != 0 && targetLoc < state->loc) return 1; (In reply to comment #6) > The patch below made it work for me: > > > diff --git a/runtime/unwind.c b/runtime/unwind.c > index e440177..3ac7f8d 100644 > --- a/runtime/unwind.c > +++ b/runtime/unwind.c > @@ -610,7 +610,7 @@ static int processCFI(const u8 *start, const u8 *end, > unsigned long targetLoc, > break; > } > dbug_unwind(1, "targetLoc=%lx state->loc=%lx\n", targetLoc, > state->loc); > - if (ptr.p8 > end) > + if (ptr.p8 >= end) > result = 0; > if (result && targetLoc != 0 && targetLoc < state->loc) > return 1; I am not sure that is correct. In principle ptr being equal to end should be fine (see also the last return condition in ProcessCFI). It means we read everything, the last instruction and arguments (and nothing more). If you hit this case then processCFI () will fail, which will make unwind_frame () also fail. So I am somewhat surprised this seems to work for you. Do you have debug output for before/after this patch? 00000040 00000014 00000000 CIE Version: 1 Augmentation: "zR" Code alignment factor: 1 Data alignment factor: -4 Return address column: 8 Augmentation data: 1b DW_CFA_def_cfa: r4 (esp) ofs 4 DW_CFA_offset: r8 (eip) at cfa-4 DW_CFA_nop DW_CFA_nop (gdb) disassemble 0x8074810 Dump of assembler code for function chvt_reinit: 0x08074810 <+0>: push %ebp 0x08074811 <+1>: mov %esp,%ebp 0x08074813 <+3>: lea -0x8(%esp),%esp 0x08074817 <+7>: mov %ebx,(%esp) 0x0807481a <+10>: mov %esi,0x4(%esp) 0x0807481e <+14>: call 0x8049fc5 <__x86.get_pc_thunk.bx> 0x08074823 <+19>: add $0x1a559,%ebx 0x08074829 <+25>: call 0x8049ac0 <getpid@plt> 0x0807482e <+30>: mov 0x1e0(%ebx),%edx 0x08074834 <+36>: mov 0x14c(%ebx),%esi 0x0807483a <+42>: mov %eax,0x14(%edx) 0x0807483d <+45>: mov %eax,0x4(%esi) 0x08074840 <+48>: call 0x8049a70 <geteuid@plt> 0x08074845 <+53>: mov %eax,0xc(%esi) 0x08074848 <+56>: call 0x8049bf0 <getpgrp@plt> 0x0807484d <+61>: mov %eax,0x8(%esi) 0x08074850 <+64>: call 0x8049d80 <getppid@plt> 0x08074855 <+69>: mov %eax,0x10(%esi) 0x08074858 <+72>: mov (%esp),%ebx 0x0807485b <+75>: mov 0x4(%esp),%esi 0x0807485f <+79>: mov %ebp,%esp 0x08074861 <+81>: pop %ebp 0x08074862 <+82>: ret End of assembler dump. (In reply to comment #7) > (In reply to comment #6) > > The patch below made it work for me: > > > > > > diff --git a/runtime/unwind.c b/runtime/unwind.c > > index e440177..3ac7f8d 100644 > > --- a/runtime/unwind.c > > +++ b/runtime/unwind.c > > @@ -610,7 +610,7 @@ static int processCFI(const u8 *start, const u8 *end, > > unsigned long targetLoc, > > break; > > } > > dbug_unwind(1, "targetLoc=%lx state->loc=%lx\n", targetLoc, > > state->loc); > > - if (ptr.p8 > end) > > + if (ptr.p8 >= end) > > result = 0; > > if (result && targetLoc != 0 && targetLoc < state->loc) > > return 1; > > I am not sure that is correct. In principle ptr being equal to end should be > fine (see also the last return condition in ProcessCFI). It means we read > everything, the last instruction and arguments (and nothing more). > > If you hit this case then processCFI () will fail, which will make unwind_frame > () also fail. So I am somewhat surprised this seems to work for you. Do you > have debug output for before/after this patch? You're right. I used the wrong binary. Sorry for that ! The change that was included was: if (result && targetLoc != 0 && targetLoc < state->loc) - return 1; + //return 1; And the FDE as discussed on irc: <groleo> 8302 00003dd4 00000024 00003d98 FDE cie=00000040 pc=08074810..08074863 <groleo> 8303 DW_CFA_advance_loc: 1 to 08074811 <groleo> 8304 DW_CFA_def_cfa_offset: 8 <groleo> 8305 DW_CFA_advance_loc: 2 to 08074813 <groleo> 8306 DW_CFA_offset: r5 (ebp) at cfa-8 <groleo> 8307 DW_CFA_def_cfa_register: r5 (ebp) <groleo> 8308 DW_CFA_advance_loc: 22 to 08074829 <groleo> 8309 DW_CFA_offset: r6 (esi) at cfa-12 I am wondering whether we are using the data_alignment correctly. Currently we are adjusting the offset in unwind_frame. Assuming state->dataAlign always has to be used. But the DWARF spec says that some DW_CFA operations (DW_CFA_def_cfa and DW_CFA_def_cfa_offset) take a non-factored offset. So we should factor the offset in processCFI() instead only when needed and not adjust in unwind_frame. Created attachment 6385 [details]
dataAlign patch (bad)
Bad patch implementing the above idea. It makes lots of exelib.exp tests fail, so it is not right.
(In reply to comment #13) > Created attachment 6385 [details] > dataAlign patch (bad) > > Bad patch implementing the above idea. It makes lots of exelib.exp tests fail, > so it is not right. Actually, not such a bad patch after all. It was something completely unrelated that made exelib.exp fail (fixed by commit fb6cf5 Fix PR13992 cont. Do iterate_over_modules, just not over libraries.). The patch is still somewhat ugly (will need some extra checks to make sure offsets are actually read correctly before applying adjustment), but it actually seems to produce the correct results on x86_64. Could you try it out for your testcase? I did try the patch out on a i686 setup and it seems to work OK there. But I don't have a reproducer yet for the original issue. I upgraded from gcc 4.4 to gcc 4.6 and it works as expected: 0x8075411 : chvt_reinit+0x1/0x60 [/system/bin/mksh] 0x8076f3b : mksh_init+0x1b/0xa80 [/system/bin/mksh] 0x8049f16 : main+0x36/0x70 [/system/bin/mksh] 0xb7724eae sh: DONE Created attachment 6390 [details]
readelf --debug-dump=frames gcc-4.6-target_executable
It is strange that the same binary compiled with gcc-4.6 does produce a good backtrace, while one compiled with gcc-.4.4 doesn't In both cases the FDE associated with the pc probe point points to a similar CIE. GCC 4.4 binary: 00003dd4 00000024 00003d98 FDE cie=00000040 pc=08074810..08074863 DW_CFA_advance_loc: 1 to 08074811 [...] GCC 4.6 binary: 00003d78 0000001c 00003d3c FDE cie=00000040 pc=08075410..08075463 DW_CFA_advance_loc: 4 to 08075414 [...] In both cases the CIE looks like: 00000040 00000014 00000000 CIE Version: 1 Augmentation: "zR" Code alignment factor: 1 Data alignment factor: -4 Return address column: 8 Augmentation data: 1b DW_CFA_def_cfa: r4 (esp) ofs 4 DW_CFA_offset: r8 (eip) at cfa-4 DW_CFA_nop DW_CFA_nop So the result should be the same since the first operand of the FDE is an DW_CFA_advance_loc that gets you past the probe point pc address (target loc). I finally understand what is happening, though I still don't understand how it can happen. The real problem is that we enter the unwinder with PC 8074811. While the probe point actually is 8074810. This is normally caused because when a breakpoint is taken the x86 increases the PC by one before invoking the interrupt handler. But in the uprobe handler we already take care of this: // Make it look like the IP is set as it would in the actual user // task when calling real probe handler. Reset IP regs on return, so // we don't confuse uprobes. PR10458 s.op->newline() << "{"; s.op->indent(1); s.op->newline() << "unsigned long uprobes_ip = REG_IP(c->uregs);"; s.op->newline() << "SET_REG_IP(regs, inst->vaddr);"; s.op->newline() << "(*sups->probe->ph) (c);"; s.op->newline() << "SET_REG_IP(regs, uprobes_ip);"; s.op->newline(-1) << "}"; In the case of GCC-4.4 the FDE has its first DW_CFA_advance_loc 1. So we think we need to process the instructions between 8074810 and 8074811. While with GCC-4.6 the FDE has its first DW_CFA_advance_loc 4, so the one-off in the PC address doesn't matter and we don't process extra FDE instructions. The question is why doesn't the above trick of adjusting the IP to compensate for the breakpoint PC increase work? To check whether this is the real issue, you can try this patch: diff --git a/runtime/stack-dwarf.c b/runtime/stack-dwarf.c index 9c55997..fbea35c 100644 --- a/runtime/stack-dwarf.c +++ b/runtime/stack-dwarf.c @@ -69,6 +69,7 @@ static void __stp_dwarf_stack_user_print(struct pt_regs *regs, int verbose, { struct unwind_frame_info *info = &uwcontext->info; arch_unw_init_frame_info(info, regs, ! uregs_valid); + info->call_frame = 1; /* XXX Always assume PC is off by one. */ while (levels) { int ret = unwind(uwcontext, 1); (In reply to comment #19) > I finally understand what is happening, though I still don't understand how it > can happen. > > The real problem is that we enter the unwinder with PC 8074811. While the probe > point actually is 8074810. The code you quote refers to the struct uprobe->vaddr as the intended original address of the breakpoint. Perhaps that address is being munged. Have we seen yet a version of the run with -DDEBUG_UPROBES and stap -p3 output? Which uprobes/utrace version was being used? (stap-report?) By the way, if inst->vaddr is incorrect for some reason, there is always sups->address to compare. (In reply to comment #20) > The code you quote refers to the struct uprobe->vaddr as the intended original > address of the breakpoint. Perhaps that address is being munged. I guess that could be happening, but how? This is done this way to have an architecture neutral way of setting REG_IP to the actual probe address. But if it is unreliable then we might just have to make this architecture specific and have some ADJUST_REG_IP_BREAKPOINT macro that does the right thing per architecture. This might be needed anyway for the inode based uprobe handler which currently has: // XXX: Can't set SET_REG_IP; we don't actually know the relocated address. > Have we seen yet a version of the run with -DDEBUG_UPROBES and stap -p3 output? > Which uprobes/utrace version was being used? (stap-report?) No, that would be useful. BTW. I just checked in some cleanup of the DWARF unwinder code: commit 44c8a286464a6196bbf44fdeaea9ccee23fa4b8c Author: Mark Wielaard <mjw@redhat.com> Date: Mon May 7 10:20:21 2012 +0200 Cleanups and fixes for DWARF unwinder. Add explict comments and set_*_rule functions for different states. Do data alignment in processCFI, not afterwards in unwind_frame. Remove unnecessary UNW_DEFAULT_RA. Make unwind_item state union explicit about (unsigned) reg versus (signed) offset. Same for unwind_reg_state cfa. It does fix some corner cases related to calculated offset rules, but that isn't the issue here. But it would be nice if future tests could be done against a tree that has this commit in, to make the results a little easier to interpret. (In reply to comment #22) > I guess that could be happening, but how? > > This is done this way to have an architecture neutral way of setting REG_IP to > the actual probe address. But if it is unreliable then we might just have to > make this architecture specific and have some ADJUST_REG_IP_BREAKPOINT macro > that does the right thing per architecture. This might be needed anyway for the > inode based uprobe handler which currently has: > > // XXX: Can't set SET_REG_IP; we don't actually know the relocated address. This is indeed "it"! <groleo> the latest k3.0 and the uprobes patch is the one here http://sourceware.org/ml/systemtap/2011-q3/msg00157.html This inode-uprobes shortcoming is biting other users too. Perhaps we can work around it in stap land by using the mmap callbacks that check build-ids to also go and calculate run-time virtual addresses of the inode-uprobes, so they can be filled in around probe execution time. Srikar advises that the uprobes_get_bkpt_addr() function may be just the ticket to compute the real probe virtual address at run time. Created attachment 6578 [details] patch that sets IP before calling probe handler (In reply to comment #25) > Srikar advises that the uprobes_get_bkpt_addr() function may be just the > ticket to compute the real probe virtual address at run time. It looks like uprobes_get_bkpt_addr() has been renamed uprobe_get_swbp_addr(). Here's a patch, that I've done a quick test on, that seems to work. Anyone got any ideas on the best way to test this? (In reply to comment #26) > Created attachment 6578 [details] > patch that sets IP before calling probe handler > > (In reply to comment #25) > > Srikar advises that the uprobes_get_bkpt_addr() function may be just the > > ticket to compute the real probe virtual address at run time. > > It looks like uprobes_get_bkpt_addr() has been renamed uprobe_get_swbp_addr(). > > Here's a patch, that I've done a quick test on, that seems to work. Anyone got > any ideas on the best way to test this? When exelib.exp is run, I see no difference with or without the patch. ==== Running ../../src/testsuite/systemtap.exelib/exelib.exp ... FAIL: uprobeslibgcc-O3-m64-prelink-debug prelink ./libuprobeslibgcc-O3-m64-prelink-debug.so FAIL: uprobeslibgcc-O3-m64-prelink-sep-debug prelink ./libuprobeslibgcc-O3-m64-prelink-sep-debug.so FAIL: uprobeslibgcc-O3-m64-prelink-sep-debug-after prelink ./libuprobeslibgcc-O3-m64-prelink-sep-debug-after.so === systemtap Summary === # of expected passes 185 # of unexpected failures 3 ==== The 3 failures are all prelink failures, of the form: ==== prelink failed: /usr/sbin/prelink: Could not set security context for ./libuprobeslibgcc-O3-m64-prelink-sep-debug-after.so: Operation not supported ==== (In reply to comment #27) > The 3 failures are all prelink failures, of the form: > > ==== > prelink failed: /usr/sbin/prelink: Could not set security context for > ./libuprobeslibgcc-O3-m64-prelink-sep-debug-after.so: Operation not supported > ==== Is that on NFS? I believe that case has a bug (either kernel/selinux or prelink). You might want to try testsuite/systemtap.context/uprobe_uaddr.exp which should test this case more precisely. Another testcase to try might be uprobe_stmt_num.exp (In reply to comment #28) > (In reply to comment #27) > > The 3 failures are all prelink failures, of the form: > > > > ==== > > prelink failed: /usr/sbin/prelink: Could not set security context for > > ./libuprobeslibgcc-O3-m64-prelink-sep-debug-after.so: Operation not supported > > ==== > > Is that on NFS? I believe that case has a bug (either kernel/selinux or > prelink). Yes, this was on NFS. > You might want to try testsuite/systemtap.context/uprobe_uaddr.exp which should > test this case more precisely. I get the same results with and without the patch. Looking at the test, it uses uretprobes, which of course don't work with inode uprobes. ==== Running ../../src/testsuite/systemtap.context/uprobe_uaddr.exp ... FAIL: uprobe_uaddr (0) FAIL: uprobe_uaddr match line 0 main+* FAIL: uprobe_uaddr match line 1 func+* FAIL: uprobe_uaddr match line 2 func2+* FAIL: uprobe_uaddr match line 3 func+* FAIL: uprobe_uaddr match line 4 main+* === systemtap Summary === # of expected passes 1 # of unexpected failures 6 ==== (In reply to comment #29) > Another testcase to try might be uprobe_stmt_num.exp Without the patch, I get: ==== Running ../../src/testsuite/systemtap.context/uprobe_stmt_num.exp ... FAIL: uprobe_stmt_num-run-statements FAIL: uprobe_stmt_num-run-statements-nolines === systemtap Summary === # of expected passes 2 # of unexpected failures 2 # of known failures 1 ==== With the patch, I get: ==== Running ../../src/testsuite/systemtap.context/uprobe_stmt_num.exp ... === systemtap Summary === # of expected passes 4 # of known failures 1 ==== (In reply to comment #30) > > Is that on NFS? I believe that case has a bug (either kernel/selinux or > > prelink). > > Yes, this was on NFS. That is https://bugzilla.redhat.com/show_bug.cgi?id=625544 "Cannot restore selinux context on nfs mounted filesystem" (In reply to comment #30) > I get the same results with and without the patch. Looking at the test, it > uses uretprobes, which of course don't work with inode uprobes. Boo! (In reply to comment #31) > With the patch, I get: > > Running ../../src/testsuite/systemtap.context/uprobe_stmt_num.exp ... > > === systemtap Summary === > > # of expected passes 4 > # of known failures 1 Hurray! Patch checked in as commit 39e63ba. |
Created attachment 6372 [details] readelf -a on the debug file On an android x86 target, I'm trying this script: probe process("/system/bin/mksh").function("chvt_reinit") { printf("%s: pid=%d\n", module_name(), pid()); printf("%s: ", module_name()); print_regs(); print_ubacktrace(); printf("%s: DONE\n",module_name()); } The backtrace consists of only addresses. I attached the output of stap + staprun, readelf -a on the target executable and of its .gnu_debuglink and also the "readelf -debug-dump=frames target_executable". One thing I noticed while trying to get more insight of what's happening, was that if I define UNW_PC_FROM_RA to be 1, in -debug-dump=frames the symbol is resolved but unwind() will only unwind the topmost frame.