Bug 5079

Summary: runtime/uprobes: stap module exit w/ outstanding uretprobe_instances
Product: systemtap Reporter: Jim Keniston <jkenisto>
Component: uprobesAssignee: Jim Keniston <jkenisto>
Status: RESOLVED FIXED    
Severity: critical CC: dsmith, fche, hunt
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: Here's a patch that implements the fix described in comment #4.

Description Jim Keniston 2007-09-28 23:34:53 UTC
Here's a situation where including the uprobes code as part of the
SystemTap-generated module messes us up.

Execute a program like this:
#include <unistd.h>
#include <stdio.h>

int sleeper()    /* set the retprobe here */
{
         sleep(1);
         return 1;
}

main()
{
         int ret;
         for (;;){
                 ret = sleeper();
                 printf("sleeper returns %d\n",ret);
         }
}
and then probe it with a stap module such as this:
probe begin {
        log("Probing...")
}
probe process($1).statement($2).absolute.return
{
        log (pp())
}
with a command such as
# stap sleeper.stp <pid> <sleeper_vaddr>

If you CTRL-C out of stap while sleeper() is running, you'll get an oops.

That's because unregister_uretprobe(), called by the module's cleanup function,
doesn't wait for the uretprobed function to return.  (It could be a LONG wait,
after all.)  Instead, it leaves the uprobe_process (and utrace_attached_engine)
in place until sleeper() returns and hits the breakpoint at the uretprobe
trampoline; uprobes's report_signal callback should then clean up.  (This is
pretty much how kretprobes works, too.)  Unfortunately, by that time, the
uprobes code no longer exists -- it disappeared with the module -- so utrace
calls a nonexistent callbck.

We could work around this on i386 and x86_64 by remembering the location, on the
stack, of the return address, and stuffing the real return address back into the
stack as part of unregister_uretprobe().  (That's how kretprobes was originally
implemented.)  For the other architectures, though, this won't work, I'm told.
Comment 1 Frank Ch. Eigler 2007-09-30 01:19:27 UTC
Would packaging this as a stacked module fix this?
(It could be built & insmod'd separately, from sources still kept
within the systemtap install tree (even its current location), then
insmod'd and referred-to from systemtap modules.)
Comment 2 David Wilder 2007-10-01 17:39:55 UTC
> We could work around this on i386 and x86_64 by remembering the location, on the
> stack, of the return address, and stuffing the real return address back into the
> stack as part of unregister_uretprobe().  (That's how kretprobes was originally
> implemented.)  For the other architectures, though, this won't work, I'm told.

I investigated Jim's approach for the s390x (64-bit application only).  It could
possible to use Jim's approach however, I there are some special cases that my
not be possible to handle.

Here is a typical preamble/prolog on s390 (64-bit)
-When the function is called the return address is in GR14.
-The first instruction of the function pushes GR14 along with any other
registers that need to be saved onto the stack.  
- <function body goes here.>
-The prolog pops the return address off the stack in to any available general
register.
-Branch the to the address in the GR above.

It should be a simple matter for uprobes to calculate the address on the stack
where the return address will be saved.  Therefor the original return address
can be restored when removing the return probe.  

However, the are several exception cases we will need to handle:
1) The gcc may optimize out the saving of the RA because GR14 is never
overwritten by the function body.  We can handle this case by flagging that the
RA is stored in GR14 not on the stack.  When the return is removed we then place
the original RA directly in GR14.
2) When removing the return probe there is a small window where the return
address has already been popped from the stack.  Uprobes will have no way of
knowing this has occurred other than to search for the address of the trampoline
in all the GRs.  But this may be problematic.  If a GR contains the address of
the trampoline it could be coincidental or a left over from some other operation. 

I don't feel that case #2 can be handled safely.  
Comment 3 Jim Keniston 2007-10-02 16:16:09 UTC
(In reply to comment #1)
> Would packaging this as a stacked module fix this?
> (It could be built & insmod'd separately, from sources still kept
> within the systemtap install tree (even its current location), then
> insmod'd and referred-to from systemtap modules.)
> 

[Here's the gist of my response from 9/30, which didn't make it into the bugzilla.]
Yes, that would simplify things greatly for us.  Uprobes-as-module has
been working for 3 months or so.  We also wouldn't need the uprobes_data hook.

I'm working on that approach now.
Comment 4 Jim Keniston 2007-10-02 18:31:09 UTC
Here's a proposed design for modifying stap to optionally build
and use a uprobes module whose source resides in runtime/uprobes.

Currently, runtime/uprobes contains code that is included
directly into the stap-generated module.  This suffers from
problems, as reported in this bugzilla.  It will be modified
to be built as a kernel module that staprun can insert, as
needed, before inserting the stap-generated module.

If stap emits calls to the uprobes API, it will also attempt to
build uprobes as a module, using the source and (new) Makefile
in runtime/uprobes.  Stap will also pass the -u ("uprobes") flag
to staprun.

To account for the possibility of uprobes being part of the kernel
build (either as a module or as a built-in component), stap will
generate code that looks something like:

#if defined(CONFIG_UPROBES) || defined(CONFIG_UPROBES_MODULE)
#include <linux/uprobes.h>
#else
#include "uprobes/uprobes.h"
#endif

When passed the -u flag, staprun will check to see whether (say)
unregister_uprobe already shows up in /proc/kallsyms.  If it doesn't,
staprun will try the following before loading the SystemTap-generated
module:
1. modprobe uprobes
2. Load uprobes.ko from runtime/uprobes.
Seems like we have to do #1 first because of the way the above
#includes work.  If the user wants to use the stap-runtime version
of uprobes.ko instead of the version in /lib/modules, then I
guess he/she should copy the stap-runtime version into
/lib/modules.

Questions:
- Should we avoid the make if (say) unregister_uprobe already shows
up in /proc/kallsyms?  What would be the implications for a "stap now,
staprun later" scenario?
- Should we avoid using /lib/modules/.../uprobes.ko, and instead
use only the stap runtime version?  If so, we'd take the
"defined(CONFIG_UPROBES_MODULE)" clause out of the above code;
and we'd skip the modprobe.
Comment 5 Jim Keniston 2007-10-05 20:46:14 UTC
Created attachment 2035 [details]
Here's a patch that implements the fix described in comment #4.

I'd appreciate it if stap/staprun maintainers could take a look at this before
I check it in.	It's been tested both on a kernel that include uprobes and on
one that doesn't.  (Your kernel still needs utrace and the usual exports --
access_process_vm needs to be added to RHEL5.1's exports, and __put_task_struct
needs to be added for kernel.org kernels.)
Comment 6 Jim Keniston 2007-10-09 00:02:58 UTC
Checked in the changes from Comment #5 today.