This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: thoughts about exception-handling requirements for kprobes
On Tue, Mar 21, 2006 at 12:27:19AM +0530, Prasanna S Panchamukhi wrote:
> On Mon, Mar 20, 2006 at 10:39:51AM -0800, Keshavamurthy Anil S wrote:
> > On Sun, Mar 19, 2006 at 09:24:54AM -0800, Prasanna S Panchamukhi wrote:
> > >
> > > On Fri, Mar 17, 2006 at 01:50:57PM -0800, Keshavamurthy Anil S wrote:
> > What I am saying is that we should look into kprobes to see
> > if we can support calling users pre/post handlers
> > without having to disable preempt.
> >
> > Currenlty we are calling users pre_handler() and post_handler()
> > with preempt disabled. If the user has put a probes on
> > syscalls, then when his pre/post handlers are called he is
> > bound to call copy_from_user(), which has a check might_sleep().
> > The might_sleep() calls in_atomic() function which checks preempt_count()
> > and if preempt_count() is greater than zero( in our case it indeed greater
> > than zero, since we are calling pre/post handlers with preempt disabled)
> > the kernel prints a error message
> > printk(KERN_ERR "Debug: sleeping function called from invalid"
> > " context at %s:%d\n", file, line);
>
> Are you trying to tell here that by allowing preemption() in the
> kprobes handler, the above debug message log can be avoided?
Not only that, I am asking to relook at kprobes to see if we
can reliably support calling pre/post handlers without having to
disable preemption. (I am aware that currently the lock-free
RCU changes depends on the fact that handlers are run with preempt disabled)
If we can acheive calling pre/post handlers without having to disable preempt
then the benifit of this work is huge.
>
> >
> > Also if we want to fallback on do_page_fault() function in kprobe_fault_handler() to
> > recover the page, then we should not be in preempt_disabled() state.
>
> We actually do not want to fall back on system do_page_fault() because,
> it might sleep. When pre/post handler page faults, we can just try
> calling fixup_exception() (non-ia64 architectures) and try to avoid actual
> do_page_fault() to be called because it might sleep().
Yes, by fixing up the exceptions(fixup_exception() call) you are avoiding
falling back to do_page_fault(), but the copy_from_user() which generated
the exception will fail and we can not 100% reliably support copy from
user from pre/post handlers even for valid address. Also on a system
where memory presure is high we might see lots user address
reference from pre/post handlers failing.
Is this acceptable solution for Systemtap? Do we know how reliable is
Dtrace in this respect?
Thanks,
Anil