This is the mail archive of the
mailing list for the systemtap project.
RE: BUG: sleeping function called from invalid context at kernel/rwsem.c:20
- From: Martin Hunt <hunt at redhat dot com>
- To: "Stone, Joshua I" <joshua dot i dot stone at intel dot com>
- Cc: "Keshavamurthy, Anil S" <anil dot s dot keshavamurthy at intel dot com>, systemtap at sourceware dot org
- Date: Mon, 11 Sep 2006 10:09:03 -0400
- Subject: RE: BUG: sleeping function called from invalid context at kernel/rwsem.c:20
- Organization: Red Hat Inc.
- References: <C56DB814FAA30B418C75310AC4BB279D93832E@scsmsx413.amr.corp.intel.com>
On Fri, 2006-09-08 at 15:02 -0700, Stone, Joshua I wrote:
> On Friday, September 08, 2006 11:38 AM, Keshavamurthy Anil S wrote:
> > On Fri, Sep 08, 2006 at 11:09:48AM -0700, Keshavamurthy Anil S wrote:
> > More debugging resulted that Systemtap generated code is
> > calling down_read() in the probe handler code path.
> > down_read()->might_sleep()->__might_sleep(__FILE__, __LINE__);
> > If CONFIG_DEBUG_SPINLOCK_SLEEP is turned off, then we don;t see the
> > dump. But Red Hat's default kernel config has this option turned on.
> > Overall, looks to me that Systemtap in the first place should not use
> > rw_semaphore calls in the probe handler code path.
> The call stack you listed before showed that you were in
> ia64_page_fault, preceded by user_string_quoted.
Actually, ia64_do_page_fault(), which is calling down_read(). But
reading the sources for the most recent stable kernel, I don't see how
that can happen. ia64_do_page_fault calls notify_die(). Shouldn't
kprobes return 1 from there to indicate a kprobe is active? That would
cause ia64_do_page_fault to return before attempting the semaphore.
> That function is
> apparently not protecting against faults properly.
Systemtap functions cannot really protect against page faults. There is
no way to predict if one will be triggered or not. The protection
happens in the callback from the OS to kprobes, when kprobes must tell
the OS to not take the fault.
> calls _stp_text_str with the 'user' flag set. _stp_text_str only
> validates access_ok on the first byte of the string, and then it calls
> __get_user to read the rest. I thought that __get_user would catch
> faults, but maybe not...
It will handle faults cleanly if the OS is doing the right thing.
> The other user_string_* functions call _stp_strncpy_from_user, which
> checks access_ok on the length of the *destination* buffer. This also
> seems wrong, because the source might be a very short string, where
> reading a longer string would be invalid.
Unless you want to call access_ok on each attempted read of a byte, that
is the only way it can work. It's a tradeoff of efficiency for accuracy
but we error on the side of safety.