This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
RE: BUG: sleeping function called from invalid context atkernel/rwsem.c:20
- From: "Stone, Joshua I" <joshua dot i dot stone at intel dot com>
- To: "Martin Hunt" <hunt at redhat dot com>
- Cc: "Keshavamurthy, Anil S" <anil dot s dot keshavamurthy at intel dot com>, <systemtap at sourceware dot org>
- Date: Mon, 11 Sep 2006 16:43:27 -0700
- Subject: RE: BUG: sleeping function called from invalid context atkernel/rwsem.c:20
On Monday, September 11, 2006 7:09 AM, Martin Hunt wrote:
> ia64_do_page_fault calls notify_die(). Shouldn't
> kprobes return 1 from there to indicate a kprobe is active? That would
> cause ia64_do_page_fault to return before attempting the semaphore.
Prasanna apparantly to agree with you, so that seems to be the right
solution to the problem.
>> That function is
>> apparently not protecting against faults properly.
>
> Systemtap functions cannot really protect against page faults. There
> is no way to predict if one will be triggered or not. The protection
> happens in the callback from the OS to kprobes, when kprobes must tell
> the OS to not take the fault.
Sure, there's no way to predict a fault (=prevention), but we can
protect the system when a fault does happen by recovering nicely. We
can use kprobes to help us with this, but other methods work as well,
like the deref macros.
>> user_string_quoted
>> calls _stp_text_str with the 'user' flag set. _stp_text_str only
>> validates access_ok on the first byte of the string, and then it
>> calls __get_user to read the rest. I thought that __get_user would
>> catch faults, but maybe not...
>
> It will handle faults cleanly if the OS is doing the right thing.
Which apparantly it's not, as you and Prasanna point out.
>> The other user_string_* functions call _stp_strncpy_from_user, which
>> checks access_ok on the length of the *destination* buffer. This
>> also seems wrong, because the source might be a very short string,
>> where reading a longer string would be invalid.
>
> Unless you want to call access_ok on each attempted read of a byte,
> that is the only way it can work. It's a tradeoff of efficiency for
> accuracy but we error on the side of safety.
Understood. And as long as that tradeoff is known, I don't have a
problem with it. A comment in the code that documents this tradeoff
would be appreciated.
Josh