This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
RE: [2/3] Userspace probes prototype-take2
- From: "Zhang, Yanmin" <yanmin dot zhang at intel dot com>
- To: <prasanna at in dot ibm dot com>
- Cc: <systemtap at sources dot redhat dot com>
- Date: Wed, 15 Feb 2006 16:06:54 +0800
- Subject: RE: [2/3] Userspace probes prototype-take2
>>-----Original Message-----
>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Prasanna S Panchamukhi
>>Sent: 2006年2月8日 22:13
>>To: systemtap@sources.redhat.com
>>Subject: Re: [2/3] Userspace probes prototype-take2
>>
>>
>>This patch provides the feature to insert the probes on the pages
>>that are not present in the memory during registeration.
>>
>>To add readpage and readpages() hooks, two new elements
>>are added to uprobe_module object.
>>struct address_space_operations *ori_a_ops;
>>struct address_space_operations user_a_ops;
>>
>>User space probe also allows the probes to be inserted within the
>>pages even that are not present in the memory at the time of
>>registration. This is done by adding hooks to readpage and readpages
>>routines. During registration, the address space operation object is
>>modified by replacing with userspace probes's specific readpage and
>>readpages routines. So that when the pages are readinto the memory
>>through the readpage and readpages address space operations, the
>>probes can be automatically inserted into those pages. These user
>>space readpage and readpages routines internally call the original
>>readpage() and readpages() routines and then check if the probes are
>>to be added to these pages and then insert the probes on these pages.
After I double-checked it, I found a big problem. When readpage/readpages return,
it doesn't mean the page/pages are really already up-to-date. If the caller wants
to get up-to-date pages, it needs call lock_page or wait_on_page_locked to get the page lock.
I/O lower layer is asynchronous with the thread. Only when the device really finishes I/O,
Mostly, bio->bi_end_io is called to unlock the page and change its state to up-to-date.
If we use lock_page or wait_on_page_locked to wait the I/O complete in uprobe_readpage/uprobe_readpages,
It looks like we might break the implied rule that the caller needs call lock_page
or wait_on_page_locked to wait the page up-to-date.
>>Overhead of adding these hooks are limited to the application on
>>which probes are inserted.
>>
>>While unregiteration, care should be taken to replace the readpage and
>>readpages() hooks by original routines if no probes exists on that
>>application.
>>
>>Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
>>
>> include/linux/namei.h | 1
>>
>>diff -puN kernel/kprobes.c~kprobes_userspace_probes_hook_readpage kernel/kprobes.c
>>
>>diff -puN kernel/kprobes.c~kprobes_userspace_probes_hook_readpage kernel/kprobes.c
>>
>>
>> kernel/kprobes.c | 116 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 files changed, 116 insertions(+)
>>
>>diff -puN kernel/kprobes.c~kprobes_userspace_probes_hook_readpage kernel/kprobes.c
>>--- linux-2.6.16-rc1-mm5/kernel/kprobes.c~kprobes_userspace_probes_hook_readpage 2006-02-08 19:21:09.000000000 +0530
>>+++ linux-2.6.16-rc1-mm5-prasanna/kernel/kprobes.c 2006-02-08 19:21:09.000000000 +0530
>>@@ -769,6 +769,110 @@ static struct uprobe_module __kprobes *g
>> return NULL;
>> }
>>
>>+static inline void insert_readpage_uprobe(struct page *page,
>>+ struct address_space *mapping, struct uprobe *uprobe)
>>+{
>>+ if (find_page_probe(uprobe->offset >> PAGE_CACHE_SHIFT,
>>+ page->index << PAGE_CACHE_SHIFT)) {
The parameter of function find_page_probe looks incorrect.
uprobe->offset >> PAGE_CACHE_SHIFT
should be:
uprobe->offset
>>+ map_uprobe_page(page, uprobe, insert_kprobe_user);
>>+ flush_vma(mapping, page, uprobe);
>>+ }
>>+}
>>+
>>+/**
>>+ * This function hooks the readpages() of all modules that have active probes
>>+ * on them. The original readpages() is called for the given
>>+ * inode/address_space to actually read the pages into the memory. Then all
>>+ * probes that are specified on these pages are inserted.
>>+ */
>>+static int __kprobes uprobe_readpages(struct file *file,
>>+ struct address_space *mapping,
>>+ struct list_head *pages, unsigned nr_pages)
>>+{
>>+ int retval = 0;
>>+ struct page *page;
>>+ struct uprobe_module *umodule;
>>+ struct uprobe *uprobe = NULL;
>>+ struct hlist_node *node;
>>+
>>+ mutex_lock(&uprobe_mutex);
>>+
>>+ umodule = get_module_by_inode(file->f_dentry->d_inode);
>>+ if (!umodule) {
I mentioned a race condition before. If there is no solution, I think at least we need
put the issue on the TODO list. How about using file->f_dentry->d_inode->i_mapping->a_ops->readpages/readpage
when umodule==NULL?
>>+ printk("uprobe_readpages: we don't have a module \
>>+ associated with this file.. Aborting\n");
>>+ retval = -EINVAL;
>>+ goto out;
>>+ }
>>+
>>+ /* call original readpages() */
>>+ retval = umodule->ori_a_ops->readpages(file, mapping, pages, nr_pages);
>>+ if (retval < 0)
>>+ goto out;
>>+
>>+ /*
>>+ * TODO: Walk through readpages page list and get
>>+ * pages with probes instead of find_get_page().
>>+ */
>>+ mutex_lock(&kprobe_mutex);
>>+ hlist_for_each_entry(uprobe, node, &umodule->ulist_head, ulist) {
>>+ page = find_get_page(mapping,
>>+ uprobe->offset >> PAGE_CACHE_SHIFT);
>>+ if (!page)
>>+ continue;
>>+
>>+ if (!uprobe->kp.opcode)
>>+ insert_readpage_uprobe(page, mapping, uprobe);
>>+ page_cache_release(page);
>>+ }
>>+ mutex_unlock(&kprobe_mutex);
>>+
>>+out:
>>+ mutex_unlock(&uprobe_mutex);
>>+ return retval;
>>+}
>>+
>>+/**
>>+ * This function hooks the readpage() of all modules that have active probes
>>+ * on them. The original readpage() is called for the given inode/address_space
>>+ * to actually read the pages into the memory. Then all probes that are
>>+ * specified on this page are inserted.
>>+ */
>>+int __kprobes uprobe_readpage(struct file *file, struct page *page)
>>+{
>>+ int retval = 0;
>>+ struct uprobe_module *umodule;
>>+ struct uprobe *uprobe = NULL;
>>+ struct hlist_node *node;
>>+ struct address_space *mapping = file->f_dentry->d_inode->i_mapping;
>>+
>>+ mutex_lock(&uprobe_mutex);
>>+ umodule = get_module_by_inode(file->f_dentry->d_inode);
>>+ if (!umodule) {
>>+ printk("uprobe_readpages: we don't have a module \
>>+ associated with this file.. Aborting\n");
>>+ retval = -EINVAL;
>>+ goto out;
>>+ }
>>+
>>+ /* call original readpage() */
>>+ retval = umodule->ori_a_ops->readpage(file, page);
>>+ if (retval < 0)
>>+ goto out;
>>+
>>+ mutex_lock(&kprobe_mutex);
>>+ hlist_for_each_entry(uprobe, node, &umodule->ulist_head, ulist) {
>>+ if (!uprobe->kp.opcode)
>>+ insert_readpage_uprobe(page, mapping, uprobe);
>>+ }
>>+ mutex_unlock(&kprobe_mutex);
>>+
>>+out:
>>+ mutex_unlock(&uprobe_mutex);
>>+
>>+ return retval;
>>+}
>>+
>> /**
>> * Gets exclusive write access to the given inode to ensure that the file
>> * on which probes are currently applied does not change. Use the function,
>>@@ -799,9 +903,17 @@ static inline int ex_write_unlock(struct
>> static void __kprobes get_inode_ops(struct uprobe *uprobe,
>> struct uprobe_module *umodule)
>> {
>>+ struct address_space *as;
>>+
>> INIT_HLIST_HEAD(&umodule->ulist_head);
>> hlist_add_head(&uprobe->ulist, &umodule->ulist_head);
>> list_add(&umodule->mlist, &uprobe_module_list);
>>+ as = umodule->nd.dentry->d_inode->i_mapping;
>>+ umodule->ori_a_ops = as->a_ops;
>>+ umodule->user_a_ops = *as->a_ops;
>>+ umodule->user_a_ops.readpage = uprobe_readpage;
>>+ umodule->user_a_ops.readpages = uprobe_readpages;
>>+ as->a_ops = &umodule->user_a_ops;
>> }
>>
>> int __kprobes remove_kprobe_user(struct uprobe *uprobe, unsigned long *address,
>>@@ -811,6 +923,7 @@ int __kprobes remove_kprobe_user(struct
>> int cleanup_p;
>>
>> p = &uprobe->kp;
>>+ /*TODO: change it to spin lock, we enter here holding i_mmap_lock */
It could be resolved by moving below mutex_lock(&kprobe_mutex) to unregister_uprobe.
>> mutex_lock(&kprobe_mutex);
>> old_p = get_kprobe_user(uprobe->inode, uprobe->offset);
>> if (unlikely(!old_p)) {
>>@@ -887,6 +1000,9 @@ void __kprobes unregister_uprobe(struct
>> hlist_del(&uprobe->ulist);
>> if (hlist_empty(&umodule->ulist_head)) {
>> list_del(&umodule->mlist);
>>+ umodule->nd.dentry->d_inode->i_mapping->a_ops =
>>+ umodule->ori_a_ops;
>>+
>> ex_write_unlock(uprobe->inode);
>> path_release(&umodule->nd);
>> kfree(umodule);
>>
>>_
>>--
>>Prasanna S Panchamukhi
>>Linux Technology Center
>>India Software Labs, IBM Bangalore
>>Email: prasanna@in.ibm.com
>>Ph: 91-80-51776329