This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [3/3] Userspace probes prototype-take2


2 main issues:
1) task switch caused by external interrupt when single-step;
2) multi-thread:

See below inline comments.

Yanmin

>>-----Original Message-----
>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Prasanna S Panchamukhi
>>Sent: 2006年2月8日 22:14
>>To: systemtap@sources.redhat.com
>>Subject: Re: [3/3] Userspace probes prototype-take2
>>
>>
>>This patch handles the executing the registered callback
>>functions when probes is hit.
>>
>>	Each userspace probe is uniquely identified by the
>>combination of inode and offset, hence during registeration the inode
>>and offset combination is added to kprobes hash table. Initially when
>>breakpoint instruction is hit, the kprobes hash table is looked up
>>for matching inode and offset. The pre_handlers are called in sequence
>>if multiple probes are registered. The original instruction is single
>>stepped out-of-line similar to kernel probes. In kernel space probes,
>>single stepping out-of-line is achieved by copying the instruction on
>>to some location within kernel address space and then single step
>>from that location. But for userspace probes, instruction copied
>>into kernel address space cannot be single stepped, hence the
>>instruction should be copied to user address space. The solution is
>>to find free space in the current process address space and then copy
>>the original instruction and single step that instruction.
>>
>>User processes use stack space to store local variables, agruments and
>>return values. Normally the stack space either below or above the
>>stack pointer indicates the free stack space. If the stack grows
>>downwards, the stack space below the stack pointer indicates the
>>unused stack free space and if the stack grows upwards, the stack
>>space above the stack pointer indicates the unused stack free space.
>>
>>The instruction to be single stepped can modify the stack space, hence
>>before using the unused stack free space, sufficient stack space
>>should be left. The instruction is copied to the bottom of the page
>>and check is made such that the copied instruction does not cross the
>>page boundry. The copied instruction is then single stepped.
>>Several architectures does not allow the instruction to be executed
>>from the stack location, since no-exec bit is set for the stack pages.
>>In those architectures, the page table entry corresponding to the
>>stack page is identified and the no-exec bit is unset making the
>>instruction on that stack page to be executed.
>>
>>There are situations where even the unused free stack space is not
>>enough for the user instruction to be copied and single stepped. In
>>such situations, the virtual memory area(vma) can be expanded beyond
>>the current stack vma. This expaneded stack can be used to copy the
>>original instruction and single step out-of-line.
>>
>>Even if the vma cannot be extended then the instruction much be
>>executed inline, by replacing the breakpoint instruction with original
>>instruction.
>>
>>TODO list
>>--------
>>1. This patch is not stable yet, should work for most conditions.
>>
>>2. This patch works only with PREEMPT config option disabled, to work
>>in PREEMPT enabled condition handlers must be re-written and must
>>be seperated out from kernel probes allowing preemption.
One of my old comments is an external device interrupt might happen when cpu is single-stepping the original instruction, then the task might be switched to another cpu. If we disable irq when exiting to user space to single step the instruction, kernel might switch the task off just on the exit kernel path. 1) uprobe_page; 2) kprobe_ctlblk, These 2 resources shouldn't be pre cpu, or we need get another approach. How could you resolve the task switch issue?



>>
>>3. Insert probes on copy-on-write pages. Tracks all COW pages for the
>>page containing the specified probe point and inserts/removes all the
>>probe points for that page.
>>
>>4. Optimize the insertion of probes through readpage hooks. Identify
>>all the probes to be inserted on the read page and insert them at
>>once.
>>
>>5. Resume exectution should handle setting of proper eip and eflags
>>for special instructions similar to kernel probes.
>>
>>6. Single stepping out-of-line expands the stack if there is no
>>enough stack space to copy the original instruction. Expansion of
>>stack should be shrinked back to the original size after single
>>stepping or the expanded stack should be reused for single stepping
>>out-of-line for other probes.
>>
>>7. A wrapper routines to calculate the offset from the probed file
>>beginning. In case of dynamic shared library, the offset is
>>calculated by substracting the address of the probe point from the
>>beginning of the file mapped address.
>>
>>8. Handing of page faults while inthe kprobes_handler() and while
>>single stepping.
>>
>>9. Accessing user space pages not present in memory, from the
>>registered callback routines.
>>
>>Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
>>
>>
>> arch/i386/kernel/kprobes.c |  460 +++++++++++++++++++++++++++++++++++++++++++--
>> include/asm-i386/kprobes.h |   13 +
>> include/linux/kprobes.h    |    7
>> kernel/kprobes.c           |    3
>> 4 files changed, 468 insertions(+), 15 deletions(-)
>>
>>diff -puN arch/i386/kernel/kprobes.c~kprobes_userspace_probes_ss_out-of-line arch/i386/kernel/kprobes.c
>>--- linux-2.6.16-rc1-mm5/arch/i386/kernel/kprobes.c~kprobes_userspace_probes_ss_out-of-line	2006-02-08 19:26:09.000000000 +0530
>>+++ linux-2.6.16-rc1-mm5-prasanna/arch/i386/kernel/kprobes.c	2006-02-08 19:26:10.000000000 +0530
>>@@ -30,6 +30,7 @@
>>
>> #include <linux/config.h>
>> #include <linux/kprobes.h>
>>+#include <linux/hash.h>
>> #include <linux/ptrace.h>
>> #include <linux/preempt.h>
>> #include <asm/cacheflush.h>
>>@@ -38,8 +39,12 @@
>>
>> void jprobe_return_end(void);
>>
>>+static struct uprobe_page *uprobe_page;
>>+static struct hlist_head uprobe_page_table[KPROBE_TABLE_SIZE];
>> DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
>> DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
>>+DEFINE_PER_CPU(struct uprobe *, current_uprobe) = NULL;
>>+DEFINE_PER_CPU(unsigned long, singlestep_addr);
>>
>> /* insert a jmp code */
>> static inline void set_jmp_op(void *from, void *to)
>>@@ -125,6 +130,23 @@ void __kprobes arch_disarm_kprobe(struct
>> 			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
>> }
>>
>>+void __kprobes arch_disarm_uprobe(struct kprobe *p, kprobe_opcode_t *address)
>>+{
>>+	*address = p->opcode;
>>+}
>>+
>>+void __kprobes arch_arm_uprobe(unsigned long *address)
>>+{
>>+	*(kprobe_opcode_t *)address = BREAKPOINT_INSTRUCTION;
>>+}
>>+
>>+void __kprobes arch_copy_uprobe(struct kprobe *p, unsigned long *address)
>>+{
>>+	memcpy(p->ainsn.insn, (kprobe_opcode_t *)address,
>>+				MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
>>+	p->opcode = *(kprobe_opcode_t *)address;
>>+}
>>+
>> static inline void save_previous_kprobe(struct kprobe_ctlblk *kcb)
>> {
>> 	kcb->prev_kprobe.kp = kprobe_running();
>>@@ -151,15 +173,326 @@ static inline void set_current_kprobe(st
>> 		kcb->kprobe_saved_eflags &= ~IF_MASK;
>> }
>>
>>+struct uprobe_page __kprobes *get_upage_current(struct task_struct *tsk)
>>+{
>>+	struct hlist_head *head;
>>+	struct hlist_node *node;
>>+	struct uprobe_page *upage;
>>+
>>+	head = &uprobe_page_table[hash_ptr(current, KPROBE_HASH_BITS)];
>>+	hlist_for_each_entry(upage, node, head, hlist) {
>>+		if (upage->tsk == tsk)
>>+			return upage;
>>+        }
>>+	return NULL;
>>+}
>>+
>>+struct uprobe_page __kprobes *get_upage_free(struct task_struct *tsk)
>>+{
>>+	int cpu;
>>+
>>+	for_each_cpu(cpu) {
>>+		struct uprobe_page *upage;
>>+		upage = per_cpu_ptr(uprobe_page, cpu);
>>+		if (upage->status & UPROBE_PAGE_FREE)
>>+			return upage;
>>+	}
>>+	return NULL;
>>+}
>>+
>>+/**
>>+ * This routines get the pte of the page containing the specified address.
>>+ */
>>+static pte_t  __kprobes *get_uprobe_pte(unsigned long address)
>>+{
>>+	pgd_t *pgd;
>>+	pud_t *pud;
>>+	pmd_t *pmd;
>>+	pte_t *pte = NULL;
>>+
>>+	pgd = pgd_offset(current->mm, address);
>>+	if (!pgd)
>>+		goto out;
>>+
>>+	pud = pud_offset(pgd, address);
>>+	if (!pud)
>>+		goto out;
>>+
>>+	pmd = pmd_offset(pud, address);
>>+	if (!pmd)
>>+		goto out;
>>+
>>+	pte = pte_alloc_map(current->mm, pmd, address);
>>+
>>+out:
>>+	return pte;
>>+}
>>+
>>+/**
>>+ *  This routine check for space in the current process's stack address space.
>>+ *  If enough address space is found, it just maps a new page and copies the
>>+ *  new instruction on that page for single stepping out-of-line.
>>+ */
>>+static int __kprobes copy_insn_on_new_page(struct uprobe *uprobe ,
>>+			struct pt_regs *regs, struct vm_area_struct *vma)
>>+{
>>+	unsigned long addr, *vaddr, stack_addr = regs->esp;
>>+	int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
>>+	struct uprobe_page *upage;
>>+	struct page *page;
>>+	pte_t *pte;
>>+
>>+
>>+	if (vma->vm_flags & VM_GROWSDOWN) {
>>+		if (((stack_addr - sizeof(long long))) < (vma->vm_start + size))
>>+			return -ENOMEM;
>>+
>>+		addr = vma->vm_start;
>>+	} else if (vma->vm_flags & VM_GROWSUP) {
>>+		if ((vma->vm_end - size) < (stack_addr + sizeof(long long)))
>>+			return -ENOMEM;
>>+
>>+		addr = vma->vm_end - size;
>>+	} else
>>+		return -EFAULT;
>>+
The multi-thread case is not resolved here. One of typical multi-thread model is that the all threads share the same vma and every thread has 8-k stack. If 2 threads trigger uprobe (although might be not the same uprobe) at the same time, one thread might erase single-step instruction of another.



>>+	preempt_enable_no_resched();
>>+
>>+	pte = get_uprobe_pte(addr);
>>+	preempt_disable();
>>+	if (!pte)
>>+		return -EFAULT;
>>+
>>+	upage = get_upage_free(current);
>>+	upage->status &= ~UPROBE_PAGE_FREE;
>>+	upage->tsk = current;
>>+	INIT_HLIST_NODE(&upage->hlist);
>>+	hlist_add_head(&upage->hlist,
>>+		&uprobe_page_table[hash_ptr(current, KPROBE_HASH_BITS)]);
>>+
>>+	upage->orig_pte = pte;
>>+	upage->orig_pte_val =  pte_val(*pte);
>>+	set_pte(pte, (*(upage->alias_pte)));
>>+
>>+	page = pte_page(*pte);
>>+	vaddr = (unsigned long *)kmap_atomic(page, KM_USER1);
>>+	vaddr = (unsigned long *)((unsigned long)vaddr + (addr & ~PAGE_MASK));
>>+	memcpy(vaddr, (unsigned long *)uprobe->kp.ainsn.insn, size);
>>+	kunmap_atomic(vaddr, KM_USER1);
>>+	regs->eip = addr;
So the temp page, upage->alias_addr, replaces the original one on the stack. If the replaced instruction is to operate stack, such like "push eax", the result might be on the new page. After the single step, the pte is restored to the original page which doesn't have the value of eax.



>>+
>>+	return 0;
>>+}
>>+
>>+/**
>>+ * This routine expands the stack beyond the present process address space
>>+ * and copies the instruction to that location, so that processor can
>>+ * single step out-of-line.
>>+ */
>>+static int __kprobes copy_insn_onexpstack(struct uprobe *uprobe,
>>+			struct pt_regs *regs, struct vm_area_struct *vma)
It has the same issues like function copy_insn_on_new_page.


>>+{
>>+	unsigned long addr, *vaddr, vm_addr;
>>+	int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
>>+	struct vm_area_struct *new_vma;
>>+	struct uprobe_page *upage;
>>+	struct mm_struct *mm = current->mm;
>>+	struct page *page;
>>+	pte_t *pte;
>>+
>>+
>>+	if (vma->vm_flags & VM_GROWSDOWN)
>>+		vm_addr = vma->vm_start - size;
>>+	else if (vma->vm_flags & VM_GROWSUP)
>>+		vm_addr = vma->vm_end + size;
>>+	else
>>+		return -EFAULT;
>>+
>>+	preempt_enable_no_resched();
>>+
>>+	/* TODO: do we need to expand stack if extend_vma fails? */
>>+	new_vma = find_extend_vma(mm, vm_addr);
>>+	preempt_disable();
>>+	if (!new_vma)
>>+		return -ENOMEM;
>>+
>>+	/*
>>+	 * TODO: Expanding stack for every probe is not a good idea, stack must
>>+	 * either be shrunk to its original size after single stepping or the
>>+	 * expanded stack should be kept track of, for the probed application,
>>+	 * so it can be reused to single step out-of-line
>>+	 */
>>+	if (new_vma->vm_flags & VM_GROWSDOWN)
>>+		addr = new_vma->vm_start;
>>+	else
>>+		addr = new_vma->vm_end - size;
>>+
>>+	preempt_enable_no_resched();
>>+	pte = get_uprobe_pte(addr);
>>+	preempt_disable();
>>+	if (!pte)
>>+		return -EFAULT;
>>+
>>+	upage = get_upage_free(current);
>>+	upage->status &= ~UPROBE_PAGE_FREE;
>>+	upage->tsk = current;
>>+	INIT_HLIST_NODE(&upage->hlist);
>>+	hlist_add_head(&upage->hlist,
>>+		&uprobe_page_table[hash_ptr(current, KPROBE_HASH_BITS)]);
>>+	upage->orig_pte = pte;
>>+	upage->orig_pte_val =  pte_val(*pte);
>>+	set_pte(pte, (*(upage->alias_pte)));
>>+
>>+	page = pte_page(*pte);
>>+	vaddr = (unsigned long *)kmap_atomic(page, KM_USER1);
>>+	vaddr = (unsigned long *)((unsigned long)vaddr + (addr & ~PAGE_MASK));
>>+	memcpy(vaddr, (unsigned long *)uprobe->kp.ainsn.insn, size);
>>+	kunmap_atomic(vaddr, KM_USER1);
>>+	regs->eip = addr;
>>+
>>+	return  0;
>>+}
>>+
>>+/**
>>+ * This routine checks for stack free space below the stack pointer and
>>+ * then copies the instructions at that location so that the processor can
>>+ * single step out-of-line. If there is no enough stack space or if
>>+ * copy_to_user fails or if the vma is invalid, it returns error.
>>+ */
>>+static int __kprobes copy_insn_onstack(struct uprobe *uprobe,
>>+			struct pt_regs *regs, unsigned long flags)
>>+{
>>+	unsigned long page_addr, stack_addr = regs->esp;
>>+	int  size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
>>+	unsigned long *source = (unsigned long *)uprobe->kp.ainsn.insn;
>>+
>>+	if (flags & VM_GROWSDOWN) {
>>+		page_addr = stack_addr & PAGE_MASK;
>>+
>>+		if (((stack_addr - sizeof(long long))) < (page_addr + size))
>>+			return -ENOMEM;
>>+
>>+		if (__copy_to_user_inatomic((unsigned long *)page_addr, source,
>>+									size))
>>+			return -EFAULT;
>>+
>>+		regs->eip = page_addr;
>>+	} else if (flags & VM_GROWSUP) {
>>+		page_addr = stack_addr & PAGE_MASK;
>>+
>>+		if (page_addr == stack_addr)
>>+			return -ENOMEM;
>>+		else
>>+			page_addr += PAGE_SIZE;
>>+
>>+		if ((page_addr - size) < (stack_addr + sizeof(long long)))
>>+			return -ENOMEM;
>>+
>>+		if (__copy_to_user_inatomic((unsigned long *)(page_addr - size),
>>+								source, size))
>>+			return -EFAULT;
>>+
>>+		regs->eip = page_addr - size;
>>+	} else
>>+		return -EINVAL;
>>+
>>+	return 0;
>>+}
>>+
>>+/**
>>+ * This routines get the page containing the probe, maps it and
>>+ * replaced the instruction at the probed address with specified
>>+ * opcode.
>>+ */
>>+void __kprobes replace_original_insn(struct uprobe *uprobe,
>>+				struct pt_regs *regs, kprobe_opcode_t opcode)
>>+{
>>+	kprobe_opcode_t *addr;
>>+	struct page *page;
>>+
>>+	page = find_get_page(uprobe->inode->i_mapping,
>>+					uprobe->offset >> PAGE_CACHE_SHIFT);
>>+	lock_page(page);
>>+
>>+	addr = (kprobe_opcode_t *)kmap_atomic(page, KM_USER0);
>>+	addr = (kprobe_opcode_t *)((unsigned long)addr +
>>+				 (unsigned long)(uprobe->offset & ~PAGE_MASK));
>>+	*addr = opcode;
>>+	/*TODO: flush vma ? */
>>+	kunmap_atomic(addr, KM_USER0);
>>+
>>+	unlock_page(page);
>>+
>>+	page_cache_release(page);
>>+	regs->eip = (unsigned long)uprobe->kp.addr;
>>+}
>>+
>>+/**
>>+ * This routine provides the functionality of single stepping out of line.
>>+ * If single stepping out-of-line cannot be achieved, it replaces with
>>+ * the original instruction allowing it to single step inline.
>>+ */
>>+static inline int uprobe_single_step(struct kprobe *p, struct pt_regs *regs)
>>+{
>>+	unsigned long stack_addr = regs->esp, flags;
>>+	struct vm_area_struct *vma = NULL;
>>+	struct uprobe *uprobe =  __get_cpu_var(current_uprobe);
>>+	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>+	int err = 0;
>>+
>>+	down_read(&current->mm->mmap_sem);
>>+
>>+	vma = find_vma(current->mm, (stack_addr & PAGE_MASK));
>>+	if (!vma) {
>>+		/* TODO: Need better error reporting? */
>>+		printk("No vma found\n");
>>+		up_read(&current->mm->mmap_sem);
>>+		return -ENOENT;
>>+	}
>>+	flags = vma->vm_flags;
>>+	up_read(&current->mm->mmap_sem);
>>+
>>+	kcb->kprobe_status |= UPROBE_SS_STACK;
>>+	err = copy_insn_onstack(uprobe, regs, flags);
>>+
>>+	down_write(&current->mm->mmap_sem);
>>+
>>+	if (err) {
>>+		kcb->kprobe_status |= UPROBE_SS_NEW_STACK;
>>+		err = copy_insn_on_new_page(uprobe, regs, vma);
>>+	}
>>+	if (err) {
>>+		kcb->kprobe_status |= UPROBE_SS_EXPSTACK;
>>+		err = copy_insn_onexpstack(uprobe, regs, vma);
>>+	}
>>+
>>+	up_write(&current->mm->mmap_sem);
>>+
>>+	if (err) {
>>+		kcb->kprobe_status |= UPROBE_SS_INLINE;
>>+		replace_original_insn(uprobe, regs, uprobe->kp.opcode);
>>+	}
>>+
>>+	 __get_cpu_var(singlestep_addr) = regs->eip;
>>+
>>+
>>+	return 0;
>>+}
>>+
>> static inline void prepare_singlestep(struct kprobe *p, struct pt_regs *regs)
>> {
>> 	regs->eflags |= TF_MASK;
>> 	regs->eflags &= ~IF_MASK;
>> 	/*single step inline if the instruction is an int3*/
>>+
>> 	if (p->opcode == BREAKPOINT_INSTRUCTION)
>> 		regs->eip = (unsigned long)p->addr;
>>-	else
>>-		regs->eip = (unsigned long)&p->ainsn.insn;
>>+	else {
>>+		if (!kernel_text_address((unsigned long)p->addr))
>>+			uprobe_single_step(p, regs);
>>+		else
>>+			regs->eip = (unsigned long)&p->ainsn.insn;
>>+	}
>> }
>>
>> /* Called with kretprobe_lock held */
>>@@ -194,6 +527,7 @@ static int __kprobes kprobe_handler(stru
>> 	kprobe_opcode_t *addr = NULL;
>> 	unsigned long *lp;
>> 	struct kprobe_ctlblk *kcb;
>>+	unsigned seg = regs->xcs & 0xffff;
>> #ifdef CONFIG_PREEMPT
>> 	unsigned pre_preempt_count = preempt_count();
>> #endif /* CONFIG_PREEMPT */
>>@@ -208,14 +542,21 @@ static int __kprobes kprobe_handler(stru
>> 	/* Check if the application is using LDT entry for its code segment and
>> 	 * calculate the address by reading the base address from the LDT entry.
>> 	 */
>>-	if ((regs->xcs & 4) && (current->mm)) {
>>+
>>+	if (regs->eflags & VM_MASK)
>>+		addr = (kprobe_opcode_t *)(((seg << 4) + regs->eip -
>>+			sizeof(kprobe_opcode_t)) & 0xffff);
>>+	else if ((regs->xcs & 4) && (current->mm)) {
>>+		local_irq_enable();
>>+		down(&current->mm->context.sem);
>> 		lp = (unsigned long *) ((unsigned long)((regs->xcs >> 3) * 8)
>> 					+ (char *) current->mm->context.ldt);
>> 		addr = (kprobe_opcode_t *) (get_desc_base(lp) + regs->eip -
>> 						sizeof(kprobe_opcode_t));
>>-	} else {
>>+		up(&current->mm->context.sem);
>>+		local_irq_disable();
>>+	} else
>> 		addr = (kprobe_opcode_t *)(regs->eip - sizeof(kprobe_opcode_t));
>>-	}
>> 	/* Check we're not actually recursing */
>> 	if (kprobe_running()) {
>> 		p = get_kprobe(addr);
>>@@ -235,7 +576,6 @@ static int __kprobes kprobe_handler(stru
>> 			save_previous_kprobe(kcb);
>> 			set_current_kprobe(p, regs, kcb);
>> 			kprobes_inc_nmissed_count(p);
>>-			prepare_singlestep(p, regs);
>> 			kcb->kprobe_status = KPROBE_REENTER;
>> 			return 1;
>> 		} else {
>>@@ -307,8 +647,8 @@ static int __kprobes kprobe_handler(stru
>> 	}
>>
>> ss_probe:
>>-	prepare_singlestep(p, regs);
>> 	kcb->kprobe_status = KPROBE_HIT_SS;
>>+	prepare_singlestep(p, regs);
>> 	return 1;
>>
>> no_kprobe:
>>@@ -498,6 +838,33 @@ no_change:
>> 	return;
>> }
>>
>>+static void __kprobes resume_execution_user(struct uprobe *uprobe,
>>+				struct pt_regs *regs, struct kprobe_ctlblk *kcb)
>>+{
>>+	unsigned long delta;
>>+	struct uprobe_page *upage;
>>+
>>+	/*
>>+	 * TODO :need to fixup special instructions as done with kernel probes.
>>+	 */
>>+	delta = regs->eip - __get_cpu_var(singlestep_addr);
>>+	regs->eip = (unsigned long)(uprobe->kp.addr + delta);
>>+
>>+	if ((kcb->kprobe_status & UPROBE_SS_EXPSTACK) ||
>>+			(kcb->kprobe_status & UPROBE_SS_NEW_STACK)) {
>>+		upage = get_upage_current(current);
>>+		set_pte(upage->orig_pte, __pte(upage->orig_pte_val));
>>+		pte_unmap(upage->orig_pte);
>>+
>>+		upage->status = UPROBE_PAGE_FREE;
>>+		hlist_del(&upage->hlist);
>>+
>>+	} else if (kcb->kprobe_status & UPROBE_SS_INLINE)
>>+		replace_original_insn(uprobe, regs,
>>+				(kprobe_opcode_t)BREAKPOINT_INSTRUCTION);
>>+	regs->eflags &= ~TF_MASK;
>>+}
>>+
>> /*
>>  * Interrupts are disabled on entry as trap1 is an interrupt gate and they
>>  * remain disabled thoroughout this function.
>>@@ -510,16 +877,19 @@ static inline int post_kprobe_handler(st
>> 	if (!cur)
>> 		return 0;
>>
>>-	if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
>>-		kcb->kprobe_status = KPROBE_HIT_SSDONE;
>>+	if (!(kcb->kprobe_status & KPROBE_REENTER) && cur->post_handler) {
>>+		kcb->kprobe_status |= KPROBE_HIT_SSDONE;
>> 		cur->post_handler(cur, regs, 0);
>> 	}
>>
>>-	resume_execution(cur, regs, kcb);
>>+	if (!kernel_text_address((unsigned long)cur->addr))
>>+		resume_execution_user(__get_cpu_var(current_uprobe), regs, kcb);
>>+	else
>>+		resume_execution(cur, regs, kcb);
>> 	regs->eflags |= kcb->kprobe_saved_eflags;
>>
>> 	/*Restore back the original saved kprobes variables and continue. */
>>-	if (kcb->kprobe_status == KPROBE_REENTER) {
>>+	if (kcb->kprobe_status & KPROBE_REENTER) {
>> 		restore_previous_kprobe(kcb);
>> 		goto out;
>> 	}
>>@@ -547,7 +917,13 @@ static inline int kprobe_fault_handler(s
>> 		return 1;
>>
>> 	if (kcb->kprobe_status & KPROBE_HIT_SS) {
>>-		resume_execution(cur, regs, kcb);
>>+		if (!kernel_text_address((unsigned long)cur->addr)) {
>>+			struct uprobe *uprobe =  __get_cpu_var(current_uprobe);
>>+			/* TODO: Proper handling of all instruction */
>>+			replace_original_insn(uprobe, regs, uprobe->kp.opcode);
>>+			regs->eflags &= ~TF_MASK;
>>+		} else
>>+			resume_execution(cur, regs, kcb);
>> 		regs->eflags |= kcb->kprobe_old_eflags;
>>
>> 		reset_current_kprobe();
>>@@ -654,7 +1030,67 @@ int __kprobes longjmp_break_handler(stru
>> 	return 0;
>> }
>>
>>+static void free_alias(void)
>>+{
>>+	int cpu;
>>+
>>+	for_each_cpu(cpu) {
>>+		struct uprobe_page *upage;
>>+		upage = per_cpu_ptr(uprobe_page, cpu);
>>+
>>+		if (upage->alias_addr) {
>>+			set_pte(upage->alias_pte, __pte(upage->alias_pte_val));
>>+			kfree(upage->alias_addr);
>>+		}
>>+		upage->alias_pte = 0;
>>+	}
>>+	free_percpu(uprobe_page);
>>+	return;
>>+}
>>+
>>+static int alloc_alias(void)
>>+{
>>+	int cpu;
>>+
>>+	uprobe_page = __alloc_percpu(sizeof(struct uprobe_page));
[YM] Do here codes try to resolve the problem of task switch at single-step? If so, the per cpu data also might be used up although get_upage_free will go through all uprobe_page of all cpus. I suggest to allocate a series of uprobe_page, and allocate again when they are used up.




>>+
>>+	for_each_cpu(cpu) {
>>+		struct uprobe_page *upage;
>>+		upage = per_cpu_ptr(uprobe_page, cpu);
>>+		upage->alias_addr = kmalloc(PAGE_SIZE, GFP_USER);
[YM] Does kmalloc(PAGE_SIZE...) imply the result is aligned to page? How about using alloc_page?


>>+		if (!upage->alias_addr) {
>>+			free_alias();
>>+			return -ENOMEM;
>>+		}
>>+		upage->alias_pte = lookup_address(
>>+					(unsigned long)upage->alias_addr);
>>+		upage->alias_pte_val = pte_val(*upage->alias_pte);
>>+		if (upage->alias_pte) {
[YM] If kmalloc returns a non-NULL address, upage->alias_pte is not equal to NULL. So delete above checking?


>>+			upage->status = UPROBE_PAGE_FREE;
>>+			set_pte(upage->alias_pte,
>>+						pte_mkdirty(*upage->alias_pte));
>>+			set_pte(upage->alias_pte,
>>+						pte_mkexec(*upage->alias_pte));
>>+			set_pte(upage->alias_pte,
>>+						 pte_mkwrite(*upage->alias_pte));
>>+			set_pte(upage->alias_pte,
>>+						pte_mkyoung(*upage->alias_pte));
>>+		}
>>+	}
>>+	return 0;
>>+}
>>+
>> int __init arch_init_kprobes(void)
>> {
>>+	int ret = 0;
>>+	/*
>>+	 * user space probes requires a page to copy the original instruction
>>+	 * so that it can single step if there is no free stack space, allocate
>>+	 * per cpu page.
>>+	 */
>>+
>>+	if ((ret = alloc_alias()))
>>+		return ret;
>>+
>> 	return 0;
>> }
>>diff -puN include/asm-i386/kprobes.h~kprobes_userspace_probes_ss_out-of-line include/asm-i386/kprobes.h
>>--- linux-2.6.16-rc1-mm5/include/asm-i386/kprobes.h~kprobes_userspace_probes_ss_out-of-line	2006-02-08 19:26:09.000000000 +0530
>>+++ linux-2.6.16-rc1-mm5-prasanna/include/asm-i386/kprobes.h	2006-02-08 19:26:10.000000000 +0530
>>@@ -42,6 +42,7 @@ typedef u8 kprobe_opcode_t;
>> #define JPROBE_ENTRY(pentry)	(kprobe_opcode_t *)pentry
>> #define ARCH_SUPPORTS_KRETPROBES
>> #define arch_remove_kprobe(p)	do {} while (0)
>>+#define UPROBE_PAGE_FREE 0x00000001
>>
>> void kretprobe_trampoline(void);
>>
>>@@ -74,6 +75,18 @@ struct kprobe_ctlblk {
>> 	struct prev_kprobe prev_kprobe;
>> };
>>
>>+/* per cpu uprobe page structure */
>>+struct uprobe_page {
>>+	struct hlist_node hlist;
>>+	pte_t *alias_pte;
>>+	pte_t *orig_pte;
>>+	unsigned long orig_pte_val;
>>+	unsigned long alias_pte_val;
[YM] I think the patch doesn't support CONFIG_X86_PAE, because if CONFIG_X86_PAE=y, pte_t becomes 64 bits.
How about changing above 2 members' type to pte_t directly?



>>+	void *alias_addr;
>>+	struct task_struct *tsk;
>>+	unsigned long status;
>>+};
>>+
>> /* trap3/1 are intr gates for kprobes.  So, restore the status of IF,
>>  * if necessary, before executing the original int3/1 (trap) handler.
>>  */
>>diff -puN include/linux/kprobes.h~kprobes_userspace_probes_ss_out-of-line include/linux/kprobes.h
>>--- linux-2.6.16-rc1-mm5/include/linux/kprobes.h~kprobes_userspace_probes_ss_out-of-line	2006-02-08 19:26:09.000000000 +0530
>>+++ linux-2.6.16-rc1-mm5-prasanna/include/linux/kprobes.h	2006-02-08 19:26:10.000000000 +0530
>>@@ -45,11 +45,18 @@
>> #ifdef CONFIG_KPROBES
>> #include <asm/kprobes.h>
>>
>>+#define KPROBE_HASH_BITS 6
>>+#define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)
>>+
>> /* kprobe_status settings */
>> #define KPROBE_HIT_ACTIVE	0x00000001
>> #define KPROBE_HIT_SS		0x00000002
>> #define KPROBE_REENTER		0x00000004
>> #define KPROBE_HIT_SSDONE	0x00000008
>>+#define UPROBE_SS_STACK		0x00000010
>>+#define UPROBE_SS_EXPSTACK	0x00000020
>>+#define UPROBE_SS_INLINE	0x00000040
>>+#define UPROBE_SS_NEW_STACK	0x00000080
>>
>> /* Attach to insert probes on any functions which should be ignored*/
>> #define __kprobes	__attribute__((__section__(".kprobes.text")))
>>diff -puN kernel/kprobes.c~kprobes_userspace_probes_ss_out-of-line kernel/kprobes.c
>>--- linux-2.6.16-rc1-mm5/kernel/kprobes.c~kprobes_userspace_probes_ss_out-of-line	2006-02-08 19:26:10.000000000 +0530
>>+++ linux-2.6.16-rc1-mm5-prasanna/kernel/kprobes.c	2006-02-08 19:26:10.000000000 +0530
>>@@ -42,9 +42,6 @@
>> #include <asm/errno.h>
>> #include <asm/kdebug.h>
>>
>>-#define KPROBE_HASH_BITS 6
>>-#define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)
>>-
>> static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
>> static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];
>> static struct list_head uprobe_module_list;
>>
>>_
>>--
>>Prasanna S Panchamukhi
>>Linux Technology Center
>>India Software Labs, IBM Bangalore
>>Email: prasanna@in.ibm.com
>>Ph: 91-80-51776329


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]