In new linux kernel version, kretprobe in IA32 is implemented in kretprobe_trampoline. And probepoint is removed from kretprobe_trampoline, instead trampoline_handler is called directly. Currently if kretprobe hander hit one trap which causes another kretprobe, there will be SPINLOCK recursive bug. This patch fixes this, and will skip trap during kretprobe handler execution. And the test case is the same in http://sourceware.org/bugzilla/show_bug.cgi?id=2071. --- 2.6.16-rc6-mm1.org/arch/i386/kernel/kprobes.c 2006-03-13 12:25:15.000000000 +0800 +++ 2.6.16-rc6-mm1/arch/i386/kernel/kprobes.c 2006-03-13 11:38:26.000000000 +0800 @@ -389,9 +389,11 @@ fastcall void *__kprobes trampoline_hand /* another task is sharing our hash bucket */ continue; - if (ri->rp && ri->rp->handler) + if (ri->rp && ri->rp->handler){ + __get_cpu_var(current_kprobe) = &ri->rp->kp; ri->rp->handler(ri, regs); - + __get_cpu_var(current_kprobe) = NULL; + } orig_ret_address = (unsigned long)ri->ret_addr; recycle_rp_inst(ri);
The patch looks like a hack and disables kprobe within the kret handler. A thorough approach is to change kretprobe_inst table as a list per task_struct and delete the spin lock in trampoline_handler, then estimate if it's a reentrancy for current task. This approach has better scalability and also solve this issue.
Another approach is to use a new per cpu data to record current in trampoline_handler and compare if it's a reentrancy of current task. It's simpler than the first approach.
Sorry, the second approach has a problem. When trampline_handler is entered, it must execute the corresponding kret handler.
My understanding is that Bibo has fixed this. Please append an update, Bibo. Thanks.
Currently if kretprobe handler hit one trap which causes another kretprobe, there will be SPINLOCK recursive bug. This patch fixes this, and will skip trap during kretprobe handler execution. And this patch is based on 2.6.16-rc6-mm2. --- arch/i386/kernel/kprobes.c.bak 2006-03-21 10:35:34.000000000 +0800 +++ arch/i386/kernel/kprobes.c 2006-03-21 10:37:44.000000000 +0800 @@ -390,8 +390,11 @@ fastcall void *__kprobes trampoline_hand /* another task is sharing our hash bucket */ continue; - if (ri->rp && ri->rp->handler) + if (ri->rp && ri->rp->handler){ + __get_cpu_var(current_kprobe) = &ri->rp->kp; ri->rp->handler(ri, regs); + __get_cpu_var(current_kprobe) = NULL; + } orig_ret_address = (unsigned long)ri->ret_addr; recycle_rp_inst(ri);
Bibo's fix below is in the upstream kernel... closing.