9740 – syscall.* probe causes kernel panic(double fault) on rawhide kernel/i686

Bug 9740 - syscall.* probe causes kernel panic(double fault) on rawhide kernel/i686

Summary: syscall.* probe causes kernel panic(double fault) on rawhide kernel/i686

Status:	RESOLVED FIXED

Alias:	None

Product:	systemtap
Classification:	Unclassified
Component:	kprobes (show other bugs)
Version:	unspecified

Importance:	P2 critical
Target Milestone:	---
Assignee:	Masami Hiramatsu

URL:
Keywords:

Depends on:
Blocks:

Reported:	2009-01-13 21:37 UTC by Masami Hiramatsu
Modified:	2009-02-06 16:38 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
First try log of git-bisect (failed) (939 bytes, text/plain) 2009-01-19 17:49 UTC, Masami Hiramatsu	Details
Second try log of git-bisect (succeeded) (837 bytes, text/plain) 2009-01-19 22:30 UTC, Masami Hiramatsu	Details
kconfig for reproducing this bug (18.47 KB, patch) 2009-01-23 16:14 UTC, Masami Hiramatsu	Details \| Diff
[PATCH] text_poke side bugfix (466 bytes, patch) 2009-01-26 21:30 UTC, Masami Hiramatsu	Details \| Diff
test module for this issue (2.82 KB, text/plain) 2009-01-28 02:09 UTC, Masami Hiramatsu	Details
Config on the test machine (17.85 KB, text/plain) 2009-02-02 17:04 UTC, Ananth Mavinakayanahalli	Details
call vm_unmap_aliases() after vunmap() in text_poke() (351 bytes, patch) 2009-02-04 13:16 UTC, Ananth Mavinakayanahalli	Details \| Diff
[PATCH] x86/kprobe : call kprobe_fault_handler after gpd fixup (566 bytes, patch) 2009-02-05 21:30 UTC, Masami Hiramatsu	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Masami Hiramatsu 2009-01-13 21:37:28 UTC

I've run stap -ve 'global pids; probe syscall.* {pids[pid()]<<<1}' on
2.6.29-0.31.rc1.git2.fc11.i686 and got below double fault.
---
PANIC: double fault, gdt at c34d3000 [255 bytes]
double fault, tss at c3619d80
eip = c06e4b9d, esp = f1a48f98
eax = f1a490d0, ebx = f8c5b1f0, ecx = 0000007b, edx = 00000000
esi = c0537f0e, edi = 00000000
---

I could reproduce this fault and it always happened the 2nd time of staprun. So,
how to reproduce is;
1. run stap -ve 'global pids; probe syscall.* {pids[pid()]<<<1}'
2. stop it (if you see a privilege error, don't care or remove stap module by
staprun -d)
3. run above command again.

And "c06e4b8f T do_page_fault".

c06e4b8f <do_page_fault>:
c06e4b8f:       55                      push   %ebp
c06e4b90:       89 e5                   mov    %esp,%ebp
c06e4b92:       57                      push   %edi
c06e4b93:       89 d7                   mov    %edx,%edi
c06e4b95:       56                      push   %esi
c06e4b96:       53                      push   %ebx
c06e4b97:       81 ec 24 01 00 00       sub    $0x124,%esp
c06e4b9d:       89 85 d0 fe ff ff       mov    %eax,-0x130(%ebp)

Comment 1 Masami Hiramatsu 2009-01-13 21:47:41 UTC

I disabled unregister_kprobes() and tested, but same double fault happened.

Comment 2 Masami Hiramatsu 2009-01-13 22:30:28 UTC

This might be related to synchronize_sched() changes.
Here, the syscall2.ko is the pre-compiled script of 'global pids; probe
syscall.* {pids[pid()]<<<1}' *without* batch unregister.

$ time staprun -d syscall2.ko

real	0m0.005s
user	0m0.000s
sys	0m0.003s

This machine has 4 cpu and syscall2.ko has 318 probes. this means when removing
syscall2.ko, kprobes may call synchronize_sched() 318 times. Usually, it needs
more than 100ms because it doesn't use unregister_kprobes().

Comment 3 Masami Hiramatsu 2009-01-13 23:07:12 UTC

Here, I checked unloading time on 2.6.28-mm1 on 2CPU machine.

$ time staprun -d syscall2.ko 

real	0m0.340s
user	0m0.000s
sys	0m0.014s

As you can see, real took over 340ms because synchronize_sched() waits scheduler
synchronization.

Comment 4 Masami Hiramatsu 2009-01-13 23:21:14 UTC

even if I use a vanilla 2.6.28-rc1 kernel(use same kconfig), double fault
occurred...

$ time staprun -d syscall2.ko 

real	0m0.006s
user	0m0.000s
sys	0m0.003s

Comment 5 Masami Hiramatsu 2009-01-14 16:57:32 UTC

My deeper investigation showed that this bug isn't related to synchronize_sched()...
I've gotten below result of 'staprun -d' on 2.6.29-rc1 but it caused double fault.

real	0m0.629s
user	0m0.000s
sys	0m0.025s

Comment 6 Masami Hiramatsu 2009-01-15 00:47:36 UTC

Hmm, I could narrowed down this bug occurs on the 2.6.29-rc1 kernel with
CONFIG_HIGHMEM4G=y, but not occur with CONFIG_HIGHMEM64G=y(and CONFIG_X86_PAE=y,
of course).

It is needed to check other configurations, e.g. 
- CONFIG_NOHIGHMEM=y case
- before 2.6.28 kernel.

Comment 7 Masami Hiramatsu 2009-01-15 04:46:13 UTC

(In reply to comment #6)
> It is needed to check other configurations, e.g. 
> - CONFIG_NOHIGHMEM=y case
> - before 2.6.28 kernel.

Unfortunately, this happened on 2.6.28 too.

Comment 8 Masami Hiramatsu 2009-01-15 05:29:51 UTC

(In reply to comment #7)
> (In reply to comment #6)
> > It is needed to check other configurations, e.g. 
> > - CONFIG_NOHIGHMEM=y case

This config didn't cause kernel panic. So, that the problem is CONFIG_HIGHMEM4G.

Comment 9 Masami Hiramatsu 2009-01-19 17:49:24 UTC

Created attachment 3671 [details]
First try log of git-bisect (failed)

I tried to narrow it down by git-bisect. Here is the log.
Unfortunately, the double fault still happened after reverting
7c88db0cb589df980acfb2f73c3595a0653004ec commit. So I'm trying bisect again...

Comment 10 Masami Hiramatsu 2009-01-19 22:30:31 UTC

Created attachment 3672 [details]
Second try log of git-bisect (succeeded)

OK, I succeeded to narrow down the bug. According to this bisect log, this bug
was caused by below commit;
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=db64fe02258f1507e13fe5212a989922323685ce


... that's a big patch.

Comment 11 Masami Hiramatsu 2009-01-20 00:15:43 UTC

I've run stap -ve 'probe syscall.* {}' on that kernel, and it also caused kernel
crash.

Comment 12 Masami Hiramatsu 2009-01-20 00:26:20 UTC

(In reply to comment #11)
> I've run stap -ve 'probe syscall.* {}' on that kernel, and it also caused kernel
> crash.

When I ran a module which just register/unregister 317 kprobes(based on
syscall.*), it also crashed kernel. So this bug is kprobes bug or kernel bug,
not systemtap.

Comment 13 Masami Hiramatsu 2009-01-20 00:46:50 UTC

(In reply to comment #12)
> When I ran a module which just register/unregister 317 kprobes(based on
> syscall.*), it also crashed kernel. So this bug is kprobes bug or kernel bug,
> not systemtap.

If I commented out kprobes related functions, it didn't cause kernel crash.

Comment 14 Masami Hiramatsu 2009-01-23 16:14:46 UTC

Created attachment 3682 [details]
kconfig for reproducing this bug

This bug still occur on 2.6.29-rc2 with attached kconfig.

Comment 15 Masami Hiramatsu 2009-01-23 18:09:12 UTC

kprobes uses vmap/vunmap only in text_poke(). So this function should trigger crash.
When I tested to change vmap/vunmap to vm_map_ram/vm_unmap_ram, the kernel
didn't crash.

Comment 16 Masami Hiramatsu 2009-01-23 18:49:34 UTC

(In reply to comment #14)
> This bug still occur on 2.6.29-rc2 with attached kconfig.
> 

Oops, this was my test failure(forgot to update test program).
sorry for confusion, this bug seems not to be reproducable on 2.6.29-rc2.

Comment 17 Masami Hiramatsu 2009-01-26 15:39:05 UTC

(In reply to comment #16)
> Oops, this was my test failure(forgot to update test program).
> sorry for confusion, this bug seems not to be reproducable on 2.6.29-rc2.

Hmm, from the result of deeper (bisect) investigation, this issue hasn't fixed
yet on 2.6.29-rc2. Sorry about that.

I'll post a bugfix patch. however, I still can't find what causes double fault.

Comment 18 Masami Hiramatsu 2009-01-26 21:30:39 UTC

Created attachment 3689 [details]
[PATCH] text_poke side bugfix

This patch fixes this issue. However, this patch might not fix the root cause
of this issue ultimately, because I haven't find that the mechanism of this
issue yet.
This patch just modifies text_poke() to work as in old days.

Comment 19 Ananth Mavinakayanahalli 2009-01-27 07:02:45 UTC

If this is a problem with vmap and its interaction with text_poke(), I guess
other users of text_poke will also run into problems. Maybe its good to bring
this issue to Mathieu's notice?

Maybe its time to move this issue to lkml.

Comment 20 Masami Hiramatsu 2009-01-27 14:21:57 UTC

(In reply to comment #19)
> If this is a problem with vmap and its interaction with text_poke(), I guess
> other users of text_poke will also run into problems. Maybe its good to bring
> this issue to Mathieu's notice?

Actually, other users of text_poke() are "alternative" functions which changes
spinlock code and so on. Since they have done while booting kernel, kernel might
be not affected so much. anyway, I agree that this issue is worth reporting to
Mathieu and Nick Piggin.

> Maybe its time to move this issue to lkml.

Sure. Thanks,

Comment 21 Masami Hiramatsu 2009-01-28 02:03:57 UTC

Hmm, I've found one cause of this bug helped by kdump & crash (thanks!)

I found that repeating page faults caused by combination of kprobe_fault_handler
and pte-fixup which is a special routine for x86 kernel with CONFIG_HIGHMEM4G.
Double fault and memory corruptions were collaterally caused by stack overflow.

f0e11e80:  f0e11e90 trace_hardirqs_off_caller+24 kpfh     kp+12480 
f0e11e90:  f0e11ef4 kpfh     kp+12480 do_page_fault 
f0e11ea0:  f0e11ef4 /error_code+119/ {kpfh     0000000e 
f0e11eb0:  [selinux_inode_security] kp+12480 [selinux_inode_security] f0e11ef4 
f0e11ec0:  kp+12480 0000007b 0000007b 000000d8 
f0e11ed0:  ffffffff kpfh(%eip)     00000060 00010082} <Nth page fault occurred
on fault handler...>
 f0e11ee0:  kprobe_fault_handler+167 0000000e kpfh     kpfh     
f0e11ef0:  00000000 [selinux_inode_security] do_page_fault+123 [selinux_inode_se
curity] 
f0e11f00:  f4949434 f00729c0 f4949400 00000000 
f0e11f10:  00000000 00000000 00000000 00000000 
f0e11f20:  00000000 00000000 00000000 00000000 
f0e11f30:  00000000 00000000 00000000 00000000 
f0e11f40:  00000000 00000000 00000000 00000000 
f0e11f50:  00000000 00000000 00000000 00000000 
f0e11f60:  00000000 00000000 00000000 00000000 
f0e11f70:  00000000 00000000 00000000 00000000 
...
f0e17ce0:  trace_hardirqs_off_caller+24 kpfh     kp+12480 f0e17d50 
f0e17cf0:  kpfh     kp+12480 do_page_fault f0e17d50 
f0e17d00:  /error_code+119/ {kpfh     0000000e f0e17e90 
f0e17d10:  kp+12480 f0e17e90 f0e17d50 kp+12480 
f0e17d20:  012d007b 0000007b [skbuff_fclone_cache] ffffffff 
f0e17d30:  kpfh(%eip)     00000060 00010082}(regs) <2nd page fault occurred on
fault handler due to pte-fixup!>

           kprobe_fault_handler+167 
f0e17d40:  0000000e kp+12480 kph      00000000 
f0e17d50:  f0e17e88 do_page_fault+123 f0e17e90 f4949434 
f0e17d60:  f00729c0 f4949400

---<below 256bytes are previous data on stack>---
           000005a8 00000000 
f0e17d70:  00000001 00000003 ac46cfa5 [TCP]    
f0e17d80:  00000246 f0e17dc0 00000246 f0e17db0 
f0e17d90:  00000246 release_sock+40 f0e17da8 00000002 
f0e17da0:  00000000 f4949464 f00729c0 00000002 
f0e17db0:  00000000 f4949464 f00729c0 f0e17dc4 
f0e17dc0:  local_bh_enable_ip+8 f0e17dd0 00000001 00000000 
f0e17dd0:  rcu_lock_map f00729c0 [skbuff_fclone_cache] [skbuff_fclone_cache] 
f0e17de0:  00000246 f4949464 f0e17e64 f0e17e10 
f0e17df0:  00000246 f4949464 f0e17e70 f0e17e20 
f0e17e00:  00000246 00000001 00000002 00000000 
f0e17e10:  00000000 might_fault+67 00000246 00000000 
f0e17e20:  00000246 0000000e b9df0980 0000000e 
f0e17e30:  b9df0980 f0e17e70 f0e17e4c set_fd_set+41 
f0e17e40:  00000004 00000001 0000000e f0e17f84 
f0e17e50:  core_sys_select+418 b9df0980 b9df0990 f0e17e60 
f0e17e60:  00002748 00000000 
----

           f0e17e78 trace_hardirqs_off_caller+24 
f0e17e70:  kp+12480 f0e17f7c f0e17ee0 kp+12480(%ebx) 
f0e17e80:  f0e17f7c(%esi) do_page_fault(%edi) f0e17ee0(%ebp) /error_code+119/
f0e17e90:  {kp+12480 kph      f0e17f7c f0e17f7c 
f0e17ea0:  cbf6d5e0 f0e17ee0 kp+12480 0000007b 
f0e17eb0:  0000007b 000000d8 ffffffff(orig_ax) kph(%eip)      
f0e17ec0:  00000060 00010086(flags)}(regs) <page fault occurred, due to pte fixup!>

---<this is normal kprobe handler call>---
           /kprobe_exceptions_notify+467/ f0e17f74 
f0e17ed0:  do_sync_write+171 kprobe_exceptions_nb ffffffff 00000000 
f0e17ee0:  f0e17f00 notifier_call_chain+43 f0e17f44 00000002 
f0e17ef0:  trace_die_notifier die_chain+36 f0e17f44 00000002 
f0e17f00:  f0e17f2c __atomic_notifier_call_chain+57 ffffffff 00000000 
f0e17f10:  00000002 00000001 00000000 __atomic_notifier_call_chain 
f0e17f20:  f0e17f7c 00000000 b7b28ff4 f0e17f3c 
f0e17f30:  atomic_notifier_call_chain+12 ffffffff 00000000 f0e17f58 
f0e17f40:  notify_die+45 f0e17f7c kallsyms_token_index+5776 00000000 
f0e17f50:  00000003 00000005 f0e17f74 do_int3+31 
f0e17f60:  00000000 00000003 00000005 00000000 
f0e17f70:  00000008 f0e17000 int3+44  00000000 
f0e17f80:  0002498e 0114d071 00000008 b7b28ff4 
f0e17f90:  f0e17000 000000af 0000007b 4000007b 
f0e17fa0:  000000d8 ffffffff sys_rt_sigprocmask+1 00000060 
f0e17fb0:  00000283 sysenter_do_call+18 00000000 bfa29638 
f0e17fc0:  bfa296b8 00000008 b7b28ff4 bfa29618 
f0e17fd0:  000000af 0000007b 0000007b _text    
f0e17fe0:  000000af b7fb9424 00000073 00000246 
f0e17ff0:  bfa2957c 0000007b 00000000 00000000

Comment 22 Masami Hiramatsu 2009-01-28 02:09:35 UTC

Created attachment 3692 [details]
test module for this issue

BTW, this plain-c code can cause memory corruption. Even if we disables fault
handler(please test it), kernel causes a crash.

Comment 23 Masami Hiramatsu 2009-01-29 16:38:56 UTC

Here are my test hw environments (the issue happened on both with uploaded kconfig)

- AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ (2core)
 - 4GB RAM
 - Fedora 10

- Intel(R) Xeon(R) CPU           E5410  @ 2.33GHz (4core)
 - 2GB RAM
 - Fedora 10

Comment 24 Ananth Mavinakayanahalli 2009-01-30 08:35:12 UTC

Masami,
I tested your module on a 4 proc x86-32 xeon box and without the text_poke
patch, I hit the kprobe fault handler recursive failure issue:
fault occurred on kprobes at c042a7fa(@c04413cb:14)
fault occurred on kprobes at c042a7fa(@c04413cb:14)
fault occurred on kprobes at c042a7fa(@c04413cb:14)
fault occurred on kprobes at c042a7fa(@c04413cb:14)

With the text_poke patch too, I see a hard hang. Haven't investigated yet what
the issue could be. So yes, there is a potential vm issue with HIGHMEM4G and a
kprobe fault handler issue on x86.

Comment 25 Masami Hiramatsu 2009-01-30 22:57:27 UTC

(In reply to comment #24)
> Masami,
> I tested your module on a 4 proc x86-32 xeon box and without the text_poke
> patch, I hit the kprobe fault handler recursive failure issue:
> fault occurred on kprobes at c042a7fa(@c04413cb:14)
> fault occurred on kprobes at c042a7fa(@c04413cb:14)
> fault occurred on kprobes at c042a7fa(@c04413cb:14)
> fault occurred on kprobes at c042a7fa(@c04413cb:14)

So did your kernel hang after that?

> With the text_poke patch too, I see a hard hang. Haven't investigated yet what
> the issue could be. So yes, there is a potential vm issue with HIGHMEM4G and a
> kprobe fault handler issue on x86.

Hmm, so my patch could not help your case.

Thanks!

Comment 26 Ananth Mavinakayanahalli 2009-02-02 09:10:33 UTC

Subject: Re:  syscall.* probe causes kernel panic(double
	fault) on rawhide kernel/i686

 
> So did your kernel hang after that?

Yes

> > With the text_poke patch too, I see a hard hang. Haven't investigated yet what
> > the issue could be. So yes, there is a potential vm issue with HIGHMEM4G and a
> > kprobe fault handler issue on x86.
> 
> Hmm, so my patch could not help your case.

Actually, it runs for some time before hanging. I am not sure what the
precise issue is, but the system does go unresponsive.

Ananth

Comment 27 Masami Hiramatsu 2009-02-02 13:45:53 UTC

(In reply to comment #26)
> Actually, it runs for some time before hanging. I am not sure what the
> precise issue is, but the system does go unresponsive.

Could you attach your kconfig if possible? I'd like to reproduce it with my patch.

Comment 28 Ananth Mavinakayanahalli 2009-02-02 15:24:41 UTC

Sorry for the false alarm Masami. Looks like the machine on which I tested the
patch earlier had some HDD issues that may have played into the failure.

I now tested with and without your patch on a 4 processor P3 with 4G RAM with
HIGHMEM4G set and the problem doesn't occur with your patch applied.

I'll try to test it on another machine tomorrow just to make sure the problem is
indeed fixed with your patch.

Thanks for your work!

Comment 29 Ananth Mavinakayanahalli 2009-02-02 16:40:36 UTC

Hmm.. the other box also has some hardware issues. However, in the failure case,
I was able to get more info on this box:

BUG: unable to handle kernel <1>BUG: Bad page map in process gdm-rh-security 
pte:d6c162e8 pmd:16c16067
addr:008b2cc6 vm_flags:08000075 anon_vma:(null) mapping:ef4c3b3c index:c
vma->vm_ops->fault: filemap_fault+0x0/0x34e
vma->vm_file->f_op->mmap: generic_file_mmap+0x0/0x3b
Pid: 4742, comm: gdm-rh-security Not tainted 2.6.29-rc3 #1
Call Trace:
 [<c0461dc8>] print_bad_pte+0x165/0x176
 [<c046319c>] handle_mm_fault+0x22b/0x56f
 [<c063508d>] do_page_fault+0x275/0x542
 [<c0634e18>] do_page_fault+0x0/0x542
 [<c0633792>] error_code+0x72/0x78
PANIC: double fault, gdt at c961f000 [255 bytes]
double fault, tss at c9623180
eip = c063463d, esp = d6800000
eax = d6800028, ebx = d6c0fd68, ecx = d6800028, edx = 0000000e
esi = f1fdbece, edi = 00000002
PANIC: double fault, gdt at c9604000 [255 bytes]
double fault, tss at c9608180
eip = c0633725, esp = de800000
eax = 000101fa, ebx = 00010082, ecx = c04eefb0, edx = 0001037a
esi = c04eefb0, edi = 00000009
paging request<0>PANIC: double fault, gdt at c960d000 [255 bytes]
double fault, tss at c9611180
eip = c04eefb0, esp = ec000004
eax = 000101fa, ebx = 00010082, ecx = c04eefb0, edx = 0001037a
esi = c04eefb0, edi = 00000000
 at 402f542c
IP: [<c04f1fd1>] rb_next+0x18/0x34
*pde = 00000000 
Oops: 0000 [#1] SMP

Comment 30 Ananth Mavinakayanahalli 2009-02-02 17:04:19 UTC

Created attachment 3703 [details]
Config on the test machine

Config from the test system. The m/c did have hardware issues, but this is the
config used on the machine.

Comment 31 Masami Hiramatsu 2009-02-03 01:28:35 UTC

Here is the test result on the latest kernel(2.6.29-rc3)

-UP means smp kernel & maxcpus=1.
-test1 means kprobes syscall probe module without fault handler
-test2 means kprobes syscall probe module with fault handler

Without my patch:
  UP
   - test1 passed 128 times load/unload.
   - test2 failed at 69th
  SMP
   - test1 failed at 7th
   - test2 failed at 4th

With my patch:
  UP
   - both tests passed 128 times load/unload.
  SMP
   - both tests passed 128 times load/unload.

So, the bug may hide in the gap between vunmap and vm_unmap_ram.

Comment 32 Masami Hiramatsu 2009-02-03 16:28:54 UTC

Hmm, calling vm_unmap_aliases() right after vunmap() in text_poke() could also
solve this bug.

Comment 33 Ananth Mavinakayanahalli 2009-02-04 09:42:14 UTC

Subject: Re:  syscall.* probe causes kernel panic(double
	fault) on rawhide kernel/i686

On Tue, Feb 03, 2009 at 04:28:54PM -0000, mhiramat at redhat dot com wrote:
> Hmm, calling vm_unmap_aliases() right after vunmap() in text_poke() could also
> solve this bug.

One other way of fixing (errr.. working around) the problem is to revert
back to our earlier mechanism that did not use text_poke(), atleast
until the actual problem gets fixed.

Comment 34 Ananth Mavinakayanahalli 2009-02-04 10:21:40 UTC

Actually, my suggestion above won't work since kernel text is now write
protected :-(

Comment 35 Ananth Mavinakayanahalli 2009-02-04 13:16:34 UTC

Created attachment 3713 [details]
call vm_unmap_aliases() after vunmap() in text_poke()

Masami,
You were right. Looking at Nick's patch as well as the usage of
vm_unmap_aliases(), it does seem as though just a vunmap() will leave stale
vmaps around for lazy flushing and that could be a problem.

I have tested the patch on a system that used to crash earlier and it now works
without problems.

Comment 36 Frank Ch. Eigler 2009-02-04 15:59:07 UTC

It seems to me that an urgent patch to LKML for 2.6.28 and 2.6.29-rc is in order.

Comment 37 Masami Hiramatsu 2009-02-04 16:43:58 UTC

Hi Ananth,

(In reply to comment #35)
> Created an attachment (id=3713)
> call vm_unmap_aliases() after vunmap() in text_poke()

This patch is good for me too.

Acked-by: Masami Hiramatsu <mhiramat@redhat.com>

Thanks!

Comment 38 Ananth Mavinakayanahalli 2009-02-05 05:16:14 UTC

Maybe I spoke too early...

Even though the above patch improves the situation (no double fault due to vmap)
and tests run longer, I do see a kprobe fault handler bug presisting:

registering...registered
unregistering...fault occu kprobes at c042a7fa(@c04413cb:14)
<4>fault occurred on kprobes at c042a7fa(@c04413cb:14)
fault occurred on kprobes at c042a7fa(@c04413cb:14)
fault occurred on kprobes at c042a7fa(@c04413cb:14)
fault occurred on kprobes at c042a7fa(@c04413cb:14)
fault occurred on kprobes at c042a7fa(@c04413cb:14)
fault occurred on kprobes at c042a7fa(@c04413cb:14)
fault occurred on kprobes at c042a7fa(@c04413cb:14)
fault occurred on kprobes at c042a7fa(@c04413cb:14)
...
The machine then goes unresponsive.

Comment 39 Ananth Mavinakayanahalli 2009-02-05 06:21:51 UTC

On a kernel with Masami's vm_(un)map_ram() and running tesmod in a loop, I hit
the following which could be the cause of the fault(?):

kernel BUG at mm/vmalloc.c:843!
invalid opcode: 0000 [#1] SMP 
last sysfs file:
/sys/devices/pci0000:05/0000:05:07.0/host0/target0:0:0/0:0:0:0/vendor
Modules linked in: genkprobe(+) autofs4 dm_mirror dm_region_hash dm_log
dm_multipath dm_mod sbs sbshc battery ac lp joydev sr_mod ide_cd_mod sg floppy
parport_pc i2c_piix4 button tg3 cdrom serio_raw parport e1000 ibmasm pcspkr
i2c_core libphy e100 mii usb_storage aic7xxx mptspi mptscsih mptbase
scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last
unloaded: genkprobe]

Pid: 10470, comm: insmod Not tainted (2.6.29-rc3 #2) eserver xSeries 235 -[86714AX]-
EIP: 0060:[<c0469cd4>] EFLAGS: 00210246 CPU: 1
EIP is at vm_unmap_ram+0x85/0x181
EAX: 00000000 EBX: fe39a000 ECX: 00000000 EDX: 00000001
ESI: 00000000 EDI: 00000001 EBP: 00000001 ESP: f664aef8
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process insmod (pid: 10470, ti=f664a000 task=f38a7200 task.ti=f664a000)
Stack:
 00000002 f664af30 fe39acba c0632166 f664af2f c04becb9 c10097c0 c10097e0
 00000000 f89f3ba0 0069bff4 00000000 c0632919 cc9f0071 c0633a54 00000000
 00000099 00000000 0069bff4 f89f0071 f89f00a4 f89f6480 c040112c 00000000
Call Trace:
 [<c0632166>] text_poke+0xe4/0x113
 [<c04becb9>] sys_msgrcv+0x0/0x4d
 [<c0632919>] arch_arm_kprobe+0x19/0x1b
 [<c0633a54>] register_kprobe+0x265/0x2ad
 [<f89f0071>] init_module+0x0/0x5f [genkprobe]
 [<f89f00a4>] init_module+0x33/0x5f [genkprobe]
fault occurred on kprobes at c042a68f(@f89f0010:14)

Comment 40 Masami Hiramatsu 2009-02-05 15:03:20 UTC

I think this is not only vm issue, but also fault handling issue.

Comment 41 Masami Hiramatsu 2009-02-05 17:47:44 UTC

I'd like to suggest a systemtap-side workaround for this bug.

Execute below code before registering kprobes.
--
dummy = alloc_vm_area(PAGE_SIZE);
free_vm_area(dummy);
--

Alloc_vm_area() calls vmalloc_sync_all() internally, which synchronizes all
process PGDs to the kernel PGD. Thus, no more on-demand PTE fixup occurs on the
kprobe handlers.

I've checked this worked fine on the 2.6.29-rc3 kernel.

Comment 42 Frank Ch. Eigler 2009-02-05 18:48:31 UTC

> dummy = alloc_vm_area(PAGE_SIZE);
> free_vm_area(dummy);

Workaround tested and committed, 482472c

Comment 43 Masami Hiramatsu 2009-02-05 21:30:19 UTC

Created attachment 3721 [details]
[PATCH] x86/kprobe : call kprobe_fault_handler after gpd fixup

This patch solves this issue as far as I tested. Please test it!

Comment 44 Ananth Mavinakayanahalli 2009-02-06 05:18:27 UTC

Verified that the patch above does fix the double fault problem without the need
for the vm fixes.

Comment 45 Masami Hiramatsu 2009-02-06 16:38:34 UTC

The kernel patch was merged to linus tree.

9be260a646bf76fa418ee519afa10196b3164681