I've recently added more tests to conversions.exp, testing invalid memory accesses from more contexts, like tracepoints, timer.profile probes, and perf probes. On ia64 (2.6.18-308.1.1.el5), the tracepoint test in conversions.exp hangs and cannot be killed. Sysrq-t doesn't show anything interesting. Platforms where conversions.exp passes correctly are: x86_64: 2.6.9-100.EL, 2.6.18-308.el5, 2.6.32-220.13.1.el6.x86_64, 3.5.0-0.rc1.git0.1.fc18.x86_64 ia32: 2.6.18-308.8.2.el5, 2.6.32-220.13.1.el6.i686, 3.5.0-0.rc1.git0.1.fc18.i686.PAE s390x: 2.6.18-308.el5, 2.6.32-278.el6.s390x ppc64: 2.6.18-308.el5, 2.6.32-278.el6.ppc64
Here's some additional information. For all the tests, we test 3 addresses: 0, 0xffffffff, and 0xffffffffffffffff. I only get the hang with 0. With a debug kernel (2.6.18-308.8.2.el5debug) I get a usable backtrace. Note that all 3 addresses listed above get a similar backtrace. ==== BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 Call Trace: [<a000000100013b40>] show_stack+0x40/0xa0 sp=e0000004efea7900 bsp=e0000004efea1550 [<a000000100013bd0>] dump_stack+0x30/0x60 sp=e0000004efea7ad0 bsp=e0000004efea1538 [<a000000100069440>] __might_sleep+0x1c0/0x1e0 sp=e0000004efea7ad0 bsp=e0000004efea1510 [<a0000001000bb4e0>] down_read+0x20/0x60 sp=e0000004efea7ad0 bsp=e0000004efea14f0 [<a000000100691130>] ia64_do_page_fault+0x110/0xa40 sp=e0000004efea7ad0 bsp=e0000004efea14a0 [<a00000010000bfe0>] __ia64_leave_kernel+0x0/0x280 sp=e0000004efea7b80 bsp=e0000004efea14a0 [<a000000207bbc720>] probe_2030+0x2e0/0x6e0 [stap_7da10598964d0c097738bae7f9532b0a_11484] sp=e0000004efea7d50 bsp=e0000004efea1430 [<a000000207bc69e0>] enter_real_tracepoint_probe_0+0x3e0/0x7c0 [stap_7da10598964d0c097738bae7f9532b0a_11484] sp=e0000004efea7d50 bsp=e0000004efea1408 [<a000000207bb0760>] enter_tracepoint_probe_0+0x20/0x40 [stap_7da10598964d0c097738bae7f9532b0a_11484] sp=e0000004efea7d60 bsp=e0000004efea13e8 [<a000000100687580>] schedule+0x1680/0x20e0 sp=e0000004efea7d60 bsp=e0000004efea1320 [<a00000010007ede0>] do_syslog+0x240/0x8a0 sp=e0000004efea7df0 bsp=e0000004efea12d0 [<a000000100219400>] kmsg_read+0x80/0xc0 sp=e0000004efea7e20 bsp=e0000004efea12a0 [<a00000010020cef0>] proc_reg_read+0x130/0x180 sp=e0000004efea7e20 bsp=e0000004efea1250 [<a000000100180fc0>] vfs_read+0x200/0x3a0 sp=e0000004efea7e20 bsp=e0000004efea1200 [<a000000100181690>] sys_read+0x70/0xe0 sp=e0000004efea7e20 bsp=e0000004efea1180 [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110 sp=e0000004efea7e30 bsp=e0000004efea1180 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 sp=e0000004efea8000 bsp=e0000004efea1180 ====
Try wrapping the kread and friends in a pagefault_disable() / pagefault_enable(), which in theory should set in_atomic()=1, and thus go to the ia64_do_page_fault no_context: branch, at which point our exception handlers should handle it.
On ia64, when in at atomic context (either in_atomic() or irqs_disabled() returns true), we now disable pagefaults when calling __stp_strncpy_from_user(), uderef(), or __stp_get_user(). Fixed in commit 6f8ab46.