14230 – on ia64, the conversions.exp tracepoint test hangs

Bug 14230 - on ia64, the conversions.exp tracepoint test hangs

Summary: on ia64, the conversions.exp tracepoint test hangs

Status:	RESOLVED FIXED

Alias:	None

Product:	systemtap
Classification:	Unclassified
Component:	runtime (show other bugs)
Version:	unspecified

Importance:	P2 normal
Target Milestone:	---
Assignee:	Unassigned

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-06-13 17:38 UTC by David Smith
Modified:	2012-06-25 20:16 UTC (History)
CC List:	1 user (show)

See Also:
Host:	ia64
Target:
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description David Smith 2012-06-13 17:38:11 UTC

I've recently added more tests to conversions.exp, testing invalid memory accesses from more contexts, like tracepoints, timer.profile probes, and perf probes.

On ia64 (2.6.18-308.1.1.el5), the tracepoint test in conversions.exp hangs and cannot be killed.  Sysrq-t doesn't show anything interesting.

Platforms where conversions.exp passes correctly are:

x86_64: 2.6.9-100.EL, 2.6.18-308.el5, 2.6.32-220.13.1.el6.x86_64,
        3.5.0-0.rc1.git0.1.fc18.x86_64
ia32: 2.6.18-308.8.2.el5, 2.6.32-220.13.1.el6.i686,
      3.5.0-0.rc1.git0.1.fc18.i686.PAE
s390x: 2.6.18-308.el5, 2.6.32-278.el6.s390x
ppc64: 2.6.18-308.el5, 2.6.32-278.el6.ppc64

Comment 1 David Smith 2012-06-21 19:03:00 UTC

Here's some additional information. For all the tests, we test 3 addresses: 0, 0xffffffff, and 0xffffffffffffffff.  I only get the hang with 0.

With a debug kernel (2.6.18-308.8.2.el5debug) I get a usable backtrace.  Note that all 3 addresses listed above get a similar backtrace.

====
BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1

Call Trace:
 [<a000000100013b40>] show_stack+0x40/0xa0
                                sp=e0000004efea7900 bsp=e0000004efea1550
 [<a000000100013bd0>] dump_stack+0x30/0x60
                                sp=e0000004efea7ad0 bsp=e0000004efea1538
 [<a000000100069440>] __might_sleep+0x1c0/0x1e0
                                sp=e0000004efea7ad0 bsp=e0000004efea1510
 [<a0000001000bb4e0>] down_read+0x20/0x60
                                sp=e0000004efea7ad0 bsp=e0000004efea14f0
 [<a000000100691130>] ia64_do_page_fault+0x110/0xa40 
                                sp=e0000004efea7ad0 bsp=e0000004efea14a0
 [<a00000010000bfe0>] __ia64_leave_kernel+0x0/0x280
                                sp=e0000004efea7b80 bsp=e0000004efea14a0
 [<a000000207bbc720>] probe_2030+0x2e0/0x6e0 [stap_7da10598964d0c097738bae7f9532b0a_11484]
                                sp=e0000004efea7d50 bsp=e0000004efea1430
 [<a000000207bc69e0>] enter_real_tracepoint_probe_0+0x3e0/0x7c0 [stap_7da10598964d0c097738bae7f9532b0a_11484]
                                sp=e0000004efea7d50 bsp=e0000004efea1408
 [<a000000207bb0760>] enter_tracepoint_probe_0+0x20/0x40 [stap_7da10598964d0c097738bae7f9532b0a_11484]
                                sp=e0000004efea7d60 bsp=e0000004efea13e8
 [<a000000100687580>] schedule+0x1680/0x20e0
                                sp=e0000004efea7d60 bsp=e0000004efea1320
 [<a00000010007ede0>] do_syslog+0x240/0x8a0
                                sp=e0000004efea7df0 bsp=e0000004efea12d0
 [<a000000100219400>] kmsg_read+0x80/0xc0
                                sp=e0000004efea7e20 bsp=e0000004efea12a0
 [<a00000010020cef0>] proc_reg_read+0x130/0x180
                                sp=e0000004efea7e20 bsp=e0000004efea1250
 [<a000000100180fc0>] vfs_read+0x200/0x3a0
                                sp=e0000004efea7e20 bsp=e0000004efea1200
 [<a000000100181690>] sys_read+0x70/0xe0
                                sp=e0000004efea7e20 bsp=e0000004efea1180
 [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
                                sp=e0000004efea7e30 bsp=e0000004efea1180
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e0000004efea8000 bsp=e0000004efea1180
====

Comment 2 Frank Ch. Eigler 2012-06-22 01:49:25 UTC

Try wrapping the kread and friends in a pagefault_disable() / pagefault_enable(),
which in theory should set in_atomic()=1, and thus go to the ia64_do_page_fault no_context: branch, at which point our exception handlers should handle it.

Comment 3 David Smith 2012-06-25 20:16:22 UTC

On ia64, when in at atomic context (either in_atomic() or irqs_disabled() returns true), we now disable pagefaults when calling __stp_strncpy_from_user(), uderef(), or __stp_get_user().

Fixed in commit 6f8ab46.