On a x86-64 rhel6 vm, the following was observed during a make installcheck against git stap: stap_fda8d28487545b13195688c63e0e9bc_13886: systemtap: 2.0/0.152, base: ffffffffa0705000, memory: 57data/29text/20ctx/2058net/34alloc kb, probes: 28 stap_ad67448ddd1c1c3d62320c81d6c3737_14107: systemtap: 2.0/0.152, base: ffffffffa0722000, memory: 57data/29text/20ctx/2058net/34alloc kb, probes: 28 stap_92468660093a64382aac155c65fae81c_14360: systemtap: 2.0/0.152, base: ffffffffa03ef000, memory: 78data/59text/20ctx/2058net/178alloc kb, probes: 28 BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1112 in_atomic(): 0, irqs_disabled(): 1, pid: 14360, name: stapio INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<(null)>] (null) hardirqs last disabled at (0): [<ffffffff8106cf44>] copy_process+0x604/0x15d0 softirqs last enabled at (0): [<ffffffff8106cf44>] copy_process+0x604/0x15d0 softirqs last disabled at (0): [<(null)>] (null) Pid: 14360, comm: stapio Not tainted 2.6.32-279.5.2.el6.x86_64.debug #1 Call Trace: [<ffffffff810ab580>] ? print_irqtrace_events+0xd0/0xe0 [<ffffffff810582b7>] ? __might_sleep+0xf7/0x130 [<ffffffff810444a4>] ? __do_page_fault+0x114/0x4e0 [<ffffffff8118522a>] ? mem_cgroup_bad_page_check+0x1a/0x30 [<ffffffff8113b57f>] ? get_page_from_freelist+0x35f/0x8e0 [<ffffffff81534c9e>] ? do_page_fault+0x3e/0xa0 [<ffffffff81531dd5>] ? page_fault+0x25/0x30 [<ffffffffa03f01cd>] ? lookup_bad_addr+0xcd/0xe0 [stap_92468660093a64382aac155c65fae81c_14360] [<ffffffffa03f24d0>] ? probe_4103+0x230/0x2b0 [stap_92468660093a64382aac155c65fae81c_14360] [<ffffffffa03f2405>] ? probe_4103+0x165/0x2b0 [stap_92468660093a64382aac155c65fae81c_14360] [<ffffffffa03fa0d8>] ? enter_real_tracepoint_probe_0+0x148/0x300 [stap_92468660093a64382aac155c65fae81c_14360] [<ffffffffa03ef1ee>] ? enter_tracepoint_probe_0+0xe/0x10 [stap_92468660093a64382aac155c65fae81c_14360] [<ffffffff8152e3de>] ? thread_return+0x709/0x7db [<ffffffff8152fb55>] ? schedule_hrtimeout_range+0x145/0x170 [<ffffffffa03f0753>] ? _stp_ctl_poll_cmd+0x33/0x60 [stap_92468660093a64382aac155c65fae81c_14360] [<ffffffff811ac629>] ? poll_schedule_timeout+0x39/0x60 [<ffffffff811acf19>] ? do_select+0x5a9/0x6d0 [<ffffffff811ac970>] ? do_select+0x0/0x6d0 [<ffffffff811ac720>] ? __pollwait+0x0/0xf0 This was apparently the systemtap.stress/conversions_trace.stp test case.
With $SYSTEMTAP_SYNC, another run produced a similar crash: stap_2e79445e6d8698522a9f86c95ee78935_4327: systemtap: 2.0/0.152, base: ffffffffa042c000, memory: 57data/29text/20ctx/2058net/34alloc kb, probes: 28 stap_67912742e5bc7aca566eb18b6434082a__5618: systemtap: 2.0/0.152, base: ffffffffa0449000, memory: 78data/59text/20ctx/2058net/178alloc kb, probes: 28 BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1112 in_atomic(): 0, irqs_disabled(): 1, pid: 5618, name: stapio 3 locks held by stapio/5618: #0: (&rq->lock){-.-.-.}, at: [<ffffffff8152e355>] thread_return+0x680/0x7db #1: (& global.s_hits_lock){......}, at: [<ffffffffa0449e75>] stp_lock_probe+0x55/0xc0 [stap_67912742e5bc7aca566eb18b6434082a__5618] #2: (&mm->mmap_sem){++++++}, at: [<ffffffff81044489>] __do_page_fault+0xf9/0x4e0 irq event stamp: 102228 hardirqs last enabled at (102227): [<ffffffff81531230>] _spin_unlock_irqrestore+0x40/0x80 hardirqs last disabled at (102228): [<ffffffff8153152f>] _spin_lock_irq+0x1f/0x80 softirqs last enabled at (101522): [<ffffffff81077831>] __do_softirq+0x151/0x210 softirqs last disabled at (101261): [<ffffffff8100c30c>] call_softirq+0x1c/0x30 Pid: 5618, comm: stapio Not tainted 2.6.32-279.5.2.el6.x86_64.debug #1 Call Trace: [<ffffffff810ab580>] ? print_irqtrace_events+0xd0/0xe0 [<ffffffff810582b7>] ? __might_sleep+0xf7/0x130 [<ffffffff810444a4>] ? __do_page_fault+0x114/0x4e0 [<ffffffff810aec3f>] ? check_irq_usage+0x9f/0xf0 [<ffffffff810b07aa>] ? __lock_acquire+0xaca/0x1570 [<ffffffff810396d8>] ? pvclock_clocksource_read+0x58/0xd0 [<ffffffffa044a1cd>] ? lookup_bad_addr+0xcd/0xe0 [stap_67912742e5bc7aca566eb18b6434082a__5618] [<ffffffff810387cc>] ? kvm_clock_read+0x1c/0x20 [<ffffffff81012da9>] ? sched_clock+0x9/0x10 [<ffffffff8109e885>] ? sched_clock_local+0x25/0x90 where the instruction in question points to the _read_unlock() call. I wonder if perhaps this is a lockdep problem.
commit 8fee9cc