I've noticed the following lockdep error while running tests against the latest git master branch of systemtap (commit 71f89cccaf46): ``` Apr 26 14:29:36 fed32-dev kernel: CPU: 28 PID: 218395 Comm: staprun Tainted: G OE 5.11.22-100.fc32.x86_64+debug #1 Apr 26 14:29:36 fed32-dev kernel: ============================= Apr 26 14:29:36 fed32-dev kernel: WARNING: suspicious RCU usage Apr 26 14:29:36 fed32-dev kernel: 5.11.22-100.fc32.x86_64+debug #1 Tainted: G OE Apr 26 14:29:36 fed32-dev kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 Apr 26 14:29:36 fed32-dev kernel: ----------------------------- Apr 26 14:29:36 fed32-dev kernel: Call Trace: Apr 26 14:29:36 fed32-dev kernel: kernel/sched/core.c:7927 Illegal context switch in RCU-bh read-side critical section! Apr 26 14:29:36 fed32-dev kernel: <IRQ> Apr 26 14:29:36 fed32-dev kernel: other info that might help us debug this: Apr 26 14:29:36 fed32-dev kernel: rcu_scheduler_active = 2, debug_locks = 0 Apr 26 14:29:36 fed32-dev kernel: dump_stack+0x8b/0xb0 Apr 26 14:29:36 fed32-dev kernel: save_trace.cold+0x13/0x18 Apr 26 14:29:36 fed32-dev kernel: ? unwind_next_frame+0x41e/0x560 Apr 26 14:29:36 fed32-dev kernel: 1 lock held by cc1/217697: Apr 26 14:29:36 fed32-dev kernel: ? create_prof_cpu_mask+0x20/0x20 Apr 26 14:29:36 fed32-dev kernel: #0: ffff888112c35930 Apr 26 14:29:36 fed32-dev kernel: mark_lock+0x1b2/0x810 Apr 26 14:29:36 fed32-dev kernel: ? link_path_walk.part.0+0x20d/0x340 Apr 26 14:29:36 fed32-dev kernel: __lock_acquire+0x7f4/0x1e60 Apr 26 14:29:36 fed32-dev kernel: ( Apr 26 14:29:36 fed32-dev kernel: ? add_lock_to_list.constprop.0+0x94/0xf0 Apr 26 14:29:36 fed32-dev kernel: lock_acquire+0xc8/0x380 Apr 26 14:29:36 fed32-dev kernel: ? __wake_up_common_lock+0x52/0x90 Apr 26 14:29:36 fed32-dev kernel: &mm->mmap_lock Apr 26 14:29:36 fed32-dev kernel: ? lock_acquire+0xc8/0x380 Apr 26 14:29:36 fed32-dev kernel: _raw_spin_lock_irqsave+0x4d/0x90 Apr 26 14:29:36 fed32-dev kernel: #2 Apr 26 14:29:36 fed32-dev kernel: ? __wake_up_common_lock+0x52/0x90 Apr 26 14:29:36 fed32-dev kernel: __wake_up_common_lock+0x52/0x90 Apr 26 14:29:36 fed32-dev kernel: ? _stp_request_exit+0x2f/0x2f [stap_96bbb4216c211c351bf940dda33dc74_218395] Apr 26 14:29:36 fed32-dev kernel: ){++++}-{3:3} Apr 26 14:29:36 fed32-dev kernel: ? _stp_request_exit+0x2f/0x2f [stap_96bbb4216c211c351bf940dda33dc74_218395] Apr 26 14:29:36 fed32-dev kernel: , at: do_user_addr_fault+0x125/0x4c0 Apr 26 14:29:36 fed32-dev kernel: _stp_ctl_work_callback+0x8a/0xb5 [stap_96bbb4216c211c351bf940dda33dc74_218395] Apr 26 14:29:36 fed32-dev kernel: stack backtrace: Apr 26 14:29:36 fed32-dev kernel: call_timer_fn+0xa9/0x2c0 Apr 26 14:29:36 fed32-dev kernel: __run_timers.part.0+0x1e4/0x2d0 Apr 26 14:29:36 fed32-dev kernel: ? kvm_clock_get_cycles+0x14/0x20 Apr 26 14:29:36 fed32-dev kernel: run_timer_softirq+0x26/0x50 Apr 26 14:29:36 fed32-dev kernel: __do_softirq+0xdd/0x4d5 Apr 26 14:29:36 fed32-dev kernel: asm_call_irq_on_stack+0xf/0x20 Apr 26 14:29:36 fed32-dev kernel: </IRQ> Apr 26 14:29:36 fed32-dev kernel: do_softirq_own_stack+0x5b/0x70 Apr 26 14:29:36 fed32-dev kernel: __irq_exit_rcu+0xe4/0x130 Apr 26 14:29:36 fed32-dev kernel: irq_exit_rcu+0xa/0x20 Apr 26 14:29:36 fed32-dev kernel: sysvec_apic_timer_interrupt+0x4b/0xa0 Apr 26 14:29:36 fed32-dev kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20 Apr 26 14:29:36 fed32-dev kernel: RIP: 0010:lock_is_held_type+0xe1/0x110 Apr 26 14:29:36 fed32-dev kernel: Code: 41 0f b6 46 32 45 31 c0 83 e0 03 44 39 e8 41 0f 94 c0 b8 ff ff ff ff 65 0f c1 05 ba a6 49 7e 83 f8 01 75 26 48 8b 3c 24 57 9d <0f> 1f 44 00 00 48 83 c4 08 44 89 c0 5b 5d 41 5c 41 5d 41 5e 41 5f Apr 26 14:29:36 fed32-dev kernel: RSP: 0018:ffffc900024abbe0 EFLAGS: 00000246 Apr 26 14:29:36 fed32-dev kernel: RAX: 0000000000000001 RBX: ffffffff82d8fbc0 RCX: 0000000000000001 Apr 26 14:29:36 fed32-dev kernel: RDX: 0000000000000000 RSI: ffffffff82d8fbc0 RDI: 0000000000000246 Apr 26 14:29:36 fed32-dev kernel: RBP: ffff8881c24c8000 R08: 0000000000000000 R09: 0000000000000001 Apr 26 14:29:36 fed32-dev kernel: R10: 0000000000000246 R11: 0000000000000000 R12: ffff8881c24c8de0 Apr 26 14:29:36 fed32-dev kernel: R13: 00000000ffffffff R14: ffff8881c24c8de0 R15: 0000000000000001 Apr 26 14:29:36 fed32-dev kernel: rcu_read_lock_sched_held+0x3f/0x80 Apr 26 14:29:36 fed32-dev kernel: ? dput+0xb4/0x480 Apr 26 14:29:36 fed32-dev kernel: lock_release+0x359/0x400 Apr 26 14:29:36 fed32-dev kernel: step_into+0xa6/0x750 Apr 26 14:29:36 fed32-dev kernel: ? kernfs_dop_revalidate+0xb8/0xd0 Apr 26 14:29:36 fed32-dev kernel: walk_component+0x62/0x190 Apr 26 14:29:36 fed32-dev kernel: link_path_walk.part.0+0x20d/0x340 Apr 26 14:29:36 fed32-dev kernel: path_openat+0x9f/0xa50 Apr 26 14:29:36 fed32-dev kernel: do_filp_open+0x75/0x100 Apr 26 14:29:36 fed32-dev kernel: ? do_raw_spin_unlock+0x4b/0xa0 Apr 26 14:29:36 fed32-dev kernel: ? _raw_spin_unlock+0x1f/0x30 Apr 26 14:29:36 fed32-dev kernel: ? alloc_fd+0xf6/0x1e0 Apr 26 14:29:36 fed32-dev kernel: do_sys_openat2+0x7b/0x130 Apr 26 14:29:36 fed32-dev kernel: __x64_sys_openat+0x46/0x70 Apr 26 14:29:36 fed32-dev kernel: do_syscall_64+0x33/0x40 Apr 26 14:29:36 fed32-dev kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Apr 26 14:29:36 fed32-dev kernel: RIP: 0033:0x7fac564dcfab Apr 26 14:29:36 fed32-dev kernel: Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 2b 0c 25 Apr 26 14:29:36 fed32-dev kernel: RSP: 002b:00007ffdf1017e90 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 Apr 26 14:29:36 fed32-dev kernel: RAX: ffffffffffffffda RBX: 00000000010cd440 RCX: 00007fac564dcfab Apr 26 14:29:36 fed32-dev kernel: RDX: 0000000000000000 RSI: 00000000010b0a40 RDI: 00000000ffffff9c Apr 26 14:29:36 fed32-dev kernel: RBP: 00000000010b0a40 R08: 0000000000000008 R09: 0000000000000001 Apr 26 14:29:36 fed32-dev kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 Apr 26 14:29:36 fed32-dev kernel: R13: 00000000010cd440 R14: 0000000000000001 R15: 0000000000000000 Apr 26 14:29:36 fed32-dev kernel: CPU: 2 PID: 217697 Comm: cc1 Tainted: G OE 5.11.22-100.fc32.x86_64+debug #1 Apr 26 14:29:36 fed32-dev kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 Apr 26 14:29:36 fed32-dev kernel: Call Trace: Apr 26 14:29:36 fed32-dev kernel: dump_stack+0x8b/0xb0 Apr 26 14:29:36 fed32-dev kernel: ___might_sleep+0x180/0x190 Apr 26 14:29:36 fed32-dev kernel: do_user_addr_fault+0x145/0x4c0 Apr 26 14:29:36 fed32-dev kernel: exc_page_fault+0x67/0x2a0 Apr 26 14:29:36 fed32-dev kernel: ? asm_exc_page_fault+0x8/0x30 Apr 26 14:29:36 fed32-dev kernel: asm_exc_page_fault+0x1e/0x30 Apr 26 14:29:36 fed32-dev kernel: RIP: 0033:0x10655fa Apr 26 14:29:36 fed32-dev kernel: Code: 54 f8 03 00 00 41 8b 79 54 b9 01 00 00 00 31 d2 31 f6 48 83 c7 08 e8 65 9d e4 ff 49 8b 7c 24 18 49 8b 2f 49 89 c0 4c 8b 57 48 <4c> 89 10 48 89 47 48 48 8d 40 08 c7 47 50 28 00 00 00 4d 89 78 28 Apr 26 14:29:36 fed32-dev kernel: RSP: 002b:00007ffcf576c1b0 EFLAGS: 00010206 Apr 26 14:29:36 fed32-dev kernel: RAX: 00007fb0faabc000 RBX: 00007fb0fad98be0 RCX: 000000000000000c Apr 26 14:29:36 fed32-dev kernel: RDX: 0000000000000006 RSI: 0000000000000000 RDI: 00007fb0fada8850 Apr 26 14:29:36 fed32-dev kernel: RBP: 00007fb0fb88d600 R08: 00007fb0faabc000 R09: 0000000000003bb5 Apr 26 14:29:36 fed32-dev kernel: R10: 00007fb0faabb400 R11: 0000000000000003 R12: 00007fb0fb87abb0 Apr 26 14:29:36 fed32-dev kernel: R13: 0000000000000002 R14: 00007fb0faabb7c8 R15: 00007fb0fad98c68 ``` I'm not sure if this a kernel bug or a stap runtime bug. Any help here will be appreciated. Thanks!
This issue is relatively hard to reproduce, and thus, we cannot prepare a minimal test case for this.
Seems like _stp_ctl_work_callback -> _stp_request_exit -> _stp_ctl_send_notify is called from within an IRQ context which should never happen. So it looks like stap runtime bug?
Seems like the kernel backtrace is a little bit misleading. It actually has nothing to do with _stp_request_exit() after all: ``` (gdb) l *_stp_ctl_work_callback+0x8a 0x6cbb is in _stp_ctl_work_callback (/path/to/systemtap/runtime/transport/transport.c:559). 554 stp_spin_unlock_irqrestore(&_stp_ctl_ready_lock, flags); 555 556 _stp_runtime_entryfn_put_context(c); 557 558 if (do_io) 559 wake_up_interruptible(&_stp_ctl_wq); 560 561 /* if exit flag is set AND we have finished with systemtap_module_init() */ 562 if (unlikely(_stp_exit_flag && _stp_probes_started)) 563 _stp_request_exit(); ``` And I also checked the disassembly of the .ko file, it is indeed the `wake_up_interruptible()` call: ``` 0000000000006be1 <_stp_ctl_work_callback>: ... /path/to/systemtap/runtime/transport/transport.c:559 wake_up_interruptible(&_stp_ctl_wq); 6c50: b9 00 00 00 00 mov ecx,0x0 6c55: ba 01 00 00 00 mov edx,0x1 6c5a: be 01 00 00 00 mov esi,0x1 6c5f: 48 c7 c7 00 00 00 00 mov rdi,0x0 6c66: e8 00 00 00 00 call 6c6b <_stp_ctl_work_callback+0x8a> 6c6b: eb c6 jmp 6c33 <_stp_ctl_work_callback+0x52> ``` And _stp_ctl_work_callback+0x8a is the instruction at the address 6c6b above.
Here's my best theory: a user process has grabbed the mm map lock, and in its critical section, stap runtime's ctl work timer is fired; and inside the ctl work callback, stap calls wake_up_interruptible(), which may try to trigger scheduler context switches and hence that lockdep error "Illegal context switch in RCU-bh read-side critical section!". __wake_up_common_lock() appears in the lockdep backtrace, which is indeed called from __wake_up(), which is wake_up_interruptible(). Furthermore, both _stp_request_exit and _stp_ctl_work_callback are in the lockdep backtrace.
FWIW, I reproduced this lockdep error at least twice on Fedora's 5.11.22 debug kernel yesterday while testing the user access begin patch for stap. They are of the same or very similar lockdep error and backtrace.