Bug 30393 - Lockdep error in _stp_ctl_work_callback()/wake_up_interruptible() on a fedora debug kernel
Summary: Lockdep error in _stp_ctl_work_callback()/wake_up_interruptible() on a fedora...
Status: UNCONFIRMED
Alias: None
Product: systemtap
Classification: Unclassified
Component: runtime (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-04-26 21:47 UTC by agentzh
Modified: 2023-05-18 17:47 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description agentzh 2023-04-26 21:47:52 UTC
I've noticed the following lockdep error while running tests against the latest git master branch of systemtap (commit 71f89cccaf46):

```
Apr 26 14:29:36 fed32-dev kernel: CPU: 28 PID: 218395 Comm: staprun Tainted: G           OE     5.11.22-100.fc32.x86_64+debug #1
Apr 26 14:29:36 fed32-dev kernel: =============================
Apr 26 14:29:36 fed32-dev kernel: WARNING: suspicious RCU usage
Apr 26 14:29:36 fed32-dev kernel: 5.11.22-100.fc32.x86_64+debug #1 Tainted: G           OE    
Apr 26 14:29:36 fed32-dev kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
Apr 26 14:29:36 fed32-dev kernel: -----------------------------
Apr 26 14:29:36 fed32-dev kernel: Call Trace:
Apr 26 14:29:36 fed32-dev kernel: kernel/sched/core.c:7927 Illegal context switch in RCU-bh read-side critical section!
Apr 26 14:29:36 fed32-dev kernel:  <IRQ>
Apr 26 14:29:36 fed32-dev kernel: 
                                  other info that might help us debug this:
Apr 26 14:29:36 fed32-dev kernel: 
                                  rcu_scheduler_active = 2, debug_locks = 0
Apr 26 14:29:36 fed32-dev kernel:  dump_stack+0x8b/0xb0
Apr 26 14:29:36 fed32-dev kernel:  save_trace.cold+0x13/0x18
Apr 26 14:29:36 fed32-dev kernel:  ? unwind_next_frame+0x41e/0x560
Apr 26 14:29:36 fed32-dev kernel: 1 lock held by cc1/217697:
Apr 26 14:29:36 fed32-dev kernel:  ? create_prof_cpu_mask+0x20/0x20
Apr 26 14:29:36 fed32-dev kernel:  #0: ffff888112c35930
Apr 26 14:29:36 fed32-dev kernel:  mark_lock+0x1b2/0x810
Apr 26 14:29:36 fed32-dev kernel:  ? link_path_walk.part.0+0x20d/0x340
Apr 26 14:29:36 fed32-dev kernel:  __lock_acquire+0x7f4/0x1e60
Apr 26 14:29:36 fed32-dev kernel:  (
Apr 26 14:29:36 fed32-dev kernel:  ? add_lock_to_list.constprop.0+0x94/0xf0
Apr 26 14:29:36 fed32-dev kernel:  lock_acquire+0xc8/0x380
Apr 26 14:29:36 fed32-dev kernel:  ? __wake_up_common_lock+0x52/0x90
Apr 26 14:29:36 fed32-dev kernel: &mm->mmap_lock
Apr 26 14:29:36 fed32-dev kernel:  ? lock_acquire+0xc8/0x380
Apr 26 14:29:36 fed32-dev kernel:  _raw_spin_lock_irqsave+0x4d/0x90
Apr 26 14:29:36 fed32-dev kernel: #2
Apr 26 14:29:36 fed32-dev kernel:  ? __wake_up_common_lock+0x52/0x90
Apr 26 14:29:36 fed32-dev kernel:  __wake_up_common_lock+0x52/0x90
Apr 26 14:29:36 fed32-dev kernel:  ? _stp_request_exit+0x2f/0x2f [stap_96bbb4216c211c351bf940dda33dc74_218395]
Apr 26 14:29:36 fed32-dev kernel: ){++++}-{3:3}
Apr 26 14:29:36 fed32-dev kernel:  ? _stp_request_exit+0x2f/0x2f [stap_96bbb4216c211c351bf940dda33dc74_218395]
Apr 26 14:29:36 fed32-dev kernel: , at: do_user_addr_fault+0x125/0x4c0
Apr 26 14:29:36 fed32-dev kernel:  _stp_ctl_work_callback+0x8a/0xb5 [stap_96bbb4216c211c351bf940dda33dc74_218395]
Apr 26 14:29:36 fed32-dev kernel: 
                                  stack backtrace:
Apr 26 14:29:36 fed32-dev kernel:  call_timer_fn+0xa9/0x2c0
Apr 26 14:29:36 fed32-dev kernel:  __run_timers.part.0+0x1e4/0x2d0
Apr 26 14:29:36 fed32-dev kernel:  ? kvm_clock_get_cycles+0x14/0x20
Apr 26 14:29:36 fed32-dev kernel:  run_timer_softirq+0x26/0x50
Apr 26 14:29:36 fed32-dev kernel:  __do_softirq+0xdd/0x4d5
Apr 26 14:29:36 fed32-dev kernel:  asm_call_irq_on_stack+0xf/0x20
Apr 26 14:29:36 fed32-dev kernel:  </IRQ>
Apr 26 14:29:36 fed32-dev kernel:  do_softirq_own_stack+0x5b/0x70
Apr 26 14:29:36 fed32-dev kernel:  __irq_exit_rcu+0xe4/0x130
Apr 26 14:29:36 fed32-dev kernel:  irq_exit_rcu+0xa/0x20
Apr 26 14:29:36 fed32-dev kernel:  sysvec_apic_timer_interrupt+0x4b/0xa0
Apr 26 14:29:36 fed32-dev kernel:  asm_sysvec_apic_timer_interrupt+0x12/0x20
Apr 26 14:29:36 fed32-dev kernel: RIP: 0010:lock_is_held_type+0xe1/0x110
Apr 26 14:29:36 fed32-dev kernel: Code: 41 0f b6 46 32 45 31 c0 83 e0 03 44 39 e8 41 0f 94 c0 b8 ff ff ff ff 65 0f c1 05 ba a6 49 7e 83 f8 01 75 26 48 8b 3c 24 57 9d <0f> 1f 44 00 00 48 83 c4 08 44 89 c0 5b 5d 41 5c 41 5d 41 5e 41 5f
Apr 26 14:29:36 fed32-dev kernel: RSP: 0018:ffffc900024abbe0 EFLAGS: 00000246
Apr 26 14:29:36 fed32-dev kernel: RAX: 0000000000000001 RBX: ffffffff82d8fbc0 RCX: 0000000000000001
Apr 26 14:29:36 fed32-dev kernel: RDX: 0000000000000000 RSI: ffffffff82d8fbc0 RDI: 0000000000000246
Apr 26 14:29:36 fed32-dev kernel: RBP: ffff8881c24c8000 R08: 0000000000000000 R09: 0000000000000001
Apr 26 14:29:36 fed32-dev kernel: R10: 0000000000000246 R11: 0000000000000000 R12: ffff8881c24c8de0
Apr 26 14:29:36 fed32-dev kernel: R13: 00000000ffffffff R14: ffff8881c24c8de0 R15: 0000000000000001
Apr 26 14:29:36 fed32-dev kernel:  rcu_read_lock_sched_held+0x3f/0x80
Apr 26 14:29:36 fed32-dev kernel:  ? dput+0xb4/0x480
Apr 26 14:29:36 fed32-dev kernel:  lock_release+0x359/0x400
Apr 26 14:29:36 fed32-dev kernel:  step_into+0xa6/0x750
Apr 26 14:29:36 fed32-dev kernel:  ? kernfs_dop_revalidate+0xb8/0xd0
Apr 26 14:29:36 fed32-dev kernel:  walk_component+0x62/0x190
Apr 26 14:29:36 fed32-dev kernel:  link_path_walk.part.0+0x20d/0x340
Apr 26 14:29:36 fed32-dev kernel:  path_openat+0x9f/0xa50
Apr 26 14:29:36 fed32-dev kernel:  do_filp_open+0x75/0x100
Apr 26 14:29:36 fed32-dev kernel:  ? do_raw_spin_unlock+0x4b/0xa0
Apr 26 14:29:36 fed32-dev kernel:  ? _raw_spin_unlock+0x1f/0x30
Apr 26 14:29:36 fed32-dev kernel:  ? alloc_fd+0xf6/0x1e0
Apr 26 14:29:36 fed32-dev kernel:  do_sys_openat2+0x7b/0x130
Apr 26 14:29:36 fed32-dev kernel:  __x64_sys_openat+0x46/0x70
Apr 26 14:29:36 fed32-dev kernel:  do_syscall_64+0x33/0x40
Apr 26 14:29:36 fed32-dev kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 26 14:29:36 fed32-dev kernel: RIP: 0033:0x7fac564dcfab
Apr 26 14:29:36 fed32-dev kernel: Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 2b 0c 25
Apr 26 14:29:36 fed32-dev kernel: RSP: 002b:00007ffdf1017e90 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Apr 26 14:29:36 fed32-dev kernel: RAX: ffffffffffffffda RBX: 00000000010cd440 RCX: 00007fac564dcfab
Apr 26 14:29:36 fed32-dev kernel: RDX: 0000000000000000 RSI: 00000000010b0a40 RDI: 00000000ffffff9c
Apr 26 14:29:36 fed32-dev kernel: RBP: 00000000010b0a40 R08: 0000000000000008 R09: 0000000000000001
Apr 26 14:29:36 fed32-dev kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Apr 26 14:29:36 fed32-dev kernel: R13: 00000000010cd440 R14: 0000000000000001 R15: 0000000000000000
Apr 26 14:29:36 fed32-dev kernel: CPU: 2 PID: 217697 Comm: cc1 Tainted: G           OE     5.11.22-100.fc32.x86_64+debug #1
Apr 26 14:29:36 fed32-dev kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
Apr 26 14:29:36 fed32-dev kernel: Call Trace:
Apr 26 14:29:36 fed32-dev kernel:  dump_stack+0x8b/0xb0
Apr 26 14:29:36 fed32-dev kernel:  ___might_sleep+0x180/0x190
Apr 26 14:29:36 fed32-dev kernel:  do_user_addr_fault+0x145/0x4c0
Apr 26 14:29:36 fed32-dev kernel:  exc_page_fault+0x67/0x2a0
Apr 26 14:29:36 fed32-dev kernel:  ? asm_exc_page_fault+0x8/0x30
Apr 26 14:29:36 fed32-dev kernel:  asm_exc_page_fault+0x1e/0x30
Apr 26 14:29:36 fed32-dev kernel: RIP: 0033:0x10655fa
Apr 26 14:29:36 fed32-dev kernel: Code: 54 f8 03 00 00 41 8b 79 54 b9 01 00 00 00 31 d2 31 f6 48 83 c7 08 e8 65 9d e4 ff 49 8b 7c 24 18 49 8b 2f 49 89 c0 4c 8b 57 48 <4c> 89 10 48 89 47 48 48 8d 40 08 c7 47 50 28 00 00 00 4d 89 78 28
Apr 26 14:29:36 fed32-dev kernel: RSP: 002b:00007ffcf576c1b0 EFLAGS: 00010206
Apr 26 14:29:36 fed32-dev kernel: RAX: 00007fb0faabc000 RBX: 00007fb0fad98be0 RCX: 000000000000000c
Apr 26 14:29:36 fed32-dev kernel: RDX: 0000000000000006 RSI: 0000000000000000 RDI: 00007fb0fada8850
Apr 26 14:29:36 fed32-dev kernel: RBP: 00007fb0fb88d600 R08: 00007fb0faabc000 R09: 0000000000003bb5
Apr 26 14:29:36 fed32-dev kernel: R10: 00007fb0faabb400 R11: 0000000000000003 R12: 00007fb0fb87abb0
Apr 26 14:29:36 fed32-dev kernel: R13: 0000000000000002 R14: 00007fb0faabb7c8 R15: 00007fb0fad98c68
```

I'm not sure if this a kernel bug or a stap runtime bug. Any help here will be appreciated. Thanks!
Comment 1 agentzh 2023-04-27 17:36:19 UTC
This issue is relatively hard to reproduce, and thus, we cannot prepare a minimal test case for this.
Comment 2 agentzh 2023-04-28 00:48:15 UTC
Seems like _stp_ctl_work_callback -> _stp_request_exit -> _stp_ctl_send_notify is called from within an IRQ context which should never happen. So it looks like stap runtime bug?
Comment 3 agentzh 2023-05-01 18:19:12 UTC
Seems like the kernel backtrace is a little bit misleading. It actually has nothing to do with _stp_request_exit() after all:

```
(gdb) l *_stp_ctl_work_callback+0x8a
0x6cbb is in _stp_ctl_work_callback (/path/to/systemtap/runtime/transport/transport.c:559).
554		stp_spin_unlock_irqrestore(&_stp_ctl_ready_lock, flags);
555
556		_stp_runtime_entryfn_put_context(c);
557
558		if (do_io)
559			wake_up_interruptible(&_stp_ctl_wq);
560
561		/* if exit flag is set AND we have finished with systemtap_module_init() */
562		if (unlikely(_stp_exit_flag && _stp_probes_started))
563			_stp_request_exit();
```

And I also checked the disassembly of the .ko file, it is indeed the `wake_up_interruptible()` call:

```
0000000000006be1 <_stp_ctl_work_callback>:
...
/path/to/systemtap/runtime/transport/transport.c:559
		wake_up_interruptible(&_stp_ctl_wq);
    6c50:	b9 00 00 00 00       	mov    ecx,0x0
    6c55:	ba 01 00 00 00       	mov    edx,0x1
    6c5a:	be 01 00 00 00       	mov    esi,0x1
    6c5f:	48 c7 c7 00 00 00 00 	mov    rdi,0x0
    6c66:	e8 00 00 00 00       	call   6c6b <_stp_ctl_work_callback+0x8a>
    6c6b:	eb c6                	jmp    6c33 <_stp_ctl_work_callback+0x52>
```

And _stp_ctl_work_callback+0x8a is the instruction at the address 6c6b above.
Comment 4 agentzh 2023-05-18 17:43:50 UTC
Here's my best theory: a user process has grabbed the mm map lock, and in its critical section, stap runtime's ctl work timer is fired; and inside the ctl work callback, stap calls wake_up_interruptible(), which may try to trigger scheduler context switches and hence that lockdep error "Illegal context switch in RCU-bh read-side critical section!".

__wake_up_common_lock() appears in the lockdep backtrace, which is indeed called from __wake_up(), which is wake_up_interruptible(). Furthermore, both _stp_request_exit and _stp_ctl_work_callback are in the lockdep backtrace.
Comment 5 agentzh 2023-05-18 17:47:47 UTC
FWIW, I reproduced this lockdep error at least twice on Fedora's 5.11.22 debug kernel yesterday while testing the user access begin patch for stap. They are of the same or very similar lockdep error and backtrace.