Bug 30405 - Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 debug kernels
Summary: Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 debug ker...
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: runtime (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-04-29 22:03 UTC by agentzh
Modified: 2023-05-09 19:53 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description agentzh 2023-04-29 22:03:00 UTC
We noted that when using Fedora 36 x86_64's latest kernel-debug kernel, 6.2.12-100.fc36.x86_64+debug, when using a minimal kprobes stap script, the kernel always reports the following error in dmesg:

```
[   89.347060] stap_f93d809e35e31d9f81df52024bfed1b5__230 (a.stp): systemtap: 4.9/0.188, base: ffffffffc06ec000, memory: 52data/28text/25ctx/524390net/389alloc kb, probes: 1
[   89.347073] BUG: sleeping function called from invalid context at kernel/kallsyms.c:305
[   89.347076] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1831, name: stapio
[   89.347078] preempt_count: 1, expected: 0
[   89.347080] RCU nest depth: 0, expected: 0
[   89.347081] 3 locks held by stapio/1831:
[   89.347083]  #0: ffff88810d1ceef8 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x52/0x60
[   89.347094]  #1: ffff88816f73c498 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x74/0xf0
[   89.347104]  #2: ffffffffc06f9330 (cmd_mutex){+.+.}-{3:3}, at: _stp_ctl_write_cmd+0xe2/0xe10 [stap_f93d809e35e31d9f81df52024bfed1b5__230]
[   89.347116] Preemption disabled at:
[   89.347117] [<ffffffffc06f0de8>] _stp_ctl_write_cmd+0xd48/0xe10 [stap_f93d809e35e31d9f81df52024bfed1b5__230]
[   89.347124] CPU: 20 PID: 1831 Comm: stapio Tainted: G           OE      6.2.12-100.fc36.x86_64+debug #1
[   89.347127] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
[   89.347129] Call Trace:
[   89.347131]  <TASK>
[   89.347133]  dump_stack_lvl+0x71/0x90
[   89.347143]  __might_resched+0x1c2/0x2e0
[   89.347148]  ? __pfx_stapkp_symbol_callback+0x10/0x10 [stap_f93d809e35e31d9f81df52024bfed1b5__230]
[   89.347154]  kallsyms_on_each_symbol+0x6a/0xf0
[   89.347184]  _stp_ctl_write_cmd+0xd62/0xe10 [stap_f93d809e35e31d9f81df52024bfed1b5__230]
[   89.347189]  ? lock_acquire+0xe2/0x2c0
[   89.347196]  proc_reg_write+0x53/0xa0
[   89.347200]  vfs_write+0xea/0x530
[   89.347203]  ? __fdget_pos+0x52/0x60
[   89.347211]  ksys_write+0x74/0xf0
[   89.347215]  do_syscall_64+0x58/0x80
[   89.347221]  ? kvm_sched_clock_read+0x14/0x40
[   89.347224]  ? sched_clock_cpu+0xb/0xc0
[   89.347227]  ? lock_release+0x15d/0x400
[   89.347230]  ? _raw_spin_unlock_irq+0x24/0x50
[   89.347237]  ? _raw_spin_unlock_irq+0x24/0x50
[   89.347240]  ? lockdep_hardirqs_on+0x7d/0x100
[   89.347245]  ? _raw_spin_unlock_irq+0x34/0x50
[   89.347248]  ? syscall_exit_to_user_mode+0xe/0x50
[   89.347252]  ? do_syscall_64+0x67/0x80
[   89.347254]  ? lockdep_hardirqs_on+0x7d/0x100
[   89.347256]  ? do_syscall_64+0x67/0x80
[   89.347259]  ? lockdep_hardirqs_on+0x7d/0x100
[   89.347261]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[   89.347267] RIP: 0033:0x7f0d4ccdbc6f
[   89.347270] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 09 76 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 5c 76 f8 ff 48
[   89.347272] RSP: 002b:00007ffe6bb215c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   89.347275] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f0d4ccdbc6f
[   89.347277] RDX: 000000000000000c RSI: 00007ffe6bb215f0 RDI: 0000000000000004
[   89.347278] RBP: 000000000000000c R08: 0000000000000000 R09: 00007ffe6bb2077f
[   89.347280] R10: 0000000000000008 R11: 0000000000000293 R12: 00007ffe6bb21a50
[   89.347281] R13: 00007ffe6bb23c80 R14: 0000000000000001 R15: 00007ffe6bb21ad4
[   89.347291]  </TASK>
```

The a.stp file is defined as

```
probe kprobe.function("finish_task_switch") ?, kprobe.function("finish_task_switch.*") ? {
    println("Hit");
    exit();
}
```

I'm using the latest master branch of the upstream systemtap repo as of this wring (commit 418f0a45ca447).
Comment 1 agentzh 2023-04-30 05:51:49 UTC
The following patch seems to fix the error itself:

```
diff --git a/runtime/linux/kprobes.c b/runtime/linux/kprobes.c
index c0f4f5c0f..72f4b148e 100644
--- a/runtime/linux/kprobes.c
+++ b/runtime/linux/kprobes.c
@@ -819,9 +819,7 @@ stapkp_init(struct stap_kprobe_probe *probes,
 #ifdef STAPCONF_MODULE_MUTEX
        mutex_lock(&module_mutex);
 #endif
-       preempt_disable();
        kallsyms_on_each_symbol(stapkp_symbol_callback, &sd);
-       preempt_enable();
 #ifdef STAPCONF_MODULE_MUTEX
        mutex_unlock(&module_mutex);
 #endif
```

But it seems to effectively revert the previous commit for fixing a softlockup in PR20735:

```
commit 58c9bd1563aece0c020d9c8da5cf8db8ef028439
Author: David Smith <dsmith@redhat.com>
Date:   Thu Nov 10 09:59:16 2016 -0600

    Fix PR20735 by updating kprobes.c to avoid a soft lockup.

    * runtime/linux/kprobes.c: Instead of grabbing the module mutex around
      calling kallsyms_on_each_symbol(), just disable/enable preemption
      instead to avoid a soft lockup.
```

What should we do then?
Comment 2 agentzh 2023-05-02 22:45:20 UTC
This problem also happens on Fedora 36 x86_64's 6.1.18 kernel.
Comment 3 agentzh 2023-05-06 22:23:16 UTC
Got a related CPU lockup on older kernel (5.11.22) on x86_64. The dmesg error is here:

https://gist.github.com/agentzh/cbe640331de5f849b360b9415a6d312e

And the following patch fixes this lockup (as well as the original "sleeping function called from invalid context" error:

https://gist.github.com/agentzh/ba69ca1acac1b475bbf0951c7c09a245
Comment 4 agentzh 2023-05-09 19:53:11 UTC
Patch committed as commit 13c18518da.