On recent kernels, a probe requested on module .init.* functions no longer receives hits. This appears to be because the kernel MODULE_STATE_COMING notification is coming too early (before the module_sect_attrs are set), so we cannot relocate symbols against .init.*, and thus cannot place probes. The bz6503.* test case was too permissive to let us discover this immediately. The kernel change that broke stap has not yet been identified.
commit 0caa975190e4d2b19e7b65a0b23a7700522aa5d7 commit 2278079efc01124dd509241f6c6eadbd6e19cb2a commit 2e67c14dad1c661d2ce0b0ed218b371c1af218ba commit 578f5f6f6792fc92d74539a927cd85de5bcbb4dd
*** Bug 19186 has been marked as a duplicate of this bug. ***
Unfortunately, the fix for this bug caused bug #19186 (getting a 'BUG: sleeping fujnction called from invalid context' error on rawhide). The kprobes registration function sleeps, and you can't sleep in a tracepoint handler. We'll have to fix this bug a different way.
Commit fb54a58b93f37aba51d0ecd5b8924e617b229523 reverts original commit 2e67c14dad1c661d2ce0b0ed218b371c1af218ba (the 3rd commit listed in comment #1). Commit 04ffc2184ffedbb3daa44ffbc0cc2ac956d304be reverts original commit 2278079efc01124dd509241f6c6eadbd6e19cb2a (the 2nd commit listed in comment #1). The other 2 commits in the original fix are fine, the first adds some debug messages and the last modifies some test cases. The new fix in commit 24897818cdcf1994a52890ffe756226422df75f3 switches to using symbol_name+offset probing for kernel modules. This appears to be the only way to be able to probe module initializers in recent kernels. Note that this work is partially based on a patch by James Bottomley from the following link: <https://sourceware.org/ml/systemtap/2008-q3/msg00182.html>
Sigh. I've found a complication here. On 3.10.0-229.el7.x86_64 (RHEL 7.1), modules_out_of_tree.exp works correctly. On 3.10.0-327.el7.x86_64 (RHEL 7.2), modules_out_of_tree.exp fails: ==== Running ../../src/testsuite/systemtap.base/modules_out_of_tree.exp ... PASS: modules_out_of_tree (compilation) PASS: modules_out_of_tree (listing) spawn stap -e probe module("/root/rhel7-64/testsuite/modules_out_of_tree/stap_modules_out_of_tree.ko").function("stap_modules_out_of_tree_init") { println("hit") } -c insmod /root/rhel7-64/testsuite/modules_out_of_tree/stap_modules_out_of_tree.ko && rmmod /root/rhel7-64/testsuite/modules_out_of_tree/stap_modules_out_of_tree.ko FAIL: modules_out_of_tree (probing) testcase ../../src/testsuite/systemtap.base/modules_out_of_tree.exp completed in 3 seconds ==== Basically the module init function isn't getting a probe hit. What's odd here is that on this kernel, the new code I added shouldn't be getting used - it only kicks in at kernel version 3.16. I've tried switching to the new code for the failing kernel, but that causes the system to reset. I need to do some more debugging here to figure out what is going on.
For reference's sake, here's the console output I get if I switch to the new code on 3.10.0-327.el7.x86_64: ==== [14388.754591] BUG: unable to handle kernel paging request at ffffffffa0493978 [14388.762378] IP: [<ffffffff810ee2f9>] module_kallsyms_lookup_name+0x39/0xc0 [14388.770066] PGD 194d067 PUD 194e063 PMD 4664c0067 PTE 800000085e077161 [14388.777395] Oops: 0003 [#1] SMP [14388.781016] Modules linked in: stap_b83341283b00405072ddbc8efa3988d_54172(OE) intel_powerclamp coretemp intel_rapl kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ipmi_ssif iTCO_vendor_support mei_me pcspkr sb_edac sg lpc_ich mei i2c_i801 edac_core mfd_core shpchp wmi ipmi_si ipmi_msghandler acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm ixgbe crct10dif_pclmul ahci mdio crct10dif_common libahci ptp drm crc32c_intel pps_core libata i2c_core dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_1f03fd0c37a653862845287dd5fe236_51524] [14388.857926] CPU: 29 PID: 54172 Comm: stapio Tainted: G OE ------------ 3.10.0-327.el7.x86_64 #1 [14388.868900] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0044.R00.1501191641 01/19/2015 [14388.880261] task: ffff880865385080 ti: ffff8808243ec000 task.ti: ffff8808243ec000 [14388.888616] RIP: 0010:[<ffffffff810ee2f9>] [<ffffffff810ee2f9>] module_kallsyms_lookup_name+0x39/0xc0 [14388.899021] RSP: 0018:ffff8808243efce8 EFLAGS: 00010246 [14388.904951] RAX: ffffffffa0493978 RBX: ffffffffa0493978 RCX: 0000000000000000 [14388.912917] RDX: 000000000000003a RSI: 0000000000000000 RDI: ffffffffa0493960 [14388.920882] RBP: ffff8808243efd08 R08: 0000000000000001 R09: ffffffff8184d42c [14388.928848] R10: ffffffff8184d42c R11: ffff8808243efb5e R12: ffffffffa0493960 [14388.936814] R13: ffffffffa0493960 R14: 00000000000118dd R15: 0000000000000007 [14388.944780] FS: 00007f566bac2740(0000) GS:ffff88046f5e0000(0000) knlGS:0000000000000000 [14388.953815] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [14388.960229] CR2: ffffffffa0493978 CR3: 0000000464aa1000 CR4: 00000000001407e0 [14388.968195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [14388.976160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [14388.984126] Stack: [14388.986367] 0000000000000246 00000000c69f8649 00000000000118dd 00000000000d2e14 [14388.994666] ffff8808243efdc0 ffffffff810eeeb8 696c5f6b72625f5f 615f74670074696d [14389.002965] 6e660000636f6c6c 7473690000000000 61635f7473696c00 0036000064656c6c [14389.011265] Call Trace: [14389.013996] [<ffffffff810eeeb8>] kallsyms_lookup_name+0x88/0xb0 [14389.020704] [<ffffffff816416e5>] kprobe_addr+0x35/0x60 [14389.026536] [<ffffffff8164326e>] register_kprobe+0x1e/0x5d0 [14389.032855] [<ffffffffa048d5fb>] stapkp_register_probe+0x10b/0x180 [stap_b83341283b00405072ddbc8efa3988d_54172] [14389.044218] [<ffffffffa048fc01>] _stp_handle_start.constprop.29+0x281/0x5e0 [stap_b83341283b00405072ddbc8efa3988d_54172] [14389.056455] [<ffffffffa0490242>] _stp_ctl_write_cmd+0x2e2/0x570 [stap_b83341283b00405072ddbc8efa3988d_54172] [14389.067526] [<ffffffff811de5cd>] vfs_write+0xbd/0x1e0 [14389.073262] [<ffffffff811df06f>] SyS_write+0x7f/0xe0 [14389.078903] [<ffffffff81645909>] system_call_fastpath+0x16/0x1b ====
It appears that adding a probe with perf on the module initializer works on 3.10.0-327.el7.x86_64: ==== # perf probe --module=./stap_modules_out_of_tree.ko --add=stapmod_init_perf=stap_modules_out_of_tree_init Added new event: probe:stapmod_init_perf (on stap_modules_out_of_tree_init in stap_modules_out_of_tree) You can now use it in all perf tools, such as: perf record -e probe:stapmod_init_perf -aR sleep 1 # perf probe --list probe:stapmod_init_perf (on stap_modules_out_of_tree_init in stap_modules_out_ # perf record --event=probe:stapmod_init_perf -a insmod ./stap_modules_out_of_tree.ko ; rmmod stap_modules_out_of_tree [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.231 MB perf.data (1 samples) ] # perf report --stdio # To display the perf.data header info, please use --header/--header-only option # # Samples: 1 of event 'probe:stapmod_init_perf' # Event count (approx.): 1 # # Overhead Command Shared Object Symbol # ........ ....... ................ ...................... # 100.00% insmod [sysfillrect] [k] 0x0000000000005071 # # (For a higher level overview, try: perf report --sort comm,dso) # ==== So, in theory anyway it ought to be possible for systemtap to do it.
OK, here's why I'm getting a crash when switching to the new code (mentioned in comment #6). The version of module_kallsyms_lookup_name() in 3.10.0-327.el7.x86_64 does this: ==== unsigned long module_kallsyms_lookup_name(const char *name) { struct module *mod; char *colon; unsigned long ret = 0; /* Don't lock: we're in enough trouble already. */ preempt_disable(); if ((colon = strchr(name, ':')) != NULL) { *colon = '\0'; if ((mod = find_module(name)) != NULL) ret = mod_find_symname(mod, colon+1); *colon = ':'; } ... ==== i.e. it actually modifies the memory at 'name'. In systemtap, that symbol name is in initialized data, so you can't write to that memory. In later versions of the kernel, module_kallsyms_lookup_name() is changed to no longer change the input string. perf uses kstrdup() to make a copy of the symbol name, which has the side-effect that you can write to that memory.
It looks like the following kernel commit fixed module_kallsyms_lookup_name(): ==== commit 4f6de4d51f4a3ab06a85e91e708cc89a513ef30c Author: Mathias Krause <minipli@googlemail.com> Date: Tue Jul 2 15:35:11 2013 +0930 module: don't modify argument of module_kallsyms_lookup_name() If we pass a pointer to a const string in the form "module:symbol" module_kallsyms_lookup_name() will try to split the string at the colon, i.e., will try to modify r/o data. That will, in fact, fail on a kernel with enabled CONFIG_DEBUG_RODATA. Avoid modifying the passed string in module_kallsyms_lookup_name(), modify find_module_all() instead to pass it the module name length. ==== This commit first appeared in kernel version 3.11.
Re-re-fixed in commit c788eb1. This commit switches kernels >= 3.10 (all RHEL7 kernels) to the new module refresh code and adds a workaround for the kernel bug mentioned in comment #6 and comment #8. We work around the bug by 2 methods: 1) For kernels that support kallsyms_on_each_symbol(), we use that to convert symbol_name+offset probes into address-based probes when the module is loaded. This should be faster as well (see bug #11514 for details). 2) For kernels that don't support kallsyms_on_each_symbol(), which are ppc64 kernels, we workaround the problem by making a copy of the symbol_name in read-write memory before calling register_kprobe().