On Fedora 33 with kernel 5.9.16-200.fc33.x86_64 and stap 4.4/0.182, rpm 4.4-2.fc33 I noticed that staprun can't seem to attach to module I/O anymore. It fails with ERROR: Cannot attach to module stap_nnn control channel; not running? ERROR: 'stap_nnn' is not a zombie systemtap module. The issue arises with no explicit transport selection or with -DSTAP_TRANS_DEBUGFS explicitly requested. This has also been reported by a colleague on Debian who is using "Systemtap translator/driver (version 4.4/0.176, Debian version 4.4-1~bpo10+1 (buster-backports))" Running stap with sudo has no effect. WORKAROUND ========== I have been able to work around the error with: sudo mount /sys/kernel/debug -o remount,mode=755 DETAIL ====== I found the above info the ancient Debian bug report https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=706817 from 2013, and it references a 2012 Linux kernel patch http://lkml.iu.edu/hypermail/linux/kernel/1201.3/00626.html . Also https://github.com/torvalds/linux/commit/82aceae4f0d42f03d9ad7d1e90389e731153898f . But remounting debugfs solves this issue despite the last relevant changes I can find related to debugfs being 7 or 8 years old. It might be related to the changes in systemtap intended to work around the kernel lockdown issues and add procfs transport support, per https://bugzilla.redhat.com/show_bug.cgi?id=1873492 in commit 7615cae79 ? Running with -DSTAP_TRANS_PROCFS output instead works, but only when stap itself is run as root, even though staprun is setuid root. Unclear why, perhaps a separate issue. I don't see the same problem when running on a self-compiled version of git master @ HEAD = d7ea535c6 but I don't yet know if that's due to differences in configuration and installation for packages vs source builds, or whether it's down to a code change since the 4.4 release. I'll compile 4.4 to see. Sample output for various runs is attached.
Created attachment 13143 [details] Output from various stap runs Sample output without verbose logging, showing behaviour of various runs, then the change after remounting debugfs
Created attachment 13144 [details] stap -vvvv -DSTAP_TRANS_DEBUGFS output Output from $ sudo mount /sys/kernel/debug -o remount,mode=700 $ stap -vvvv -DSTAP_TRANS_DEBUGFS -e 'probe begin { printf("started\n"); exit(); }' on stap 4.4 from systemtap-4.4-2.fc33.x86_64 I filtered it with egrep -v '^(Skipping tapset|Processing)' to reduce irrelevant spam.
OK, bit more digging done. If I build the module with $stap -m test_transport -p4 -DSTAP_TRANS_DEBUGFS \ -e 'probe begin { printf("started\n"); exit(); }' then on default mount mode: $ sudo mount /sys/kernel/debug -o remount,mode=700 running staprun as non-root fails: $ staprun -R -v test_transport.ko staprun:insert_module:191 Module test_tr_380170 inserted from file /home/craig/projects/2Q/systemtap/test_transport.ko ERROR: Cannot attach to module test_tr_380170 control channel; not running? ERROR: Cannot attach to module test_tr_380170 control channel; not running? ERROR: 'test_tr_380170' is not a zombie systemtap module. but as root works: $ sudo staprun -R -v test_transport.ko staprun:insert_module:191 Module test_tr_380194 inserted from file /home/craig/projects/2Q/systemtap/test_transport.ko started stapio:cleanup_and_exit:536 detach=0 stapio:cleanup_and_exit:553 closing control channel staprun:remove_module:284 Module test_tr_380194 removed. And I see the same behaviour from running stap itself: $ stap -vp 00005 -DSTAP_TRANS_DEBUGFS -e 'probe begin { printf("started\n"); exit(); }' Pass 1: parsed user script and 496 library scripts using 337684virt/95784res/12788shr/82644data kb, in 140usr/30sys/176real ms. Pass 2: analyzed script: 1 probe, 1 function, 0 embeds, 0 globals using 339268virt/97432res/12860shr/84228data kb, in 10usr/0sys/6real ms. Pass 3: using cached /home/craig/.systemtap/cache/67/stap_67a78d650b7aca78a67c0155dbf23b64_1010.c Pass 4: using cached /home/craig/.systemtap/cache/67/stap_67a78d650b7aca78a67c0155dbf23b64_1010.ko Pass 5: starting run. ERROR: Cannot attach to module stap_67a78d650b7aca78a67c0155dbf23b_380255 control channel; not running? ERROR: Cannot attach to module stap_67a78d650b7aca78a67c0155dbf23b_380255 control channel; not running? ERROR: 'stap_67a78d650b7aca78a67c0155dbf23b_380255' is not a zombie systemtap module. WARNING: /usr/bin/staprun exited with status: 1 Pass 5: run completed in 0usr/0sys/3real ms. Pass 5: run failed. [man error::pass5] $ sudo stap -vp 00005 -DSTAP_TRANS_DEBUGFS -e 'probe begin { printf("started\n"); exit(); }' Pass 1: parsed user script and 496 library scripts using 330692virt/95728res/12740shr/82480data kb, in 150usr/20sys/172real ms. Pass 2: analyzed script: 1 probe, 1 function, 0 embeds, 0 globals using 332276virt/97376res/12812shr/84064data kb, in 10usr/0sys/6real ms. Pass 3: translated to C into "/tmp/stapDRx9sx/stap_df533545ae591f1f9eca48caaf372255_1007_src.c" using 332408virt/97376res/12812shr/84196data kb, in 0usr/0sys/0real ms. Pass 4: compiled C into "stap_df533545ae591f1f9eca48caaf372255_1007.ko" in 1800usr/620sys/2244real ms. Pass 5: starting run. started Pass 5: run completed in 10usr/30sys/377real ms. Repeating the same with debugfs mode 755, I see that the non-root runs now work too: $ sudo mount /sys/kernel/debug -o remount,mode=700 $ staprun -R -v test_transport.ko staprun:insert_module:191 Module test_tr_381074 inserted from file /home/craig/projects/2Q/systemtap/test_transport.ko started stapio:cleanup_and_exit:536 detach=0 stapio:cleanup_and_exit:553 closing control channel staprun:remove_module:284 Module test_tr_381074 removed. $ sudo stap -vp 00005 -DSTAP_TRANS_DEBUGFS -e 'probe begin { printf("started\n"); exit(); }' Pass 1: parsed user script and 496 library scripts using 330692virt/95384res/12400shr/82480data kb, in 150usr/10sys/172real ms. Pass 2: analyzed script: 1 probe, 1 function, 0 embeds, 0 globals using 332276virt/97032res/12472shr/84064data kb, in 10usr/0sys/5real ms. Pass 3: using cached /root/.systemtap/cache/df/stap_df533545ae591f1f9eca48caaf372255_1007.c Pass 4: using cached /root/.systemtap/cache/df/stap_df533545ae591f1f9eca48caaf372255_1007.ko Pass 5: starting run. started Pass 5: run completed in 0usr/20sys/377real ms.
I believe you are looking for commit e3d03db82853049f .
bug #27067
That's ``` commit e3d03db82 Author: Frank Ch. Eigler <fche@redhat.com> Date: Sun Dec 13 21:05:23 2020 -0500 PR23512: fix staprun/stapio operation via less-than-root privileges Commit 7615cae790c899bc8a82841c75c8ea9c6fa54df3 for PR26665 introduced a regression in handling stapusr/stapdev/stapsys gid invocation of staprun/stapio. This patch simplifies the relevant code in staprun/ctl.c, init_ctl_channel(), to rely on openat/etc. to populate and use the relay_basedir_fd as much as possible. Also, we now avoid unnecessary use of access(), which was checking against the wrong (real rather than effective) uid/gid. ``` and sounds about right. Fine to close this. Workaround is documented by this bug now. *** This bug has been marked as a duplicate of bug 27067 ***