Bug 27224 - stap 4.4 'cannot attach to module' in debugfs mode as non-root (WITH WORKAROUND)
Summary: stap 4.4 'cannot attach to module' in debugfs mode as non-root (WITH WORKAROUND)
Status: RESOLVED DUPLICATE of bug 27067
Alias: None
Product: systemtap
Classification: Unclassified
Component: runtime (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-22 04:26 UTC by Craig Ringer
Modified: 2021-01-29 10:43 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2021-01-22 00:00:00


Attachments
Output from various stap runs (440 bytes, text/plain)
2021-01-22 04:26 UTC, Craig Ringer
Details
stap -vvvv -DSTAP_TRANS_DEBUGFS output (1.58 KB, text/plain)
2021-01-22 04:31 UTC, Craig Ringer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Craig Ringer 2021-01-22 04:26:06 UTC
On Fedora 33 with kernel 5.9.16-200.fc33.x86_64 and stap 4.4/0.182, rpm 4.4-2.fc33 I noticed that staprun can't seem to attach to module I/O anymore. It fails with

    ERROR: Cannot attach to module stap_nnn control channel; not running?
    ERROR: 'stap_nnn' is not a zombie systemtap module.

The issue arises with no explicit transport selection or with -DSTAP_TRANS_DEBUGFS explicitly requested.

This has also been reported by a colleague on Debian who is using "Systemtap translator/driver (version 4.4/0.176, Debian version 4.4-1~bpo10+1 (buster-backports))"

Running stap with sudo has no effect.

WORKAROUND
========== 

I have been able to work around the error with:

    sudo mount /sys/kernel/debug -o remount,mode=755

DETAIL
======

I found the above info the ancient Debian bug report https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=706817 from 2013, and it references a 2012 Linux kernel patch http://lkml.iu.edu/hypermail/linux/kernel/1201.3/00626.html . Also https://github.com/torvalds/linux/commit/82aceae4f0d42f03d9ad7d1e90389e731153898f  .

But remounting debugfs solves this issue despite the last relevant changes I can find related to debugfs being 7 or 8 years old.

It might be related to the changes in systemtap intended to work around the kernel lockdown issues and add procfs transport support, per https://bugzilla.redhat.com/show_bug.cgi?id=1873492 in commit 7615cae79 ?

Running with -DSTAP_TRANS_PROCFS output instead works, but only when stap itself is run as root, even though staprun is setuid root. Unclear why, perhaps a separate issue.

I don't see the same problem when running on a self-compiled version of git master @ HEAD = d7ea535c6 but I don't yet know if that's due to differences in configuration and installation for packages vs source builds, or whether it's down to a code change since the 4.4 release. I'll compile 4.4 to see.

Sample output for various runs is attached.
Comment 1 Craig Ringer 2021-01-22 04:26:57 UTC
Created attachment 13143 [details]
Output from various stap runs

Sample output without verbose logging, showing behaviour of various runs, then the change after remounting debugfs
Comment 2 Craig Ringer 2021-01-22 04:31:12 UTC
Created attachment 13144 [details]
stap -vvvv -DSTAP_TRANS_DEBUGFS output

Output from

  $ sudo mount /sys/kernel/debug -o remount,mode=700

  $ stap -vvvv -DSTAP_TRANS_DEBUGFS -e 'probe begin { printf("started\n"); exit(); }'

on stap 4.4 from systemtap-4.4-2.fc33.x86_64 

I filtered it with 

  egrep -v '^(Skipping tapset|Processing)'

to reduce irrelevant spam.
Comment 3 Craig Ringer 2021-01-22 04:42:36 UTC
OK, bit more digging done.

If I build the module with

    $stap -m test_transport -p4 -DSTAP_TRANS_DEBUGFS \
          -e 'probe begin { printf("started\n"); exit(); }'

then on default mount mode:

    $ sudo mount /sys/kernel/debug -o remount,mode=700

running staprun as non-root fails:

    $ staprun -R -v  test_transport.ko
    staprun:insert_module:191 Module test_tr_380170 inserted from file /home/craig/projects/2Q/systemtap/test_transport.ko
    ERROR: Cannot attach to module test_tr_380170 control channel; not running?
    ERROR: Cannot attach to module test_tr_380170 control channel; not running?
    ERROR: 'test_tr_380170' is not a zombie systemtap module.

but as root works:

    $ sudo staprun -R -v  test_transport.ko
    staprun:insert_module:191 Module test_tr_380194 inserted from file /home/craig/projects/2Q/systemtap/test_transport.ko
started
    stapio:cleanup_and_exit:536 detach=0
    stapio:cleanup_and_exit:553 closing control channel
    staprun:remove_module:284 Module test_tr_380194 removed.

And I see the same behaviour from running stap itself:

    $ stap -vp 00005 -DSTAP_TRANS_DEBUGFS -e 'probe begin { printf("started\n"); exit(); }' 
    Pass 1: parsed user script and 496 library scripts using 337684virt/95784res/12788shr/82644data kb, in 140usr/30sys/176real ms.
    Pass 2: analyzed script: 1 probe, 1 function, 0 embeds, 0 globals using 339268virt/97432res/12860shr/84228data kb, in 10usr/0sys/6real ms.
    Pass 3: using cached /home/craig/.systemtap/cache/67/stap_67a78d650b7aca78a67c0155dbf23b64_1010.c
    Pass 4: using cached /home/craig/.systemtap/cache/67/stap_67a78d650b7aca78a67c0155dbf23b64_1010.ko
    Pass 5: starting run.
    ERROR: Cannot attach to module stap_67a78d650b7aca78a67c0155dbf23b_380255 control channel; not running?
    ERROR: Cannot attach to module stap_67a78d650b7aca78a67c0155dbf23b_380255 control channel; not running?
    ERROR: 'stap_67a78d650b7aca78a67c0155dbf23b_380255' is not a zombie systemtap module.
    WARNING: /usr/bin/staprun exited with status: 1
    Pass 5: run completed in 0usr/0sys/3real ms.
    Pass 5: run failed.  [man error::pass5]

    $ sudo stap -vp 00005 -DSTAP_TRANS_DEBUGFS -e 'probe begin { printf("started\n"); exit(); }' 
    Pass 1: parsed user script and 496 library scripts using 330692virt/95728res/12740shr/82480data kb, in 150usr/20sys/172real ms.
    Pass 2: analyzed script: 1 probe, 1 function, 0 embeds, 0 globals using 332276virt/97376res/12812shr/84064data kb, in 10usr/0sys/6real ms.
    Pass 3: translated to C into "/tmp/stapDRx9sx/stap_df533545ae591f1f9eca48caaf372255_1007_src.c" using 332408virt/97376res/12812shr/84196data kb, in 0usr/0sys/0real ms.
    Pass 4: compiled C into "stap_df533545ae591f1f9eca48caaf372255_1007.ko" in 1800usr/620sys/2244real ms.
    Pass 5: starting run.
    started
    Pass 5: run completed in 10usr/30sys/377real ms.

Repeating the same with debugfs mode 755, I see that the non-root runs now work too:

    $ sudo mount /sys/kernel/debug -o remount,mode=700

    $ staprun -R -v  test_transport.ko
    staprun:insert_module:191 Module test_tr_381074 inserted from file /home/craig/projects/2Q/systemtap/test_transport.ko
    started
    stapio:cleanup_and_exit:536 detach=0
    stapio:cleanup_and_exit:553 closing control channel
    staprun:remove_module:284 Module test_tr_381074 removed.

    $ sudo stap -vp 00005 -DSTAP_TRANS_DEBUGFS -e 'probe begin { printf("started\n"); exit(); }'
    Pass 1: parsed user script and 496 library scripts using 330692virt/95384res/12400shr/82480data kb, in 150usr/10sys/172real ms.
    Pass 2: analyzed script: 1 probe, 1 function, 0 embeds, 0 globals using 332276virt/97032res/12472shr/84064data kb, in 10usr/0sys/5real ms.
    Pass 3: using cached /root/.systemtap/cache/df/stap_df533545ae591f1f9eca48caaf372255_1007.c
    Pass 4: using cached /root/.systemtap/cache/df/stap_df533545ae591f1f9eca48caaf372255_1007.ko
    Pass 5: starting run.
    started
    Pass 5: run completed in 0usr/20sys/377real ms.
Comment 4 Frank Ch. Eigler 2021-01-22 22:30:39 UTC
I believe you are looking for commit e3d03db82853049f .
Comment 5 Frank Ch. Eigler 2021-01-22 22:31:09 UTC
bug #27067
Comment 6 Craig Ringer 2021-01-29 10:43:18 UTC
That's

```
commit e3d03db82
Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Sun Dec 13 21:05:23 2020 -0500

    PR23512: fix staprun/stapio operation via less-than-root privileges
    
    Commit 7615cae790c899bc8a82841c75c8ea9c6fa54df3 for PR26665 introduced
    a regression in handling stapusr/stapdev/stapsys gid invocation of
    staprun/stapio.  This patch simplifies the relevant code in
    staprun/ctl.c, init_ctl_channel(), to rely on openat/etc. to populate
    and use the relay_basedir_fd as much as possible.  Also, we now avoid
    unnecessary use of access(), which was checking against the wrong
    (real rather than effective) uid/gid.
```

and sounds about right.

Fine to close this. Workaround is documented by this bug now.

*** This bug has been marked as a duplicate of bug 27067 ***