systemtap's pass 2 (and a bit of pass 3 - the -d option) involves processing elf/dwarf files via elfutils. While these files tend to be trusted & trustworthy, it may be comforting from a security perspective if systemtap attempts to shed privileges while dealing with them. Similarly, invoking the subordinate compilers should be done with a reduced privilege if possible. Some complications: - pass 2 may well involve elfutils-triggered debuginfod downloads, ergo client cache writes - pass 3 may want to read /proc/kallsyms, which the kernel may obfuscate unless read as root - filesystem permissions for the target binaries may require root to open (but not to process) - pass 5 will generally require full privileges, so simply dropping privs early altogether is not possible The smallest practical step that may give some incremental protection could be to add some setreuid(2) calls to temporarily swap between root and an unprivileged uid around certain processing steps.
Created attachment 15393 [details] working proof of concept This is a bare bones proof of concept. The idea is to: - Start running under root. - Presume unprivileged userid 1000 exists. - Create /root/.systemtap and chown it to the unprivileged user. - Create /tmp/stapXXXXXX and chown it to the unprivileged user. - Split pass_5 in two. - Pass_5_1 prepares the staprun comdline and saves it to s->tmpdir + "/staprun_args". Runs unprivileged. - Pass_5_2 spawns staprun under root using saved staprun cmdline. Runs under root privileges Here's how it works for me: > [root@rawh build]# stap -v -e 'probe begin {log("hey!") exit()}' > XXX Parent started waiting for the child ... > XXX parent pid=368908, uid=0, euid=0 > XXX Child started running the unprivileged part ... > XXX Child can't setreuid() back to root, good. > XXX Child pid=368909, uid=1000, euid=1000 > Pass 1: parsed user script and 484 library scripts using 126156virt/95180res/4352shr/90480data kb, in 170usr/50sys/226real ms. > Pass 2: analyzed script: 1 probe, 2 functions, 0 embeds, 0 globals using 127608virt/97484res/5248shr/91932data kb, in 10usr/0sys/8real ms. > Pass 3: translated to C into "/tmp/staps6Eb49/stap_750a6cb0bd1e77fdf34e841f8ddc6811_1358_src.c" using 127608virt/98252res/5888shr/91932data kb, in 0usr/0sys/0real ms. > Pass 4: compiled C into "stap_750a6cb0bd1e77fdf34e841f8ddc6811_1358.ko" in 27920usr/4700sys/18108real ms. > XXX Child finished running the unprivileged part, tmpdir is /tmp/staps6Eb49 > XXX Parent finished waiting for the child. > XXX Parent about to execute staprun, tmpdir is /tmp/staps6Eb49 > Pass 5: starting run. > hey! > Pass 5: run completed in 30usr/40sys/715real ms. > [root@rawh build]#
Created attachment 15438 [details] WIP patch Right now I'm working with the attached patch. It, in particular, provides the --insecure command line switch. It essentially allows for a fallback to the original behavior where "everything" runs under root. The patch shows the expected kallsyms and /proc problems (--insecure works these around). Test results generated on s390x: 1) Baseline (without patch): === systemtap Summary === # of expected passes 8529 # of unexpected failures 1535 # of unexpected successes 5 # of expected failures 353 # of unknown successes 4 # of known failures 112 # of untested testcases 919 # of unsupported tests 34 2) Test results for the patched code: === systemtap Summary === # of expected passes 7943 # of unexpected failures 1417 # of unexpected successes 4 # of expected failures 352 # of unknown successes 5 # of known failures 111 # of unresolved testcases 1 # of untested testcases 926 # of unsupported tests 10 These test results look relatively good already. Most of the regressions shown above can be worked around using the provided --insecure switch. But not all: There are some regressions - looking into those.
Created attachment 15448 [details] WIP patch
Created attachment 15456 [details] possible patch Possible patch including new user/group creation in the specfile.
(In reply to Martin Cermak from comment #4) > Possible patch including new user/group creation in the specfile. Pushed it to a branch: https://sourceware.org/git/?p=systemtap.git;a=shortlog;h=refs/heads/mcermak-0
(In reply to Martin Cermak from comment #5) > (In reply to Martin Cermak from comment #4) > > Possible patch including new user/group creation in the specfile. > > Pushed it to a branch: https://sourceware.org/git/?p=systemtap.git;a=shortlog;h=refs/heads/mcermak-0
Apologies, not sure why the URL breaks. The branch name is mcermak-pr30321 .
Created attachment 15473 [details] test results I've been regression-testing baseline commit 83ea7cbc0 "Update nfsd.stp tapset to work with Linux 6.1 and newer kernels" against PR30321 commit 5c185332a "Move setting the user and group to a new function run_unprivileged()" on aarch64 and x86_64. Testing was executed on rhel-9.5. The test results are as follows: aarch64 baseleine: === systemtap Summary === # of expected passes 9845 # of unexpected failures 1442 # of unexpected successes 10 # of expected failures 706 # of unknown successes 10 # of known failures 299 # of unresolved testcases 1 # of untested testcases 807 # of unsupported tests 37 aarch64 patched: === systemtap Summary === # of expected passes 9248 # of unexpected failures 1400 # of unexpected successes 8 # of expected failures 705 # of unknown successes 7 # of known failures 307 # of unresolved testcases 1 # of untested testcases 811 # of unsupported tests 4 x86_64 baseline: === systemtap Summary === # of expected passes 10205 # of unexpected failures 1517 # of unexpected successes 10 # of expected failures 710 # of unknown successes 9 # of known failures 318 # of untested testcases 805 # of unsupported tests 16 x86_64 patched: === systemtap Summary === # of expected passes 9322 # of unexpected failures 1298 # of unexpected successes 8 # of expected failures 709 # of unknown successes 7 # of known failures 323 # of unresolved testcases 2 # of untested testcases 811 # of unsupported tests 4 The patch test results have slightly less expected passes mainly because the syscall test module failed to build. This should be easy to fix on the testcase side, will look into it. There are also other tests that I'm going to look into such as cache.exp, listing_mode.exp, or unprivileged_embedded_C.exp. But in general the test results look pretty good to me. More tests are underway. I plan provide updates early next week.
commit be7e131bdedb20ad690fdc83a52d74041abd0e54 (HEAD -> master, origin/master, origin/HEAD) Author: Martin Cermak <mcermak@redhat.com> Date: Fri Apr 26 17:09:45 2024 +0200 PR30321 Privilege separation if invoked as root Provide new command line switch 'stap --build-as' that allows for running passes 1-4 under an unprivileged user. In case this switch is specified, systemtap forks and runs passes 1-4 under the specified user. At the RPM install time a new user 'stapunpriv' is created, and can be used with 'stap --build-as=stapunpriv'. If '--build-as' isn't specified, systemtap behaves the traditional way, no forking happens. This commit is a preparatory step. Further work is supposed to happen so that the privilege separation brings a true improvement from the security perspective.