Bug 30321 - apply privilege separation for passes 2/3/4, esp. if invoked as root
Summary: apply privilege separation for passes 2/3/4, esp. if invoked as root
Status: ASSIGNED
Alias: None
Product: systemtap
Classification: Unclassified
Component: translator (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Martin Cermak
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-04-07 01:18 UTC by Frank Ch. Eigler
Modified: 2024-04-26 15:30 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
working proof of concept (3.38 KB, patch)
2024-03-08 14:46 UTC, Martin Cermak
Details | Diff
WIP patch (5.07 KB, patch)
2024-03-27 13:57 UTC, Martin Cermak
Details | Diff
WIP patch (6.90 KB, patch)
2024-04-02 21:08 UTC, Martin Cermak
Details | Diff
possible patch (8.80 KB, patch)
2024-04-10 08:41 UTC, Martin Cermak
Details | Diff
test results (400.79 KB, application/x-xz)
2024-04-19 12:47 UTC, Martin Cermak
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Ch. Eigler 2023-04-07 01:18:34 UTC
systemtap's pass 2 (and a bit of pass 3 - the -d option) involves processing elf/dwarf files via elfutils.  While these files tend to be trusted & trustworthy, it may be comforting from a security perspective if systemtap attempts to shed privileges while dealing with them.  Similarly, invoking the subordinate compilers should be done with a reduced privilege if possible.

Some complications:

- pass 2 may well involve elfutils-triggered debuginfod downloads,
  ergo client cache writes

- pass 3 may want to read /proc/kallsyms, which the kernel may obfuscate
  unless read as root

- filesystem permissions for the target binaries may require root to open
  (but not to process)

- pass 5 will generally require full privileges, so simply dropping privs
  early altogether is not possible


The smallest practical step that may give some incremental protection
could be to add some setreuid(2) calls to temporarily swap between root
and an unprivileged uid around certain processing steps.
Comment 1 Martin Cermak 2024-03-08 14:46:19 UTC
Created attachment 15393 [details]
working proof of concept

This is a bare bones proof of concept. The idea is to:

- Start running under root.
- Presume unprivileged userid 1000 exists.
- Create /root/.systemtap and chown it to the unprivileged user.
- Create /tmp/stapXXXXXX and chown it to the unprivileged user.
- Split pass_5 in two.
  - Pass_5_1 prepares the staprun comdline and saves it to
    s->tmpdir + "/staprun_args". Runs unprivileged.
  - Pass_5_2 spawns staprun under root using saved staprun
    cmdline.  Runs under root privileges

Here's how it works for me:

> [root@rawh build]#  stap -v -e 'probe begin {log("hey!") exit()}' 
> XXX Parent started waiting for the child ...
> XXX parent pid=368908, uid=0, euid=0
> XXX Child started running the unprivileged part ...
> XXX Child can't setreuid() back to root, good.
> XXX Child pid=368909, uid=1000, euid=1000
> Pass 1: parsed user script and 484 library scripts using 126156virt/95180res/4352shr/90480data kb, in 170usr/50sys/226real ms.
> Pass 2: analyzed script: 1 probe, 2 functions, 0 embeds, 0 globals using 127608virt/97484res/5248shr/91932data kb, in 10usr/0sys/8real ms.
> Pass 3: translated to C into "/tmp/staps6Eb49/stap_750a6cb0bd1e77fdf34e841f8ddc6811_1358_src.c" using 127608virt/98252res/5888shr/91932data kb, in 0usr/0sys/0real ms.
> Pass 4: compiled C into "stap_750a6cb0bd1e77fdf34e841f8ddc6811_1358.ko" in 27920usr/4700sys/18108real ms.
> XXX Child finished running the unprivileged part, tmpdir is /tmp/staps6Eb49
> XXX Parent finished waiting for the child.
> XXX Parent about to execute staprun, tmpdir is /tmp/staps6Eb49
> Pass 5: starting run.
> hey!
> Pass 5: run completed in 30usr/40sys/715real ms.
> [root@rawh build]#
Comment 2 Martin Cermak 2024-03-27 13:57:24 UTC
Created attachment 15438 [details]
WIP patch

Right now I'm working with the attached patch.  It, in particular, provides the --insecure command line switch. It essentially allows for a fallback to the original behavior where "everything" runs under root.  The patch shows the expected kallsyms and /proc problems (--insecure works these around). Test results generated on s390x:


1) Baseline (without patch):

          === systemtap Summary ===                                                
                                                                                   
# of expected passes              8529                                             
# of unexpected failures  1535                                                     
# of unexpected successes 5                                                        
# of expected failures            353                                              
# of unknown successes            4                                                
# of known failures               112                                              
# of untested testcases           919                                              
# of unsupported tests            34                                               



2) Test results for the patched code:
                                                                                   
          === systemtap Summary ===                                                
                                                                                   
# of expected passes              7943                                             
# of unexpected failures  1417                                                     
# of unexpected successes 4                                                        
# of expected failures            352                                              
# of unknown successes            5                                                
# of known failures               111                                              
# of unresolved testcases 1                                                        
# of untested testcases           926                                              
# of unsupported tests            10



These test results look relatively good already.  Most of the regressions shown above can be worked around using the provided --insecure switch.  But not all:  There are some regressions - looking into those.
Comment 3 Martin Cermak 2024-04-02 21:08:36 UTC
Created attachment 15448 [details]
WIP patch
Comment 4 Martin Cermak 2024-04-10 08:41:10 UTC
Created attachment 15456 [details]
possible patch

Possible patch including new user/group creation in the specfile.
Comment 5 Martin Cermak 2024-04-10 15:49:17 UTC
(In reply to Martin Cermak from comment #4)
> Possible patch including new user/group creation in the specfile.

Pushed it to a branch:
https://sourceware.org/git/?p=systemtap.git;a=shortlog;h=refs/heads/mcermak-﷒0
Comment 6 Martin Cermak 2024-04-10 15:50:23 UTC
(In reply to Martin Cermak from comment #5)
> (In reply to Martin Cermak from comment #4)
> > Possible patch including new user/group creation in the specfile.
> 
> Pushed it to a branch:
https://sourceware.org/git/?p=systemtap.git;a=shortlog;h=refs/heads/mcermak-﷒0
Comment 7 Martin Cermak 2024-04-10 15:53:50 UTC
Apologies, not sure why the URL breaks.  The branch name is mcermak-pr30321 .
Comment 8 Martin Cermak 2024-04-19 12:47:01 UTC
Created attachment 15473 [details]
test results

I've been regression-testing baseline commit 83ea7cbc0 "Update nfsd.stp tapset to work with Linux 6.1 and newer kernels" against PR30321 commit 5c185332a "Move setting the user and group to a new function run_unprivileged()" on aarch64 and x86_64. Testing was executed on rhel-9.5. The test results are as follows:

aarch64 baseleine:

                === systemtap Summary ===

# of expected passes            9845
# of unexpected failures        1442
# of unexpected successes       10
# of expected failures          706
# of unknown successes          10
# of known failures             299
# of unresolved testcases       1
# of untested testcases         807
# of unsupported tests          37

aarch64 patched:

                === systemtap Summary ===

# of expected passes            9248
# of unexpected failures        1400
# of unexpected successes       8
# of expected failures          705
# of unknown successes          7
# of known failures             307
# of unresolved testcases       1
# of untested testcases         811
# of unsupported tests          4

x86_64 baseline:

                === systemtap Summary ===

# of expected passes            10205
# of unexpected failures        1517
# of unexpected successes       10
# of expected failures          710
# of unknown successes          9
# of known failures             318
# of untested testcases         805
# of unsupported tests          16

x86_64 patched:

                === systemtap Summary ===

# of expected passes            9322
# of unexpected failures        1298
# of unexpected successes       8
# of expected failures          709
# of unknown successes          7
# of known failures             323
# of unresolved testcases       2
# of untested testcases         811
# of unsupported tests          4


The patch test results have slightly less expected passes mainly because the syscall test module failed to build.  This should be easy to fix on the testcase side, will look into it. There are also other tests that I'm going to look into such as cache.exp, listing_mode.exp, or unprivileged_embedded_C.exp. But in general the test results look pretty good to me.  More tests are underway. I plan provide updates early next week.
Comment 9 Martin Cermak 2024-04-26 15:30:31 UTC
commit be7e131bdedb20ad690fdc83a52d74041abd0e54 (HEAD -> master, origin/master, origin/HEAD)
Author: Martin Cermak <mcermak@redhat.com>
Date:   Fri Apr 26 17:09:45 2024 +0200

    PR30321 Privilege separation if invoked as root
    
    Provide new command line switch 'stap --build-as' that allows for
    running passes 1-4 under an unprivileged user.  In case this switch
    is specified, systemtap forks and runs passes 1-4 under the specified
    user.  At the RPM install time a new user 'stapunpriv' is created, and
    can be used with 'stap --build-as=stapunpriv'. If '--build-as' isn't
    specified, systemtap behaves the traditional way, no forking happens.
    
    This commit is a preparatory step.  Further work is supposed to happen
    so that the privilege separation brings a true improvement from the
    security perspective.