This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: glibc in master is incompatible with systemd-nspawn
- From: Rich Felker <dalias at libc dot org>
- To: Florian Weimer <fweimer at redhat dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Fri, 8 Nov 2019 15:20:26 -0500
- Subject: Re: glibc in master is incompatible with systemd-nspawn
- References: <87k18a62xe.fsf@oldenburg2.str.redhat.com>
On Fri, Nov 08, 2019 at 12:08:29PM +0100, Florian Weimer wrote:
> systemd-nspawn (at least the Fedora version) applies a system call
> filter which causes system calls to fail with EPERM instead of ENOSYS.
> This breaks our fallback handling. This problem has existed for a long
> time, but it has become more prevalent with the recent time64 changes.
>
> I think this is a systemd-nspawn bug.
Completely agree that causing the syscalls to fail with EPERM is a
bug. EPERM is somewhat defensible in cases where the system call being
blocked involves performing a privileged operation that a caller might
or might not be allowed to do, and the seccomp filter is further
disallowing it (beyond the normal access controls implied by
euid/egid/caps). But it would be erroneous to treat EPERM as a
situation calling for fallback with different syscalls; doing so would
likely make some policy decisions that need to be atomic non-atomic.
However I'm not sure that ENOSYS is an entirely defensible behavior in
all cases either. Surely for any syscall added in recent history
(especially after the glibc compiled-in minimum kernel version),
ENOSYS is an acceptable behavior. But there are a lot of syscalls that
should not be able to fail at all, and for which there is no
reasonable way to make forward progress if they fail. I'm not sure
glibc checks for all these; I'm pretty certain we don't in musl. For
example, failure of some of the following could be dangerous:
- SYS_exit
- SYS_rt_setprocmask
- SYS_set_tid_address
- SYS_set_thread_area (or equivalent)
- SYS_get*id
- SYS_close
- SYS_fcntl (with certain commands)
The whole seccomp framework of blocking syscalls rather than blocking
kernel-state-affecting actions at a higher semantic level was rather
poorly thought-out and inherently leads to this kind of problem, and
in some sense it's not fixable, but it might make sense for tools like
systemd-nspawn to have a list of syscalls they refuse to block without
a "force" option, or to insist on blocking them by terminating the
process rather than by failure.
Rich