[RFC PATCH] Linux: Add seccomp probing to faccessat2

Florian Weimer fweimer@redhat.com
Tue Nov 24 13:16:45 GMT 2020


Some container runtimes cause faccessat2 to fail unconditionally
with EPERM.  Since it is conceivable that the real faccessat2
implementation can return EPERM (e.g., triggered by a Linux
Security Module), unconditional fallback to the incorrect workaround
on EPERM seems wrong.  Instead, a probing sequence attempts to
figure out whether the error comes from a seccomp filter or the
kernel.

Related kernel discussion:

<https://lore.kernel.org/linux-api/87lfer2c0b.fsf@oldenburg2.str.redhat.com/>

Fixes commit 3d3ab573a5f3071992cbc4f57d50d1d29d55bde2
("Linux: Use faccessat2 to implement faccessat (bug 18683)").


This is not a real submission, I just want to show how this would like
in glibc.  I haven't actually tested it.  As I said on the kernel
thread, I'd like to see some reluctant support from kernel developers
before we go in this direction.

---
 sysdeps/unix/sysv/linux/faccessat.c | 39 +++++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/faccessat.c b/sysdeps/unix/sysv/linux/faccessat.c
index 5d078371b5..e39c046472 100644
--- a/sysdeps/unix/sysv/linux/faccessat.c
+++ b/sysdeps/unix/sysv/linux/faccessat.c
@@ -17,26 +17,53 @@
    <https://www.gnu.org/licenses/>.  */
 
 #include <fcntl.h>
-#include <unistd.h>
-#include <sys/types.h>
+#include <stdbool.h>
 #include <sys/stat.h>
+#include <sys/types.h>
 #include <sysdep.h>
+#include <unistd.h>
 
+#if !__ASSUME_FACCESSAT2
+/* Used to make sure that an EPERM error came from the kernel and not
+   a system call filter.  */
+static bool
+check_for_eperm (int fd, const char *file, int mode, int flag)
+{
+  int ret = INTERNAL_SYSCALL_CALL (faccessat2, fd, file, mode, flag);
+  return (INTERNAL_SYSCALL_ERROR_P (ret)
+	  && INTERNAL_SYSCALL_ERRNO (ret) == EPERM);
+}
+#endif
 
 int
 faccessat (int fd, const char *file, int mode, int flag)
 {
-  int ret = INLINE_SYSCALL_CALL (faccessat2, fd, file, mode, flag);
 #if __ASSUME_FACCESSAT2
-  return ret;
+  return INLINE_SYSCALL_CALL (faccessat2, fd, file, mode, flag);
 #else
-  if (ret == 0 || errno != ENOSYS)
+  /* Prefer the old system call if no flags are specified, to avoid
+     any complex fallback in that case.  */
+  if (flag == 0)
+    return INLINE_SYSCALL (faccessat, 3, fd, file, mode);
+
+  int ret = INLINE_SYSCALL_CALL (faccessat2, fd, file, mode, flag);
+  if (ret == 0 || (errno != ENOSYS && errno != EPERM))
+    return ret;
+
+  /* Workaround for seccomp filters and Linux containers: Check that
+     the EPERM system call is real by probing for known error
+     conditions.  If either probe does not fail with EPERM, it
+     suggests that there is no seccomp filter in place, and the
+     initial EPERM error came from the kernel.  */
+  if (errno == EPERM
+      && (!check_for_eperm (-1, "", 0, 0) /* EBADFD expected.  */
+	  || !check_for_eperm (fd, NULL, 0, 0))) /* EFAULT expected.  */
     return ret;
 
   if (flag & ~(AT_SYMLINK_NOFOLLOW | AT_EACCESS))
     return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
 
-  if ((flag == 0 || ((flag & ~AT_EACCESS) == 0 && ! __libc_enable_secure)))
+  if ((flag & ~AT_EACCESS) == 0 && ! __libc_enable_secure)
     return INLINE_SYSCALL (faccessat, 3, fd, file, mode);
 
   struct stat64 stats;

-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill



More information about the Libc-alpha mailing list