This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[PATCH v15 6/9] namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
- From: Aleksa Sarai <cyphar at cyphar dot com>
- To: Al Viro <viro at zeniv dot linux dot org dot uk>, Jeff Layton <jlayton at kernel dot org>, "J. Bruce Fields" <bfields at fieldses dot org>, Arnd Bergmann <arnd at arndb dot de>, David Howells <dhowells at redhat dot com>, Shuah Khan <shuah at kernel dot org>, Shuah Khan <skhan at linuxfoundation dot org>, Ingo Molnar <mingo at redhat dot com>, Peter Zijlstra <peterz at infradead dot org>
- Cc: Aleksa Sarai <cyphar at cyphar dot com>, Christian Brauner <christian dot brauner at ubuntu dot com>, Jann Horn <jannh at google dot com>, Linus Torvalds <torvalds at linux-foundation dot org>, Eric Biederman <ebiederm at xmission dot com>, Andy Lutomirski <luto at kernel dot org>, Andrew Morton <akpm at linux-foundation dot org>, Alexei Starovoitov <ast at kernel dot org>, Kees Cook <keescook at chromium dot org>, Tycho Andersen <tycho at tycho dot ws>, David Drysdale <drysdale at google dot com>, Chanho Min <chanho dot min at lge dot com>, Oleg Nesterov <oleg at redhat dot com>, Rasmus Villemoes <linux at rasmusvillemoes dot dk>, Alexander Shishkin <alexander dot shishkin at linux dot intel dot com>, Jiri Olsa <jolsa at redhat dot com>, Namhyung Kim <namhyung at kernel dot org>, Christian Brauner <christian at brauner dot io>, Aleksa Sarai <asarai at suse dot de>, containers at lists dot linux-foundation dot org, linux-alpha at vger dot kernel dot org, linux-api at vger dot kernel dot org, libc-alpha at sourceware dot org, linux-arch at vger dot kernel dot org, linux-arm-kernel at lists dot infradead dot org, linux-fsdevel at vger dot kernel dot org, linux-ia64 at vger dot kernel dot org, linux-kernel at vger dot kernel dot org, linux-kselftest at vger dot kernel dot org, linux-m68k at lists dot linux-m68k dot org, linux-mips at vger dot kernel dot org, linux-parisc at vger dot kernel dot org, linuxppc-dev at lists dot ozlabs dot org, linux-s390 at vger dot kernel dot org, linux-sh at vger dot kernel dot org, linux-xtensa at linux-xtensa dot org, sparclinux at vger dot kernel dot org
- Date: Tue, 5 Nov 2019 20:05:50 +1100
- Subject: [PATCH v15 6/9] namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
- References: <20191105090553.6350-1-cyphar@cyphar.com>
Allow LOOKUP_BENEATH and LOOKUP_IN_ROOT to safely permit ".." resolution
(in the case of LOOKUP_BENEATH the resolution will still fail if ".."
resolution would resolve a path outside of the root -- while
LOOKUP_IN_ROOT will chroot(2)-style scope it). Magic-link jumps are
still disallowed entirely[*].
As Jann explains[1,2], the need for this patch (and the original no-".."
restriction) is explained by observing there is a fairly easy-to-exploit
race condition with chroot(2) (and thus by extension LOOKUP_IN_ROOT and
LOOKUP_BENEATH if ".." is allowed) where a rename(2) of a path can be
used to "skip over" nd->root and thus escape to the filesystem above
nd->root.
thread1 [attacker]:
for (;;)
renameat2(AT_FDCWD, "/a/b/c", AT_FDCWD, "/a/d", RENAME_EXCHANGE);
thread2 [victim]:
for (;;)
openat2(dirb, "b/c/../../etc/shadow",
{ .flags = O_PATH, .resolve = RESOLVE_IN_ROOT } );
With fairly significant regularity, thread2 will resolve to
"/etc/shadow" rather than "/a/b/etc/shadow". There is also a similar
(though somewhat more privileged) attack using MS_MOVE.
With this patch, such cases will be detected *during* ".." resolution
and will return -EAGAIN for userspace to decide to either retry or abort
the lookup. It should be noted that ".." is the weak point of chroot(2)
-- walking *into* a subdirectory tautologically cannot result in you
walking *outside* nd->root (except through a bind-mount or magic-link).
There is also no other way for a directory's parent to change (which is
the primary worry with ".." resolution here) other than a rename or
MS_MOVE.
This is a first-pass implementation, where -EAGAIN will be returned if
any rename or mount occurs anywhere on the host (in any namespace). This
will result in spurious errors, but there isn't a satisfactory
alternative (other than denying ".." altogether).
One other possible alternative (which previous versions of this patch
used) would be to check with path_is_under() if there was a racing
rename or mount (after re-taking the relevant seqlocks). While this does
work, it results in possible O(n*m) behaviour if there are many renames
or mounts occuring *anywhere on the system*.
A variant of the above attack is included in the selftests for
openat2(2) later in this patch series. I've run this test on several
machines for several days and no instances of a breakout were detected.
While this is not concrete proof that this is safe, when combined with
the above argument it should lend some trustworthiness to this
construction.
[*] It may be acceptable in the future to do a path_is_under() check (as
with the alternative solution for "..") for magic-links after they
are resolved. However this seems unlikely to be a feature that
people *really* need -- it can be added later if it turns out a lot
of people want it.
[1]: https://lore.kernel.org/lkml/CAG48ez1jzNvxB+bfOBnERFGp=oMM0vHWuLD6EULmne3R6xa53w@mail.gmail.com/
[2]: https://lore.kernel.org/lkml/CAG48ez30WJhbsro2HOc_DR7V91M+hNFzBP5ogRMZaxbAORvqzg@mail.gmail.com/
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Suggested-by: Jann Horn <jannh@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
fs/namei.c | 42 +++++++++++++++++++++++++++++-------------
1 file changed, 29 insertions(+), 13 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index a3d199a60708..174d69cf9084 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -491,7 +491,7 @@ struct nameidata {
struct path root;
struct inode *inode; /* path.dentry.d_inode */
unsigned int flags;
- unsigned seq, m_seq;
+ unsigned seq, m_seq, r_seq;
int last_type;
unsigned depth;
int total_link_count;
@@ -1769,22 +1769,35 @@ static inline int handle_dots(struct nameidata *nd, int type)
if (type == LAST_DOTDOT) {
int error = 0;
- /*
- * Scoped-lookup flags resolving ".." is not currently safe --
- * races can cause our parent to have moved outside of the root
- * and us to skip over it.
- */
- if (unlikely(nd->flags & LOOKUP_IS_SCOPED))
- return -EXDEV;
if (!nd->root.mnt) {
error = set_root(nd);
if (error)
return error;
}
- if (nd->flags & LOOKUP_RCU) {
- return follow_dotdot_rcu(nd);
- } else
- return follow_dotdot(nd);
+ if (nd->flags & LOOKUP_RCU)
+ error = follow_dotdot_rcu(nd);
+ else
+ error = follow_dotdot(nd);
+ if (error)
+ return error;
+
+ if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
+ bool m_retry = read_seqretry(&mount_lock, nd->m_seq);
+ bool r_retry = read_seqretry(&rename_lock, nd->r_seq);
+
+ /*
+ * If there was a racing rename or mount along our
+ * path, then we can't be sure that ".." hasn't jumped
+ * above nd->root (and so userspace should retry or use
+ * some fallback).
+ *
+ * In future we could do a path_is_under() check here
+ * instead, but there are O(n*m) performance
+ * considerations with such a setup.
+ */
+ if (unlikely(m_retry || r_retry))
+ return -EAGAIN;
+ }
}
return 0;
}
@@ -2254,6 +2267,10 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
nd->last_type = LAST_ROOT; /* if there are only slashes... */
nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT;
nd->depth = 0;
+
+ nd->m_seq = read_seqbegin(&mount_lock);
+ nd->r_seq = read_seqbegin(&rename_lock);
+
if (flags & LOOKUP_ROOT) {
struct dentry *root = nd->root.dentry;
struct inode *inode = root->d_inode;
@@ -2275,7 +2292,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
nd->path.mnt = NULL;
nd->path.dentry = NULL;
- nd->m_seq = read_seqbegin(&mount_lock);
/* Absolute pathname -- fetch the root. */
if (flags & LOOKUP_IN_ROOT) {
--
2.23.0