This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fallout from dlopen() blocking SIGSYS


On Thu, Dec 05, 2019 at 05:03:00PM +0100, Florian Weimer wrote:
> * Gian-Carlo Pascutto:
> 
> > Block signals during the initial part of dlopen
> > (a2e8aa0d9ea648068d8be52dd7b15f1b6a008e23)
> >
> > is going to break every Firefox release of the last few years. We use a
> > seccomp-bpf filter to sandbox various processes. In some of these
> > processes we don't want to do a dlopen() of untrusted code while we're
> > not sandboxed yet, for example in the process we use to isolate Google's
> > Widevine DRM modules from any private data on the system.
> >
> > seccomp-bpf will intercept various filesystem related syscalls and raise
> > SIGSYS, at which moment our code will contact a broker in the parent
> > process that checks if the file that's being want to read is acceptable
> > to us, and then passes down the file handle.
> 
> I have re-reviewed the referenced patch and posted:
> 
>   <https://sourceware.org/ml/libc-alpha/2019-12/msg00175.html>
>   <https://sourceware.org/ml/libc-alpha/2019-12/msg00176.html>
>   <https://sourceware.org/ml/libc-alpha/2019-12/msg00177.html>
> 
> Lazy binding is buggy and has races, but with the new patches, the
> NODELETE changes should not make matters worse.
> 
> But I think we do need something better for seccomp sandboxing in the
> medium term, so I'm happy to have a larger conversation now.
> 
> Is there actually a signal handler for SIGSYS in the monitored process?
> Based on some discussion I've seen, I think the kernel pushes a signal
> context on the thread stack (otherwise there wouldn't be a signal mask
> to patch), handler or not.  This alone as compatibility implications.
> 
> There are cases where we absolutely have to block all signals for
> correctness purposes.  Some reasons are:
> 
> (a) Implementing async-signal-safe functions on top of something that is
> not async-signal-safe.
> 
> (b) Avoid running user code with the wrong TCB or an uninitialized TCB.
> 
> (c) Prevent the kernel from pushing the signal context onto a stack that
> is too small.
> 
> (d) Avoid running user code on a stack that is too small.
> 
> (e) Enable reuse of the stack pointer register for something else.
> 
> Particularly for (a), I expect to see more cases in the future.  I don't
> know which system calls we would run in such critical sections.  The
> usage in dlopen falls into that category, but it's a very incomplete fix
> and not very useful overall.
> 
> Unfortunately, (b) is generally necessary around clone system calls.
> It's essential for correct use of vfork-like clone in posix_spawn.  We
> don't do it for pthread_create today, but this results in a bug we want
> to fix (see bug 25098).
> 
> (c) is relevant to the current use of clone in vfork because we have a
> small stack there.  I think this impacts seccomp monitoring even if
> there is no actual signal handler because of the signal context data
> written by the kernel.  I want to add a clone_samestack system call
> wrapper that avoids this issue, but I haven't done that yet.
> 
> (d) is a more problematic variant of (c).  That's a secondary issue with
> the vfork as wellin addition to (b).  I don't think (d) is something we
> do a lot in glibc, but applications may do it.  Perhaps they use
> sigaltstack instead.
> 
> glibc currently does not do (e) as far as I know, but there are some
> applications which use %esp on i386 as a general-purpose register.  I
> doubt this use case is relevant to Firefox anyway.
> 
> Please do not underestimate the stack usage for the signal context.  If
> I recall correctly, on current x86, it is more than 5 KiB, and on POWER,
> it's more than 10 KiB.  Stack usage grows with newer kernel releases
> which bring support for larger register files.  Some of this overhead
> comes from the red zone (the stack region below the stack pointer that
> signal handlers cannot touch), and that's part of the ABI definition.
> But the variable-sized part of the signal context is not exported from
> the kernel, so it's hard for applications to size stacks appropriately.
> (That's why I'm interested in clone_samestack for glibc's internal use.)
> 
> For (a), we really need a list of system calls which are safe to perform
> in such critical sections.  Can we call your interposed malloc, or will
> that try to open files in /proc in some cases?
> 
> When we fix bug 25098 and adopt clone3, you might be a bit of a problem
> because of the in-memory flags argument for clone3, and you can't

Fwiw, I have this on my agenda, i.e. making it possible for seccomp to
filter a certain __subset__ of system calls with pointer arguments. I've
started a discussion in August right before Kernel Summit:
https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2019-July/006699.html
and Kees Cook and I gave a session at Kernel Summit in Lisbon:
https://www.youtube.com/watch?v=PnOSPsRzVYM

It's planned I just need to find time to work on this :/

Christian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]