Fallout from dlopen() blocking SIGSYS

Fri Dec 6 10:39:00 GMT 2019

* Gian-Carlo Pascutto:

> On 5/12/2019 17:03, Florian Weimer wrote:
>> I have re-reviewed the referenced patch and posted:
>> 
>>   <https://sourceware.org/ml/libc-alpha/2019-12/msg00175.html>
>>   <https://sourceware.org/ml/libc-alpha/2019-12/msg00176.html>
>>   <https://sourceware.org/ml/libc-alpha/2019-12/msg00177.html>
>> 
>> Lazy binding is buggy and has races, but with the new patches, the
>> NODELETE changes should not make matters worse.
>
> Thanks, it seems like that should put out the immediate fire at least.
>
>> But I think we do need something better for seccomp sandboxing in the
>> medium term, so I'm happy to have a larger conversation now.
>
> We'd probably need to include the Chromium people in this.

Would you be able to do this?  I don't know anyone there.

> The implementation we use is to a large extent based on theirs, and,
> IIRC, it's also that team that did parts of the initial implementation
> of seccomp-bpf in the kernel. The (only) reason this now affected
> Firefox first is that dlopen() was the first to block signals, and we
> have a use case where we need to do that inside the sandbox, and
> (apparently) Chromium doesn't (yet).
>
> But if the signal blocking is going to be required for other libc calls
> to work, I would assume it is going to risk breaking Chromium too, as
> they also use the SECCOMP_RET_TRAP mechanism in several places:

Chromium was definitely affected by the recent changes in the glibc
syscall profile, adjusting to the current generation of system calls
recommended by the kernel.  (Only those system calls will receive Y2038
companions on existing 32-bit architectures, and we'll consolidate on
those system calls.)

>> For (a), we really need a list of system calls which are safe to perform
>> in such critical sections.  Can we call your interposed malloc, or will
>> that try to open files in /proc in some cases?
>
> It should do an anonymous mmap, AFAIK.

Doesn't it also create background threads?

> We TRAP on
>
> https://searchfox.org/mozilla-central/source/security/sandbox/linux/SandboxFilter.cpp
>
> Anything trying to hit the filesystem:
> open, openat, access, accessat, stat, lstat, statat, chmod, link,
> symlink, rename, mkdir, rmdir, unlink, readlink, readlinkat, faccessat,
> statx (in the future)
>
> Anything affecting other processes or leaking too much data about the
> system:
> tkill, prctl (sometimes), getppid, connect, socketpair, socketcall,
> sched, uname, fcntl, sched_getparam, sched_getscheduler,
> sched_setscheduler (sometimes)

And for other system calls, you may have BPF-based filters which inspect
arguments?

>> When we fix bug 25098 and adopt clone3, you might be a bit of a problem
>> because of the in-memory flags argument for clone3, and you can't
>> intercept the system call due to the blocked signals.
>
> Yes, that looks like it would be a serious problem, for both browsers:
>
> https://searchfox.org/mozilla-central/rev/ea63a0888d406fae720cf24f4727d87569a8cab5/security/sandbox/linux/SandboxFilter.cpp#325
>
> https://chromium.googlesource.com/chromium/src/+/refs/heads/master/sandbox/linux/seccomp-bpf-helpers/syscall_parameters_restrictions.cc#140

> Some implementations in glibc try to use a new syscall, and if it
> doesn't exist on the current kernel (ENOSYS), fall back to older
> interfaces. If that's possible for clone3 usage, then we'd simply return
> ENOSYS on that and force the fallback to regular clone, unless the
> kernel is new enough that we can filter. That would decouple glibc's
> ability to use the new syscall from the state of the seccomp filtering
> implementation in the kernel. Could that work here?

For default builds and the forseeable future, probably yes.

> We currently do this for statx, which just gets an ENOSYS instead of a
> TRAP - glibc (and rust's stdlib!) will happily use their fallback paths
> until we write a broker implementation for it.

Some distributions may opt to build glibc against a more recent minimum
kernel version instead of the default (Linux 3.2).  In that case, the
fallback code may be gone.  Some libraries also choose not to support
old kernels and make direct system calls.

This is also what happens if we switch to more recent system calls from
historic ones.  There won't be any ENOSYS fallback code if they are
available in Linux 3.2.

Thanks,
Florian