This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Alternative libio vtable hardening approach
- From: Florian Weimer <fweimer at redhat dot com>
- To: "Carlos O'Donell" <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, Kees Cook <keescook at chromium dot org>, Yunlian Jiang <yunlian at google dot com>
- Date: Tue, 31 May 2016 17:56:41 +0200
- Subject: Re: Alternative libio vtable hardening approach
- Authentication-results: sourceware.org; auth=none
- References: <b34105f2-adcb-9347-73c0-43079729c418 at redhat dot com> <574DA946 dot 3000803 at redhat dot com>
On 05/31/2016 05:09 PM, Carlos O'Donell wrote:
The question I'm asking myself is: Should we be pointer mangling any of
these pointers to make it harder to scan for the table?
I'm not sure what you are trying to address. The table itself should be
in RELRO memory, and its address should be at a fixed offset relative to
the PC at the site of the check, so all that should be pretty safe.
I'm only worried about the flag that turns of the check.
The attached patch is not completely finished. We need to disable
vtable validation on both sides of a static dlopen boundary because
the vtables are not shared across the boundary. This is what causes
dlfcn/tststatic2 to fail.
How do you plan to disable the hardening at the static dlopen boundary?
Set the flag on both sides once dlopen is called. I see no way to avoid
that because we cannot know if any of the functions in the loaded DSOs
access stdio streams on both sides of the boundary.
We could presumably add a flag that inhibits this for internal DSOs
which we know do not have such behavior. But I'm not sure if this is
worthwhile.
What we really want is a set of internal initializers that are run during
static dlopen / dlmopen that allow the library to copy key state into the
other namespace.
We don't need pass across any actual data, or affect relocation, so it's
easier. malloc in libc.so does the following to detect whether it's
running across a linking boundary:
#ifdef SHARED
/* In case this libc copy is in a non-default namespace, never use brk.
Likewise if dlopened from statically linked program. */
Dl_info di;
struct link_map *l;
if (_dl_open_hook != NULL
|| (_dl_addr (ptmalloc_init, &di, &l, NULL) != 0
&& l->l_ns != LM_ID_BASE))
__morecore = __failing_morecore;
#endif
I just need to stick this code in some place where it runs early enough
(before any ELF constructors and the like). Suggestions welcome. :)
(And we already have some mechanism because it seems that the dlopen
hooks are sent across. But I don't have all the details.)
A future optimization would change the virtual function calls to
compare the function pointer in the vtable against an expected
function pointer. Only if that comparison fails, the vtable pointer
is validated. This will allow us to inline parts of xsputn into
vfprintf, especially if we change the various implementations to
share more code.
You suggest this because a pointer comparison is cheaper than an
address range comparison? Could we make the tables read-only and use
another flag to elide all checking?
The current checking code looks like this:
a27: 48 8b 85 80 fb ff ff mov -0x480(%rbp),%rax
a2e: 4c 8b a0 d8 00 00 00 mov 0xd8(%rax),%r12
a35: 49 81 fc 00 00 00 00 cmp $IO_vtables,%r12
a3c: 0f 82 0e 03 00 00 jb d50
a42: 49 81 fc 00 00 00 00 cmp $IO_vtables+0xbd0,%r12
a49: 0f 83 01 03 00 00 jae d50
a4f: 49 8b 75 18 mov 0x18(%r13),%rsi
a53: 49 8b 55 20 mov 0x20(%r13),%rdx
a57: 48 8b bd 80 fb ff ff mov -0x480(%rbp),%rdi
a5e: 48 29 f2 sub %rsi,%rdx
a61: 41 ff 54 24 38 callq *0x38(%r12)
â
d50: e8 00 00 00 00 callq IO_vtable_check
d55: e9 f5 fc ff ff jmpq a4f
Instead we could do something like this (the call setup sequence is
somewhat different here):
a40: 49 8b 44 24 38 mov 0x38(%r12),%rax
a45: 48 29 f2 sub %rsi,%rdx
a48: 48 3d 00 00 00 00 cmp $_IO_file_xsputn_fast,%rax
a4e: 0f 85 f7 11 00 00 jne 1c4b
a54: 4c 89 ff mov %r15,%rdi
a57: e8 00 00 00 00 callq _IO_file_xsputn_fast
â
1c4b: 49 81 fc 00 00 00 00 cmp $IO_vtables+0xbd0,%r12
1c52: 0f 83 4e 04 00 00 jae 20a6
1c58: 49 81 fc 00 00 00 00 cmp $IO_vtables,%r12
1c5f: 0f 82 41 04 00 00 jb 20a6
1c65: 4c 89 ff mov %r15,%rdi
1c68: ff d0 callq *%rax
1c6a: e9 ed ed ff ff jmpq a5c
â
20a6: 48 89 95 78 fb ff ff mov %rdx,-0x488(%rbp)
20ad: 48 89 b5 88 fb ff ff mov %rsi,-0x478(%rbp)
20b4: e8 00 00 00 00 callq IO_vtable_check
20b9: 49 8b 44 24 38 mov 0x38(%r12),%rax
20be: 48 8b 95 78 fb ff ff mov -0x488(%rbp),%rdx
20c5: 48 8b b5 88 fb ff ff mov -0x478(%rbp),%rsi
20cc: e9 94 fb ff ff jmpq 1c65
So devirtualize the call to _IO_file_xsputn_fast, which can then be
inlined. If the slow path is only needed for compatibility (after we
have carefully rewritten xsputn so that we only need one version, which
seems doable), we could call some slow-path xsputn function, so the
a4c: 48 8b 83 d8 00 00 00 mov 0xd8(%rbx),%rax
a53: 48 29 f2 sub %rsi,%rdx
a56: 48 81 78 38 00 00 00 cmpq
$_IO_file_xsputn_fast,0x38(%rax)
a5e: 0f 85 e6 11 00 00 jne 1c4a
a64: e8 00 00 00 00 callq _IO_file_xsputn_fast
â
1c4a: e8 00 00 00 00 callq _IO_file_xsputn_slow
1c4f: e9 15 ee ff ff jmpq a69
This expansion has much more reasonable size, but only makes sense if
the call site is monomorphic in practice.
Agreed. My only worry is static dlopen.
My worry is that the hardening is insufficient. :)
Florian