This is the mail archive of the
mailing list for the glibc project.
Re: Alternative libio vtable hardening approach
- From: Carlos O'Donell <carlos at redhat dot com>
- To: Florian Weimer <fweimer at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, Kees Cook <keescook at chromium dot org>, Yunlian Jiang <yunlian at google dot com>
- Date: Tue, 31 May 2016 20:25:45 -0400
- Subject: Re: Alternative libio vtable hardening approach
- Authentication-results: sourceware.org; auth=none
- References: <b34105f2-adcb-9347-73c0-43079729c418 at redhat dot com> <574DA946 dot 3000803 at redhat dot com> <ed17ec29-a391-f73a-a3e4-67f4b3d34570 at redhat dot com>
On 05/31/2016 11:56 AM, Florian Weimer wrote:
> On 05/31/2016 05:09 PM, Carlos O'Donell wrote:
>> The question I'm asking myself is: Should we be pointer mangling any of
>> these pointers to make it harder to scan for the table?
> I'm not sure what you are trying to address. The table itself should
> be in RELRO memory, and its address should be at a fixed offset
> relative to the PC at the site of the check, so all that should be
> pretty safe.
That's exactly the question I'm asking.
If I understand your reasoning:
(a) The table is in .rodata/.data.rel.ro (verified present tables are in .rodata)
(b) The table is always PC-relative addressable from ROP gadgets
within the library so finding it is easy.
- Hiding the pointers with mangling doesn't make this harder.
(c) Mangling degrades performance.
Seems fine to me, I just wanted to be as explicit as possible for our
> I'm only worried about the flag that turns of the check.
I don't see any way to provide backwards compatibility than to disable
the check with a global.
We could encrypt the value of the global to make it harder to unset?
Mike suggests this too.
>>> The attached patch is not completely finished. We need to disable
>>> vtable validation on both sides of a static dlopen boundary because
>>> the vtables are not shared across the boundary. This is what causes
>>> dlfcn/tststatic2 to fail.
>> How do you plan to disable the hardening at the static dlopen boundary?
> Set the flag on both sides once dlopen is called. I see no way to
> avoid that because we cannot know if any of the functions in the
> loaded DSOs access stdio streams on both sides of the boundary.
You answer this below.
> We could presumably add a flag that inhibits this for internal DSOs
> which we know do not have such behavior. But I'm not sure if this is
Let us stay focused on the hardening aspect for this first change.
>> What we really want is a set of internal initializers that are run during
>> static dlopen / dlmopen that allow the library to copy key state into the
>> other namespace.
> We don't need pass across any actual data, or affect relocation, so
> it's easier. malloc in libc.so does the following to detect whether
> it's running across a linking boundary:
> #ifdef SHARED
> /* In case this libc copy is in a non-default namespace, never use brk.
> Likewise if dlopened from statically linked program. */
> Dl_info di;
> struct link_map *l;
> if (_dl_open_hook != NULL
> || (_dl_addr (ptmalloc_init, &di, &l, NULL) != 0
> && l->l_ns != LM_ID_BASE))
> __morecore = __failing_morecore;
Ah, OK, that answers my question. You lookup your own link map and then
determine you're not in the default namespace and reset the variables.
That's a sufficient solution. Thanks.
> I just need to stick this code in some place where it runs early
> enough (before any ELF constructors and the like). Suggestions
> welcome. :)
That sounds good. Thanks.
> (And we already have some mechanism because it seems that the dlopen
> hooks are sent across. But I don't have all the details.)
Again, my thoughts were along the lines of some kind of callback
registration process where the old structure pointers are registered
along with a callback. Then in the new namespace the callback runs
with access to the new structure address (dynamic lookup) and does
some kind of custom copy to reinitialize the dynamic namespace.
That's a lot more work than what you propose. I do not suggest you
follow my suggestion. Stick with what works and solve the security
>>> A future optimization would change the virtual function calls to
>>> compare the function pointer in the vtable against an expected
>>> function pointer. Only if that comparison fails, the vtable pointer
>>> is validated. This will allow us to inline parts of xsputn into
>>> vfprintf, especially if we change the various implementations to
>>> share more code.
>> You suggest this because a pointer comparison is cheaper than an
>> address range comparison? Could we make the tables read-only and use
>> another flag to elide all checking?
Note that this was before I checked to verify the tables were in .rodata
or .data.rel.ro. The present tables are in .rodata.
> The current checking code looks like this:
> a27: 48 8b 85 80 fb ff ff mov -0x480(%rbp),%rax
> a2e: 4c 8b a0 d8 00 00 00 mov 0xd8(%rax),%r12
> a35: 49 81 fc 00 00 00 00 cmp $IO_vtables,%r12
> a3c: 0f 82 0e 03 00 00 jb d50
> a42: 49 81 fc 00 00 00 00 cmp $IO_vtables+0xbd0,%r12
> a49: 0f 83 01 03 00 00 jae d50
> a4f: 49 8b 75 18 mov 0x18(%r13),%rsi
> a53: 49 8b 55 20 mov 0x20(%r13),%rdx
> a57: 48 8b bd 80 fb ff ff mov -0x480(%rbp),%rdi
> a5e: 48 29 f2 sub %rsi,%rdx
> a61: 41 ff 54 24 38 callq *0x38(%r12)
> d50: e8 00 00 00 00 callq IO_vtable_check
> d55: e9 f5 fc ff ff jmpq a4f
> Instead we could do something like this (the call setup sequence is somewhat different here):
> a40: 49 8b 44 24 38 mov 0x38(%r12),%rax
> a45: 48 29 f2 sub %rsi,%rdx
> a48: 48 3d 00 00 00 00 cmp $_IO_file_xsputn_fast,%rax
> a4e: 0f 85 f7 11 00 00 jne 1c4b
> a54: 4c 89 ff mov %r15,%rdi
> a57: e8 00 00 00 00 callq _IO_file_xsputn_fast
> 1c4b: 49 81 fc 00 00 00 00 cmp $IO_vtables+0xbd0,%r12
> 1c52: 0f 83 4e 04 00 00 jae 20a6
> 1c58: 49 81 fc 00 00 00 00 cmp $IO_vtables,%r12
> 1c5f: 0f 82 41 04 00 00 jb 20a6
> 1c65: 4c 89 ff mov %r15,%rdi
> 1c68: ff d0 callq *%rax
> 1c6a: e9 ed ed ff ff jmpq a5c
> 20a6: 48 89 95 78 fb ff ff mov %rdx,-0x488(%rbp)
> 20ad: 48 89 b5 88 fb ff ff mov %rsi,-0x478(%rbp)
> 20b4: e8 00 00 00 00 callq IO_vtable_check
> 20b9: 49 8b 44 24 38 mov 0x38(%r12),%rax
> 20be: 48 8b 95 78 fb ff ff mov -0x488(%rbp),%rdx
> 20c5: 48 8b b5 88 fb ff ff mov -0x478(%rbp),%rsi
> 20cc: e9 94 fb ff ff jmpq 1c65
> So devirtualize the call to _IO_file_xsputn_fast, which can then be
> inlined. If the slow path is only needed for compatibility (after we
> have carefully rewritten xsputn so that we only need one version,
> which seems doable), we could call some slow-path xsputn function, so
> a4c: 48 8b 83 d8 00 00 00 mov 0xd8(%rbx),%rax
> a53: 48 29 f2 sub %rsi,%rdx
> a56: 48 81 78 38 00 00 00 cmpq $_IO_file_xsputn_fast,0x38(%rax)
> a5e: 0f 85 e6 11 00 00 jne 1c4a
> a64: e8 00 00 00 00 callq _IO_file_xsputn_fast
> 1c4a: e8 00 00 00 00 callq _IO_file_xsputn_slow
> 1c4f: e9 15 ee ff ff jmpq a69
> This expansion has much more reasonable size, but only makes sense if
> the call site is monomorphic in practice.
I don't follow. The slow path can still use the vtables as required by
legacy binaries and the call site there can be polymorphic? Maybe I've
misunderstood the suggestion.
>> Agreed. My only worry is static dlopen.
> My worry is that the hardening is insufficient. :)
Would it worry you if I said "the hardening is _always_ insufficient." ;-)