This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Alternative libio vtable hardening approach
- From: Florian Weimer <fweimer at redhat dot com>
- To: "Carlos O'Donell" <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, Kees Cook <keescook at chromium dot org>, Yunlian Jiang <yunlian at google dot com>
- Date: Wed, 1 Jun 2016 11:12:23 +0200
- Subject: Re: Alternative libio vtable hardening approach
- Authentication-results: sourceware.org; auth=none
- References: <b34105f2-adcb-9347-73c0-43079729c418 at redhat dot com> <574DA946 dot 3000803 at redhat dot com> <ed17ec29-a391-f73a-a3e4-67f4b3d34570 at redhat dot com> <574E2B89 dot 60001 at redhat dot com>
On 06/01/2016 02:25 AM, Carlos O'Donell wrote:
On 05/31/2016 11:56 AM, Florian Weimer wrote:
On 05/31/2016 05:09 PM, Carlos O'Donell wrote:
The question I'm asking myself is: Should we be pointer mangling any of
these pointers to make it harder to scan for the table?
I'm not sure what you are trying to address. The table itself should
be in RELRO memory, and its address should be at a fixed offset
relative to the PC at the site of the check, so all that should be
pretty safe.
That's exactly the question I'm asking.
If I understand your reasoning:
(a) The table is in .rodata/.data.rel.ro (verified present tables are in .rodata)
(b) The table is always PC-relative addressable from ROP gadgets
within the library so finding it is easy.
- Hiding the pointers with mangling doesn't make this harder.
I don't get the âscanningâ and âROPâ parts. :)
(c) Mangling degrades performance.
Not just that, it makes compatibility so much harder. (We would have to
unmangle vtables if needed.)
I'm only worried about the flag that turns of the check.
I don't see any way to provide backwards compatibility than to disable
the check with a global.
We could encrypt the value of the global to make it harder to unset?
Mike suggests this too.
Yes, it's a good idea.
What we really want is a set of internal initializers that are run during
static dlopen / dlmopen that allow the library to copy key state into the
other namespace.
We don't need pass across any actual data, or affect relocation, so
it's easier. malloc in libc.so does the following to detect whether
it's running across a linking boundary:
#ifdef SHARED
/* In case this libc copy is in a non-default namespace, never use brk.
Likewise if dlopened from statically linked program. */
Dl_info di;
struct link_map *l;
if (_dl_open_hook != NULL
|| (_dl_addr (ptmalloc_init, &di, &l, NULL) != 0
&& l->l_ns != LM_ID_BASE))
__morecore = __failing_morecore;
#endif
Ah, OK, that answers my question. You lookup your own link map and then
determine you're not in the default namespace and reset the variables.
That's a sufficient solution. Thanks.
I also found this, which seems to cover the same thing:
/* Set nonzero if we have to be prepared for more than one libc being
used in the process. Safe assumption if initializer never runs. */
int __libc_multiple_libcs attribute_hidden = 1;
The malloc code doesn't use it, maybe because it came from out-of-tree
ptmalloc.
Would be csu/init-first.c the right place to perform the multiple-libc
check and set the flag? Does it run for both static and dynamic binaries?
Again, my thoughts were along the lines of some kind of callback
registration process where the old structure pointers are registered
along with a callback.
Based on our experience with the fork handlers, I think explicit calls
across subsystems are the way to go. It's the best way to express
ordering requirements.
a4c: 48 8b 83 d8 00 00 00 mov 0xd8(%rbx),%rax
a53: 48 29 f2 sub %rsi,%rdx
a56: 48 81 78 38 00 00 00 cmpq $_IO_file_xsputn_fast,0x38(%rax)
a5e: 0f 85 e6 11 00 00 jne 1c4a
a64: e8 00 00 00 00 callq _IO_file_xsputn_fast
â
1c4a: e8 00 00 00 00 callq _IO_file_xsputn_slow
1c4f: e9 15 ee ff ff jmpq a69
This expansion has much more reasonable size, but only makes sense if
the call site is monomorphic in practice.
I don't follow. The slow path can still use the vtables as required by
legacy binaries and the call site there can be polymorphic? Maybe I've
misunderstood the suggestion.
It's just a matter of performance. The comparison against the expected
pointer is worthwhile if the in almost all cases, there is a match and
you hit the fast path. If it's just 50/50 or something like that (as it
currently would be because fprintf and sprintf use different xsputn
implementations), it does not improve performance. Calling the slow
path directly would be preferable.
Florian