This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Alternative libio vtable hardening approach

From: Florian Weimer <fweimer at redhat dot com>
To: "Carlos O'Donell" <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, Kees Cook <keescook at chromium dot org>, Yunlian Jiang <yunlian at google dot com>
Date: Tue, 31 May 2016 17:56:41 +0200
Subject: Re: Alternative libio vtable hardening approach
Authentication-results: sourceware.org; auth=none
References: <b34105f2-adcb-9347-73c0-43079729c418 at redhat dot com> <574DA946 dot 3000803 at redhat dot com>

On 05/31/2016 05:09 PM, Carlos O'Donell wrote:

The question I'm asking myself is: Should we be pointer mangling any of
these pointers to make it harder to scan for the table?

I'm not sure what you are trying to address. The table itself should bein RELRO memory, and its address should be at a fixed offset relative tothe PC at the site of the check, so all that should be pretty safe.


I'm only worried about the flag that turns of the check.

The attached patch is not completely finished.  We need to disable
vtable validation on both sides of a static dlopen boundary because
the vtables are not shared across the boundary.  This is what causes
dlfcn/tststatic2 to fail.


How do you plan to disable the hardening at the static dlopen boundary?

Set the flag on both sides once dlopen is called. I see no way to avoidthat because we cannot know if any of the functions in the loaded DSOsaccess stdio streams on both sides of the boundary.

We could presumably add a flag that inhibits this for internal DSOswhich we know do not have such behavior. But I'm not sure if this isworthwhile.

What we really want is a set of internal initializers that are run during
static dlopen / dlmopen that allow the library to copy key state into the
other namespace.

We don't need pass across any actual data, or affect relocation, so it'seasier. malloc in libc.so does the following to detect whether it'srunning across a linking boundary:


#ifdef SHARED
  /* In case this libc copy is in a non-default namespace, never use brk.
     Likewise if dlopened from statically linked program.  */
  Dl_info di;
  struct link_map *l;

  if (_dl_open_hook != NULL
      || (_dl_addr (ptmalloc_init, &di, &l, NULL) != 0
          && l->l_ns != LM_ID_BASE))
    __morecore = __failing_morecore;
#endif

I just need to stick this code in some place where it runs early enough(before any ELF constructors and the like). Suggestions welcome. :)

(And we already have some mechanism because it seems that the dlopenhooks are sent across. But I don't have all the details.)

A future optimization would change the virtual function calls to
compare the function pointer in the vtable against an expected
function pointer.  Only if that comparison fails, the vtable pointer
is validated.  This will allow us to inline parts of xsputn into
vfprintf, especially if we change the various implementations to
share more code.


You suggest this because a pointer comparison is cheaper than an
address range comparison? Could we make the tables read-only and use
another flag to elide all checking?


The current checking code looks like this:

     a27:       48 8b 85 80 fb ff ff    mov    -0x480(%rbp),%rax
     a2e:       4c 8b a0 d8 00 00 00    mov    0xd8(%rax),%r12
     a35:       49 81 fc 00 00 00 00    cmp    $IO_vtables,%r12
     a3c:       0f 82 0e 03 00 00       jb     d50
     a42:       49 81 fc 00 00 00 00    cmp    $IO_vtables+0xbd0,%r12
     a49:       0f 83 01 03 00 00       jae    d50
     a4f:       49 8b 75 18             mov    0x18(%r13),%rsi
     a53:       49 8b 55 20             mov    0x20(%r13),%rdx
     a57:       48 8b bd 80 fb ff ff    mov    -0x480(%rbp),%rdi
     a5e:       48 29 f2                sub    %rsi,%rdx
     a61:       41 ff 54 24 38          callq  *0x38(%r12)
â
     d50:       e8 00 00 00 00          callq  IO_vtable_check
     d55:       e9 f5 fc ff ff          jmpq   a4f

Instead we could do something like this (the call setup sequence issomewhat different here):


     a40:       49 8b 44 24 38          mov    0x38(%r12),%rax
     a45:       48 29 f2                sub    %rsi,%rdx
     a48:       48 3d 00 00 00 00       cmp   $_IO_file_xsputn_fast,%rax
     a4e:       0f 85 f7 11 00 00       jne    1c4b
     a54:       4c 89 ff                mov    %r15,%rdi
     a57:       e8 00 00 00 00          callq  _IO_file_xsputn_fast
â
    1c4b:       49 81 fc 00 00 00 00    cmp    $IO_vtables+0xbd0,%r12
    1c52:       0f 83 4e 04 00 00       jae    20a6
    1c58:       49 81 fc 00 00 00 00    cmp    $IO_vtables,%r12
    1c5f:       0f 82 41 04 00 00       jb     20a6
    1c65:       4c 89 ff                mov    %r15,%rdi
    1c68:       ff d0                   callq  *%rax
    1c6a:       e9 ed ed ff ff          jmpq   a5c
â
    20a6:       48 89 95 78 fb ff ff    mov    %rdx,-0x488(%rbp)
    20ad:       48 89 b5 88 fb ff ff    mov    %rsi,-0x478(%rbp)
    20b4:       e8 00 00 00 00          callq  IO_vtable_check
    20b9:       49 8b 44 24 38          mov    0x38(%r12),%rax
    20be:       48 8b 95 78 fb ff ff    mov    -0x488(%rbp),%rdx
    20c5:       48 8b b5 88 fb ff ff    mov    -0x478(%rbp),%rsi
    20cc:       e9 94 fb ff ff          jmpq   1c65

So devirtualize the call to _IO_file_xsputn_fast, which can then beinlined. If the slow path is only needed for compatibility (after wehave carefully rewritten xsputn so that we only need one version, whichseems doable), we could call some slow-path xsputn function, so the



     a4c:       48 8b 83 d8 00 00 00    mov    0xd8(%rbx),%rax
     a53:       48 29 f2                sub    %rsi,%rdx

a56: 48 81 78 38 00 00 00 cmpq$_IO_file_xsputn_fast,0x38(%rax)

     a5e:       0f 85 e6 11 00 00       jne    1c4a
     a64:       e8 00 00 00 00          callq  _IO_file_xsputn_fast
â
    1c4a:       e8 00 00 00 00          callq  _IO_file_xsputn_slow
    1c4f:       e9 15 ee ff ff          jmpq   a69

This expansion has much more reasonable size, but only makes sense ifthe call site is monomorphic in practice.

Agreed. My only worry is static dlopen.


My worry is that the hardening is insufficient. :)

Florian

References:
- Alternative libio vtable hardening approach
  - From: Florian Weimer
- Re: Alternative libio vtable hardening approach
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]