This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Alternative libio vtable hardening approach


On 05/31/2016 05:09 PM, Carlos O'Donell wrote:

The question I'm asking myself is: Should we be pointer mangling any of
these pointers to make it harder to scan for the table?

I'm not sure what you are trying to address. The table itself should be in RELRO memory, and its address should be at a fixed offset relative to the PC at the site of the check, so all that should be pretty safe.

I'm only worried about the flag that turns of the check.

The attached patch is not completely finished.  We need to disable
vtable validation on both sides of a static dlopen boundary because
the vtables are not shared across the boundary.  This is what causes
dlfcn/tststatic2 to fail.

How do you plan to disable the hardening at the static dlopen boundary?

Set the flag on both sides once dlopen is called. I see no way to avoid that because we cannot know if any of the functions in the loaded DSOs access stdio streams on both sides of the boundary.

We could presumably add a flag that inhibits this for internal DSOs which we know do not have such behavior. But I'm not sure if this is worthwhile.

What we really want is a set of internal initializers that are run during
static dlopen / dlmopen that allow the library to copy key state into the
other namespace.

We don't need pass across any actual data, or affect relocation, so it's easier. malloc in libc.so does the following to detect whether it's running across a linking boundary:

#ifdef SHARED
  /* In case this libc copy is in a non-default namespace, never use brk.
     Likewise if dlopened from statically linked program.  */
  Dl_info di;
  struct link_map *l;

  if (_dl_open_hook != NULL
      || (_dl_addr (ptmalloc_init, &di, &l, NULL) != 0
          && l->l_ns != LM_ID_BASE))
    __morecore = __failing_morecore;
#endif

I just need to stick this code in some place where it runs early enough (before any ELF constructors and the like). Suggestions welcome. :)

(And we already have some mechanism because it seems that the dlopen hooks are sent across. But I don't have all the details.)

A future optimization would change the virtual function calls to
compare the function pointer in the vtable against an expected
function pointer.  Only if that comparison fails, the vtable pointer
is validated.  This will allow us to inline parts of xsputn into
vfprintf, especially if we change the various implementations to
share more code.

You suggest this because a pointer comparison is cheaper than an
address range comparison? Could we make the tables read-only and use
another flag to elide all checking?

The current checking code looks like this:

     a27:       48 8b 85 80 fb ff ff    mov    -0x480(%rbp),%rax
     a2e:       4c 8b a0 d8 00 00 00    mov    0xd8(%rax),%r12
     a35:       49 81 fc 00 00 00 00    cmp    $IO_vtables,%r12
     a3c:       0f 82 0e 03 00 00       jb     d50
     a42:       49 81 fc 00 00 00 00    cmp    $IO_vtables+0xbd0,%r12
     a49:       0f 83 01 03 00 00       jae    d50
     a4f:       49 8b 75 18             mov    0x18(%r13),%rsi
     a53:       49 8b 55 20             mov    0x20(%r13),%rdx
     a57:       48 8b bd 80 fb ff ff    mov    -0x480(%rbp),%rdi
     a5e:       48 29 f2                sub    %rsi,%rdx
     a61:       41 ff 54 24 38          callq  *0x38(%r12)
â
     d50:       e8 00 00 00 00          callq  IO_vtable_check
     d55:       e9 f5 fc ff ff          jmpq   a4f

Instead we could do something like this (the call setup sequence is somewhat different here):

     a40:       49 8b 44 24 38          mov    0x38(%r12),%rax
     a45:       48 29 f2                sub    %rsi,%rdx
     a48:       48 3d 00 00 00 00       cmp   $_IO_file_xsputn_fast,%rax
     a4e:       0f 85 f7 11 00 00       jne    1c4b
     a54:       4c 89 ff                mov    %r15,%rdi
     a57:       e8 00 00 00 00          callq  _IO_file_xsputn_fast
â
    1c4b:       49 81 fc 00 00 00 00    cmp    $IO_vtables+0xbd0,%r12
    1c52:       0f 83 4e 04 00 00       jae    20a6
    1c58:       49 81 fc 00 00 00 00    cmp    $IO_vtables,%r12
    1c5f:       0f 82 41 04 00 00       jb     20a6
    1c65:       4c 89 ff                mov    %r15,%rdi
    1c68:       ff d0                   callq  *%rax
    1c6a:       e9 ed ed ff ff          jmpq   a5c
â
    20a6:       48 89 95 78 fb ff ff    mov    %rdx,-0x488(%rbp)
    20ad:       48 89 b5 88 fb ff ff    mov    %rsi,-0x478(%rbp)
    20b4:       e8 00 00 00 00          callq  IO_vtable_check
    20b9:       49 8b 44 24 38          mov    0x38(%r12),%rax
    20be:       48 8b 95 78 fb ff ff    mov    -0x488(%rbp),%rdx
    20c5:       48 8b b5 88 fb ff ff    mov    -0x478(%rbp),%rsi
    20cc:       e9 94 fb ff ff          jmpq   1c65

So devirtualize the call to _IO_file_xsputn_fast, which can then be inlined. If the slow path is only needed for compatibility (after we have carefully rewritten xsputn so that we only need one version, which seems doable), we could call some slow-path xsputn function, so the


     a4c:       48 8b 83 d8 00 00 00    mov    0xd8(%rbx),%rax
     a53:       48 29 f2                sub    %rsi,%rdx
a56: 48 81 78 38 00 00 00 cmpq $_IO_file_xsputn_fast,0x38(%rax)
     a5e:       0f 85 e6 11 00 00       jne    1c4a
     a64:       e8 00 00 00 00          callq  _IO_file_xsputn_fast
â
    1c4a:       e8 00 00 00 00          callq  _IO_file_xsputn_slow
    1c4f:       e9 15 ee ff ff          jmpq   a69

This expansion has much more reasonable size, but only makes sense if the call site is monomorphic in practice.

Agreed. My only worry is static dlopen.

My worry is that the hardening is insufficient. :)

Florian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]