This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC PATCH 00/11] Library OS support

This patch is to add Library OS(LibOS in short) to glibc.
This is the first version of patch series to support LibOS.
The feedback is more than welcome.
I'll give a remote presentation at GNU cauldron 2019 on 13 September.

Why LibOS support?
Recently there are many Library OS projects and some of them have been already
deployed in the fields. Typically they uses modified libc to get control
instead of kernel to process system call. Such modifications are done
independently by each project. They are making duplicated effort.
This effort is to upstream those common modification to glibc upstream.
Also some projects adapt other libc implementation for some reasons.
With this LibOS support, glibc can gain more user base.

What is LibOS?
LibOS implements OS functionalities as library that executes in the application
address space. To invoke its entry point, usually function call is used instead
of special instruction like syscall instruction.
The common use cases are, container(unikernel or sand box),
compatibility(e.g. SGX support), and/or performance(e.g. avoiding
kernel overhead).
There are a lot of academic papers and industrial white papers.

What does LibOS?
There are common behaviors of LibOS in general.

As boot up, LibOS is booted by kernel and then LibOS takes control.
LibOS loads interpreter( and target application binary instead of kernel.

several hooks:
small number of hooks are needed to inject special logic.
For example, LibOS wants to know when (un)loads shared library so that
it can inject extra logic.

heap allocation:
LibOS has its own virtual address layout so that it has restriction
for heap space. So heap allocator of libc needs to be aware of it.
If heap allocator requests too large area, it results in ENOMEM.

Thread Control Block:
LibOS also needs to have its own thread control block in addition to
pthread tcb(tcbhead_t).
one way is modify tcbhead_t at source code level Another way is to use %gs.

Hooking system call:
For LibOS to take control instead of kernel on system call, system call
instruction is replaced with function call to LibOS.
There are two points. how to identify system call instruction and
how to replace it with function call.

The approach varies among LibOSes. There are two major ones.
- modify libc at source code level
  prepare shared library specialized for LibOS and use them instead of
  those installed on the system.
  This assumes that executables are usually dynamically liked and shared
  library (, etc.) can be easily replaced.
  The downside is, this can't be applied to statically linked binary easily.
- analyze opcodes and replace system call instruction somehow
  It can be done at loading time, execution time or offline.
  This technique applies to both dynamically/statically linked binary.
  The downside is, such logic is complex and fragile. The code tends to
  be huge. The existing binary analysis framework can be used.

High level direction
Single version of binary:
For library maintenance, the single version of binary should serve both
native case(traditional tools stack) and LibOS case.
We shouldn't have multiple versions of binary.
e.g. version for native, version for LibOS X and more.
multiple version won't scale and the maintenance of multiple version
won't be sustainable.
This also means the approach should be agnostic to LibOS.

Minimize maintenance const in glibc/overhead for native case:
For maintainability in glibc, the change to glibc should be minimized and
the complexity/burden should be put on LibOSes when trade off is needed.
Also the overhead(cpu cycles and memory space) for native case should be
minimized because the largest user base is native case.

adding initialization hooks:
add stub functions of weak symbol so that LibOS can interpose those functions.

heap allocator:
introduce new tunables to specify heap size so that LibOS can tell heap size
to heap allocator

hp-timing(profiling rtld):
introduce weak symbol to disable rtld profiling

new .note:
introduce new note which describes LibOS support. Then LibOS can easily
check if it supports LibOS or not.

hooking system call:
For LibOS to use binary editing, create a list of system call instructions and
adds nop instruction for binary editing.
introduce .libos.instruction.syscall section for it.

x86 instruction has variable length and syscall instruction has 2 byte length.
4 byte jump/call requires 5 bytes. So it complicates binary editing.
To make binary editing easier nop instructions are added around syscall

Alternatives for system call hooking:
- weak symbol function
  The extrem option is, replace system call instruction with syscall function.
  and make syscall function as weak symbol.
  Then LibOS needs to hook only syscall function.
  This may incur function call overhead on native case.
- SDT marker
  SDT marker is optimized for its usage. It's not suitable for hooking
  system call instructions.
  For example, only single nop is inserted and the size of .note is 32+ bytes
  per marker.

Analysis and benchmark
The number of syscall instruction and space overhead:
I counted the number of syscall instruction and space overhead
on my environment.
Although the actual number may vary depending on the environment,
the result won't be greatly different.

Library              |File size |# of   |nop    |List size| space
                     |(stripped)|syscall|(N * 3)|(N * 16) |   sum
                     |          |       |bytes  |bytes    | bytes
---------------------+----------+-------+-------+---------+------            |      1.7M|    701|   2013|    11216| 13329      |      112K|    208|    624|     3328|  3958 |      162K|     36|    108|      576|   684           |       31K|     29|     87|      464|   551            |       15K|      3|      9|       48|    57

overhead of function call or nop:
I measured the time of N-Loop of gettid system call.
The difference was less than OS noise so that I couldn't get meaningful
Probably the size of instructions are small enough so that they are
all stored in cpu icache. With real applications the effect might be

- syscal(SYS_gettid); <function call>
- asm("syscall\n": "=a"(ret): "0"(SYS_gettid)); <base line>
- asm("syscall\n nop * <3>\n": "=a"(ret): "0"(SYS_gettid));
- asm("syscall\n nop * <10>\n": "=a"(ret): "0"(SYS_gettid));
- asm("jmp 1f\n nop *<1>\n 1:\n syscall\n": "=a"(ret): "0"(SYS_gettid));
- asm("jmp 1f\n nop *<8>\n 1:\n syscall\n": "=a"(ret): "0"(SYS_gettid));

Impact on LibOSes
LibOS may need update to implement new logic of
  - inject symbols for hooking functions
  - hook syscall instruction somehow based on this proposal
    Although it's up to LibOS how/when to do it, LibOS may needs update.
If LibOS wants to stick to their own way without glibc help, it's okay.

This change to glibc may break the exiting heuristics of LibOS.
Some LibOSes expect specific opcode sequence which includes syscall
instruction for binary editing. This proposal changes such sequence.
So such heuristic may break.
glibc doesn't guarantee anything about such opcode sequence and this shows
the weak point of such heuristic.
This proposal introduces explicit contract between glibc and LibOS.

full disclosure
I'm working on Graphene LibOS project[1]. The above discussion is generic
and applicable to LibOS projects. At least I tried.
But I may be biased. The feedback from other LibOS projects is also more
than welcome.

Isaku Yamahata (11):
  x86-64, elf: make elf_machine_lazy_rel() ignore R_X86_64_NONE
  elf: add macro to define note section for LibOS
  elf: define note section for LibOS
  elf: add stub functions for LibOS support
  elf: add hook, __libos_map_library to dl-open.c
  elf/rtld: introduce runtime option to disable HP_TIMING_INLINE
  malloc: make arena size configurable on startup
  x86-64: replace syscall instruction with SYSCALL_INST macro
  x86-64: add nop instruction after syscall instrunction
  x86-64: make the number of nops after syscall configurable
  benchtests: simple benchmark to measure nop effects

 benchtests/bench-nop.c                        | 128 ++++++++++++++++++                                  |  19 +++
 elf/Makefile                                  |   3 +-
 elf/Versions                                  |   6 +
 elf/dl-load.c                                 |  22 +--
 elf/dl-tunables.list                          |   5 +
 elf/libos.c                                   |  36 +++++
 elf/libos.h                                   |  98 ++++++++++++++
 elf/rtld.c                                    |  20 ++-
 malloc/arena.c                                |  17 +--
 malloc/malloc.c                               |  25 ++++
 malloc/malloc.h                               |   1 +
 .../unix/sysv/linux/x86_64/____longjmp_chk.S  |   2 +-
 .../unix/sysv/linux/x86_64/__start_context.S  |   2 +-
 sysdeps/unix/sysv/linux/x86_64/cancellation.S |   2 +-
 sysdeps/unix/sysv/linux/x86_64/clone.S        |   4 +-
 sysdeps/unix/sysv/linux/x86_64/getcontext.S   |   4 +-
 sysdeps/unix/sysv/linux/x86_64/setcontext.S   |   2 +-
 sysdeps/unix/sysv/linux/x86_64/sigaction.c    |   2 +-
 sysdeps/unix/sysv/linux/x86_64/swapcontext.S  |   4 +-
 sysdeps/unix/sysv/linux/x86_64/syscall.S      |   2 +-
 sysdeps/unix/sysv/linux/x86_64/sysdep.h       |  50 +++++--
 sysdeps/unix/sysv/linux/x86_64/vfork.S        |   2 +-
 sysdeps/unix/sysv/linux/x86_64/x32/times.c    |   2 +-
 sysdeps/x86_64/dl-machine.h                   |   2 +
 sysdeps/x86_64/nptl/tls.h                     |   2 +-
 26 files changed, 418 insertions(+), 44 deletions(-)
 create mode 100644 benchtests/bench-nop.c
 create mode 100644 elf/libos.c
 create mode 100644 elf/libos.h


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]