This is the mail archive of the
mailing list for the glibc project.
[RFC PATCH 00/11] Library OS support
- From: Isaku Yamahata <isaku dot yamahata at gmail dot com>
- To: libc-alpha at sourceware dot org
- Cc: isaku dot yamahata at intel dot com, Isaku Yamahata <isaku dot yamahata at gmail dot com>
- Date: Wed, 11 Sep 2019 14:03:58 -0700
- Subject: [RFC PATCH 00/11] Library OS support
This patch is to add Library OS(LibOS in short) to glibc.
This is the first version of patch series to support LibOS.
The feedback is more than welcome.
I'll give a remote presentation at GNU cauldron 2019 on 13 September.
Why LibOS support?
Recently there are many Library OS projects and some of them have been already
deployed in the fields. Typically they uses modified libc to get control
instead of kernel to process system call. Such modifications are done
independently by each project. They are making duplicated effort.
This effort is to upstream those common modification to glibc upstream.
Also some projects adapt other libc implementation for some reasons.
With this LibOS support, glibc can gain more user base.
What is LibOS?
LibOS implements OS functionalities as library that executes in the application
address space. To invoke its entry point, usually function call is used instead
of special instruction like syscall instruction.
The common use cases are, container(unikernel or sand box),
compatibility(e.g. SGX support), and/or performance(e.g. avoiding
There are a lot of academic papers and industrial white papers.
What does LibOS?
There are common behaviors of LibOS in general.
As boot up, LibOS is booted by kernel and then LibOS takes control.
LibOS loads interpreter(ld.so) and target application binary instead of kernel.
small number of hooks are needed to inject special logic.
For example, LibOS wants to know when ld.so (un)loads shared library so that
it can inject extra logic.
LibOS has its own virtual address layout so that it has restriction
for heap space. So heap allocator of libc needs to be aware of it.
If heap allocator requests too large area, it results in ENOMEM.
Thread Control Block:
LibOS also needs to have its own thread control block in addition to
one way is modify tcbhead_t at source code level Another way is to use %gs.
Hooking system call:
For LibOS to take control instead of kernel on system call, system call
instruction is replaced with function call to LibOS.
There are two points. how to identify system call instruction and
how to replace it with function call.
The approach varies among LibOSes. There are two major ones.
- modify libc at source code level
prepare shared library specialized for LibOS and use them instead of
those installed on the system.
This assumes that executables are usually dynamically liked and shared
library (ld.so, libc.so libpthread.so etc.) can be easily replaced.
The downside is, this can't be applied to statically linked binary easily.
- analyze opcodes and replace system call instruction somehow
It can be done at loading time, execution time or offline.
This technique applies to both dynamically/statically linked binary.
The downside is, such logic is complex and fragile. The code tends to
be huge. The existing binary analysis framework can be used.
High level direction
Single version of binary:
For library maintenance, the single version of binary should serve both
native case(traditional tools stack) and LibOS case.
We shouldn't have multiple versions of binary.
e.g. version for native, version for LibOS X and more.
multiple version won't scale and the maintenance of multiple version
won't be sustainable.
This also means the approach should be agnostic to LibOS.
Minimize maintenance const in glibc/overhead for native case:
For maintainability in glibc, the change to glibc should be minimized and
the complexity/burden should be put on LibOSes when trade off is needed.
Also the overhead(cpu cycles and memory space) for native case should be
minimized because the largest user base is native case.
adding initialization hooks:
add stub functions of weak symbol so that LibOS can interpose those functions.
introduce new tunables to specify heap size so that LibOS can tell heap size
to heap allocator
introduce weak symbol to disable rtld profiling
introduce new note which describes LibOS support. Then LibOS can easily
check if it supports LibOS or not.
hooking system call:
For LibOS to use binary editing, create a list of system call instructions and
adds nop instruction for binary editing.
introduce .libos.instruction.syscall section for it.
x86 instruction has variable length and syscall instruction has 2 byte length.
4 byte jump/call requires 5 bytes. So it complicates binary editing.
To make binary editing easier nop instructions are added around syscall
Alternatives for system call hooking:
- weak symbol function
The extrem option is, replace system call instruction with syscall function.
and make syscall function as weak symbol.
Then LibOS needs to hook only syscall function.
This may incur function call overhead on native case.
- SDT marker
SDT marker is optimized for its usage. It's not suitable for hooking
system call instructions.
For example, only single nop is inserted and the size of .note is 32+ bytes
Analysis and benchmark
The number of syscall instruction and space overhead:
I counted the number of syscall instruction and space overhead
on my environment.
Although the actual number may vary depending on the environment,
the result won't be greatly different.
Library |File size |# of |nop |List size| space
|(stripped)|syscall|(N * 3)|(N * 16) | sum
| | |bytes |bytes | bytes
Libc.so.6 | 1.7M| 701| 2013| 11216| 13329
libpthread.so.0 | 112K| 208| 624| 3328| 3958
ld-linux-x86-64.so.2 | 162K| 36| 108| 576| 684
librt.so.1 | 31K| 29| 87| 464| 551
libnal.so | 15K| 3| 9| 48| 57
overhead of function call or nop:
I measured the time of N-Loop of gettid system call.
The difference was less than OS noise so that I couldn't get meaningful
Probably the size of instructions are small enough so that they are
all stored in cpu icache. With real applications the effect might be
- syscal(SYS_gettid); <function call>
- asm("syscall\n": "=a"(ret): "0"(SYS_gettid)); <base line>
- asm("syscall\n nop * <3>\n": "=a"(ret): "0"(SYS_gettid));
- asm("syscall\n nop * <10>\n": "=a"(ret): "0"(SYS_gettid));
- asm("jmp 1f\n nop *<1>\n 1:\n syscall\n": "=a"(ret): "0"(SYS_gettid));
- asm("jmp 1f\n nop *<8>\n 1:\n syscall\n": "=a"(ret): "0"(SYS_gettid));
Impact on LibOSes
LibOS may need update to implement new logic of
- inject symbols for hooking functions
- hook syscall instruction somehow based on this proposal
Although it's up to LibOS how/when to do it, LibOS may needs update.
If LibOS wants to stick to their own way without glibc help, it's okay.
This change to glibc may break the exiting heuristics of LibOS.
Some LibOSes expect specific opcode sequence which includes syscall
instruction for binary editing. This proposal changes such sequence.
So such heuristic may break.
glibc doesn't guarantee anything about such opcode sequence and this shows
the weak point of such heuristic.
This proposal introduces explicit contract between glibc and LibOS.
I'm working on Graphene LibOS project. The above discussion is generic
and applicable to LibOS projects. At least I tried.
But I may be biased. The feedback from other LibOS projects is also more
Isaku Yamahata (11):
x86-64, elf: make elf_machine_lazy_rel() ignore R_X86_64_NONE
elf: add macro to define note section for LibOS
elf: define note section for LibOS
elf: add stub functions for LibOS support
elf: add hook, __libos_map_library to dl-open.c
elf/rtld: introduce runtime option to disable HP_TIMING_INLINE
malloc: make arena size configurable on startup
x86-64: replace syscall instruction with SYSCALL_INST macro
x86-64: add nop instruction after syscall instrunction
x86-64: make the number of nops after syscall configurable
benchtests: simple benchmark to measure nop effects
benchtests/bench-nop.c | 128 ++++++++++++++++++
configure.ac | 19 +++
elf/Makefile | 3 +-
elf/Versions | 6 +
elf/dl-load.c | 22 +--
elf/dl-tunables.list | 5 +
elf/libos.c | 36 +++++
elf/libos.h | 98 ++++++++++++++
elf/rtld.c | 20 ++-
malloc/arena.c | 17 +--
malloc/malloc.c | 25 ++++
malloc/malloc.h | 1 +
.../unix/sysv/linux/x86_64/____longjmp_chk.S | 2 +-
.../unix/sysv/linux/x86_64/__start_context.S | 2 +-
sysdeps/unix/sysv/linux/x86_64/cancellation.S | 2 +-
sysdeps/unix/sysv/linux/x86_64/clone.S | 4 +-
sysdeps/unix/sysv/linux/x86_64/getcontext.S | 4 +-
sysdeps/unix/sysv/linux/x86_64/setcontext.S | 2 +-
sysdeps/unix/sysv/linux/x86_64/sigaction.c | 2 +-
sysdeps/unix/sysv/linux/x86_64/swapcontext.S | 4 +-
sysdeps/unix/sysv/linux/x86_64/syscall.S | 2 +-
sysdeps/unix/sysv/linux/x86_64/sysdep.h | 50 +++++--
sysdeps/unix/sysv/linux/x86_64/vfork.S | 2 +-
sysdeps/unix/sysv/linux/x86_64/x32/times.c | 2 +-
sysdeps/x86_64/dl-machine.h | 2 +
sysdeps/x86_64/nptl/tls.h | 2 +-
26 files changed, 418 insertions(+), 44 deletions(-)
create mode 100644 benchtests/bench-nop.c
create mode 100644 elf/libos.c
create mode 100644 elf/libos.h