System Call Wrappers

There are three types of OS kernel system call wrappers that are used by glibc: assembly, macro, and bespoke.

First we'll talk about the assembly ones. Then we'll talk about the other two.

Assembly syscalls

Simple kernel system calls in glibc are translated from a list of names into an assembly wrapper that is then compiled.

In a build directory disassemble the socket syscall and you'll see the syscall-template.S wrapper:

[azanella@x86_64-linux-gnu]$ objdump -ldr socket/socket.o

socket/socket.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <__socket>:
__socket():
/home/azanella/Projects/glibc/glibc-git/socket/../sysdeps/unix/syscall-template.S:79
   0:   b8 29 00 00 00          mov    $0x29,%eax
   5:   0f 05                   syscall
   7:   48 3d 01 f0 ff ff       cmp    $0xfffffffffffff001,%rax
   d:   0f 83 00 00 00 00       jae    13 <__socket+0x13>
                        f: R_X86_64_PC32        __syscall_error-0x4
/home/azanella/Projects/glibc/glibc-git/socket/../sysdeps/unix/syscall-template.S:80
  13:   c3                      retq
[azanella@x86_64-linux-gnu]$ cd ..

The list of syscalls that use wrappers is kept in the syscalls.list files:

[azanella@glibc-git]$ find . -name syscalls.list
./sysdeps/unix/sysv/linux/generic/syscalls.list
./sysdeps/unix/sysv/linux/generic/wordsize-32/syscalls.list
./sysdeps/unix/sysv/linux/sparc/sparc32/syscalls.list
./sysdeps/unix/sysv/linux/sparc/sparc64/syscalls.list
./sysdeps/unix/sysv/linux/alpha/syscalls.list
./sysdeps/unix/sysv/linux/s390/s390-32/syscalls.list
./sysdeps/unix/sysv/linux/microblaze/syscalls.list
./sysdeps/unix/sysv/linux/syscalls.list
./sysdeps/unix/sysv/linux/sh/syscalls.list
./sysdeps/unix/sysv/linux/powerpc/powerpc32/syscalls.list
./sysdeps/unix/sysv/linux/ia64/syscalls.list
./sysdeps/unix/sysv/linux/i386/syscalls.list
./sysdeps/unix/sysv/linux/x86_64/syscalls.list
./sysdeps/unix/sysv/linux/x86_64/x32/syscalls.list
./sysdeps/unix/sysv/linux/hppa/syscalls.list
./sysdeps/unix/sysv/linux/wordsize-64/syscalls.list
./sysdeps/unix/sysv/linux/m68k/syscalls.list
./sysdeps/unix/sysv/linux/arm/syscalls.list
./sysdeps/unix/sysv/linux/mips/mips64/n32/syscalls.list
./sysdeps/unix/sysv/linux/mips/mips64/n64/syscalls.list
./sysdeps/unix/sysv/linux/mips/syscalls.list
./sysdeps/unix/sysv/linux/mips/mips32/syscalls.list
./sysdeps/unix/syscalls.list
./sysdeps/unix/bsd/syscalls.list

The sysdep directory ordering helps decide which syscalls apply. So for example on x86_64 the following would apply:

./sysdeps/unix/sysv/linux/x86_64/syscalls.list
./sysdeps/unix/sysv/linux/wordsize-64/syscalls.list
./sysdeps/unix/sysv/linux/syscalls.list

The makefile rules for processing syscall wrappers are in sysdeps/unix/Makefile e.g.

...
ifndef avoid-generated
$(common-objpfx)sysd-syscalls: $(..)sysdeps/unix/make-syscalls.sh \
                               $(wildcard $(+sysdep_dirs:%=%/syscalls.list)) \
                               $(common-objpfx)libc-modules.stmp
        for dir in $(+sysdep_dirs); do \
          test -f $$dir/syscalls.list && \
          { sysdirs='$(sysdirs)' \
            asm_CPP='$(COMPILE.S) -E -x assembler-with-cpp' \
            $(SHELL) $(dir $<)$(notdir $<) $$dir || exit 1; }; \
          test $$dir = $(..)sysdeps/unix && break; \
        done > $@T
        mv -f $@T $@
endif
...

The syscalls.list files are processed by a script called sysdep/unix/make-syscalls.sh whose comments describe the format of a syscalls.list file.

The script uses a template called syscall-template.S to generate the assembly file that uses machine specific macros to build the wrapper for the syscall. The machines can override syscall-template.S with their own copy since it is also selected based on the sysdep directory ordering.

Lastly the macros for each machine are provided by the sysdep.h header files:

./sysdeps/aarch64/sysdep.h
./sysdeps/generic/sysdep.h
./sysdeps/sparc/sysdep.h
./sysdeps/s390/s390-32/sysdep.h
./sysdeps/s390/s390-64/sysdep.h
./sysdeps/microblaze/sysdep.h
./sysdeps/sh/sysdep.h
./sysdeps/powerpc/sysdep.h
./sysdeps/powerpc/powerpc64/sysdep.h
./sysdeps/powerpc/powerpc32/sysdep.h
./sysdeps/ia64/sysdep.h
./sysdeps/i386/sysdep.h
./sysdeps/mach/sysdep.h
./sysdeps/mach/i386/sysdep.h
./sysdeps/tile/sysdep.h
./sysdeps/x86_64/sysdep.h
./sysdeps/x86_64/x32/sysdep.h
./sysdeps/hppa/sysdep.h
./sysdeps/m68k/sysdep.h
./sysdeps/m68k/coldfire/sysdep.h
./sysdeps/m68k/m680x0/sysdep.h
./sysdeps/arm/sysdep.h
./sysdeps/nios2/sysdep.h
./sysdeps/unix/alpha/sysdep.h
./sysdeps/unix/sysdep.h
./sysdeps/unix/sysv/linux/aarch64/sysdep.h
./sysdeps/unix/sysv/linux/generic/sysdep.h
./sysdeps/unix/sysv/linux/sparc/sysdep.h
./sysdeps/unix/sysv/linux/sparc/sparc32/sysdep.h
./sysdeps/unix/sysv/linux/sparc/sparc64/sysdep.h
./sysdeps/unix/sysv/linux/alpha/sysdep.h
./sysdeps/unix/sysv/linux/s390/s390-32/sysdep.h
./sysdeps/unix/sysv/linux/s390/s390-64/sysdep.h
./sysdeps/unix/sysv/linux/sysdep.h
./sysdeps/unix/sysv/linux/microblaze/sysdep.h
./sysdeps/unix/sysv/linux/sh/sh4/sysdep.h
./sysdeps/unix/sysv/linux/sh/sysdep.h
./sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h
./sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep.h
./sysdeps/unix/sysv/linux/ia64/sysdep.h
./sysdeps/unix/sysv/linux/i386/sysdep.h
./sysdeps/unix/sysv/linux/tile/sysdep.h
./sysdeps/unix/sysv/linux/x86_64/sysdep.h
./sysdeps/unix/sysv/linux/x86_64/x32/sysdep.h
./sysdeps/unix/sysv/linux/hppa/sysdep.h
./sysdeps/unix/sysv/linux/m68k/sysdep.h
./sysdeps/unix/sysv/linux/m68k/coldfire/sysdep.h
./sysdeps/unix/sysv/linux/m68k/m680x0/sysdep.h
./sysdeps/unix/sysv/linux/arm/sysdep.h
./sysdeps/unix/sysv/linux/nios2/sysdep.h
./sysdeps/unix/sysv/linux/mips/mips64/n32/sysdep.h
./sysdeps/unix/sysv/linux/mips/mips64/n64/sysdep.h
./sysdeps/unix/sysv/linux/mips/mips32/sysdep.h
./sysdeps/unix/sh/sysdep.h
./sysdeps/unix/powerpc/sysdep.h
./sysdeps/unix/i386/sysdep.h
./sysdeps/unix/x86_64/sysdep.h
./sysdeps/unix/arm/sysdep.h
./sysdeps/unix/mips/mips64/n32/sysdep.h
./sysdeps/unix/mips/mips64/n64/sysdep.h
./sysdeps/unix/mips/sysdep.h
./sysdeps/unix/mips/mips32/sysdep.h

Together all of these pieces produce a wrapper compiled like this:

(echo '#define SYSCALL_NAME socket'; \
 echo '#define SYSCALL_NARGS 3'; \
 echo '#define SYSCALL_SYMBOL __socket'; \
 echo '#define SYSCALL_CANCELLABLE 0'; \
 echo '#define SYSCALL_NOERRNO 0'; \
 echo '#define SYSCALL_ERRVAL 0'; \
 echo '#include <syscall-template.S>'; \
 echo 'weak_alias (__socket, socket)'; \
 echo 'hidden_weak (socket)'; \
) | /opt/cross/x86_64-linux-gnu/bin/x86_64-glibc-linux-gnu-gcc -c     -I../include -I/home/azanella/Projects/glibc/build/x86_64-linux-gnu/socket  -I/home/azanella/Projects/glibc/build/x86_64-linux-gnu  -I../sysdeps/unix/sysv/linux/x86_64/64  -I../sysdeps/unix/sysv/linux/x86_64  -I../sysdeps/unix/sysv/linux/x86  -I../sysdeps/x86/nptl  -I../sysdeps/unix/sysv/linux/wordsize-64  -I../sysdeps/x86_64/nptl  -I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux  -I../sysdeps/nptl  -I../sysdeps/pthread  -I../sysdeps/gnu  -I../sysdeps/unix/inet  -I../sysdeps/unix/sysv  -I../sysdeps/unix/x86_64  -I../sysdeps/unix  -I../sysdeps/posix  -I../sysdeps/x86_64/64  -I../sysdeps/x86_64/fpu/multiarch  -I../sysdeps/x86_64/fpu  -I../sysdeps/x86/fpu/include -I../sysdeps/x86/fpu  -I../sysdeps/x86_64/multiarch  -I../sysdeps/x86_64  -I../sysdeps/x86  -I../sysdeps/ieee754/float128  -I../sysdeps/ieee754/ldbl-96/include -I../sysdeps/ieee754/ldbl-96  -I../sysdeps/ieee754/dbl-64/wordsize-64  -I../sysdeps/ieee754/dbl-64  -I../sysdeps/ieee754/flt-32  -I../sysdeps/wordsize-64  -I../sysdeps/ieee754  -I../sysdeps/generic  -I.. -I../libio -I.   -D_LIBC_REENTRANT -include /home/azanella/Projects/glibc/build/x86_64-linux-gnu/libc-modules.h -DMODULE_NAME=libc -include ../include/libc-symbols.h  -DPIC -DSHARED     -DTOP_NAMESPACE=glibc -DASSEMBLER  -g -Werror=undef -Wa,--noexecstack   -o /home/azanella/Projects/glibc/build/x86_64-linux-gnu/socket/socket.os -x assembler-with-cpp - -MD -MP -MF /home/azanella/Projects/glibc/build/x86_64-linux-gnu/socket/socket.os.dt -MT /home/azanella/Projects/glibc/build/x86_64-linux-gnu/socket/socket.os

Note the use of -x assembler-with-cpp, and thus these wrappers should only use assembly.

NOTE: GLIBC 2.26 and previous version used to define cancellation syscalls by using an auxiliary header sysdep-cancel.h which contained macros for the required steps (calling the __{libc,pthread,librt}_{enable,disable}_asynccancel functions in nopic/pic mode). GLIBC 2.27 and forward only requires the default sysdep.h assembly macros and all cancellation syscalls are implemented in C file using SYSCALL_CANCEL macro.

Macro Syscalls

The macro syscalls are handled by *.c files that are much more complicated than simple wrappers.

Some system calls may require shuffling the kernel result into a userspace structure and thus glibc needs a way to make inline system calls in C code.

This is handled by macros defined in the sysdep.h files.

The macros are all called INTERNAL_* and INLINE_* and provide several variants to be used by the source code.

You can see use of these macros for example here in the wait function implementation (sysdeps/unix/sysv/linux/wait.c):

/* Wait for a child to die.  When one does, put its status in *STAT_LOC
   and return its process ID.  For errors, return (pid_t) -1.  */
pid_t
__libc_wait (int *stat_loc)
{
  pid_t result = SYSCALL_CANCEL (wait4, WAIT_ANY, stat_loc, 0,
                                 (struct rusage *) NULL);
  return result;
}

The function __libc_wait calls the SYSCALL_CANCEL macro which is defined as (sysdeps/unix/sysdep.h):

#define SYSCALL_CANCEL(...) \
  ({                                                                         \
    long int sc_ret;                                                         \
    if (SINGLE_THREAD_P)                                                     \
      sc_ret = INLINE_SYSCALL_CALL (__VA_ARGS__);                            \
    else                                                                     \
      {                                                                      \
        int sc_cancel_oldtype = LIBC_CANCEL_ASYNC ();                        \
        sc_ret = INLINE_SYSCALL_CALL (__VA_ARGS__);                          \
        LIBC_CANCEL_RESET (sc_cancel_oldtype);                               \
      }                                                                      \
    sc_ret;                                                                  \
  })

The LIBC_CANCEL_ASYNC calls __{libc,pthread,librt}_enable_asynccancel which enables asynchronous cancellation mode atomically before calling the syscall. In the other handle LIBC_CANCEL_RESET atomically disables the asynchronous cancellation mode by calling __{libc,pthread,librt}_disable_asynccancel and act accordingly if required.

Bespoke Syscalls

The British term bespoke means that it was custom made or tailored to the requirements of the buyer. There are some places in glibc where system calls are made that do not use the standard assembly or C code macros.

The best example of this is the fork and vfork implementation, which requires specific calling convention on Linux depending of the architecture. For instance for x86_64 (sysdeps/unix/sysv/linux/x86_64/vfork.S):

ENTRY (__vfork)

        /* Pop the return PC value into RDI.  We need a register that
           is preserved by the syscall and that we're allowed to destroy. */
        popq    %rdi
        cfi_adjust_cfa_offset(-8)
        cfi_register(%rip, %rdi)

        /* Stuff the syscall number in RAX and enter into the kernel.  */
        movl    $SYS_ify (vfork), %eax
        syscall

        /* Push back the return PC.  */
        pushq   %rdi
        cfi_adjust_cfa_offset(8)

        cmpl    $-4095, %eax
        jae SYSCALL_ERROR_LABEL         /* Branch forward if it failed.  */

        /* Normal return.  */
        ret

PSEUDO_END (__vfork)

It uses the sysdep.h macros for function return (SYSCALL_ERROR_LABEL), however due some specific ABI and semantic constraints it require some specific assembly implementation.

The truth of the matter is that mostly of bespoke cases should probably all be cleaned up to use macros.

None: SyscallWrappers (last edited 2017-08-24 14:50:58 by b1c27c03)