System Call Wrappers

There are three types of OS kernel system call wrappers that are used by glibc: assembly, macro, and bespoke.

First we'll talk about the assembly ones. Then we'll talk about the other two.

Assembly syscalls

Simple kernel system calls in glibc are translated from a list of names into an assembly wrapper that is then compiled.

In a build directory disassemble the write syscall and you'll see the syscall-template.S wrapper:

[carlos@koi glibc]$ objdump -ldr io/write.o

io/write.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <__libc_write>:
write():
/home/carlos/src/glibc/io/../sysdeps/unix/syscall-template.S:81
   0:   83 3d 00 00 00 00 00    cmpl   $0x0,0x0(%rip)        # 7 <__libc_write+0x7>
                        2: R_X86_64_PC32        __libc_multiple_threads-0x5
   7:   75 14                   jne    1d <__write_nocancel+0x14>

0000000000000009 <__write_nocancel>:
__write_nocancel():
   9:   b8 01 00 00 00          mov    $0x1,%eax
   e:   0f 05                   syscall 
  10:   48 3d 01 f0 ff ff       cmp    $0xfffffffffffff001,%rax
  16:   0f 83 00 00 00 00       jae    1c <__write_nocancel+0x13>
                        18: R_X86_64_PC32       __syscall_error-0x4
  1c:   c3                      retq   
  1d:   48 83 ec 08             sub    $0x8,%rsp
  21:   e8 00 00 00 00          callq  26 <__write_nocancel+0x1d>
                        22: R_X86_64_PC32       __libc_enable_asynccancel-0x4
  26:   48 89 04 24             mov    %rax,(%rsp)
  2a:   b8 01 00 00 00          mov    $0x1,%eax
  2f:   0f 05                   syscall 
  31:   48 8b 3c 24             mov    (%rsp),%rdi
  35:   48 89 c2                mov    %rax,%rdx
  38:   e8 00 00 00 00          callq  3d <__write_nocancel+0x34>
                        39: R_X86_64_PC32       __libc_disable_asynccancel-0x4
  3d:   48 89 d0                mov    %rdx,%rax
  40:   48 83 c4 08             add    $0x8,%rsp
  44:   48 3d 01 f0 ff ff       cmp    $0xfffffffffffff001,%rax
  4a:   0f 83 00 00 00 00       jae    50 <__write_nocancel+0x47>
                        4c: R_X86_64_PC32       __syscall_error-0x4
/home/carlos/src/glibc/io/../sysdeps/unix/syscall-template.S:82
  50:   c3                      retq   
[carlos@koi glibc]$ 

The list of syscalls that use wrappers is kept in the syscalls.list files:

[carlos@koi glibc]$ find . -name syscalls.list
./ports/sysdeps/unix/sysv/linux/mips/mips32/syscalls.list
./ports/sysdeps/unix/sysv/linux/mips/syscalls.list
./ports/sysdeps/unix/sysv/linux/mips/mips64/n64/syscalls.list
./ports/sysdeps/unix/sysv/linux/mips/mips64/syscalls.list
./ports/sysdeps/unix/sysv/linux/mips/mips64/n32/syscalls.list
./ports/sysdeps/unix/sysv/linux/m68k/m680x0/syscalls.list
./ports/sysdeps/unix/sysv/linux/m68k/syscalls.list
./ports/sysdeps/unix/sysv/linux/ia64/syscalls.list
./ports/sysdeps/unix/sysv/linux/alpha/syscalls.list
./ports/sysdeps/unix/sysv/linux/arm/syscalls.list
./ports/sysdeps/unix/sysv/linux/generic/wordsize-32/syscalls.list
./ports/sysdeps/unix/sysv/linux/generic/syscalls.list
./ports/sysdeps/unix/sysv/linux/hppa/syscalls.list
./sysdeps/unix/sysv/linux/sh/syscalls.list
./sysdeps/unix/sysv/linux/wordsize-64/syscalls.list
./sysdeps/unix/sysv/linux/i386/syscalls.list
./sysdeps/unix/sysv/linux/syscalls.list
./sysdeps/unix/sysv/linux/powerpc/powerpc64/syscalls.list
./sysdeps/unix/sysv/linux/powerpc/syscalls.list
./sysdeps/unix/sysv/linux/powerpc/powerpc32/syscalls.list
./sysdeps/unix/sysv/linux/x86_64/syscalls.list
./sysdeps/unix/sysv/linux/x86_64/x32/syscalls.list
./sysdeps/unix/sysv/linux/s390/s390-64/syscalls.list
./sysdeps/unix/sysv/linux/s390/s390-32/syscalls.list
./sysdeps/unix/sysv/linux/sparc/sparc64/syscalls.list
./sysdeps/unix/sysv/linux/sparc/syscalls.list
./sysdeps/unix/sysv/linux/sparc/sparc32/syscalls.list
./sysdeps/unix/syscalls.list
./sysdeps/unix/bsd/bsd4.4/syscalls.list
./sysdeps/unix/bsd/syscalls.list
[carlos@koi glibc]$ 

The sysdep directory ordering helps decide which syscalls apply. So for example on x86_64 the following would apply:

./sysdeps/unix/sysv/linux/x86_64/syscalls.list
./sysdeps/unix/sysv/linux/wordsize-64/syscalls.list
./sysdeps/unix/sysv/linux/syscalls.list

The makefile rules for processing syscall wrappers are in sysdeps/unix/Makefile e.g.

...
ifndef avoid-generated
$(common-objpfx)sysd-syscalls: $(..)sysdeps/unix/make-syscalls.sh \
                               $(wildcard $(+sysdep_dirs:%=%/syscalls.list))
        for dir in $(+sysdep_dirs); do \
          test -f $$dir/syscalls.list && \
          { sysdirs='$(sysdirs)' \
            asm_CPP='$(COMPILE.S) -E -x assembler-with-cpp' \
            $(SHELL) $(dir $<)$(notdir $<) $$dir || exit 1; }; \
          test $$dir = $(..)sysdeps/unix && break; \
        done > $@T
        mv -f $@T $@
endif
...

The syscalls.list files are processed by a script called sysdep/unix/make-syscalls.sh whose comments describe the format of a syscalls.list file.

The script uses a template called syscall-template.S to generate the assembly file that uses machine specific macros to build the wrapper for the syscall. The machines can override syscall-template.S with their own copy since it is also selected based on the sysdep directory ordering.

Lastly the macros for each machine are provided by the sysdep.h header files:

[carlos@koi glibc]$ find . -name sysdep.h
./ports/sysdeps/m68k/coldfire/sysdep.h
./ports/sysdeps/m68k/m680x0/sysdep.h
./ports/sysdeps/m68k/sysdep.h
./ports/sysdeps/ia64/sysdep.h
./ports/sysdeps/am33/sysdep.h
./ports/sysdeps/arm/sysdep.h
./ports/sysdeps/aarch64/sysdep.h
./ports/sysdeps/unix/mips/mips32/sysdep.h
./ports/sysdeps/unix/mips/mips64/n64/sysdep.h
./ports/sysdeps/unix/mips/mips64/n32/sysdep.h
./ports/sysdeps/unix/mips/sysdep.h
./ports/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h
./ports/sysdeps/unix/sysv/linux/mips/mips64/n64/sysdep.h
./ports/sysdeps/unix/sysv/linux/mips/mips64/n32/sysdep.h
./ports/sysdeps/unix/sysv/linux/m68k/coldfire/sysdep.h
./ports/sysdeps/unix/sysv/linux/m68k/m680x0/sysdep.h
./ports/sysdeps/unix/sysv/linux/m68k/sysdep.h
./ports/sysdeps/unix/sysv/linux/ia64/sysdep.h
./ports/sysdeps/unix/sysv/linux/am33/sysdep.h
./ports/sysdeps/unix/sysv/linux/alpha/sysdep.h
./ports/sysdeps/unix/sysv/linux/arm/sysdep.h
./ports/sysdeps/unix/sysv/linux/aarch64/sysdep.h
./ports/sysdeps/unix/sysv/linux/generic/sysdep.h
./ports/sysdeps/unix/sysv/linux/hppa/sysdep.h
./ports/sysdeps/unix/sysv/linux/tile/sysdep.h
./ports/sysdeps/unix/am33/sysdep.h
./ports/sysdeps/unix/alpha/sysdep.h
./ports/sysdeps/unix/arm/sysdep.h
./ports/sysdeps/hppa/sysdep.h
./ports/sysdeps/tile/sysdep.h
./sysdeps/sh/sysdep.h
./sysdeps/i386/sysdep.h
./sysdeps/generic/sysdep.h
./sysdeps/unix/sh/sysdep.h
./sysdeps/unix/i386/sysdep.h
./sysdeps/unix/sysv/linux/sh/sh4/sysdep.h
./sysdeps/unix/sysv/linux/sh/sysdep.h
./sysdeps/unix/sysv/linux/i386/sysdep.h
./sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h
./sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep.h
./sysdeps/unix/sysv/linux/x86_64/x32/sysdep.h
./sysdeps/unix/sysv/linux/x86_64/sysdep.h
./sysdeps/unix/sysv/linux/s390/s390-64/sysdep.h
./sysdeps/unix/sysv/linux/s390/s390-32/sysdep.h
./sysdeps/unix/sysv/linux/sparc/sparc64/sysdep.h
./sysdeps/unix/sysv/linux/sparc/sparc32/sysdep.h
./sysdeps/unix/sysv/linux/sparc/sysdep.h
./sysdeps/unix/powerpc/sysdep.h
./sysdeps/unix/x86_64/sysdep.h
./sysdeps/unix/sysdep.h
./sysdeps/mach/i386/sysdep.h
./sysdeps/mach/sysdep.h
./sysdeps/powerpc/powerpc64/sysdep.h
./sysdeps/powerpc/powerpc32/sysdep.h
./sysdeps/powerpc/sysdep.h
./sysdeps/x86_64/x32/sysdep.h
./sysdeps/x86_64/sysdep.h
./sysdeps/s390/s390-64/sysdep.h
./sysdeps/s390/s390-32/sysdep.h
./sysdeps/sparc/sysdep.h
[carlos@koi glibc]$ 

The sysdep.h is additionally augmented by the sysdep-cancel.h header which is used in the event that cancellation is required for the syscall:

[carlos@koi glibc]$ find . -name sysdep-cancel.h
./ports/sysdeps/unix/sysv/linux/mips/mips64/nptl/sysdep-cancel.h
./ports/sysdeps/unix/sysv/linux/mips/nptl/sysdep-cancel.h
./ports/sysdeps/unix/sysv/linux/m68k/nptl/sysdep-cancel.h
./ports/sysdeps/unix/sysv/linux/ia64/nptl/sysdep-cancel.h
./ports/sysdeps/unix/sysv/linux/am33/linuxthreads/sysdep-cancel.h
./ports/sysdeps/unix/sysv/linux/alpha/nptl/sysdep-cancel.h
./ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
./ports/sysdeps/unix/sysv/linux/aarch64/nptl/sysdep-cancel.h
./ports/sysdeps/unix/sysv/linux/hppa/nptl/sysdep-cancel.h
./ports/sysdeps/unix/sysv/linux/tile/nptl/sysdep-cancel.h
./nptl/sysdeps/unix/sysv/linux/sh/sysdep-cancel.h
./nptl/sysdeps/unix/sysv/linux/i386/sysdep-cancel.h
./nptl/sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep-cancel.h
./nptl/sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep-cancel.h
./nptl/sysdeps/unix/sysv/linux/x86_64/sysdep-cancel.h
./nptl/sysdeps/unix/sysv/linux/s390/s390-64/sysdep-cancel.h
./nptl/sysdeps/unix/sysv/linux/s390/s390-32/sysdep-cancel.h
./nptl/sysdeps/unix/sysv/linux/sparc/sparc64/sysdep-cancel.h
./nptl/sysdeps/unix/sysv/linux/sparc/sparc32/sysdep-cancel.h
./sysdeps/generic/sysdep-cancel.h
[carlos@koi glibc]$ 

Together all of these pieces produce a wrapper compiled like this:

(echo '#define SYSCALL_NAME write'; \
 echo '#define SYSCALL_NARGS 3'; \
 echo '#define SYSCALL_SYMBOL __libc_write'; \
 echo '#define SYSCALL_CANCELLABLE 1'; \
 echo '#include <syscall-template.S>'; \
 echo 'weak_alias (__libc_write, __write)'; \
 echo 'libc_hidden_weak (__write)'; \
 echo 'weak_alias (__libc_write, write)'; \
 echo 'libc_hidden_weak (write)'; \
) | gcc -c  -g -O2  -I../include -I/home/carlos/build/glibc/io  -I/home/carlos/build/glibc  -I../sysdeps/unix/sysv/linux/x86_64/64/nptl  -I../sysdeps/unix/sysv/linux/x86_64/64  -I../nptl/sysdeps/unix/sysv/linux/x86_64  -I../nptl/sysdeps/unix/sysv/linux/x86  -I../sysdeps/unix/sysv/linux/x86  -I../sysdeps/unix/sysv/linux/x86_64  -I../sysdeps/unix/sysv/linux/wordsize-64  -I../nptl/sysdeps/unix/sysv/linux  -I../nptl/sysdeps/pthread  -I../sysdeps/pthread  -I../ports/sysdeps/unix/sysv/linux  -I../sysdeps/unix/sysv/linux  -I../sysdeps/gnu  -I../sysdeps/unix/inet  -I../nptl/sysdeps/unix/sysv  -I../ports/sysdeps/unix/sysv  -I../sysdeps/unix/sysv  -I../sysdeps/unix/x86_64  -I../nptl/sysdeps/unix  -I../ports/sysdeps/unix  -I../sysdeps/unix  -I../sysdeps/posix  -I../nptl/sysdeps/x86_64/64  -I../sysdeps/x86_64/64  -I../sysdeps/x86_64/fpu/multiarch  -I../sysdeps/x86_64/fpu  -I../sysdeps/x86/fpu  -I../sysdeps/x86_64/multiarch  -I../nptl/sysdeps/x86_64  -I../sysdeps/x86_64  -I../sysdeps/x86  -I../sysdeps/ieee754/ldbl-96  -I../sysdeps/ieee754/dbl-64/wordsize-64  -I../sysdeps/ieee754/dbl-64  -I../sysdeps/ieee754/flt-32  -I../sysdeps/wordsize-64  -I../sysdeps/ieee754  -I../sysdeps/generic  -I../nptl  -I../ports  -I.. -I../libio -I. -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.7.2/include -isystem /home/carlos/install-linux/include  -D_LIBC_REENTRANT -include ../include/libc-symbols.h       -DASSEMBLER  -g -Wa,--noexecstack   -o /home/carlos/build/glibc/io/write.o -x assembler-with-cpp - -MD -MP -MF /home/carlos/build/glibc/io/write.o.dt -MT /home/carlos/build/glibc/io/write.o

Note the use of -x assembler-with-cpp, and thus these wrappers should only use assembly.

Macro Syscalls

The macro syscalls are handled by *.c files that are much more complicated than simple wrappers.

Some system calls may require shuffling the kernel result into a userspace structure and thus glibc needs a way to make inline system calls in C code.

This is handled by macros defined in the sysdep.h files.

The macros are all called INTERNAL_* and INLINE_* and provide several variants to be used by the source code.

You can see use of these macros for example here in the wait function implementation:

/* Wait for a child to die.  When one does, put its status in *STAT_LOC
   and return its process ID.  For errors, return (pid_t) -1.  */
pid_t
__libc_wait (__WAIT_STATUS_DEFN stat_loc)
{
  if (SINGLE_THREAD_P)
    return INLINE_SYSCALL (wait4, 4, WAIT_ANY, stat_loc, 0,
                           (struct rusage *) NULL);

  int oldtype = LIBC_CANCEL_ASYNC ();

  pid_t result = INLINE_SYSCALL (wait4, 4, WAIT_ANY, stat_loc, 0,
                                 (struct rusage *) NULL);

  LIBC_CANCEL_RESET (oldtype);

  return result;
}

The function __libc_wait has two inline syscalls with cancellation being conditionally enabled based on the number of threads.

Bespoke Syscalls

The British term bespoke means that it was custom made or tailored to the requirements of the buyer. There are some places in glibc where system calls are made that do not use the standard assembly or C code macros.

The best example of this is the x86 lowlevellock.h which is used to implement POSIX thread support.

To give you an example from nptl/sysdeps/unix/sysv/linux/i386/lowlevellock.h:

#define lll_futex_timed_wait(futex, val, timeout, private) \
  ({                                                                          \
    int __status;                                                             \
    register __typeof (val) _val asm ("edx") = (val);                         \
    __asm __volatile (LLL_EBX_LOAD                                            \
                      LLL_ENTER_KERNEL                                        \
                      LLL_EBX_LOAD                                            \
                      : "=a" (__status)                                       \
                      : "0" (SYS_futex), LLL_EBX_REG (futex), "S" (timeout),  \
                        "c" (__lll_private_flag (FUTEX_WAIT, private)),       \
                        "d" (_val), "i" (offsetof (tcbhead_t, sysinfo))       \
                      : "memory");                                            \
    __status;                                                                 \
  })

The macro calls the kernel futex syscall without using INLINE_SYSCALL or INTERNAL_SYSCALL.

The truth of the matter is that this bespoke case should probably all be cleaned up to use macros.

None: SyscallWrappers (last edited 2013-04-02 17:12:22 by CarlosODonell)