Glibc Coding Style and Conventions

This document is not meant to be a reiteration of the GNU coding style document (see here). It is a clarification when the GLIBC policy differs or expands upon the GNU standard. See NewPorts for additional information on what is expected for architecture-specific code.

This is a work in progress and not yet definitive.


1. Code Formatting

1.1. Symbols and Parenthesis

When invoking functions make sure there is a space between the symbol and the parenthesis, e.g.,

retval = foo (bar);

This applies to the defined preprocessor function as well (but you can also omit the parentheses completely in that case):

#if defined (__foo__) || defined (__FOO__)
# define FOO 1
#endif

This includes any macros that are used as functions too:

size_t align = ALIGN_UP (size, 4096);
hidden_def (some_symbol);

It also applies to keywords such as sizeof and __attribute__.

1.1.1. Exceptions

This rule does not apply to preprocessor function definitions where having a space between the symbol and the parenthesis would cause improper expansion, for example, the following is correct:

#define BAT(x) \
({               \
  bat = FOO (x); \
})

Whereas, the following is incorrect:

#define BAT (x) \
({               \
  bat = FOO (x); \
})

When using macros that are not "function-like" (e.g. they are used to expand symbol names), the space should be omitted:

ElfW(Ehdr) ehdr;

if (GLRO(dl_fpu_control) != 0)
  return;

/* NB: Pay particular attention here!  */
GLRO(dl_debug_printf) ("some debug data: %u\n", i);

/* Same rules apply to assembly code.  */
call HIDDEN_JUMPTARGET(__fortify_fail)
b PLTJMP(HIDDEN_JUMPTARGET(__sigsetjmp))

Pay particular attention to the spacing

1.2. Multi-line function-like macros

First, do you need to use a macro at all? See #MacrosVsStaticInlines below.

Macros that span multiple lines should take one of two forms. The first form is used if the macro should act like a single statement:

#define FUNC(x)       \
  do                  \
    {                 \
      ..............; \
    }                 \
  while (0)

Here the do {} while (0) construct is formatted entirely as the GNU coding standard dictates.

The second form is used if the macro contains multiple statments but should act like an expression with a non-void value. If used in an installed header, it is necessary to use the (__extension__ ({ ... })) form.

#define FUNC(x)     \
  ({                \
    ..............; \
  })

There are tabs or spaces after the function name and arguments (as on subsequent lines) followed by the line continuation character \. The next line begins with   ({ and subsequent lines indent two characters and follow normal GNU conventions after that. Two options exist for the position of \ characters, either you pick the longest line and add a space and line up all \ at that column using tabs then spaces, or you place the \ as the 79th character if the lines are already close to that length already.

Both constructs are used extensively in glibc with some variance on exactly how they are written, but we should standardize on the above two forms. There are some cases where they cannot be used, particularly if the function-like macro ends up being used for a constant value e.g. array size, and in those cases you cannot use the above constructs.


1.3. 79-Column Lines

All source files in glibc must use lines of fewer than 80 characters. The only exceptions are when it's syntactically impossible to split a line for some reason.


1.4. Nested C Preprocessor Directives

Nested preprocessor directives need spaces after the '#'.

Example 1: One level of nesting

#if __FP_FAST_FMA
# define FP_FAST_FMA 1
#endif

#if __FP_FAST_FMAF
# define FP_FAST_FMAF 1
#endif

#if __FP_FAST_FMAL
# define FP_FAST_FMAL 1
#endif

Reference: http://sourceware.org/ml/libc-alpha/2010-10/msg00024.html

Example 2: Several levels of nesting

#ifdef HAVE_ASM_GLOBAL_DOT_NAME
# ifndef C_SYMBOL_DOT_NAME
#  if defined __GNUC__ && defined __GNUC_MINOR__ \
      && (__GNUC__ << 16) + __GNUC_MINOR__ >= (3 << 16) + 1
#   define C_SYMBOL_DOT_NAME(name) .name
#  else
#   define C_SYMBOL_DOT_NAME(name) .##name
#  endif
# endif
#endif

Note that in a header file, the outer #ifndef _FILE_H/#endif pair does not increase the indentation level.

Example 3: Outer #ifndef

#ifndef _FILE_H
#if FOO
# define BAR
#endif
#endif

1.5. Commenting #endif

In order to make it easier to determine which conditional is being ended by the #endif it is common to add a code comment to the preprocessor directive.

Example 1: #endif with no #else.

#ifdef TEST
...
#endif /* TEST */

The next example contains an #else which means the second half of the conditional is used only if the negation of the conditional is true. The code comment on the #else and the closing #endif indicates this by negating the conditional.

Example 2: #endif with #else.

#ifdef TEST
...
#else /* !TEST */
...
#endif /* !TEST */

The use of the conditional or negated conditional in the comment is the most common style in glibc. This style helps the developer immediately determine what case is running in that branch of the condition.

In cases where the conditional is used to guard against inclusion of the entire file you might see two styles. The first and most common style is the name of the file in the comment, this indicates it's a file inclusion guard and which file is referenced. The second and less common style is simply to list the guard name used, following the convention above.

Example 3: #endif for file inclusion guard with file name comment

#ifndef _ARGP_FMTSTREAM_H
#define _ARGP_FMTSTREAM_H
...
#endif /* __OPTIMIZE__ */

#endif /* ARGP_FMTSTREAM_USE_LINEWRAP */

#endif /* argp-fmtstream.h */

Example 4: #endif for file inclusion guard with conditional comment

#ifndef dl_machine_h
#define dl_machine_h
...
#endif /* !dl_machine_h */

Either option is acceptable.

1.6. Implicit int

Use unsigned int, long int, etc., instead of just unsigned, long, etc. https://sourceware.org/ml/libc-alpha/2012-05/msg01455.html

1.7. Files not formatted according to the GNU standard

Some files (e.g. malloc/arena.c) and have a different, consistent coding style since the origin of the file inside glibc due to being imported from a different project or source. The rule for such files is to stick to the code formatting convention in that file.

Reference: http://sourceware.org/ml/libc-alpha/2012-08/msg00182.html

2. Python usage conventions

Some supporting scripts are implemented in python. The rule for these files is to use the PEP 8 -- Style Guide for Python with the following additional information specific to the glibc python sources:


3. Use of GCC Compiler Attributes

3.1. inline

We should eschew the inline keyword entirely when used alone. If it really matters that something be inlined, it needs always_inline. Otherwise we should leave optimization decisions to the compiler unless there is a particular strong reason in an individual case. Any such cases should have clear comments saying why the explicit inline is desireable.

3.2. __unused__

Use __attribute__ ((__unused__)) with static inline


4. Creating files

4.1. Proper sysdeps Location

4.1.1. Default ENOTSUP Implementation Location

4.1.2. OS Specific Implementation

sysdeps/unix/sysv/linux/<foo>.[ch]

4.1.3. OS and Platform Specific Implementation

sysdeps/unix/sysv/linux/powerpc/<foo>.[ch]

4.1.4. Wordsize Specific Implementation

sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/<foo>.[ch]

4.1.5. Platform Specific Implementation

sysdeps/powerpc/<foo>.[ch]

4.1.6. Floating-Point Unit Implementation

sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/<foo>.[ch] sysdeps/powerpc/powerpc32/fpu/<foo>.[ch]


5. Reusing Existing Code

When possible pick up existing code via #include directives rather than copying code. For whole files, this may also be done automatically via Implies files.

We strive to reduce the number of duplicate copies of code, for example by consolidating all copies of an architecture-specific sysdeps/unix/sysv/linux/<arch> header into an architecture-independent one plus a set of small architecture-specific ones for the architecture-specific bits.


6. Macros vs. Static Inlines

Static inline functions are preferred over macros, when possible, because the compiler can more adequately schedule static inlines.


7. Header Files

bits/<foo>.h not a place for an API, just for OS specific definitions.


8. Alloca vs. Malloc

Here are some things to consider when deciding whether to use alloca or malloc:

    bool use_alloca = __libc_use_alloca (bufsize);
    struct foo *buf = use_alloca ? alloca (bufsize) : malloc (bufsize);
    if (buf)
      do_work_with (buf, bufsize);
    if (! use_alloca)
      free (buf);

    struct foo buffer[4000 / sizeof (struct foo)];
    struct foo *buf = bufsize <= sizeof buffer ? buffer : malloc (bufsize);
    if (buf)
      do_work_with (buf, bufsize);
    if (buf != buffer)
      free (buf);

    struct foo buffer[10];
    struct foo *buf = buffer;
    size_t bufsize = sizeof buffer;
    void *allocated = NULL;
    size_t needed;
    while (bufsize < (needed = do_work_with (buf, bufsize)))
      {
        if (__libc_use_alloca (needed))
          {
            size_t size = bufsize;
            void *newbuf = extend_alloca (buf, bufsize, needed);
            buf = memmove (newbuf, buf, size);
          }
        else
          {
            void *newbuf = realloc (allocated, needed);
            if (! newbuf)
              {
                needed = 0;
                break;
              }
            if (! allocated)
              memcpy (newbuf, buf, bufsize);
            buf = allocated = newbuf;
            bufsize = needed;
          }
      }
    free (allocated);
    return needed; /* This is zero on allocation failure.  */

At present there is no magic bullet of special procedure for selecting alloca vs. malloc; if there was then we could encode it into this wiki or into a macro.


9. Branch Prediction

glibc has the __glibc_likely and __glibc_unlikely macros that wrap around __builtin_expect. Use those instead of using __builtin_expect for branch prediction since they're nicer to read.

10. Error Handling

10.1. Bugs in the GNU C library

Bugs in the GNU C library should fail early and catastrophically to alert developers of the problem. That trades off against any runtime cost of detecting the case. If it's cheap to detect, then detect it. If it's not so cheap, then don't pay the cost because we don't expect that we'll have the bug. Using assert is a middle ground for things that have enough cost that we don't just leave them in all the time, but little enough that there's still any question about it.

10.2. Bugs in the user program

If it's user code invoking undefined behavior, then it should fail early and catastrophically so that developers don't get the false impression that their code is OK when it happens not to break the use cases they test adequately. (Said another way, so that we avoid giving developers an excuse to complain when a future implementation change "breaks" their programs that were always broken, but theretofore ignorably so.) That too trades off against any runtime cost of detecting the case. I'd say the allowance for cost of detection is marginally higher than in the case of library bugs, because we expect user bugs to be more common that library bugs. But it's still not much, since correct programs performing better is more important to us than buggy programs being easier to debug.

10.3. Error returns from the OS kernel

The GNU C Library should crash when the OS returns unexpected error return codes rather than pass those errors back to the user. This immediately alerts kernel and glibc developers of a mismatch in their expectations before this ever gets out of experimental distributions.

This has proven itself time and time again. The most recent case is the CPU affinity handling (sched_getaffinity and sched_setaffinity) where application developers used undocumented kernel error returns to invent a usage that was incorrect, but which grew out of an analysis of empirical error returns from the kernel.

10.4. Invalid pointers

The GNU C library considers it a QoI feature not to mask user bugs by detecting invalid pointers and returning EINVAL (unless the API is standardized and says it does that). If passing a bad pointer has undefined behavior, it is far more useful in the long run if it crashes quickly rather than diagnosing an error that is probably ignored by the flaky caller.

10.4.1. NULL pointers

If you're going to check for NULL pointer arguments where you have not entered into a contract to accept and interpret them, do so with an assert, not a conditional error return. This way the bugs in the caller will be immediately detected and can be fixed, and it makes it easy to disable the overhead in production builds. The assert can be valuable as code documentation. However, a segfault from dereferencing the NULL pointer is just as effective for debugging. If you return an error code to a caller which has already proven itself buggy, the most likely result is that the caller will ignore the error, and bad things will happen much later down the line when the original cause of the error has become difficult or impossible to track down. Why is it reasonable to assume the caller will ignore the error you return? Because the caller already ignored the error return of malloc or fopen or some other library-specific allocation function which returned NULL to indicate an error.

In summary:

10.5. Assertions

Assertions are for internal consistency checking only.

External conditions are governed by the API and if user code violates the API then the library behaviour is undefined.

However, in scenarios where user input is recorded into internal structures for later use it is useful to assert in these cases to catch the first occurrence of the error.

11. Double-underscore names for public API functions

What are the double-underscore name for public API functions and when should I call them?

There are two issues at hand. Firstly there are namespace issues, and secondly there are PLT avoidance issues.

The namespace issues arise when an application calls a function A in a standard, and that function calls another function B in another standard. This is a problem because the application may define it's own B since it isn't a part of the standard it is using, and that would cause function A to call the application's B instead of the intended standard function B. In order for this to work correctly the implementation of function A will call the double-underscore variants of these functions to avoid symbol interposition and problems static-linking. If the function called is in another library, the double-underscore name also needs to be exported at GLIBC_PRIVATE so that the call can work in the dynamic linking case - unless there's a reason for it to be exported at a public symbol version. (For example, if a macro definition of a public function in an installed header uses the double-underscore name, or libstdc++ should use it, or redirection for _FILE_OFFSET_BITS=64 should use it, then it may be necessary to export at a public version, in which case you don't need a redundant GLIBC_PRIVATE export.)

The PLT avoidance issue is all about performance. If the library calls dup internally, it should not go through the PLT, it should be a direct function call, the compiler should know about it, and it should be optimized. In general most public API functions like dup and close have alternate local symbol aliases in the form of __dup or __close (see include/libc-symbols.h for the full details) created using hidden_proto and hidden_def. Calling these double-underscore variant symbols from within the same library that defines them avoids indirection through the procedure linkage table (PLT). This avoidance of the procedure linkage table does two things: first it makes the call faster by saving instructions, and secondly it avoids calling any interposed version of the function provided by the user. Avoiding calling the interposed version of the function is important when the library is trying to guarantee internal consistency for the implemented API.

For example: the core C library implements the function perror to print to standard error the value of a message along with strerror (errno). It is expected that perror operate correctly regardless of the interposed symbols provided by the user. The caller can't rely on perror calling dup and close to manipulate standard error, and the library is free to bypass the interposed symbols for dup and close and instead call __dup and __close directly. Thus from the user's perspective perror functions as the standard describes despite whatever interposed implementation for dup and close the user provides.

Deciding to call the normal symbol name for the function (goes through the PLT) or the double-underscore variant is a judgement call that must consider the expected function behaviour, internal consistency requirements, interposition requirements, inclusion in a standard, and standard dictated behaviour. Again in general the only functions that indirect through the PLT are the malloc family of functions because application developers expect to override those.

Lastly, the double-underscore functions will have versions that match their non-double-underscore variants. The functions are in the implementations namespace, and should not be called by user programs. These functions may be exposed to other parts of the implementation, for example __tls_get_addr is part of the implementations thread-local storage ABI (called by compiler generated code), and __printf_chk is part of the public compile-time buffer-checking implementation enabled by _FORITYF_SOURCE.

Symbols marked with GLIBC_PRIVATE form a shared interface between the libraries built from the same glibc source, for example the dynamic loader and the C library may share an interface marked private in this way e.g. __libc_enable_secure for communicating between the dynamic loader and the C library that the application should be treated securely (and all the things that entails). These interfaces never need any real versioning because the implementation is always updated to match any changes, but marking them clearly as GLIBC_PRIVATE helps to organize the internal symbols under a common version.

12. Boolean Coercions

Always eschew implicit Boolean coercions, except for the return value of strcmp/memcmp and the like (where the most common idiomatic uses treat the value as a Boolean even though nonzero values have further meaning) and when checking for specific bits in a value.

If val is a Boolean then you can do the following:

if (val)
  {
    /* Do something.  */
  }

Otherwise you should use:

if (val != 0)
  {
    /* Do something.  */
  }

If you are checking a specific bit in an integer valued variable foo; you can do this:

if (foo & BIT)
  {
    /* Do something.  */
  }
...
if (!(foo & BIT))
  {
    /* Do something.  */
  }

There is no need to do:

if ((foo & BIT) != 0)
  {
    /* Do something.  */
  }
...
if ((foo & BIT) == 0)
  {
    /* Do something.  */
  }

13. Support for features not yet in the mainstream Linux kernel?

It is often the case that you want all of your new processor, ABI, or feature support checked into the various GNU tools before the final version is committed to the main Linux kernel source tree. However, you do not want users attempting to use this code until that support has landed in the mainstream linux kernel (Linus' git tree). By convention (precedent set by the MIPS NaN2008 work) you should set arch_minimum_kernel to 10.0.0 until the code has landed in the mainstream source tree. This ensures that the GNU C Library will not build until this is fixed, and it won't be fixed until support lands upstream, and when it does it will use the correct upstream kernel version.

None: Style_and_Conventions (last edited 2020-08-21 15:32:53 by macro)