Differences between revisions 35 and 36
Revision 35 as of 2014-07-25 20:35:33
Size: 18456
Comment:
Revision 36 as of 2014-07-25 20:36:20
Size: 18464
Comment:
Deletions are marked like this. Additions are marked like this.
Line 94: Line 94:
{{ {{{
Line 98: Line 98:
}} }}}
Line 104: Line 104:
{{ {{{
Line 110: Line 110:
}} }}}
Line 118: Line 118:
{{ {{{
Line 127: Line 127:
}} }}}
Line 131: Line 131:
{{ {{{
Line 136: Line 136:
}} }}}

Glibc Coding Style and Conventions

This document is not meant to be a reiteration of the GNU coding style document (see here). It is a clarification when the GLIBC policy differs or expands upon the GNU standard.

This is a work in progress and not yet definitive.


1. Code Formatting

1.1. Symbols and Parenthesis

When invoking functions make sure there is a space between the symbol and the parenthesis, e.g.,

retval = foo (bar);

This rule does not apply to preprocessor function definitions where having a space between the symbol and the parenthesis would cause improper expansion, for example, the following is correct:

#define BAT(x) do { \
  bat = FOO (x);      \
} while (0)

Whereas, the following is incorrect:

#define BAT (x) do { \
  bat = FOO (x);      \
} while (0)


1.2. 79-Column Lines

All source files in glibc must use lines of fewer than 80 characters. The only exceptions are when it's syntactically impossible to split a line for some reason.


1.3. Nested C Preprocessor Directives

Nested preprocessor directives need spaces after the '#'.

Example 1: One level of nesting

#if __FP_FAST_FMA
# define FP_FAST_FMA 1
#endif

#if __FP_FAST_FMAF
# define FP_FAST_FMAF 1
#endif

#if __FP_FAST_FMAL
# define FP_FAST_FMAL 1
#endif

Reference: http://sourceware.org/ml/libc-alpha/2010-10/msg00024.html

Example 2: Several levels of nesting

#ifdef HAVE_ASM_GLOBAL_DOT_NAME
# ifndef C_SYMBOL_DOT_NAME
#  if defined __GNUC__ && defined __GNUC_MINOR__ \
      && (__GNUC__ << 16) + __GNUC_MINOR__ >= (3 << 16) + 1
#   define C_SYMBOL_DOT_NAME(name) .name
#  else
#   define C_SYMBOL_DOT_NAME(name) .##name
#  endif
# endif
#endif

Note that in a header file, the outer #ifndef _FILE_H/#endif pair does not increase the indentation level.

Example 3: Outer #ifndef

#ifndef _FILE_H
#if FOO
# define BAR
#endif
#endif

1.4. Commenting #endif

In order to make it easier to determine which conditional is being ended by the #endif it is common to add a code comment to the preprocessor directive.

Example 1: #endif with no #else.

#ifdef TEST
...
#endif /* TEST */

The next example contains an #else which means the second half of the conditional is used only if the negation of the conditional is true. The code comment on the closing #endif indicates this by negating the conditional.

Example 2: #endif with #else.

#ifdef TEST
...
#else
...
#endif /* !TEST */

The use of the conditional or negated conditional in the comment is the most common style in glibc. This style helps the developer immediately determine what case is running in that branch of the condition.

In cases where the conditional is used to guard against inclusion of the entire file you might see two styles. The first and most common style is the name of the file in the comment, this indicates it's a file inclusion guard and which file is referenced. The second and less common style is simply to list the guard name used, following the convention above.

Example 3: #endif for file inclusion guard with file name comment

#ifndef _ARGP_FMTSTREAM_H
#define _ARGP_FMTSTREAM_H
...
#endif /* __OPTIMIZE__ */

#endif /* ARGP_FMTSTREAM_USE_LINEWRAP */

#endif /* argp-fmtstream.h */

Example 4: #endif for file inclusion guard with conditional comment

#ifndef dl_machine_h
#define dl_machine_h
...
#endif /* !dl_machine_h */

Either option is acceptable.

1.5. Files not formatted according to the GNU standard

Some files (e.g. malloc/arena.c) and have a different, consistent coding style since the origin of the file inside glibc due to being imported from a different project or source. The rule for such files is to stick to the code formatting convention in that file.

Reference: http://sourceware.org/ml/libc-alpha/2012-08/msg00182.html

1.6. Code formatting in python sources

Some supporting scripts are implemented in python - these are not required for the core build process. The rule for these files is to use the PEP 8 -- Style Guide for Python with the following additional information specific to the glibc python sources:

  • No tabs for indentation. Each indentation level is strictly 4 spaces
  • All functions should have a PEP 257 style docstring description. The preferred format is as follows:

     """One line description.
    
     Longer details go here on multiple lines.
    
     Args:
       arg: description of it
    
     Returns:
       Describe the return value.
    
     Raises:
       Any random exceptions that might be raised.
     """
  • Require python-2.7, but be compatible with python-3.2+
  • Use from __future__ import print_function

  • Use printf strings rather than concatenation:
       bad: print('blah' + var + 'foo')
       good: print('blah%sfoo' % var)
  • Prefer strings use single quotes rather than double quotes
  • Use a main() function

  • Global scope code is heavily discouraged
  • Globals are heavily discouraged
  • Validate code using the pylint script: $ ./scripts/pylint <your-script>


2. Use of GCC Compiler Attributes

2.1. inline

We should eschew the inline keyword entirely when used alone. If it really matters that something be inlined, it needs always_inline. Otherwise we should leave optimization decisions to the compiler unless there is a particular strong reason in an individual case. Any such cases should have clear comments saying why the explicit inline is desireable.

2.2. __unused__

Use __attribute__ ((__unused__)) with static inline


3. Creating files

  • Don't create empty files
  • Find new versions of the copyright headers to use as a template.
  • Make sure the top line is descriptive.
  • "Contributed by" statements are no longer used.

3.1. Proper sysdeps Location

3.1.1. Default ENOTSUP Implementation Location

3.1.2. OS Specific Implementation

sysdeps/unix/sysv/linux/<foo>.[ch]

3.1.3. OS and Platform Specific Implementation

sysdeps/unix/sysv/linux/powerpc/<foo>.[ch]

3.1.4. Wordsize Specific Implementation

sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/<foo>.[ch]

3.1.5. Platform Specific Implementation

sysdeps/powerpc/<foo>.[ch]

3.1.6. Floating-Point Unit Implementation

sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/<foo>.[ch] sysdeps/powerpc/powerpc32/fpu/<foo>.[ch]


4. Reusing Existing Code

When possible pick up existing code via #include directives rather than copying code. For whole files, this may also be done automatically via Implies files.

We strive to reduce the number of duplicate copies of code, for example by consolidating all copies of an architecture-specific sysdeps/unix/sysv/linux/<arch> header into an architecture-independent one plus a set of small architecture-specific ones for the architecture-specific bits.


5. Macros vs. Static Inlines

Static inline functions are preferred over macros, when possible, because the compiler can more adequately schedule static inlines.


6. Header Files

bits/<foo>.h not a place for an API, just for OS specific definitions.


7. Alloca vs. Malloc

Here are some things to consider when deciding whether to use alloca or malloc:

  • Do not use alloca to create an array whose size S is such that ! libc_use_alloca (S), as large arrays like that may bypass stack-overflow checking.

  • If the storage may need to outlive the current function, then obviously alloca cannot be used.

  • If the API does not allow returning a memory-allocation failure indication such as ENOMEM, then alloca may be preferable, as malloc can fail.

  • If this is a hot path with a small allocation, prefer alloca, as it is typically much faster.

  • When growing a buffer, either on the stack or on the heap, watch out for integer overflow when calculating the new size. Such overflow should be treated as allocation failure than letting the integer wrap around.
  • If the size of the buffer is directly or indirectly under user control, consider imposing a maximum to help make denial-of-service attacks more difficult.
  • If this is a hot path and the allocation size is typically small but may be large, and is known in advance, you can use the following pattern:

    bool use_alloca = __libc_use_alloca (bufsize);
    struct foo *buf = use_alloca ? alloca (bufsize) : malloc (bufsize);
    if (buf)
      do_work_with (buf, bufsize);
    if (! use_alloca)
      free (buf);
  • Use of alloca is a memory optimization compared to having a local array on stack. That is, the above example is close in behavior to the following, except that the alloca version consumes only the stack space needed, rather than always consuming approximately 4000 bytes on the stack.

    struct foo buffer[4000 / sizeof (struct foo)];
    struct foo *buf = bufsize <= sizeof buffer ? buffer : malloc (bufsize);
    if (buf)
      do_work_with (buf, bufsize);
    if (buf != buffer)
      free (buf);
  • If the amount of storage is not known in advance but may grow without bound, you can start with a small buffer on the stack and switch to malloc if it gets to be too large for the stack. While the storage is on the stack, you can grow it by using extend_alloca. For example:

    struct foo buffer[10];
    struct foo *buf = buffer;
    size_t bufsize = sizeof buffer;
    void *allocated = NULL;
    size_t needed;
    while (bufsize < (needed = do_work_with (buf, bufsize)))
      {
        if (__libc_use_alloca (needed))
          {
            size_t size = bufsize;
            void *newbuf = extend_alloca (buf, bufsize, needed);
            buf = memmove (newbuf, buf, size);
          }
        else
          {
            void *newbuf = realloc (allocated, needed);
            if (! newbuf)
              {
                needed = 0;
                break;
              }
            if (! allocated)
              memcpy (newbuf, buf, bufsize);
            buf = allocated = newbuf;
            bufsize = needed;
          }
      }
    free (allocated);
    return needed; /* This is zero on allocation failure.  */
  • To boost performance a bit in the typical case of the above examples, you can use __glibc_likely or __glibc_unlikely, e.g., if (__glibc_likely (use_alloca)) instead of just if (use_alloca).

At present there is no magic bullet of special procedure for selecting alloca vs. malloc; if there was then we could encode it into this wiki or into a macro.


8. Branch Prediction

glibc has the __glibc_likely and __glibc_unlikely macros that wrap around __builtin_expect. Use those instead of using __builtin_expect for branch prediction since they're nicer to read.

9. Invalid pointers

The GNU C library considers it a QoI feature not to mask user bugs by detecting invalid pointers and returning EINVAL (unless the API is standardized and says it does that). If passing a bad pointer has undefined behavior, it is far more useful in the long run if it crashes quickly rather than diagnosing an error that is probably ignored by the flaky caller.

9.1. NULL pointers

If you're going to check for NULL pointer arguments where you have not entered into a contract to accept and interpret them, do so with an assert, not a conditional error return. This way the bugs in the caller will be immediately detected and can be fixed, and it makes it easy to disable the overhead in production builds. The assert can be valuable as code documentation. However, a segfault from dereferencing the NULL pointer is just as effective for debugging. If you return an error code to a caller which has already proven itself buggy, the most likely result is that the caller will ignore the error, and bad things will happen much later down the line when the original cause of the error has become difficult or impossible to track down. Why is it reasonable to assume the caller will ignore the error you return? Because the caller already ignored the error return of malloc or fopen or some other library-specific allocation function which returned NULL to indicate an error.

In summary:

  • If you have no contract to accept NULL and you don't immediately dereference the pointer then use an assert to raise an error when NULL is passed as an invalid argument.
  • If you have no contract to accept NULL and immediately dereference the pointer then the segfault is sufficient to indicate the error.
  • If you have a contract to accept NULL then do so.

10. Assertions

Assertions are for internal consistency checking only.

External conditions are governed by the API and if user code violates the API then the library behaviour is undefined.

However, in scenarios where user input is recorded into internal structures for later use it is useful to assert in these cases to catch the first occurrence of the error.

11. Double-underscore names for public API functions

What are the double-underscore name for public API functions and when should I call them?

There are two issues at hand. Firstly there are namespace issues, and secondly there are PLT avoidance issues.

The namespace issues arise when an application calls a function A in a standard, and that function calls another function B in another standard. This is a problem because the application may define it's own B since it isn't a part of the standard it is using, and that would cause function A to call the application's B instead of the intended standard function B. In order for this to work correctly the implementation of function A will call the double-underscore variants of these functions to avoid symbol interposition and problems static-linking. If the function called is in another library, the double-underscore name also needs to be exported at GLIBC_PRIVATE so that the call can work in the dynamic linking case - unless there's a reason for it to be exported at a public symbol version. (For example, if a macro definition of a public function in an installed header uses the double-underscore name, or libstdc++ should use it, or redirection for _FILE_OFFSET_BITS=64 should use it, then it may be necessary to export at a public version, in which case you don't need a redundant GLIBC_PRIVATE export.)

The PLT avoidance issue is all about performance. If the library calls dup internally, it should not go through the PLT, it should be a direct function call, the compiler should know about it, and it should be optimized. In general most public API functions like dup and close have alternate local symbol aliases in the form of __dup or __close (see include/libc-symbols.h for the full details) created using hidden_proto and hidden_def. Calling these double-underscore variant symbols from within the same library that defines them avoids indirection through the procedure linkage table (PLT). This avoidance of the procedure linkage table does two things: first it makes the call faster by saving instructions, and secondly it avoids calling any interposed version of the function provided by the user. Avoiding calling the interposed version of the function is important when the library is trying to guarantee internal consistency for the implemented API.

For example: the core C library implements the function perror to print to standard error the value of a message along with strerror (errno). It is expected that perror operate correctly regardless of the interposed symbols provided by the user. The caller can't rely on perror calling dup and close to manipulate standard error, and the library is free to bypass the interposed symbols for dup and close and instead call __dup and __close directly. Thus from the user's perspective perror functions as the standard describes despite whatever interposed implementation for dup and close the user provides.

Deciding to call the normal symbol name for the function (goes through the PLT) or the double-underscore variant is a judgement call that must consider the expected function behaviour, internal consistency requirements, interposition requirements, inclusion in a standard, and standard dictated behaviour. Again in general the only functions that indirect through the PLT are the malloc family of functions because application developers expect to override those.

Lastly, the double-underscore functions will have versions that match their non-double-underscore variants. The functions are in the implementations namespace, and should not be called by user programs. These functions may be exposed to other parts of the implementation, for example __tls_get_addr is part of the implementations thread-local storage ABI (called by compiler generated code), and __printf_chk is part of the public compile-time buffer-checking implementation enabled by _FORITYF_SOURCE.

Symbols marked with GLIBC_PRIVATE form a shared interface between the libraries built from the same glibc source, for example the dynamic loader and the C library may share an interface marked private in this way e.g. __libc_enable_secure for communicating between the dynamic loader and the C library that the application should be treated securely (and all the things that entails). These interfaces never need any real versioning because the implementation is always updated to match any changes, but marking them clearly as GLIBC_PRIVATE helps to organize the internal symbols under a common version.

None: Style_and_Conventions (last edited 2014-09-12 18:11:49 by CarlosODonell)