Autoconf, Automake, and Libtool: 15.1 C Language Portability

15.1 C Language Portability

The C language makes it easy to write non-portable code. In this section we discuss these portability issues, and how to avoid them.

We concentrate on differences that can arise on systems in common use today. For example, all common systems today define char to be 8 bits, and define a pointer to hold the address of an 8-bit byte. We do not discuss the more exotic possibilities found on historical machines or on certain supercomputers. If your program needs to run in unusual settings, make sure you understand the characteristics of those systems; the system documentation should include a C portability guide describing the problems you are likely to encounter.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

15.1.1 ISO C

The ISO C standard first appeared in 1989 (the standard is often called ANSI C). It added several new features to the C language, most notably function prototypes. This led to many years of portability issues when deciding whether to use ISO C features.

We think that programs written today can assume the presence of an ISO C compiler. Therefore, we will not discuss issues related to the differences between ISO C compilers and older compilers—often called K&R compilers, from the first book on C by Kernighan and Ritchie. You may see these differences handled in older programs.

There is a newer C standard called ‘C9X’. Because compilers that support it are not widely available as of this writing, this discussion does not cover it.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

15.1.2 C Data Type Sizes

The C language defines data types in terms of a minimum size, rather than an exact size. As of this writing, this mainly matters for the types int and long. A variable of type int must be at least 16 bits, and is often 32 bits. A variable of type long must be at least 32 bits, and is sometimes 64 bits.

The range of a 16 bit number is -32768 to 32767 for a signed number, or 0 to 65535 for an unsigned number. If a variable may hold numbers larger than 16 bits, use long rather than int. Never assume that int or long have a specific size, or that they will overflow at a particular point. When appropriate, use variables of system defined types rather than int or long:

size_t: Use this to hold the size of an object, as returned by sizeof.
ptrdiff_t: Use this to hold the difference between two pointers into the same array.
time_t: Use this to hold a time value as returned by the time function.
off_t: On a Unix system, use this to hold a file position as returned by lseek.
ssize_t: Use this to hold the result of the Unix read or write functions.

Some books on C recommend using typedefs to specify types of particular sizes, and then adjusting those typedefs on specific systems. GNU Autotools supports this using the ‘AC_CHECK_SIZEOF’ macro. However, while we agree with using typedefs for clarity, we do not recommend using them purely for portability. It is safest to rely only on the minimum size assumptions made by the C language, rather than to assume that a type of a specific size will always be available. Also, most C compilers will define int to be the most efficient type for the system, so it is normally best to simply use int when possible.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

15.1.3 C Endianness

When a number longer than a single byte is stored in memory, it must be stored in some particular format. Modern systems do this by storing the number byte by byte such that the bytes can simply be concatenated into the final number. However, the order of storage varies: some systems store the least significant byte at the lowest address in memory, while some store the most significant byte there. These are referred to as little-endian and big-endian systems, respectively.(32)

This difference means that portable code may not make any assumptions about the order of storage of a number. For example, code like this will act differently on different systems:

  /* Example of non-portable code; don't do this */
  int i = 4;
  char c = *(char *) i;

Although that was a contrived example, real problems arise when writing numeric data in a file or across a network connection. If the file or network connection may be read on a different type of system, numeric data must be written in a format which can be unambiguously recovered. It is not portable to simply do something like

  /* Example of non-portable code; don't do this */
  write (fd, &i, sizeof i);

This example is non-portable both because of endianness and because it assumes that the size of the type of i are the same on both systems.

Instead, do something like this:

  int j;
  char buf[4];
  for (j = 0; j < 4; ++j)
    buf[j] = (i >> (j * 8)) & 0xff;
  write (fd, buf, 4); /* In real code, check the return value */

This unambiguously writes out a little endian 4 byte value. The code will work on any system, and the result can be read unambiguously on any system.

Another approach to handling endianness is to use the htons and ntohs functions available on most systems. These functions convert between network endianness and host endianness. Network endianness is big-endian; it has that name because the standard TCP/IP network protocols use big-endian ordering.

These functions come in two sizes: htonl and ntohl operate on 4-byte quantities, and htons and ntohs operate on 2-byte quantities. The hton functions convert host endianness to network endianness. The ntoh functions convert network endianness to host endianness. On big-endian systems, these functions simply return their arguments; on little-endian systems, they return their arguments after swapping the bytes.

Although these functions are used in a lot of existing code, they can be difficult to use in highly portable code, because they require knowing the exact size of your data types. If you know that the type int is exactly 4 bytes long, then it is possible to write code like the following:

  int j;
  j = htonl (i);
  write (fd, &j, 4);

However, if int is not exactly 4 bytes long, this example will not work correctly on all systems.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

15.1.4 C Structure Layout

C compilers on different systems lay out structures differently. In some cases there can even be layout differences between different C compilers on the same system. Compilers add gaps between fields, and these gaps have different sizes and are at different locations. You can normally assume that there are no gaps between fields of type char or array of char. However, you can not make any assumptions about gaps between fields of any larger type. You also can not make any assumptions about the layout of bitfield types.

These structure layout issues mean that it is difficult to portably use a C struct to define the format of data which may be read on another type of system, such as data in a file or sent over a network connection. Portable code must read and write such data field by field, rather than trying to read an entire struct at once.

Here is an example of non-portable code when reading data which may have been written to a file or a network connection on another type of system. Don’t do this.

  /* Example of non-portable code; don't do this */
  struct {
    short i;
    int j;
  } s;
  read (fd, &s, sizeof s);

Instead, do something like this (the struct s is assumed to be the same as above):

  unsigned char buf[6];
  read (fd, buf, sizeof buf); /* Should check return value */
  s.i = buf[0] | (buf[1] << 8);
  s.j = buf[2] | (buf[3] << 8) | (buf[4] << 16) | (buf[5] << 24);

Naturally the code to write out the structure should be similar.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

15.1.5 C Floating Point

Most modern systems handle floating point following the IEEE-695 standard. However, there are still portability issues.

Most processors use 64 bits of precision when computing floating point values. However, the widely used Intel x86 series of processors compute temporary values using 80 bits of precision, as do most instances of the Motorola 68k series. Some other processors, such as the PowerPC, provide fused multiply-add instructions which perform a multiplication and an addition using high precision for the intermediate value. Optimizing compilers will generate such instructions based on sequences of C operations.

For almost all programs, these differences do not matter. However, for programs which do intensive floating point operations, the differences can be significant. It is possible to write floating point loops which terminate on one sort of processor but not on another.

Unfortunately, there is no rule of thumb that can be used to avoid these problems. Most compilers provide an option to disable the use of extended precision (for GNU cc, the option is ‘-ffloat-store’). However, on the one hand, this merely shifts the portability problem elsewhere, and, on the other, the extended precision is often good rather than bad. Although these portability problems can not be easily avoided, you should at least be aware of them if you write programs which require very precise floating point operations.

The IEEE-695 standard specifies certain flags which the floating point processor should make available (e.g., overflow, underflow, inexact), and specifies that there should be some control over the floating point rounding mode. Most processors make these flags and controls available; however, there is no portable way to access them. A portable program should not assume that it will have this degree of control over floating point operations.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

15.1.6 GNU cc Extensions

The GNU cc compiler has several useful extensions, which are documented in the GNU cc manual. A program which must be portable to other C compilers must naturally avoid these extensions; the ‘-pedantic’ option may be used to warn about any accidental use of an extension.

However, the GNU cc compiler is itself highly portable, and it runs on all modern Unix platforms as well as on Windows. Depending upon your portability requirements, you may be able to simply assume that GNU cc is available, in which case your program may use extensions when they are useful. Note that some extensions are inherently non-portable, such as inline assembler code, or using attributes to specify a particular section for a function or a global variable.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

This document was generated by Ben Elliston on July 10, 2015 using texi2html 1.82.