[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The C language makes it easy to write non-portable code. In this section we discuss these portability issues, and how to avoid them.
We concentrate on differences that can arise on systems in common use
today. For example, all common systems today define char
to be 8
bits, and define a pointer to hold the address of an 8-bit byte. We do
not discuss the more exotic possibilities found on historical machines
or on certain supercomputers. If your program needs to run in unusual
settings, make sure you understand the characteristics of those systems;
the system documentation should include a C portability guide describing
the problems you are likely to encounter.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The ISO C standard first appeared in 1989 (the standard is often called ANSI C). It added several new features to the C language, most notably function prototypes. This led to many years of portability issues when deciding whether to use ISO C features.
We think that programs written today can assume the presence of an ISO C compiler. Therefore, we will not discuss issues related to the differences between ISO C compilers and older compilers—often called K&R compilers, from the first book on C by Kernighan and Ritchie. You may see these differences handled in older programs.
There is a newer C standard called ‘C9X’. Because compilers that support it are not widely available as of this writing, this discussion does not cover it.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The C language defines data types in terms of a minimum size, rather
than an exact size. As of this writing, this mainly matters for the
types int
and long
. A variable of type int
must be
at least 16 bits, and is often 32 bits. A variable of type long
must be at least 32 bits, and is sometimes 64 bits.
The range of a 16 bit number is -32768 to 32767 for a signed number, or
0 to 65535 for an unsigned number. If a variable may hold numbers
larger than 16 bits, use long
rather than int
. Never
assume that int
or long
have a specific size, or that they
will overflow at a particular point. When appropriate, use variables of
system defined types rather than int
or long
:
size_t
Use this to hold the size of an object, as returned by sizeof
.
ptrdiff_t
Use this to hold the difference between two pointers into the same array.
time_t
Use this to hold a time value as returned by the time
function.
off_t
On a Unix system, use this to hold a file position as returned by
lseek
.
ssize_t
Use this to hold the result of the Unix read
or write
functions.
Some books on C recommend using typedefs to specify types of particular
sizes, and then adjusting those typedefs on specific systems.
GNU Autotools supports this using the ‘AC_CHECK_SIZEOF’ macro.
However, while we agree with using typedefs for clarity, we do not
recommend using them purely for portability. It is safest to rely only
on the minimum size assumptions made by the C language, rather than to
assume that a type of a specific size will always be available. Also,
most C compilers will define int
to be the most efficient type
for the system, so it is normally best to simply use int
when
possible.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When a number longer than a single byte is stored in memory, it must be stored in some particular format. Modern systems do this by storing the number byte by byte such that the bytes can simply be concatenated into the final number. However, the order of storage varies: some systems store the least significant byte at the lowest address in memory, while some store the most significant byte there. These are referred to as little-endian and big-endian systems, respectively.(32)
This difference means that portable code may not make any assumptions about the order of storage of a number. For example, code like this will act differently on different systems:
/* Example of non-portable code; don't do this */ int i = 4; char c = *(char *) i; |
Although that was a contrived example, real problems arise when writing numeric data in a file or across a network connection. If the file or network connection may be read on a different type of system, numeric data must be written in a format which can be unambiguously recovered. It is not portable to simply do something like
/* Example of non-portable code; don't do this */ write (fd, &i, sizeof i); |
This example is non-portable both because of endianness and because it
assumes that the size of the type of i
are the same on both
systems.
Instead, do something like this:
int j; char buf[4]; for (j = 0; j < 4; ++j) buf[j] = (i >> (j * 8)) & 0xff; write (fd, buf, 4); /* In real code, check the return value */ |
This unambiguously writes out a little endian 4 byte value. The code will work on any system, and the result can be read unambiguously on any system.
Another approach to handling endianness is to use the htons
and ntohs
functions available on most systems. These
functions convert between network endianness and host endianness.
Network endianness is big-endian; it has that name because the standard
TCP/IP network protocols use big-endian ordering.
These functions come in two sizes: htonl
and ntohl
operate
on 4-byte quantities, and htons
and ntohs
operate on
2-byte quantities. The hton
functions convert host endianness to
network endianness. The ntoh
functions convert network
endianness to host endianness. On big-endian systems, these functions
simply return their arguments; on little-endian systems, they return
their arguments after swapping the bytes.
Although these functions are used in a lot of existing code, they can be
difficult to use in highly portable code, because they require knowing
the exact size of your data types. If you know that the type int
is exactly 4 bytes long, then it is possible to write code like the
following:
int j; j = htonl (i); write (fd, &j, 4); |
However, if int
is not exactly 4 bytes long, this example will
not work correctly on all systems.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
C compilers on different systems lay out structures differently. In
some cases there can even be layout differences between different C
compilers on the same system. Compilers add gaps between fields, and
these gaps have different sizes and are at different locations. You can
normally assume that there are no gaps between fields of type
char
or array of char
. However, you can not make any
assumptions about gaps between fields of any larger type. You also can
not make any assumptions about the layout of bitfield types.
These structure layout issues mean that it is difficult to portably use a C struct to define the format of data which may be read on another type of system, such as data in a file or sent over a network connection. Portable code must read and write such data field by field, rather than trying to read an entire struct at once.
Here is an example of non-portable code when reading data which may have been written to a file or a network connection on another type of system. Don’t do this.
/* Example of non-portable code; don't do this */ struct { short i; int j; } s; read (fd, &s, sizeof s); |
Instead, do something like this (the struct s
is assumed to be
the same as above):
unsigned char buf[6]; read (fd, buf, sizeof buf); /* Should check return value */ s.i = buf[0] | (buf[1] << 8); s.j = buf[2] | (buf[3] << 8) | (buf[4] << 16) | (buf[5] << 24); |
Naturally the code to write out the structure should be similar.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Most modern systems handle floating point following the IEEE-695 standard. However, there are still portability issues.
Most processors use 64 bits of precision when computing floating point values. However, the widely used Intel x86 series of processors compute temporary values using 80 bits of precision, as do most instances of the Motorola 68k series. Some other processors, such as the PowerPC, provide fused multiply-add instructions which perform a multiplication and an addition using high precision for the intermediate value. Optimizing compilers will generate such instructions based on sequences of C operations.
For almost all programs, these differences do not matter. However, for programs which do intensive floating point operations, the differences can be significant. It is possible to write floating point loops which terminate on one sort of processor but not on another.
Unfortunately, there is no rule of thumb that can be used to avoid these problems. Most compilers provide an option to disable the use of extended precision (for GNU cc, the option is ‘-ffloat-store’). However, on the one hand, this merely shifts the portability problem elsewhere, and, on the other, the extended precision is often good rather than bad. Although these portability problems can not be easily avoided, you should at least be aware of them if you write programs which require very precise floating point operations.
The IEEE-695 standard specifies certain flags which the floating point processor should make available (e.g., overflow, underflow, inexact), and specifies that there should be some control over the floating point rounding mode. Most processors make these flags and controls available; however, there is no portable way to access them. A portable program should not assume that it will have this degree of control over floating point operations.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The GNU cc
compiler has several useful extensions, which are
documented in the GNU cc
manual. A program which must be
portable to other C compilers must naturally avoid these extensions; the
‘-pedantic’ option may be used to warn about any accidental use of
an extension.
However, the GNU cc compiler is itself highly portable, and it runs on all modern Unix platforms as well as on Windows. Depending upon your portability requirements, you may be able to simply assume that GNU cc is available, in which case your program may use extensions when they are useful. Note that some extensions are inherently non-portable, such as inline assembler code, or using attributes to specify a particular section for a function or a global variable.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Ben Elliston on July 10, 2015 using texi2html 1.82.