Host endian independence
Joseph Myers
joseph@codesourcery.com
Thu Aug 29 16:43:00 GMT 2019
On Thu, 29 Aug 2019, Damien Zammit wrote:
> Firstly, endian-helpers.h is not a byte-swapping interface, it is a
> collection of functions that read/write streams in desired endian, which
> is something currently missing from glibc.
That reads like simply a personal preference for arranging things
differently, which isn't a good basis for changing how things are done
away from very well established and widely used Unix style (functions such
as htonl; interfaces such as htobe32 are simply a more modern version of
that, avoiding the historical "network" and "long" concepts implicit in
the older names). Avoiding such a header at all means we can have a much
smaller patch that genuinely improves consistency (reduces the number of
explicit tests of endianness in glibc) rather than changing things for one
person's individual taste and means we don't need to go through all the
many deviations in that header from GNU coding style.
> I can see that your preference is to reuse existing byte-swap
> interfaces. However, I am suggesting that byte-swapping, in general, is
> unnecessary and a kludge.
glibc isn't application code. It's system-level C, written for a
particular profile of common-usage architectures and ABIs. In
system-level C, expressing things in terms of converting between host byte
order and a particular byte-order for an external interface is entirely
appropriate. And interfaces such as htobe32 and be32toh make clear
exactly what conversions are taking place.
Furthermore, the implementations of those interfaces use __builtin_bswap*,
which are exactly the appropriate idioms for such byte-swapping in GNU C.
A plausible alternative would be to use the scalar_storage_order attribute
(added to GCC in 2015, so available in all GCC versions now supported for
building glibc) - subject to checking how well it is optimized compared to
the present code (e.g. if a field gets used multiple times, does the
compiler optimize that to only doing a single endian conversion on it?);
that could eliminate the need for any explicit endian conversions at all
in some places. But that would require careful consideration to gain
consensus before actually introducing any uses into glibc.
> > By using those interfaces, tzfile.c, for example, could lose some of its
> > existing endian checks (that would be a very small local change to the
> > implementations of the decode and decode64 functions, larger changes are
> > not needed and make the code less clean because the logical information
> > that certain data is stored in the files in big-endian format is best kept
> > local to the implementations of those two functions, rather than
> > hardcoding that information in lots of places with read_be32 and read_be64
> > names).
>
> If a stream coming from a file is stored in big endian, why not be
> explicit in the naming of functions used to decode it so that it is
> clear which endian it is?
General principles of encapsulation of information in one place. There is
one piece of information "tzfile format uses big-endian" that is most
cleanly kept in one place (so avoiding typos from a single place
accidentally using *_le32 instead of *_be32, for example).
--
Joseph S. Myers
joseph@codesourcery.com
More information about the Libc-alpha
mailing list