Host endian independence

Joseph Myers joseph@codesourcery.com
Thu Aug 29 16:43:00 GMT 2019


On Thu, 29 Aug 2019, Damien Zammit wrote:

> Firstly, endian-helpers.h is not a byte-swapping interface, it is a 
> collection of functions that read/write streams in desired endian, which 
> is something currently missing from glibc.

That reads like simply a personal preference for arranging things 
differently, which isn't a good basis for changing how things are done 
away from very well established and widely used Unix style (functions such 
as htonl; interfaces such as htobe32 are simply a more modern version of 
that, avoiding the historical "network" and "long" concepts implicit in 
the older names).  Avoiding such a header at all means we can have a much 
smaller patch that genuinely improves consistency (reduces the number of 
explicit tests of endianness in glibc) rather than changing things for one 
person's individual taste and means we don't need to go through all the 
many deviations in that header from GNU coding style.

> I can see that your preference is to reuse existing byte-swap 
> interfaces. However, I am suggesting that byte-swapping, in general, is 
> unnecessary and a kludge.

glibc isn't application code.  It's system-level C, written for a 
particular profile of common-usage architectures and ABIs.  In 
system-level C, expressing things in terms of converting between host byte 
order and a particular byte-order for an external interface is entirely 
appropriate.  And interfaces such as htobe32 and be32toh make clear 
exactly what conversions are taking place.

Furthermore, the implementations of those interfaces use __builtin_bswap*, 
which are exactly the appropriate idioms for such byte-swapping in GNU C.

A plausible alternative would be to use the scalar_storage_order attribute 
(added to GCC in 2015, so available in all GCC versions now supported for 
building glibc) - subject to checking how well it is optimized compared to 
the present code (e.g. if a field gets used multiple times, does the 
compiler optimize that to only doing a single endian conversion on it?); 
that could eliminate the need for any explicit endian conversions at all 
in some places.  But that would require careful consideration to gain 
consensus before actually introducing any uses into glibc.

> > By using those interfaces, tzfile.c, for example, could lose some of its 
> > existing endian checks (that would be a very small local change to the 
> > implementations of the decode and decode64 functions, larger changes are 
> > not needed and make the code less clean because the logical information 
> > that certain data is stored in the files in big-endian format is best kept 
> > local to the implementations of those two functions, rather than 
> > hardcoding that information in lots of places with read_be32 and read_be64 
> > names).
> 
> If a stream coming from a file is stored in big endian, why not be 
> explicit in the naming of functions used to decode it so that it is 
> clear which endian it is?

General principles of encapsulation of information in one place.  There is 
one piece of information "tzfile format uses big-endian" that is most 
cleanly kept in one place (so avoiding typos from a single place 
accidentally using *_le32 instead of *_be32, for example).

-- 
Joseph S. Myers
joseph@codesourcery.com



More information about the Libc-alpha mailing list