ELF octets_per_byte

Cary Coutant ccoutant@gmail.com
Thu Feb 25 00:13:00 GMT 2016


>   7.6 Variable Length Data
>
>   [...]
>   Unsigned LEB128 (ULEB128) numbers are encoded as follows: start at
>   the low order end of an unsigned integer and chop it into 7-bit chunks.
>   Place each chunk into the low order 7 bits of a byte. Typically,
>   several of the high order bytes will be zero; discard them. Emit the
>   remaining bytes in a stream, starting with the low order byte; set
>   the high order bit on each byte except the last emitted byte. The
>   high bit of zero on the last byte indicates to the decoder that it has
>   encountered the last byte.
>
> Note the use of the word "byte" here.  I could not find any reference to octets
> in the document, but I think that the implication is that a byte is the smallest
> addressable storage unit available on the target architecture, and not necessarily
> always an 8-bit quantity.  This does mean however that for targets with 32-bit
> bytes for example, LEB128 encoding is very wasteful of space...
>
> If the specification intends that a "byte" is an 8-bit quantity then it needs to
> specify how these 8-bit values are stored into a target storage unit when the
> storage unit is larger than 8 bits, (ie little endian vs big endian).  Plus it
> should state whether a LEB128 value is padded out to fill a whole number of
> storage units, or if they are packed in as tightly as possible.  Plus the
> algorithm in Appendix C ought to be extended to reference packing octets into
> bytes...

In machines where the smallest addressable unit is 12 bits or larger,
those units are typically called "words", not "bytes". It's standard
practice on such machines to continue to refer to smaller units that
can hold a character as bytes, even when they can't be individually
addressed. The DEC-10, for example, had 36-bit words, and
byte-handling instructions flexible enough to handle an arbitrary byte
size, so you could pack four 8-bit bytes or five 7-bit bytes (the most
common) or six 6-bit bytes into a word.

DWARF is an on-disk representation, so "byte" most definitely refers
to an 8-bit quantity. I don't think we really need to care how the
target architecture packs them into machine storage units.

The DWARF spec does say, "DWARF can be used with a wide range of
processor architectures, whether byte or word oriented, linear or
segmented, with any word or byte size." But it doesn't *really* mean
"any byte size" :-). As you've pointed out, the definition of LEB128
is wasteful when there are more than 8 bits in a byte, and doesn't
work at all when there are fewer. In practice, the target machine can
define a byte however it wants to, but the DWARF representation on
disk still needs to use 8-bit bytes. We could perhaps make that
clearer.

-cary



More information about the Binutils mailing list