ELF octets_per_byte

Dan dgisselq@verizon.net
Thu Feb 25 02:34:00 GMT 2016


Cary,

Please allow me to respond below,

On Wed, 2016-02-24 at 15:51 -0800, Cary Coutant wrote:
> > I am in the process of trying to port binutils to a new architecture,
> > the ZipCPU.  (You can find a description of it here:
> > https://opencores.org/project,zipcpu)  One unique "feature" of this
> > processor is that the size of the minimum addressable unit is 32-bits.
> >
> > While binutils has support for an "octets_per_byte" value other than
> > one, this feature does not appear to be fully supported.  Indeed, the
> > "bfd/elflink.c" file contains several "FIXME" lines regarding the
> > insufficiency of the current support.
> 
> It doesn't seem to me that having a minimum addressable unit of 32-bits
> necessarily makes your byte size 32 bits. That would actually surprise
> me quite a lot. You've simply described a word-addressed machine, and
> it would be quite sensible to continue to have four 8-bit bytes in each
> of those 32-bit words. 

While I would be tempted to agree with you wholeheartedly, the "octets"
versus "byte" definition is one I reverse engineered from within the gas
code for an assembler.  Within that code, there's a lot of support for
OCTETS_PER_BYTE being something other than 1 as well as
OCTETS_PER_BYTE_POWER being defined such that
(1<<OCTETS_PER_BYTE)=OCTETS_PER_BYTE_POWER.  That said, only two other
CPU's appear to have OCTETS_PER_BYTE set to anything other than 1: the
TI-C4x, and the TI-C54.

Using this terminology, a "byte" is the minimum addressible unit,
whereas an "octet" is 8-bits.

This "OCTETS_PER_BYTE" feature has some support within BFD, and it was
in the hopes of updating and correcting that support that I am writing.

> Does your C compiler evaluate sizeof(void *) to 1 or to 4?
> 

On the ZipCPU, ...

sizeof(char) = sizeof(int) = sizeof(void *) = 1 (32-bits)


> Indeed, I'm not at all surprised that there are many FIXME's
> associated with a parameter named "octets_per_byte". The name just
> doesn't make sense, unless it's a float: I'd expect such a parameter
> to be 1 or some fraction close to 1 (e.g., for 6- or 7- or 9-bit bytes
> typical of some legacy architectures from the 60s and 70s). But I
> guess bfd actually does use "byte" to mean "word" for word-addressed
> machines. Sigh. That's not a definition I would have chosen. You can
> mentally translate "byte" to "word" inside bfd, and to "octet" when
> reading the ELF spec.
> 
> > All of these can be easily fixed, and I would like to propose a patch
> > (or series of patches) to do this.  The first part of this process will
> > need to be identifying which ELF variables/values are "bytes" (units of
> > the targets address space), and which are "octets" (8-bit values, units
> > of the more commonly used address space).  Sadly, these units are not
> > consistent with the meaning of "bytes" found within the ELF
> > specification, nor can they be since the ELF specification does not
> > acknowledge the potential difference between these two.
> 
> Is it really the case that we haven't yet seen a word-addressed
> machine that uses ELF?
> 

A quick grep for OCTETS_PER_BYTE in binutils/gas/config/*.h reveals that
only the two TI chips have this feature.

> ELF states right up front that it's designed for 8-bit bytes and
> 32-bit or 64-bit architectures. It's an on-disk file format, so in
> today's world, that means it's byte oriented, where "byte" means the
> same thing as "octet" (a term invented by standards bodies just to
> avoid any bias against machines where bytes weren't clearly 8 bits
> long). Other than that, though, it's defined in terms of C structures,
> so its format is completely defined by the ABI of your target
> architecture.
> 
> ELF also clearly separates the notions of a file offset (Elfxx_Off)
> from that of a program address (Elfxx_Addr). Anything that's declared
> in the ELF spec as Elfxx_Off is a file offset, specified in bytes
> (octets), while anything that's declared as Elfxx_Addr is a machine
> address, whatever that means on your target machine. Sizes, whether on
> disk or in memory, are consistently described as in bytes.
> 
> > For the purpose of beginning a discussion, and based upon a reading of
> > the ELF specification, I propose the following values be in units of
> > "octets":
> >
> > section size
> > section header size
> > section header offset
> >
> > For the most part, these values *must* be in octets, or it will be
> > impossible to read and process an ELF file.
> >
> > I also propose that the following values are in units of target address
> > space "bytes":
> >
> > ELF header "entry" address
> 
> Yes, this is an Elfxx_Addr.
> 
> > section header address
> 
> No, this is an Elfxx_Off. It's not an "address"; it's a "file offset",
> so it must be in bytes.

Looks like I may have gotten one wrong, then.  However, I think the
current functionality stores "bytes" and not "octets" into this field.
I'll have to go back and double check.

> 
> > symbol value
> 
> Yes, this is an Elfxx_Addr.
> 
> > symbol size
> 
> No, this is a size, in bytes.
> 
> > relocation offset
> 
> Yes, this is an Elfxx_Addr. It's described as an "offset", because it
> may be relative to the start of a section, but it's not a "file
> offset".
> 
> > relocation addend
> 
> This is just a pure number used in relocation processing, so it makes
> sense that it should be in the same units as a symbol value.
> 
> -cary

I'll go dig into the section header address to see if that needs
adjusting for how it is kept.

Dan



More information about the Binutils mailing list