This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: ELF octets_per_byte
- From: Cary Coutant <ccoutant at gmail dot com>
- To: dgisselq at ieee dot org
- Cc: Binutils <binutils at sourceware dot org>
- Date: Wed, 24 Feb 2016 15:51:49 -0800
- Subject: Re: ELF octets_per_byte
- Authentication-results: sourceware.org; auth=none
- References: <1456242622 dot 30661 dot 448 dot camel at jericho>
> I am in the process of trying to port binutils to a new architecture,
> the ZipCPU. (You can find a description of it here:
> https://opencores.org/project,zipcpu) One unique "feature" of this
> processor is that the size of the minimum addressable unit is 32-bits.
>
> While binutils has support for an "octets_per_byte" value other than
> one, this feature does not appear to be fully supported. Indeed, the
> "bfd/elflink.c" file contains several "FIXME" lines regarding the
> insufficiency of the current support.
It doesn't seem to me that having a minimum addressable unit of
32-bits necessarily makes your byte size 32 bits. That would actually
surprise me quite a lot. You've simply described a word-addressed
machine, and it would be quite sensible to continue to have four 8-bit
bytes in each of those 32-bit words. Does your C compiler evaluate
sizeof(void *) to 1 or to 4?
Indeed, I'm not at all surprised that there are many FIXME's
associated with a parameter named "octets_per_byte". The name just
doesn't make sense, unless it's a float: I'd expect such a parameter
to be 1 or some fraction close to 1 (e.g., for 6- or 7- or 9-bit bytes
typical of some legacy architectures from the 60s and 70s). But I
guess bfd actually does use "byte" to mean "word" for word-addressed
machines. Sigh. That's not a definition I would have chosen. You can
mentally translate "byte" to "word" inside bfd, and to "octet" when
reading the ELF spec.
> All of these can be easily fixed, and I would like to propose a patch
> (or series of patches) to do this. The first part of this process will
> need to be identifying which ELF variables/values are "bytes" (units of
> the targets address space), and which are "octets" (8-bit values, units
> of the more commonly used address space). Sadly, these units are not
> consistent with the meaning of "bytes" found within the ELF
> specification, nor can they be since the ELF specification does not
> acknowledge the potential difference between these two.
Is it really the case that we haven't yet seen a word-addressed
machine that uses ELF?
ELF states right up front that it's designed for 8-bit bytes and
32-bit or 64-bit architectures. It's an on-disk file format, so in
today's world, that means it's byte oriented, where "byte" means the
same thing as "octet" (a term invented by standards bodies just to
avoid any bias against machines where bytes weren't clearly 8 bits
long). Other than that, though, it's defined in terms of C structures,
so its format is completely defined by the ABI of your target
architecture.
ELF also clearly separates the notions of a file offset (Elfxx_Off)
from that of a program address (Elfxx_Addr). Anything that's declared
in the ELF spec as Elfxx_Off is a file offset, specified in bytes
(octets), while anything that's declared as Elfxx_Addr is a machine
address, whatever that means on your target machine. Sizes, whether on
disk or in memory, are consistently described as in bytes.
> For the purpose of beginning a discussion, and based upon a reading of
> the ELF specification, I propose the following values be in units of
> "octets":
>
> section size
> section header size
> section header offset
>
> For the most part, these values *must* be in octets, or it will be
> impossible to read and process an ELF file.
>
> I also propose that the following values are in units of target address
> space "bytes":
>
> ELF header "entry" address
Yes, this is an Elfxx_Addr.
> section header address
No, this is an Elfxx_Off. It's not an "address"; it's a "file offset",
so it must be in bytes.
> symbol value
Yes, this is an Elfxx_Addr.
> symbol size
No, this is a size, in bytes.
> relocation offset
Yes, this is an Elfxx_Addr. It's described as an "offset", because it
may be relative to the start of a section, but it's not a "file
offset".
> relocation addend
This is just a pure number used in relocation processing, so it makes
sense that it should be in the same units as a symbol value.
-cary