[PATCH 0/9] RISC-V: Implement support for big endian targets

Sun Dec 20 10:00:06 GMT 2020

Nelson Chu <nelson.chu@sifive.com> writes:

> Yeah here should be the right place to develop and submit the patch.  I
> won’t have time in few days, so before that, it would be better to add the
> ChangLogs in your commits, and give more details of what you add? and why
> you choose these way to do?

Sure, I'll add a ChangeLog entry.

As for details of the changes, a lot of it is just boilerplate
creating new configs which are idential to the old little endian ones
except big endian.  I'll try to detail the ones that are not below.

* Flags -mbig-endian and -mlittle-endian added and documented

These allow the default endianness to be overridden.  It is standard
for dual endian arches to support these flags in one form or another.

* md_number_to_chars in gas/config/tc-riscv.c is changed to
  acknowledge target_big_endian

When encoding numbers (i.e. data) into the binary, the target
endianness should be taken into account.  Thus instead of always
calling number_to_chars_littleendian, the function now calls either
number_to_chars_littleendian or number_to_chars_bigendian depending
on target endianness.

* install_insn in /gas/config/tc-riscv.c is changed from
  using md_number_to_chars to using number_to_chars_littleendian

This is a consequence of the previous one.  Unlike data, instructions
are always little endian, and so it is no longer approprate to use
md_number_to_chars.  Instead, number_to_chars_littleendian is called
directly.

* Changes to function calls for instruction access in
  bfd/elfnn-riscv.c

The relocation processing functions in bfd/elfnn-riscv.c were using a
mix of bfd_get_32, bfd_put_32, buf_put_16, bfd_get, and bfd_put to
access instructions in binary format.  Since all of these take
target_big_endian into account, they are inappropriate for
instructions, which are always little endian.

For bfd_get_32, bfd_put_32, and buf_put_16, I simply changed these
calls to use bfd_getl32, bfd_putl32, and buf_putl16 instead, which was
the minimal change required to make them correct.  In the case of
bfd_get and bfd_put, there was no direct equivalent which was always
little endian (number_to_chars_littleendian can't be used since it
lives in gas, not bfd), so I had to create new macros for that, which
I called riscv_get_insn and riscv_put_insn.  The implementation is
identical to that of bfd_get and bfd_put, except that bfd_getl* and
bfd_putl* are used instead of bfd_get_* and bfd_put_*, and that the 8
bits case is not handled (since there are no 8-bit instructions on
RISC-V).  Thus, they can handle 16, 32, and 64 bit instructions and
instruction clusters, which is what is in use today, but they could be
extended to also handle 48 bit and 80 bit should the need arise.

* Changes to functions calls for instruction access in
  gas/config/tc-riscv.c

Here md_number_to_chars was used to write back modified branch/jump
instructions.  Since this is in gas, number_to_chars_littleendian
could have been used as a replacement, but I opted to use bfd_putl32
instead, since the size was statically known and the call to _read_
the instruction was bfd_getl32.

* perform_relocation in bfd/elfnn-riscv.c applies heuristic to
  determine if relocation is against data or instructions

Since data and instructions can have different byteorder, it is
important to know if the target of a relocation is one or the other 
when updating the value in memory.  The function perform_relocation
operates on both instruction and data relocations, and thus needs to
decide based on the reloc_howto_type which it is.  I added a function
riscv_is_insn_reloc for this purpose.  The current implementation uses
the following heuristic:

  o  8-bit relocations must be data since there are no 8-bit
     instructions

  o  Otherwise, if the dst_mask covers the entire bitsize, then it
     must be data (since otherwise it would modify the opcode)

  o  Otherwise, it is an instruction

This is basically the same heuristic which is already in place for
aarch64, another bi-endian architecture which has instructions that
are always little endian.  AFAICT it yields the correct result for all
current relocations.  If needed, special cases could be added for
individual values of howto->type.

  // Marcus