Problems with relocations for a custom ISA

Tue Aug 8 17:26:42 GMT 2023

Am 2023-08-08 um 17:35 schrieb Michael Matz:
> Hello,
>
> On Tue, 8 Aug 2023, MegaIng wrote:
>
>>>> Most of the basics I already managed to implement, i.e. I can generate
>>>> simple
>>>> workable ELF files. However, I am running into problems with relocations
>>>> for
>>>> "load immediate" instructions. Without extensions, we want to potentially
>>>> emit
>>>> long chains of instruction (3 to 8 instructions is realistic), but with
>>>> proper
>>>> extensions in can get down to only 1 instruction of 3 or 4 bytes. I am
>>>> unsure
>>>> how to best represent such variable length relocations in BFD and ELF.
>>> The normal way would be to not do that.  It seems the assembler will
>>> already see either a long chain of small insns, or a single large insn,
>>> right?
>> Our idea was that the user can use a simple pseudo instruction to
>> represent the entire process of loading a symbol (or any immediate for
>> that matter).
> Pseudo instruction makes sense.  But then it would still be the assembler
> that expands it to either a couple base insns or a single extended insn.
> The linker would see only one or the other, and hence also only the base
> or the extended relocs.
>
> Or did you really want to reserve some specific byte encoding for this
> pseudo instruction to transfer it from assembler via object file to linker
> and let only the linker replace that by one or the other variant?  That
> seems an unnecessarily complicated scheme.  It depends on if the assembler
> does or doesn't know if it can target the extended insns, or only the base
> ones.  I would definitely suggest that the assembler at latest should know
> this.

It wasn't our idea to have a specific bit pattern reserved for that, that
would be quite weird, I agree :-) I think the linker needs knowlegde about
which extensions are available, for that we would use an attributes section
similar to what RISC-V seems to use. (although, maybe we don't need it if we
have many relocation types)

>>> (obviously details will differ, your 16bit insns won't be able to quite
>>> set all 16 bits :) ).
>>> If you really want to optimize these sequences also at link time (but
>>> why?) then all of this becomes more complicated, but remains essentially
>>> the same.  The secret will then be in linking from one of the small relocs
>>> (say, the high16 one) to the other, for the linker to easily recognize the
>>> whole insn pair and appropriately do something about those byte sequences.
>>> In that scheme you need to differ between relocations applied to relaxable
>>> code and relocation applied to random non-relaxable data.  E.g. you
>>> probably need two variants of the RELOC_LOW16 relocation.
>> Not sure if you took a look at our instruction set: The way you would load an
>> arbitrary 16bit word is via a sequence of `slo` (shift left 5 and or)
>> instructions which use a 5bit immediate (the largest we have in base). So
>> breaking it up into two RELOC_LOW_16 or similar wouldn't quite work.
> Sure, as I said above: "obviously details will differ".
>
>> It would have to be 3-4 RELOC_BITS_0_4, RELOC_BITS_5_9 RELOC_BITS_10_15
>> or something like that. And you couldn't exactly remove one of those
>> without changing the others.
> Yes, this is the usual way to express that.  There are many architectures
> which have similar ISA restrictions and they all do it essentially the
> same way: "select X bits from value, put them into Y bits of field", for
> potentially many combinations of (not necessarily consecutive) X and Y.
>
>> But ofcourse, we don't always need all 4
>> instructions, sometimes we can get away with only two or three, for
>> example if it's only an 8bit value, we only need 2 instructions. We
>> would like to optimize these cases somewhere.
> I see.  Yeah, that will ultimately need some linker relaxation as only
> that one will know for sure which values symbols have, and hence if they
> do or do not fit certain constraints.
>
>> After a bit more
>> discussion we came to the idea of having many relocations that
>> potentially cover multiple instructions so that the entire
>> load-immediate sequence can be covered by one relocation,
> As you have only such a short immediate field in the base ISA this seems
> like a sensible idea, as otherwise, as you say, you need 7 relocations
> (and insns) for a full 32bit load.
>
>> but this is quite a large amount of relocations.
> Hmm?  I don't understand this remark.  If you cover a range of
> instructions by one relocation you necessarily need fewer relocs than if
> you use one reloc per insn?

I was considering a large amount of relocation types as a drawback, but 
I now realize
that this can't be avoided no matter which path we chose. We are now 
going to have the
large multi-instruction relocations that can be relaxed one instruction 
at a time instead
of the bit-selection relocations.

>>> I wouldn't go that way if I were you: it seems the assembler/compiler
>>> needs to know if targeting the extended ISA or not anyway, so generating
>>> the right instructions and relocations from the start in the assembler
>>> seems the right choice, and then doesn't need any relax complications at
>>> link time.
>> As long as the range (or even the exact value) of the symbol is known at
>> assembly time, this is ofcourse true, but what about situations where nothing
>> about the range of the value is known?
> The compiler/assembler would always emit the full sequence (e.g. assumes
> that the symbol in question happens to be full 32bit).  If you want to
> optimize this use in case the symbol happens to need fewer bits, then yes,
> you do need linker relaxation.  As said, you then need a way in the linker
> to recognize an insn sequence that "belongs" together, so that you can
> appropriately optimize this, either by referring from one to the next
> reloc in such a chain, or by simply assuming that such sequences are
> always done in a certain order (i.e. a simple pattern match; unrecognized
> patterns would remain unrelaxed/unoptimized).
>
> The basic form of relocations doesn't depend on that, though.  You still
> need to differ between the lowest N bits of the requested value, the next
> N bits, the next N bits, and so on, so you do need roundup(32/N) reloc
> types either way.
>
> By restricting certain insn sequences and flexibility you can get away
> with fewer relocations than this.  E.g. with your idea of covering
> multiple insns with one reloc.  Say, if you require that the low 10 bits
> of a value are always set in this way (and given your ISA that makes
> sense):
>
>     shiftset5 %r1, bit04(sym)
>     shiftset5 %r1, bit59(sym)
>
> and never with another insn in between, and never in a difference order,
> then of course you can get away with a relocation (say) RELOC_SHIFTSET10,
> that takes the low 10 bits of 'sym' and appropriate distributes those 10
> bits into the right 5 bit field of the instruction.  It would implicitely
> cover both instructions, i.e. a 32bit place in the code section.
>
> If you extend this idea to cover seven instructions of the base ISA you
> can get away with a single reloc that is able to set the whole 32bit of a
> value (at the expense of not being able to place unrelated instructions
> between those seven).
My primary interested is to support to load-immediate pseudo opcode, so
I am not going to worry about stuff users could manually write. I don't 
think
there could ever be a benefit to put instruction in the middle of that, so I
am not gonna worry about that.
Although, we might have to split into multiple relocations since bfd 
set's an
upper limit on the amount of bytes a relocation can cover by using a 4-wide
bitfield for that.
>> It seems like other assembler targets truncate the values in those
>> cases? If we went for the minimal representation we would basically
>> limit external symbols to 5bit, which isn't exactly ideal. And from what
>> I can tell, growing a relocation also isn't really something bfd is
>> designed to deal with, right?
> I'm not super fluent in the actual implementation of bfd linker
> relaxation.  But I don't see why it can't also grow sections.  It's true
> that the usual relaxation shrinks sizes, and it's probably better to
> follow that as well, but in principle enlarging is no proble either (if
> you enlarge _and_ shrink in your relaxation you can run into
> endless oscillation between the two, so that needs to be watched for).
>
> But one thing about terminology: relocations themself don't grow or
> shrink.  A relocation in principle applies to a certain address without
> range.  The semantics of a specific relocation type will usually say that
> these-and-those bits in a field will be changed by it, and you can say
> that that's the size of a relocation.  But not all relocations are like
> that, and nothing really prevents you from either changing the relocation
> type when you want something else (in linker relaxation), or even defining
> a funny type that applies to either (say) a byte or a word, as needed.
> You need to implement special functions for such relocs then, and can't
> use the generic simple BFD reloc howto model, but still.
>
> Just to expand on this: in principle one could invent a relocation type
> that says "when the symbol has value '1' change the byte 45 bytes
> from here to 42, when it has another value then encode that one into the
> word 7 bytes from here".  That's obviously a crazy semantics for a
> relocation, but nothing inherently prevents you from that.  (Of course,
> making sure that there actually _is_ something 45 bytes from the relocs
> place is a problem :) )  The "size" of such relocation wouldn't be
> well-defined anymore (or be 46), but what I'm saying is, that this is
> okayish.
>
> What does grow or shrink is the section content, and hence distance
> between labels might change during relaxation, which requires delaying
> resolving jumps until relaxation time as well.  This can get quite slow at
> link time (riscv is plagued by this).  Just to make you aware :)
Yeah, thank you, my word choice was a bit confused. The speed penalty is 
something
we are probably not gonna worry about for the moment, but we will keep 
it in mind.
> One remark: you _really_ should think long and hard about your immediate
> size in the base ISA.  5 bits is terribly small.  Maybe you can snatch
> away some bits here and there in your 16bit insns to make this 8 bits
> (something that divides 32 would be ideal), but even 6 would bring the
> full-32-bit sequence from 7 to 6 instructions.
This is something we had discussed a few times and came to the conclusion
that we prefer the current encoding. We wanted 16bit opcodes and 
byte-aligned
sections and from there the choices do get quite limited. We also wanted
a simple encoding, so we didn't want to have too many complex tricks.
>
> Ciao,
> Michael.
Thank you for taking your time :-)