This is the mail archive of the
binutils@sourceware.cygnus.com
mailing list for the binutils project.
Re: Link-time relaxing for m68k ELF
- To: binutils at sourceware dot cygnus dot com
- Subject: Re: Link-time relaxing for m68k ELF
- From: msokolov at ivan dot Harhan dot ORG (Michael Sokolov)
- Date: Tue, 18 Apr 00 12:08:52 CDT
Ian Lance Taylor <ian@zembu.com> wrote:
> If you want to handle all possible cases, you will need to have a
> quite complex set of opcode translations.
I don't know what you mean by all possible cases, I was talking about what the
assembly-time relaxer in gas does now. It handles:
* Branches, all 4 kinds: unconditional, conditional, coprocessor, and DBcc.
* PC-relative operands. There is the 16-bit one which is available on all CPUs
and is simple (the operand consists only of the displacement word), and on
68020 and higher there is a 32-bit one. The latter is inconvenient and
expensive, as it involves the wacky 68020 addressing modes and an additional
word in the operand besides the two displacement words, so relaxing these is
important.
* PC-relative + index register operands. These involve a complicated addressing
mode on all CPUs. On 68000 they can only be 8-bit, and on 68020 and higher they
can also be 16-bit or 32-bit.
* Relaxing an absolute reference operand, which takes two words, into a 16-bit
PC-relative one, which takes one word, if it fits. Works on all CPUs.
I think this actually does cover all possible cases of PC-relative relocs in
assembly code.
The problem with the current relaxer (and not a hypothetical one, but one that
really hits me in my m68k work) is that the (arguably most important) case of
branch relaxation doesn't adequately handle the different CPU types and PIC
styles. Long branches are available on 68020 and higher, but not on 68000. The
current relaxer doesn't put enough emphasis on this. Also the current relaxer
feels free to replace branches with absolute jumps (primarily on 68000 when it
wants a long branch which is not available, but also in some other cases on all
CPUs). This is a problem for PC-relative position-independent code.
I want a new relaxer that will look at the CPU type, take directions from the
user as to whether convertion to absolute jumps is OK or not, and based on
these two factors plus the distance to the target choose the best possible
allowed instruction or sequence. The current relaxer isn't really adept to this
kind of functionality and would have to reimplemented. Given the choice, I
would much rather reimplement it in the linker than in the assembler, which
brings me to present.
> It will be more tractable
> to only handle cases which could reasonably be generated by a
> compiler.
What I want to implement, and what I have the guts for, is for the linker
relaxer to handle the same things the gas relaxer handles. They are all really
ordinary and occur all the time in normal code, compiler-generated or hand-
written.
Actually the approach of generating the longest worst-case instruction or
sequence in the assembler and then having the linker shrink it as much as it
can makes this quite easy to implement. The longest worst-case instruction or
sequence generation step in where we need to consider the CPU type and the
user's PIC requirements. As this is still in the assembler, this is easy. For
branches there will be four choices for the assembler in this order of
preference:
1. If we are on a 68020, generate a long branch.
2. If we are on a 68000 and absolute jumps are OK, generate one.
3. If we are on a 68000, absolute jumps are not allowed, but the branch is
actually a C function call that follows the C calling convention (we'll need an
assembly-level indication of this), we can generate a sequence like the
following, effecting a 32-bit PC-relative call:
.word 0x403C | movel #imm,%d0
.long <R_68K_PC32 reloc>
.word 0x4EBB | jsr disp8(%pc,%d0.l)
.word 0x08FA | (index reg is %d0.l, disp8 such that the
| R_68K_PC32 reloc above works normally)
4. If none of the above can be used, generate a word-sized branch. If it
overflows, we did our best. But if it's actually very short, it can be relaxed
to a byte-sized branch.
For each relaxable instruction or sequence generated, emit a relaxability
marker indicating its type. Then the link relaxer's job is straightforward: for
each type of relaxable instruction or sequence, there is only one type of
instruction it can be relaxed to, which itself may or may not be further
relaxable. This makes the link relaxer a simple state machine. The above
sequences will all relax to simple word-sized branches, which will be further
relaxable to byte-sized ones if they are simple processor branches, but not if
they are coprocessor branches or DBcc's.
Regarding the wacky R_68K_PC32 call sequence for 68000 above, I will have two
more in that spirit. In addition to branches, other PC-relative references that
constantly occur in normal code are PC-relative operands. These are currently
relaxed between 16-bit or 32-bit for 68020, but only 16-bit is available on
68000. When it overflows, you are SOL. However, in practical compiler-generated
PC-relative code all actual PC-relative operands are in lea and pea
instructions. The following sequences will effect R_68K_PC32 lea and pea for
68000:
.word 0x41FA | lea disp16(%pc),%a0
.word 0x0004 | (disp16 such that the R_68K_PC32 reloc below
| works normally)
.word 0xD1FC | addal #imm,%a0
.long <R_68K_PC32 reloc>
.word 0x487A | pea disp16(%pc)
.word 0x0004 | (disp16 such that the R_68K_PC32 reloc below
| works normally)
.word 0x0697 | addil #imm,(%sp)
.long <R_68K_PC32 reloc>
The linker will then relax them to normal R_68K_PC16 lea and pea using its
normal relaxing state machine just like ordinary single instructions.
> [using special relocs to indicate relaxability and relocation slots under
> .rela for control information]
>
> This is weird but I think it's doable.
But that's how all link-time relaxing is, isn't it? :-)
> An alternative would be to only permit relaxing for a direct symbolic
> reference, and to use the r_addend field to hold the offset to the
> start of the instruction. A non-direct reference requires a
> conventional non-relaxable reloc.
No, I don't like this. Given the above wacky sequences and all that, relaxing
is really important to me, and I don't want things to be needlessly declared
non-relaxable.
> Or, similarly, you can arbitrarily declare that r_addend is limited to
> 29 bits, and use 3 bits for the offset to the start of the
> instruction. A reference using a larger addend requires a
> conventional non-relaxable reloc.
>
> Or you could burn relocation space, and allocate 24 relocs, and use
> three bits for the offset to the start of the instruction.
No, this would be ugly and come back to haunt you when you handle them as
normal relocs. Also the offset to the beginning of the instruction is not all I
need. I also want to store an explicit relaxing mode code somewhere, as trying
to recover it from the opcode would require a full m68k disassembler in BFD
(and we've got the wacky sequences too, not just single instructions!). Storing
an explicit code somewhere would allow implementing the link relaxer as a
straightforward state machine.
> If you want to go really crazy, you could relax GOT and PLT
> references, too. Then you would need even more relocations. But it's
> probably not worth it.
No, I won't touch anything having to do with ELF UNIX ABIs and shared
libraries, only normal processor functionality usable by embedded systems and
old-fashioned UNIXophiles like me.
Here's what I'm thinking about. Rather than use any special relocs at all, what
about creating special sections with relaxation control information? I.e.,
have, say, .gnurelax.text going in parallel with .text and .rela.text. Have
SHF_ALLOC off for these sections so that they don't occupy address space and
have the standard link scripts discard them. All relocs will still be normal,
so objects assembled for GNU link relaxing will still comply with the ABI and
be usable by other linkers, even if inefficient. Have the relaxing routine
parse the relaxing control section and run the state machine encoded there.
The more I think about it, the more I like this better than special relocs.
Ian, what do you think?
--
Michael Sokolov Harhan Engineering Laboratory
Public Service Agent International Free Computing Task Force
International Engineering and Science Task Force
615 N GOOD LATIMER EXPY STE #4
DALLAS TX 75204-5852 USA
Phone: +1-214-824-7693 (Harhan Eng Lab office)
E-mail: msokolov@ivan.Harhan.ORG (ARPA TCP/SMTP) (UUCP coming soon)