This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Questions regarding address relaxation on IA-64
- From: "Alexander Monakov" <monoid at ispras dot ru>
- To: binutils at sourceware dot org
- Date: Thu, 22 Mar 2007 20:21:26 +0300
- Subject: Questions regarding address relaxation on IA-64
Hi,
Recently, I ran into a problem with compiler-linker interaction when
analyzing a code generation regression of the new instruction scheduler
for GCC that we develop.
It seems that on IA64 addresses of global variables are loaded with two
instructions: "addl rXX = r1, <offset>" and "ld8 rXX = [rXX]", with the
latter being later changed to "nop" by the linker. This causes the
following questions:
* Is the purpose of the "ld8" instruction to load the correct offset if
it does not fit into "addl" immediate operand?
* Is it possible to use "movl rXX = <offset>" (move long immediate, in
MLX bundle) + "addl rXX = r1, rXX" for the same purpose?
* Is it possible to tell compiler and linker that offsets will be small
enough so that only "addl rXX = r1, <offset>" will be needed (and if it is
not possbile, why)?
I have noticed that with -mno-pic GCC generates "movl rXX = <address>"
(MLX bundle). This causes a couple of questions, too:
* Is it possible to use "mov rXX = <offset-or-address>" (short immediate
form) + "ld8 rXX = [rXX]", with ld8 being changed to "nop" by linker if
necessary?
* Why is mov+ld8 preferred in PIC code, and movl - in non-PIC code?
The problem is as follows: the benchmark contains a very frequently called
function that accesses a number of global variables. For loads of those
variables' addresses, GCC generates something like this:
addl r46 = <offset1>, r1
addl r47 = <offset2>, r1
...
addl r56 = <offsetN>, r1
ld8 r46 = [r46]
...
ld8 r56 = [r56]
On Itanium2, 8-byte loads can issue from memory ports 0 and 1 only, so our
scheduler places stop bits after each pair of ld8s to avoid stalls due to
resource oversubscription. However, the previous scheduler did not care so
much, and that brought it a lot of advantage, because all ld8s were
changed to nops by linker, and code generated by new scheduler waited
unnecessary on extra stop bits.
What can you suggest to solve this problem? Maybe linker should be taught
to delete stop bit following a bundle, if it relaxed the bundle so that it
consists of nops only, and there is a stop bit preceding this bundle?
Thanks in advance.
Alexander Monakov