This is the mail archive of the
binutils@sources.redhat.com
mailing list for the binutils project.
Re: PATCH: Optimize IA64 brl to br
On Thu, Jan 29, 2004 at 10:23:35PM -0800, Jim Wilson wrote:
> On Thu, 2004-01-29 at 19:09, Richard Henderson wrote:
> > Is MBB really the best bundle to use for this? I seem to recall
> > that forcing a split issue. I'd think MIB or MMB would be better.
>
> Good point. I didn't check that.
>
> See section 3.3.2 "Dispersal Rules" of the Itanium2 Processor Reference
> Manual, in particular see table 3-4 on page 16 which shows that MBB
> always splits issue. However, the same section points out that MIB,
> MMB, and MFB always split issue unless the B insn is a nop.b or a brp
> (branch predict). So it does not appear to make any difference.
>
> The table 3-4 has some typos incidentally. I have the June 2002 version
> of the document. MBB is mentioned twice, both in the row and column
> indices. The second one is supposed to be MMB in both cases. Also, the
> MMB and MFB row entries are missing the 1 superscript that is present on
> the MIB row entry, to indicate that dual-issue only occurs if the B is a
> nop.b or brp.
>
> I don't see anything that would suggest any of these is better than the
> other.
This is the revised patch. Is it OK?
H.J.
-----
2004-01-30 H.J. Lu <hongjiu.lu@intel.com>
* elfxx-ia64.c (elfNN_ia64_relax_section): Optimize brl to br
during the relax finalize pass.
--- bfd/elfxx-ia64.c.brl 2004-01-28 14:54:28.000000000 -0800
+++ bfd/elfxx-ia64.c 2004-01-30 11:37:22.000000000 -0800
@@ -775,13 +775,30 @@ elfNN_ia64_relax_section (abfd, sec, lin
case R_IA64_PCREL21BI:
case R_IA64_PCREL21M:
case R_IA64_PCREL21F:
+ /* In the finalize pass, all br relaxations are done. We can
+ skip it. */
if (!link_info->need_relax_finalize)
continue;
is_branch = TRUE;
break;
+ case R_IA64_PCREL60B:
+ /* We can't optimize brl to br before the finalize pass since
+ br relaxations will increase the code size. Defer it to
+ the finalize pass. */
+ if (link_info->need_relax_finalize)
+ {
+ sec->need_finalize_relax = 1;
+ continue;
+ }
+ is_branch = TRUE;
+ break;
+
case R_IA64_LTOFF22X:
case R_IA64_LDXMOV:
+ /* We can't relax ldx/mov before the finalize pass since
+ br relaxations will increase the code size. Defer it to
+ the finalize pass. */
if (link_info->need_relax_finalize)
{
sec->need_finalize_relax = 1;
@@ -895,6 +912,51 @@ elfNN_ia64_relax_section (abfd, sec, lin
/* If the branch is in range, no need to do anything. */
if ((bfd_signed_vma) (symaddr - reladdr) >= -0x1000000
&& (bfd_signed_vma) (symaddr - reladdr) <= 0x0FFFFF0)
+ {
+ /* If the 60-bit branch is in 21-bit range, optimize it. */
+ if (r_type == R_IA64_PCREL60B)
+ {
+ int template;
+ bfd_byte *hit_addr;
+ bfd_vma t0, t1, i0, i1, i2;
+
+ hit_addr = (bfd_byte *) (contents + roff);
+ hit_addr -= (long) hit_addr & 0x3;
+ t0 = bfd_get_64 (abfd, hit_addr);
+ t1 = bfd_get_64 (abfd, hit_addr + 8);
+
+ /* Keep the instruction in slot 0. */
+ i0 = (t0 >> 5) & 0x1ffffffffffLL;
+ /* Use nop.b for slot 1. */
+ i1 = 0x4000000000LL;
+ /* For slot 2, turn brl into br by masking out bit
+ 40. */
+ i2 = (t1 >> 23) & 0x0ffffffffffLL;
+
+ /* Turn a MLX bundle into a MBB bundle with the
+ same stop-bit variety. */
+ template = 0x12;
+ if ((t0 & 0x1fLL) == 5)
+ template += 1;
+ t0 = (i1 << 46) | (i0 << 5) | template;
+ t1 = (i2 << 23) | (i1 >> 18);
+
+ bfd_put_64 (abfd, t0, hit_addr);
+ bfd_put_64 (abfd, t1, hit_addr + 8);
+
+ irel->r_info
+ = ELF64_R_INFO (ELF64_R_SYM (irel->r_info),
+ R_IA64_PCREL21B);
+
+ /* If the original relocation offset points to slot
+ 1, change it to slot 2. */
+ if ((irel->r_offset & 3) == 1)
+ irel->r_offset += 1;
+ }
+
+ continue;
+ }
+ else if (r_type == R_IA64_PCREL60B)
continue;
/* If the branch and target are in the same section, you've