[PATCH] gold: Add linker relaxation of tail calls on sparc.

David Miller davem@davemloft.net
Tue Apr 24 21:45:00 GMT 2012


From: David Miller <davem@davemloft.net>
Date: Mon, 23 Apr 2012 03:08:21 -0400 (EDT)

Ian, could you please review this patch?

Thanks a lot.

> This gets GOLD in line with the relaxations performed by the
> BFD linker on sparc.  Basically we can turn a tail call into
> a branch if certain conditions are satisfied, and in particular
> the final call displacement is small enough.
> 
> The code is heavily and thoroughly commented.
> 
> I've validated that the relaxation triggers during the testsuite on
> both 32-bit and 64-bit sparc, and that the transformations performed
> are correct.
> 
> I looked at the relaxation support that exists for the kind of stuff
> ARM does with stubs where section sizes and offsets end up changing,
> and that didn't seem appropriate nor necessary for what I'm doing
> here.  So I implemented this similarly to how the TLS code
> optimizations are done.
> 
> This is relative to "[PATCH] gold: Maintain sparc ELF header bits
> properly" for which I'm still waiting approval.
> 
> After this I only have 3 items on my agenda for GOLD/Sparc:
> 
> 1) Implement proper R_SPARC_REGISTER support, this has been bugging
>    me for years but last time I tried this it was hard to frob out
>    dummy relocations and/or symbols and have GOLD just leave them
>    completely alone and let them propagate unimpeded to the target
>    relocation function.
> 
>    My frustration matched that which Roland recently endured with
>    those special ARM PLT symbols.  :-) Hopefully this kind of thing is
>    easier now.
> 
> 2) Fix all of the bugs wrt. large PLTs on sparc64.  As I mentioned
>    in my IFUNC patch posting, in order to get this right we'll need
>    to pass PLT indexes around rather than PLT offsets.
> 
>    The core issue is that PLTs are variably sized on 64-bit sparc, the
>    first 32767 PLT slots are one size, and then the format and the
>    size of the PLT entries change starting at index 32768.  So if you
>    have an IFUNC plt slot, you won't know how big it's going to be
>    until you know it's final index.
> 
> 3) Implement full incremental linking support just like x86-64 does.
>    Any known gotchas or implementation issues would be appreciated.
> 
> Thanks.
> 
> gold/
> 
> 	* sparc.cc (Target_sparc::Relocate::relax_call): New function.
> 	(Target_sparc::Relocate::relocate): Call it for R_SPARC_WDISP30
> 	and R_SPARC_WPLT30.
> 
> diff --git a/gold/sparc.cc b/gold/sparc.cc
> index 004a265..a9fe9de 100644
> --- a/gold/sparc.cc
> +++ b/gold/sparc.cc
> @@ -333,6 +333,11 @@ class Target_sparc : public Sized_target<size, big_endian>
>  		 typename elfcpp::Elf_types<size>::Elf_Addr,
>  		 section_size_type);
>  
> +    inline void
> +    relax_call(Target_sparc<size, big_endian>* target,
> +	       unsigned char* view,
> +	       const elfcpp::Rela<size, big_endian>& rela);
> +
>      // Ignore the next relocation which should be R_SPARC_TLS_GD_ADD
>      bool ignore_gd_add_;
>  
> @@ -3304,6 +3309,8 @@ Target_sparc<size, big_endian>::Relocate::relocate(
>      case elfcpp::R_SPARC_WDISP30:
>      case elfcpp::R_SPARC_WPLT30:
>        Reloc::wdisp30(view, object, psymval, addend, address);
> +      if (target->may_relax())
> +	relax_call(target, view, rela);
>        break;
>  
>      case elfcpp::R_SPARC_WDISP22:
> @@ -3954,6 +3961,145 @@ Target_sparc<size, big_endian>::Relocate::relocate_tls(
>      }
>  }
>  
> +// Relax a call instruction.
> +
> +template<int size, bool big_endian>
> +inline void
> +Target_sparc<size, big_endian>::Relocate::relax_call(
> +    Target_sparc<size, big_endian>* target,
> +    unsigned char* view,
> +    const elfcpp::Rela<size, big_endian>& rela)
> +{
> +  typedef typename elfcpp::Swap<32, true>::Valtype Insntype;
> +  Insntype *wv = reinterpret_cast<Insntype*>(view);
> +  Insntype call_insn, delay_insn, set_insn;
> +  uint32_t op3, reg, off;
> +
> +  // This code tries to relax call instructions that meet
> +  // certain criteria.
> +  //
> +  // The first criteria is that the call must be such that the return
> +  // address which the call writes into %o7 is unused.  Two sequences
> +  // meet this criteria, and are used to implement tail calls.
> +  //
> +  // Leaf function tail call:
> +  //
> +  // or %o7, %g0, %ANY_REG
> +  // call FUNC
> +  //  or %ANY_REG, %g0, %o7
> +  //
> +  // Non-leaf function tail call:
> +  //
> +  // call FUNC
> +  //  restore
> +  //
> +  // The second criteria is that the call destination is close.  If
> +  // the displacement can fit in a signed 22-bit immediate field of a
> +  // pre-V9 branch, we can do it.  If we are generating a 64-bit
> +  // object or a 32-bit object with ELF machine type EF_SPARC32PLUS,
> +  // and the displacement fits in a signed 19-bit immediate field,
> +  // then we can use a V9 branch.
> +
> +  call_insn = elfcpp::Swap<32, true>::readval(wv);
> +  delay_insn = elfcpp::Swap<32, true>::readval(wv + 1);
> +
> +  // Make sure it is really a call instruction.
> +  if (((call_insn >> 30) & 0x3) != 1)
> +    return;
> +
> +  if (((delay_insn >> 30) & 0x3) != 2)
> +    return;
> +
> +  // Accept only a restore or an integer arithmetic operation whose
> +  // sole side effect is to write the %o7 register (and perhaps set
> +  // the condition codes, which are considered clobbered across
> +  // function calls).
> +  //
> +  // For example, we don't want to match a tagged addition or
> +  // subtraction.  We also don't want to match something like a
> +  // divide.
> +  //
> +  // Specifically we accept add{,cc}, and{,cc}, or{,cc},
> +  // xor{,cc}, sub{,cc}, andn{,cc}, orn{,cc}, and xnor{,cc}.
> +
> +  op3 = (delay_insn >> 19) & 0x3f;
> +  reg = (delay_insn >> 25) & 0x1f;
> +  if (op3 != 0x3d
> +      && ((op3 & 0x28) != 0 || reg != 15))
> +    return;
> +
> +  // For non-restore instructions, make sure %o7 isn't
> +  // an input.
> +  if (op3 != 0x3d)
> +    {
> +      // First check RS1
> +      reg = (delay_insn >> 14) & 0x15;
> +      if (reg == 15)
> +	return;
> +
> +      // And if non-immediate, check RS2
> +      if (((delay_insn >> 13) & 1) == 0)
> +	{
> +	  reg = (delay_insn & 0x1f);
> +	  if (reg == 15)
> +	    return;
> +	}
> +    }
> +
> +  // Now check the branch distance.  We are called after the
> +  // call has been relocated, so we just have to peek at the
> +  // offset contained in the instruction.
> +  off = call_insn & 0x3fffffff;
> +  if ((off & 0x3fe00000)
> +      && (off & 0x3fe00000) != 0x3fe00000)
> +    return;
> +
> +  if ((size == 64 || target->elf_machine_ == elfcpp::EM_SPARC32PLUS)
> +      && ((off & 0x3c0000) == 0
> +	  || (off & 0x3c0000) == 0x3c0000))
> +    {
> +      // ba,pt %xcc, FUNC
> +      call_insn = 0x10680000 | (off & 0x07ffff);
> +    }
> +  else
> +    {
> +      // ba FUNC
> +      call_insn = 0x10800000 | (off & 0x3fffff);
> +    }
> +  elfcpp::Swap<32, true>::writeval(wv, call_insn);
> +
> +  // See if we can NOP out the delay slot instruction.  We peek
> +  // at the instruction before the call to make sure we're dealing
> +  // with exactly the:
> +  //
> +  // or %o7, %g0, %ANY_REG
> +  // call
> +  //  or %ANY_REG, %g0, %o7
> +  //
> +  // case.  Otherwise this might be a tricky piece of hand written
> +  // assembler calculating %o7 in some non-trivial way, and therefore
> +  // we can't be sure that NOP'ing out the delay slot is safe.
> +  if (op3 == 0x02
> +      && rela.get_r_offset() >= 4)
> +    {
> +      if ((delay_insn & ~(0x1f << 14)) != 0x9e100000)
> +	return;
> +
> +      set_insn = elfcpp::Swap<32, true>::readval(wv - 1);
> +      if ((set_insn & ~(0x1f << 25)) != 0x8013c000)
> +	return;
> +
> +      reg = (set_insn >> 25) & 0x1f;
> +      if (reg == 0 || reg == 15)
> +	return;
> +      if (reg != ((delay_insn >> 14) & 0x1f))
> +	return;
> +
> +      // All tests pass, nop it out.
> +      elfcpp::Swap<32, true>::writeval(wv + 1, sparc_nop);
> +    }
> +}
> +
>  // Relocate section data.
>  
>  template<int size, bool big_endian>



More information about the Binutils mailing list