PowerPC64: why do we need .branch_lt for long branch thunks

Mon Jan 27 10:53:00 GMT 2020

On Sun, Jan 26, 2020 at 04:42:30PM -0800, Fangrui Song wrote:
> 
> I noticed that PowerPC32 uses
> 
>   lis 12, 512       # @ha
>   addi 12, 12, 8200 # @l
>   mtctr 12
>   bctr
> 
> for -no-pie long branch thunks (jumping to a non-preemptible symbol with a
> distance>=0x2000000), and
> 
>   mflr 0
>   bcl 20, 31, .+4
>   mflr 12
>   addis 12, 12, 512 # @ha
>   addi 12, 12, -24  # @l
>   mtlr 0
>   mtctr 12
>   bctr
> 
> for -pie/-shared long branch thunks. On PowerPC64, why do we use the
> "load an address from .branch_lt and jump" approach? Can we do something
> similar to PowerPC32 and avoid .branch_lt?

We use .branch_lt because it is faster than the multiple instruction
sequence with register dependencies between each insn, needed to
calculate a 64-bit address.  For PIE it's quite a lot better.

> I think a pair of @ha and @l is sufficient for most use cases. It the
> offset is beyond [-0x80008000,0x7fff8000), we can add a @highera.
> (I don't know when @highesta will be needed but it is straightforward to
> add it.)
> 
> (https://gcc.gnu.org/onlinedocs/gcc/RS_002f6000-and-PowerPC-Options.html
>  does not say how large the text segment can be in the medium code model.)

At a minimum, 2G.  The limit is determined by the lowest address
object that you might access relative to the TOC pointer.  So not
function code but strings or suchlike typically in .rodata.

> I am not sure whether loading an address from .branch_lt can be faster
> than @highesta+@highera+@ha+@l, but a long branch thunk should never be
> in a performance critical code path.

-- 
Alan Modra
Australia Development Lab, IBM