PowerPC64: why do we need .branch_lt for long branch thunks
Alan Modra
amodra@gmail.com
Mon Jan 27 10:53:00 GMT 2020
On Sun, Jan 26, 2020 at 04:42:30PM -0800, Fangrui Song wrote:
>
> I noticed that PowerPC32 uses
>
> lis 12, 512 # @ha
> addi 12, 12, 8200 # @l
> mtctr 12
> bctr
>
> for -no-pie long branch thunks (jumping to a non-preemptible symbol with a
> distance>=0x2000000), and
>
> mflr 0
> bcl 20, 31, .+4
> mflr 12
> addis 12, 12, 512 # @ha
> addi 12, 12, -24 # @l
> mtlr 0
> mtctr 12
> bctr
>
> for -pie/-shared long branch thunks. On PowerPC64, why do we use the
> "load an address from .branch_lt and jump" approach? Can we do something
> similar to PowerPC32 and avoid .branch_lt?
We use .branch_lt because it is faster than the multiple instruction
sequence with register dependencies between each insn, needed to
calculate a 64-bit address. For PIE it's quite a lot better.
> I think a pair of @ha and @l is sufficient for most use cases. It the
> offset is beyond [-0x80008000,0x7fff8000), we can add a @highera.
> (I don't know when @highesta will be needed but it is straightforward to
> add it.)
>
> (https://gcc.gnu.org/onlinedocs/gcc/RS_002f6000-and-PowerPC-Options.html
> does not say how large the text segment can be in the medium code model.)
At a minimum, 2G. The limit is determined by the lowest address
object that you might access relative to the TOC pointer. So not
function code but strings or suchlike typically in .rodata.
> I am not sure whether loading an address from .branch_lt can be faster
> than @highesta+@highera+@ha+@l, but a long branch thunk should never be
> in a performance critical code path.
--
Alan Modra
Australia Development Lab, IBM
More information about the Binutils
mailing list