[PATCH v1 0/4] LoongArch: Add support for TLS Descriptors (TLSDESC)

Alexandre Oliva oliva@adacore.com
Fri Dec 1 16:14:11 GMT 2023


Hello,

On Dec  1, 2023, Lulu Cai <cailulu@loongson.cn> wrote:

> The LoongArch TLS Descriptors implementation contains several points:

I'm excited to see another platform gain TLS Descriptors support.

I'm not deeply acquainted with LoongArch, but I'll dare chime in.

> 1. The instruction sequences is:
>    pcalau12i  $a0,%desc_pc_hi20(var)		#R_LARCH_TLS_DESC_PC_HI20
>    ld.d       $a1,$a0,%desc_ld_pc_lo12(var)	#R_LARCH_TLS_DESC_LD_PC_LO12
>    addi.d     $a0,$a0,%desc_add_pc_lo12(var)	#R_LARCH_TLS_DESC_ADD_PC_LO12
>    jirl       $ra,$a1,%desc_call(var)		#R_LARCH_TLS_DESC_CALL

Are these instructions fixed, and supposed to appear in this sequence,
or can different registers be used, and the instructions intermixed with
other unrelated ones?  The ability to intermix them for better
scheduling and register allocation was one of the guiding design
principles of TLS Descriptors, so the canonical sequence and the design
of relaxations should ideally take flexibility into account, and choose
relaxations with similar scheduling profiles.

Say, would compiler-generated or hand-coded asm still work if one used:

     pcalau12i  $a2,%desc_pc_hi20(var)		#R_LARCH_TLS_DESC_PC_HI20
     ld.d       $a3,$a2,%desc_ld_pc_lo12(var)	#R_LARCH_TLS_DESC_LD_PC_LO12
     addi.d     $a0,$a2,%desc_add_pc_lo12(var)	#R_LARCH_TLS_DESC_ADD_PC_LO12
     jirl       $ra,$a3,%desc_call(var)		#R_LARCH_TLS_DESC_CALL

or even

     pcalau12i  $a2,%desc_pc_hi20(var)		#R_LARCH_TLS_DESC_PC_HI20
     or         $a5,$a2
     or         $a6,$a2
     addi.d     $a4,$a5,%desc_add_pc_lo12(var)	#R_LARCH_TLS_DESC_ADD_PC_LO12
     ld.d       $a3,$a6,%desc_ld_pc_lo12(var)	#R_LARCH_TLS_DESC_LD_PC_LO12
     or         $a0,$a4,$r0
     jirl       $ra,$a3,%desc_call(var)		#R_LARCH_TLS_DESC_CALL

?

(I realize you seem to have not planned/implemented relaxations, aside
from the LE one for static linking, but planning for them ahead of time
about them helps make sure they're doable)

E.g., for IE, I'd suggest turning the latter sequence into (I'm making
up relocation names):

     pcalau12i  $a2,%gotpc_tlsoff_hi20(var)
     or         $a5,$a2,$r0 #not necessary, but not marked, so unchanged
     or         $a6,$a2,$r0
     nop
     ld.d       $a3,$a6,%gotpc_tlsoff_lo12(var)
     or         $a0,$a4,$r0 #not necessary, but not marked, so unchanged
     or         $a0,$a3,$r0

and or LE, I'd suggest:

     pcalau12i  $a2,%tlsoff_hi20(var)
     or         $a5,$a2,$r0
     or         $a6,$a2,$r0 #not necessary, but not marked, so unchanged
     addi.d     $a4,$a5,%tlsoff_lo12(var)
     nop
     or         $a0,$a4,$r0 #not necessary, but not marked, so unchanged
     nop

This addi.d is what I suggest instead of the 'ori' in the LE relaxation.
The main difference in my suggestion is that it takes the same position
of the original addi instruction, thus the very same scheduling profile,
and more importantly participating the same way in the data flow, as the
extra moves help see.

I realize that addi rather than ori may require offsetting the base
address to account for the signed rather than unsigned (I suppose)
immediate, so maybe it's not worth it.  I am not sure, however, whether
you can even separate the pcalau12i hi20 instruction from the subsequent
lo12 one (ISTM that it would be challenging to match them if so,
especially if a single hi20 is reused by multiple lo12 loads), so maybe
there is less flexibility to be exploited than I'm making out.

Anyway, I hope this makes sense and that it helps,

-- 
Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
   Free Software Activist                   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


More information about the Binutils mailing list