[PATCH v1 0/4] LoongArch: Add support for TLS Descriptors (TLSDESC)

mengqinggang mengqinggang@loongson.cn
Sat Dec 2 03:54:26 GMT 2023


Thank you very much for your suggestions.


在 2023/12/2 上午12:14, Alexandre Oliva 写道:
> Hello,
>
> On Dec  1, 2023, Lulu Cai <cailulu@loongson.cn> wrote:
>
>> The LoongArch TLS Descriptors implementation contains several points:
> I'm excited to see another platform gain TLS Descriptors support.
>
> I'm not deeply acquainted with LoongArch, but I'll dare chime in.
>
>> 1. The instruction sequences is:
>>     pcalau12i  $a0,%desc_pc_hi20(var)		#R_LARCH_TLS_DESC_PC_HI20
>>     ld.d       $a1,$a0,%desc_ld_pc_lo12(var)	#R_LARCH_TLS_DESC_LD_PC_LO12
>>     addi.d     $a0,$a0,%desc_add_pc_lo12(var)	#R_LARCH_TLS_DESC_ADD_PC_LO12
>>     jirl       $ra,$a1,%desc_call(var)		#R_LARCH_TLS_DESC_CALL
> Are these instructions fixed, and supposed to appear in this sequence,
> or can different registers be used, and the instructions intermixed with
> other unrelated ones?  The ability to intermix them for better
> scheduling and register allocation was one of the guiding design
> principles of TLS Descriptors, so the canonical sequence and the design
> of relaxations should ideally take flexibility into account, and choose
> relaxations with similar scheduling profiles.
>
> Say, would compiler-generated or hand-coded asm still work if one used:
>
>       pcalau12i  $a2,%desc_pc_hi20(var)		#R_LARCH_TLS_DESC_PC_HI20
>       ld.d       $a3,$a2,%desc_ld_pc_lo12(var)	#R_LARCH_TLS_DESC_LD_PC_LO12
>       addi.d     $a0,$a2,%desc_add_pc_lo12(var)	#R_LARCH_TLS_DESC_ADD_PC_LO12
>       jirl       $ra,$a3,%desc_call(var)		#R_LARCH_TLS_DESC_CALL
>
> or even
>
>       pcalau12i  $a2,%desc_pc_hi20(var)		#R_LARCH_TLS_DESC_PC_HI20
>       or         $a5,$a2
>       or         $a6,$a2
>       addi.d     $a4,$a5,%desc_add_pc_lo12(var)	#R_LARCH_TLS_DESC_ADD_PC_LO12
>       ld.d       $a3,$a6,%desc_ld_pc_lo12(var)	#R_LARCH_TLS_DESC_LD_PC_LO12
>       or         $a0,$a4,$r0
>       jirl       $ra,$a3,%desc_call(var)		#R_LARCH_TLS_DESC_CALL
>
> ?


I do a test, these two sequences still work.
But in this version patch, TLS descriptors instructions sequences expand 
for la.tls.desc
and fixed registers and instructions are used.


> (I realize you seem to have not planned/implemented relaxations, aside
> from the LE one for static linking, but planning for them ahead of time
> about them helps make sure they're doable)


We will support relax to IE in the future.
Because glibc can only resolve R_XXX_IRELATIVE relocation in static 
linking,
we relax DESC to LE to avoid generating R_LARCH_TLS_DESC relocation.


> E.g., for IE, I'd suggest turning the latter sequence into (I'm making
> up relocation names):
>
>       pcalau12i  $a2,%gotpc_tlsoff_hi20(var)
>       or         $a5,$a2,$r0 #not necessary, but not marked, so unchanged
>       or         $a6,$a2,$r0
>       nop
>       ld.d       $a3,$a6,%gotpc_tlsoff_lo12(var)
>       or         $a0,$a4,$r0 #not necessary, but not marked, so unchanged
>       or         $a0,$a3,$r0
>
> and or LE, I'd suggest:
>
>       pcalau12i  $a2,%tlsoff_hi20(var)
>       or         $a5,$a2,$r0
>       or         $a6,$a2,$r0 #not necessary, but not marked, so unchanged
>       addi.d     $a4,$a5,%tlsoff_lo12(var)
>       nop
>       or         $a0,$a4,$r0 #not necessary, but not marked, so unchanged
>       nop
>
> This addi.d is what I suggest instead of the 'ori' in the LE relaxation.
> The main difference in my suggestion is that it takes the same position
> of the original addi instruction, thus the very same scheduling profile,
> and more importantly participating the same way in the data flow, as the
> extra moves help see.

We will add  a new relocation for addi.d, the related patch is here:
https://sourceware.org/pipermail/binutils/2023-December/130921.html

>
> I realize that addi rather than ori may require offsetting the base
> address to account for the signed rather than unsigned (I suppose)
> immediate, so maybe it's not worth it.  I am not sure, however, whether
> you can even separate the pcalau12i hi20 instruction from the subsequent
> lo12 one (ISTM that it would be challenging to match them if so,
> especially if a single hi20 is reused by multiple lo12 loads), so maybe
> there is less flexibility to be exploited than I'm making out.
> Anyway, I hope this makes sense and that it helps,
>



More information about the Binutils mailing list