[PATCH 3/3] Loongarch: Add ifunc support for strncmp{aligned, lsx}
dengjianbo
dengjianbo@loongson.cn
Wed Aug 23 07:25:27 GMT 2023
On 2023-08-22 19:23, Xi Ruoyao wrote:
> On Tue, 2023-08-22 at 19:13 +0800, Xi Ruoyao via Libc-alpha wrote:
>> On Tue, 2023-08-22 at 14:37 +0800, dengjianbo wrote:
>>
>>> Putting the data here is due to the performance. When the vld
>>> instruction is executed, the data will be in the cache, it can
>>> speed up the data loading.
>> AFAIK LoongArch CPUs have separate icache and dcache like all modern
>> CPUs, so this is not valid to me.
> And even if it can really improve the performance, this is not on the
> hot path of the algorithm so we should not use bizarre optimizations
> here for marginal improvement.
>
> -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University
Thanks for your suggestion. We have changed strcmp and strncmp to put
the data in the rodata section with mergeable flags, and also use pcalau12i
and %pc_lo12 with the vld to get the data.
diff --git a/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S b/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S
index 595472fcda..0b4eee2a98 100644
--- a/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S
+++ b/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S
@@ -25,15 +25,11 @@
# define STRNCMP __strncmp_lsx
-L(magic_num):
- .align 6
- .dword 0x0706050403020100
- .dword 0x0f0e0d0c0b0a0908
-ENTRY_NO_ALIGN(STRNCMP)
+LEAF(STRNCMP, 6)
beqz a2, L(ret0)
- pcaddi t0, -5
+ pcalau12i t0, %pc_hi20(L(INDEX))
andi a3, a0, 0xf
- vld vr2, t0, 0
+ vld vr2, t0, %pc_lo12(L(INDEX))
andi a4, a1, 0xf
li.d t2, 16
@@ -202,5 +198,11 @@ L(ret0):
jr ra
END(STRNCMP)
+ .section .rodata.cst16,"M",@progbits,16
+ .align 4
+L(INDEX):
+ .dword 0x0706050403020100
+ .dword 0x0f0e0d0c0b0a0908
+
libc_hidden_builtin_def (STRNCMP)
#endif
More information about the Libc-alpha
mailing list