[PATCH 2/2] Loongarch: Add ifunc support and add different versions of strlen

Adhemerval Zanella Netto adhemerval.zanella@linaro.org
Wed Aug 2 12:59:42 GMT 2023



On 02/08/23 09:25, dengjianbo wrote:
>>>>
>>>> On 2023-08-02 10:31, Adhemerval Zanella Netto wrote:
>>>> +#if IS_IN (libc)
>>>> +# define STRLEN __strlen_aligned
>>>> +#else
>>>> +# define STRLEN strlen
>>>> +#endif
>>> Is this really an improvement over the generic implementation? It seems to
>>> use a quite similar strategy.
> Comparing with the code generated by compiler, the assembly code does an 16bytes loop
> unrolling, and handles ascii data and non-ascii data separately which could take less
> instructions to calculate the length of  ascii data. besides, the assembly code using
> fewer instructions to start the loop. I think the performance improvement benefits from
> this. Please kindly check bench result also from:
> https://github.com/jiadengx/glibc_test/blob/main/strlen/bench-strlen.out

>From the summarized results [1], it seems that the initial start to mask
off unaligned inputs are slight better.  The __strlen_aligned onl seems
better to sizes larger than 32 (the 16 lenght results seems strange).
Maybe you coult improve shift_find/find_zero_all/index_first on loongarch.

Does it improve by explicit instructing compiler to unroll the loop?

diff --git a/sysdeps/loongarch/Makefile b/sysdeps/loongarch/Makefile
index 43d2f583cd..d807a5e0d2 100644
--- a/sysdeps/loongarch/Makefile
+++ b/sysdeps/loongarch/Makefile
@@ -15,3 +15,7 @@ ASFLAGS-.os += $(pic-ccflag)
 ifeq (yes,$(have-cmodel-medium))
 CFLAGS-.oS += -mcmodel=medium
 endif
+
+ifeq ($(subdir),string)
+CFLAGS-strlen.c += -funroll-all-loops --param max-variable-expansions-in-unroller=2
+endif

[1] https://github.com/jiadengx/glibc_test/blob/main/strlen/strlen_align.png

>>> This implementation fails to assembler with binutils 2.40.0.20230525:
>>>
>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lasx.S:31: Error: no match insn: xvld        $xr0,$r4,0
>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lasx.S:33: Error: no match insn: xvmsknz.b   $xr0,$xr0
>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lasx.S:34: Error: no match insn: xvpickve.w  $xr1,$xr0,4
>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lasx.S:35: Error: no match insn: vilvl.h     $vr0,$vr1,$vr0
>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lasx.S:44: Error: no match insn: xvld        $xr0,$r4,32
>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lasx.S:46: Error: no match insn: xvsetanyeqz.b       $fcc0,$xr0
>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lasx.S:50: Error: no match insn: xvmsknz.b   $xr0,$xr0
>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lasx.S:52: Error: no match insn: xvpickve.w  $xr1,$xr0,4
>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lasx.S:53: Error: no match insn: vilvl.h     $vr0,$vr1,$vr0
>>>
>>> You need to either add a configure option to increase the minimum required
>>> binutils or add a macro to synthetize the instruction on older binutils
>>> (similar to what sysdeps/powerpc/powerpc64/le/power9/strncmp.S does).
> Configuration variable loongarch_vec_asm has been added in patch v2, when doing the configuration,
> it will check if the assembler supports LSX/LASX and decides whether strlen LSX/LASX code get compiled.
> 



> diff --git a/sysdeps/loongarch/configure.ac b/sysdeps/loongarch/configure.ac
> index 39efccfd8f..9fadf7bb9d 100644
> --- a/sysdeps/loongarch/configure.ac
> +++ b/sysdeps/loongarch/configure.ac
> @@ -74,6 +74,8 @@ else
>    libc_cv_loongarch_vec_asm=no
>  fi
>  rm -f conftest*])
> +LIBC_CONFIG_VAR([loongarch_vec_asm], [$libc_cv_loongarch_vec_asm])
> +
>  if test $libc_cv_loongarch_vec_asm = yes; then
>    AC_DEFINE(HAVE_LOONGARCH_VEC_ASM)
>  fi
> diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile
> new file mode 100644
> index 0000000000..73b7f61969
> --- /dev/null
> +++ b/sysdeps/loongarch/lp64/multiarch/Makefile
> @@ -0,0 +1,11 @@
> +ifeq ($(subdir),string)
> +sysdep_routines += strlen-aligned \
> +	# sysdep_routines
> +
> +ifeq ($(loongarch_vec_asm), yes)
> +sysdep_routines += strlen-lsx \
> +	strlen-lasx \
> +	# sysdep_routines
> +endif
> +
> +endif
> 
> Detailed info can be find from:
> https://sourceware.org/pipermail/libc-alpha/2023-August/150566.html
>>> This implementation fails to assembler with binutils 2.40.0.20230525:
>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S: Assembler messages:
>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S:30: Error: no match insn: vld  $vr0,$r4,0
>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S:31: Error: no match insn: vld  $vr1,$r4,16
>>>
> Sorry, it's my mistake for the wrong version of binutils. Could you please try the latest release
> version 2.41?

Although it should work, it is unexpected that depending of the assembler used
some optimized routines are not enabled. 

> 
> Following issues is  also fixed in the patch v2:
> 1. Missing one line comment.
> 2. I think it should be only 2023 here and for other new file as well. (Copyright)


More information about the Libc-alpha mailing list