This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] x86-64: Optimize strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2


ping. OK for master?


On Mon, 2018-10-08 at 08:59 -0500,
leonardo.sandoval.gonzalez@linux.intel.com wrote:
> From: Leonardo Sandoval <leonardo.sandoval.gonzalez@linux.intel.com>
> 
> Optimize x86-64 strcat/strncat, strcpy/strncpy and stpcpy/stpncpy
> with AVX2.
> It uses vector comparison as much as possible. In general, the larger
> the
> source string, the greater performance gain observed, reaching
> speedups of
> 1.6x compared to SSE2 unaligned routines. Select AVX2 strcat/strncat,
> strcpy/strncpy and stpcpy/stpncpy on AVX2 machines where vzeroupper
> is
> preferred and AVX unaligned load is fast.
> 
> 	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
> 	strcat-avx2, strncat-avx2, strcpy-avx2, strncpy-avx2,
> 	stpcpy-avx2 and stpncpy-avx2.
> 	* sysdeps/x86_64/multiarch/ifunc-impl-list.c:
> 	(__libc_ifunc_impl_list): Add tests for __strcat_avx2,
> 	__strncat_avx2, __strcpy_avx2, __strncpy_avx2, __stpcpy_avx2
> 	and __stpncpy_avx2.
> 	* sysdeps/x86_64/multiarch/{ifunc-unaligned-ssse3.h =>
> 	ifunc-strcpy.h}: rename header for a more generic name.
> 	* sysdeps/x86_64/multiarch/ifunc-strcpy.h:
> 	(IFUNC_SELECTOR): Return OPTIMIZE (avx2) on AVX 2 machines if
> 	AVX unaligned load is fast and vzeroupper is preferred.
> 	* sysdeps/x86_64/multiarch/stpcpy-avx2.S: New file
> 	* sysdeps/x86_64/multiarch/stpncpy-avx2.S: Likewise
> 	* sysdeps/x86_64/multiarch/strcat-avx2.S: Likewise
> 	* sysdeps/x86_64/multiarch/strcpy-avx2.S: Likewise
> 	* sysdeps/x86_64/multiarch/strncat-avx2.S: Likewise
> 	* sysdeps/x86_64/multiarch/strncpy-avx2.S: Likewise
> ---
>  sysdeps/x86_64/multiarch/Makefile             |    3 +
>  sysdeps/x86_64/multiarch/ifunc-impl-list.c    |   12 +
>  ...ifunc-unaligned-ssse3.h => ifunc-strcpy.h} |    6 +
>  sysdeps/x86_64/multiarch/stpcpy-avx2.S        |    3 +
>  sysdeps/x86_64/multiarch/stpcpy.c             |    2 +-
>  sysdeps/x86_64/multiarch/stpncpy-avx2.S       |    4 +
>  sysdeps/x86_64/multiarch/stpncpy.c            |    2 +-
>  sysdeps/x86_64/multiarch/strcat-avx2.S        |  275 +++++
>  sysdeps/x86_64/multiarch/strcat.c             |    2 +-
>  sysdeps/x86_64/multiarch/strcpy-avx2.S        | 1022
> +++++++++++++++++
>  sysdeps/x86_64/multiarch/strcpy.c             |    2 +-
>  sysdeps/x86_64/multiarch/strncat-avx2.S       |    3 +
>  sysdeps/x86_64/multiarch/strncat.c            |    2 +-
>  sysdeps/x86_64/multiarch/strncpy-avx2.S       |    3 +
>  sysdeps/x86_64/multiarch/strncpy.c            |    2 +-
>  15 files changed, 1337 insertions(+), 6 deletions(-)
>  rename sysdeps/x86_64/multiarch/{ifunc-unaligned-ssse3.h => ifunc-
> strcpy.h} (83%)
>  create mode 100644 sysdeps/x86_64/multiarch/stpcpy-avx2.S
>  create mode 100644 sysdeps/x86_64/multiarch/stpncpy-avx2.S
>  create mode 100644 sysdeps/x86_64/multiarch/strcat-avx2.S
>  create mode 100644 sysdeps/x86_64/multiarch/strcpy-avx2.S
>  create mode 100644 sysdeps/x86_64/multiarch/strncat-avx2.S
>  create mode 100644 sysdeps/x86_64/multiarch/strncpy-avx2.S
> 




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]