This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] x86-64: Optimize strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2
- From: Leonardo Sandoval <leonardo dot sandoval dot gonzalez at linux dot intel dot com>
- To: libc-alpha at sourceware dot org
- Cc: hjl dot tools at gmail dot com
- Date: Wed, 31 Oct 2018 10:29:46 -0600
- Subject: Re: [PATCH] x86-64: Optimize strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2
- References: <20181008135950.9113-1-leonardo.sandoval.gonzalez@linux.intel.com>
ping. OK for master?
On Mon, 2018-10-08 at 08:59 -0500,
leonardo.sandoval.gonzalez@linux.intel.com wrote:
> From: Leonardo Sandoval <leonardo.sandoval.gonzalez@linux.intel.com>
>
> Optimize x86-64 strcat/strncat, strcpy/strncpy and stpcpy/stpncpy
> with AVX2.
> It uses vector comparison as much as possible. In general, the larger
> the
> source string, the greater performance gain observed, reaching
> speedups of
> 1.6x compared to SSE2 unaligned routines. Select AVX2 strcat/strncat,
> strcpy/strncpy and stpcpy/stpncpy on AVX2 machines where vzeroupper
> is
> preferred and AVX unaligned load is fast.
>
> * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
> strcat-avx2, strncat-avx2, strcpy-avx2, strncpy-avx2,
> stpcpy-avx2 and stpncpy-avx2.
> * sysdeps/x86_64/multiarch/ifunc-impl-list.c:
> (__libc_ifunc_impl_list): Add tests for __strcat_avx2,
> __strncat_avx2, __strcpy_avx2, __strncpy_avx2, __stpcpy_avx2
> and __stpncpy_avx2.
> * sysdeps/x86_64/multiarch/{ifunc-unaligned-ssse3.h =>
> ifunc-strcpy.h}: rename header for a more generic name.
> * sysdeps/x86_64/multiarch/ifunc-strcpy.h:
> (IFUNC_SELECTOR): Return OPTIMIZE (avx2) on AVX 2 machines if
> AVX unaligned load is fast and vzeroupper is preferred.
> * sysdeps/x86_64/multiarch/stpcpy-avx2.S: New file
> * sysdeps/x86_64/multiarch/stpncpy-avx2.S: Likewise
> * sysdeps/x86_64/multiarch/strcat-avx2.S: Likewise
> * sysdeps/x86_64/multiarch/strcpy-avx2.S: Likewise
> * sysdeps/x86_64/multiarch/strncat-avx2.S: Likewise
> * sysdeps/x86_64/multiarch/strncpy-avx2.S: Likewise
> ---
> sysdeps/x86_64/multiarch/Makefile | 3 +
> sysdeps/x86_64/multiarch/ifunc-impl-list.c | 12 +
> ...ifunc-unaligned-ssse3.h => ifunc-strcpy.h} | 6 +
> sysdeps/x86_64/multiarch/stpcpy-avx2.S | 3 +
> sysdeps/x86_64/multiarch/stpcpy.c | 2 +-
> sysdeps/x86_64/multiarch/stpncpy-avx2.S | 4 +
> sysdeps/x86_64/multiarch/stpncpy.c | 2 +-
> sysdeps/x86_64/multiarch/strcat-avx2.S | 275 +++++
> sysdeps/x86_64/multiarch/strcat.c | 2 +-
> sysdeps/x86_64/multiarch/strcpy-avx2.S | 1022
> +++++++++++++++++
> sysdeps/x86_64/multiarch/strcpy.c | 2 +-
> sysdeps/x86_64/multiarch/strncat-avx2.S | 3 +
> sysdeps/x86_64/multiarch/strncat.c | 2 +-
> sysdeps/x86_64/multiarch/strncpy-avx2.S | 3 +
> sysdeps/x86_64/multiarch/strncpy.c | 2 +-
> 15 files changed, 1337 insertions(+), 6 deletions(-)
> rename sysdeps/x86_64/multiarch/{ifunc-unaligned-ssse3.h => ifunc-
> strcpy.h} (83%)
> create mode 100644 sysdeps/x86_64/multiarch/stpcpy-avx2.S
> create mode 100644 sysdeps/x86_64/multiarch/stpncpy-avx2.S
> create mode 100644 sysdeps/x86_64/multiarch/strcat-avx2.S
> create mode 100644 sysdeps/x86_64/multiarch/strcpy-avx2.S
> create mode 100644 sysdeps/x86_64/multiarch/strncat-avx2.S
> create mode 100644 sysdeps/x86_64/multiarch/strncpy-avx2.S
>