This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2] x86-64: Optimize strcmp/wcscmp with AVX2
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Leonardo Sandoval <leonardo dot sandoval dot gonzalez at linux dot intel dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Fri, 1 Jun 2018 07:46:41 -0700
- Subject: Re: [PATCH v2] x86-64: Optimize strcmp/wcscmp with AVX2
- References: <20180529185339.11541-1-leonardo.sandoval.gonzalez@linux.intel.com>
On Tue, May 29, 2018 at 11:53 AM,
<leonardo.sandoval.gonzalez@linux.intel.com> wrote:
> From: Leonardo Sandoval <leonardo.sandoval.gonzalez@linux.intel.com>
>
> Optimize x86-64 strcmp/strncmp/wcscmp/wcsncmp with AVX2. It uses vector
> comparison as much as possible. Peak performance observed on a SkyLake
> machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp,
> respectively. The larger the comparison length, the more benefit using
> avx2 functions, except on the strcmp, where peak is observed at length
> == 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper
> is preferred and AVX unaligned load is fast.
>
> NB: It uses TZCNT instead of BSF since TZCNT produces the same result
> as BSF for non-zero input. TZCNT is faster than BSF and is executed
> as BSF if machine doesn't support TZCNT.
>
> * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
> strcmp-avx2, strncmp-avx2, wcscmp-avx2, wcscmp-sse2, wcsncmp-avx2 and
> wcsncmp-sse2.
> * sysdeps/x86_64/multiarch/ifunc-impl-list.c
> (__libc_ifunc_impl_list): Add tests for __strcmp_avx2,
> __strncmp_avx2, __wcscmp_avx2, __wcsncmp_avx2, __wcscmp_sse2
> and __wcsncmp_sse2.
> * sysdeps/x86_64/multiarch/strcmp.c (OPTIMIZE (avx2)):
> (IFUNC_SELECTOR): Return OPTIMIZE (avx2) on AVX 2 machines if
> AVX unaligned load is fast and vzeroupper is preferred.
> * sysdeps/x86_64/multiarch/strncmp.c: Likewise.
> * sysdeps/x86_64/multiarch/strcmp-avx2.S: New file.
> * sysdeps/x86_64/multiarch/strncmp-avx2.S: Likewise.
> * sysdeps/x86_64/multiarch/wcscmp-avx2.S: Likewise.
> * sysdeps/x86_64/multiarch/wcscmp-sse2.S: Likewise.
> * sysdeps/x86_64/multiarch/wcscmp.c: Likewise.
> * sysdeps/x86_64/multiarch/wcsncmp-avx2.S: Likewise.
> * sysdeps/x86_64/multiarch/wcsncmp-sse2.c: Likewise.
> * sysdeps/x86_64/multiarch/wcsncmp.c: Likewise.
> * sysdeps/x86_64/wcscmp.S (__wcscmp): Add alias only if __wcscmp
> is undefined.
Please mention strncmp and wcsncmp in commit subject. OK with this
change.
Thanks.
--
H.J.