This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] x86-64: Optimize strcmp/wcscmp with AVX2


On Tue, May 29, 2018 at 11:53 AM,
<leonardo.sandoval.gonzalez@linux.intel.com> wrote:
> From: Leonardo Sandoval <leonardo.sandoval.gonzalez@linux.intel.com>
>
> Optimize x86-64 strcmp/strncmp/wcscmp/wcsncmp with AVX2. It uses vector
> comparison as much as possible. Peak performance observed on a SkyLake
> machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp,
> respectively. The larger the comparison length, the more benefit using
> avx2 functions, except on the strcmp, where peak is observed at length
> == 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper
> is preferred and AVX unaligned load is fast.
>
> NB: It uses TZCNT instead of BSF since TZCNT produces the same result
> as BSF for non-zero input.  TZCNT is faster than BSF and is executed
> as BSF if machine doesn't support TZCNT.
>
>         * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
>         strcmp-avx2, strncmp-avx2, wcscmp-avx2, wcscmp-sse2, wcsncmp-avx2 and
>         wcsncmp-sse2.
>         * sysdeps/x86_64/multiarch/ifunc-impl-list.c
>         (__libc_ifunc_impl_list): Add tests for __strcmp_avx2,
>         __strncmp_avx2, __wcscmp_avx2, __wcsncmp_avx2, __wcscmp_sse2
>         and __wcsncmp_sse2.
>         * sysdeps/x86_64/multiarch/strcmp.c (OPTIMIZE (avx2)):
>         (IFUNC_SELECTOR): Return OPTIMIZE (avx2) on AVX 2 machines if
>         AVX unaligned load is fast and vzeroupper is preferred.
>         * sysdeps/x86_64/multiarch/strncmp.c: Likewise.
>         * sysdeps/x86_64/multiarch/strcmp-avx2.S: New file.
>         * sysdeps/x86_64/multiarch/strncmp-avx2.S: Likewise.
>         * sysdeps/x86_64/multiarch/wcscmp-avx2.S: Likewise.
>         * sysdeps/x86_64/multiarch/wcscmp-sse2.S: Likewise.
>         * sysdeps/x86_64/multiarch/wcscmp.c: Likewise.
>         * sysdeps/x86_64/multiarch/wcsncmp-avx2.S: Likewise.
>         * sysdeps/x86_64/multiarch/wcsncmp-sse2.c: Likewise.
>         * sysdeps/x86_64/multiarch/wcsncmp.c: Likewise.
>         * sysdeps/x86_64/wcscmp.S (__wcscmp): Add alias only if __wcscmp
>         is undefined.

Please mention strncmp and wcsncmp in commit subject.  OK with this
change.

Thanks.

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]