This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] x86-64: Optimize strchr/strchrnul/wcschr with AVX2
On Fri, Jun 2, 2017 at 12:49 PM, H.J. Lu <firstname.lastname@example.org> wrote:
> On Thu, Jun 1, 2017 at 11:13 AM, H.J. Lu <email@example.com> wrote:
>> Optimize strchr/strchrnul/wcschr with AVX2 to search 32 bytes with vector
>> instructions. It is as fast as SSE2 versions for size <= 16 bytes and up
>> to 1X faster for or size > 16 bytes on Haswell. Select AVX2 version on
>> AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast.
>> NB: It uses TZCNT instead of BSF since TZCNT produces the same result
>> as BSF for non-zero input. TZCNT is faster than BSF and is executed
>> as BSF if machine doesn't support TZCNT.
>> Any comments?
>> * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
>> strchr-avx2, strchrnul-avx2 and wcschr-avx2.
>> * sysdeps/x86_64/multiarch/ifunc-impl-list.c
>> (__libc_ifunc_impl_list): Add tests for __strchr_avx2,
>> __strchrnul_avx2, __strchrnul_sse2, __wcschr_avx2 and
>> * sysdeps/x86_64/multiarch/strchr-avx2.S: New file.
>> * sysdeps/x86_64/multiarch/strchrnul-avx2.S: Likewise.
>> * sysdeps/x86_64/multiarch/strchrnul.S: Likewise.
>> * sysdeps/x86_64/multiarch/wcschr-avx2.S: Likewise.
>> * sysdeps/x86_64/multiarch/wcschr.S: Likewise.
>> * sysdeps/x86_64/multiarch/strchr.S (strchr): Add support for
> Updated patch with IFUNC selector in C.
I will check it in.