Re: [PATCH] x86-64: Optimize strrchr/wcsrchr with AVX2

On Fri, Jun 2, 2017 at 12:52 PM, H.J. Lu <> wrote:
> On Thu, Jun 1, 2017 at 11:42 AM, Adhemerval Zanella
> <> wrote:
>> On 01/06/2017 15:14, H.J. Lu wrote:
>>> Optimize strrchr/wcsrchr with AVX2 to check 32 bytes with vector
>>> instructions.  It is as fast as SSE2 version for small data sizes
>>> and up to 1X faster for large data sizes on Haswell.  Select AVX2
>>> version on AVX2 machines where vzeroupper is preferred and AVX
>>> unaligned load is fast.
>>> Any comments?
>>> H.J.
>>> --
>>>       * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
>>>       strrchr-avx2 and wcsrchr-avx2.
>>>       * sysdeps/x86_64/multiarch/ifunc-impl-list.c
>>>       (__libc_ifunc_impl_list): Add tests for __strrchr_avx2,
>>>       __strrchr_sse2, __wcsrchr_avx2 and __wcsrchr_sse2.
>>>       * sysdeps/x86_64/multiarch/strrchr-avx2.S: New file.
>>>       * sysdeps/x86_64/multiarch/strrchr.S: Likewise.
>> I think this could be an opportunity to avoid adding more assembly ifunc
>> written in assembly and use C implementation instead.
> Good idea.  Here is the updated patch.
> Thanks.

I will check it in.


