This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] x86-64: Optimize strrchr/wcsrchr with AVX2
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Fri, 9 Jun 2017 04:35:08 -0700
- Subject: Re: [PATCH] x86-64: Optimize strrchr/wcsrchr with AVX2
- Authentication-results: sourceware.org; auth=none
- References: <20170601181401.GD28627@lucon.org> <34399498-351f-2d58-b7b2-8119dcd0f142@linaro.org> <CAMe9rOp41cWpTS__bHrTFdWbnpfim-4G7qGM5U7SCkMZ=O4CYQ@mail.gmail.com>
On Fri, Jun 2, 2017 at 12:52 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Jun 1, 2017 at 11:42 AM, Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>> On 01/06/2017 15:14, H.J. Lu wrote:
>>> Optimize strrchr/wcsrchr with AVX2 to check 32 bytes with vector
>>> instructions. It is as fast as SSE2 version for small data sizes
>>> and up to 1X faster for large data sizes on Haswell. Select AVX2
>>> version on AVX2 machines where vzeroupper is preferred and AVX
>>> unaligned load is fast.
>>>
>>> Any comments?
>>>
>>> H.J.
>>> --
>>> * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
>>> strrchr-avx2 and wcsrchr-avx2.
>>> * sysdeps/x86_64/multiarch/ifunc-impl-list.c
>>> (__libc_ifunc_impl_list): Add tests for __strrchr_avx2,
>>> __strrchr_sse2, __wcsrchr_avx2 and __wcsrchr_sse2.
>>> * sysdeps/x86_64/multiarch/strrchr-avx2.S: New file.
>>> * sysdeps/x86_64/multiarch/strrchr.S: Likewise.
>>
>> I think this could be an opportunity to avoid adding more assembly ifunc
>> written in assembly and use C implementation instead.
>
> Good idea. Here is the updated patch.
>
> Thanks.
>
I will check it in.
--
H.J.