This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] x86-64: Add wmemset optimized with SSE2/AVX2/AVX512


On Fri, Jun 2, 2017 at 12:45 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, May 31, 2017 at 4:29 AM, Rodriguez Bahena, Victor
> <victor.rodriguez.bahena@intel.com> wrote:
>> +1
>>
>>
>>
>>
>> -----Original Message-----
>> From: <libc-alpha-owner@sourceware.org> on behalf of "H.J. Lu"
>> <hjl.tools@gmail.com>
>> Date: Tuesday, May 30, 2017 at 6:41 PM
>> To: GNU C Library <libc-alpha@sourceware.org>
>> Subject: Re: [PATCH] x86-64: Add wmemset optimized with SSE2/AVX2/AVX512
>>
>>>On Sun, May 21, 2017 at 1:34 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> The difference between memset and wmemset is byte vs int.  Add stubs
>>>> to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size:
>>>>
>>>> SSE2 wmemset:
>>>>         shl    $0x2,%rdx
>>>>         movd   %esi,%xmm0
>>>>         mov    %rdi,%rax
>>>>         pshufd $0x0,%xmm0,%xmm0
>>>>         jmp     entry_from_wmemset
>>>>
>>>> SSE2 memset:
>>>>         movd   %esi,%xmm0
>>>>         mov    %rdi,%rax
>>>>         punpcklbw %xmm0,%xmm0
>>>>         punpcklwd %xmm0,%xmm0
>>>>         pshufd $0x0,%xmm0,%xmm0
>>>> entry_from_wmemset:
>>>>
>>>> Since the ERMS versions of wmemset requires "rep stosl" instead of
>>>> "rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset
>>>> are added.  The SSE2 wmemset is about 3X faster and the AVX2 wmemset
>>>> is about 6X faster on Haswell.
>>>>
>>>> OK for master?
>>>
>>>Any objections?
>>>
>>>> H.J.
>>>> ---
>>>>         * include/wchar.h (__wmemset_chk): New.
>>>>         * sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed
>>>>         to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN.
>>>>         (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
>>>>         (WMEMSET_CHK_SYMBOL): Likewise.
>>>>         (WMEMSET_SYMBOL): Likewise.
>>>>         (__wmemset): Add hidden definition.
>>>>         (wmemset): Add weak hidden definition.
>>>>         * sysdeps/x86_64/multiarch/ifunc-impl-list.c
>>>>         (__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned,
>>>>         __wmemset_avx2_unaligned, __wmemset_avx512_unaligned,
>>>>         __wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned
>>>>         and __wmemset_chk_avx512_unaligned.
>>>>         * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
>>>>         (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
>>>>         (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
>>>>         (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
>>>>         (WMEMSET_SYMBOL): Likewise.
>>>>         * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
>>>>         (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
>>>>         (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
>>>>         (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
>>>>         (WMEMSET_SYMBOL): Likewise.
>>>>         * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated.
>>>>         (WMEMSET_CHK_SYMBOL): New.
>>>>         (WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise.
>>>>         (WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise.
>>>>         * sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New.
>>>>         (libc_hidden_builtin_def): Also define __GI_wmemset and
>>>>         __GI___wmemset.
>>>>         (weak_alias): New.
>>>>         * sysdeps/x86_64/multiarch/wmemset.S: New file.
>>>>         * sysdeps/x86_64/multiarch/wmemset_chk.S: Likewise.
>>>>         * sysdeps/x86_64/wmemset.S: Likewise.
>>>>         * sysdeps/x86_64/wmemset_chk.S: Likewise.
>
> Here is the updated patch to implement IFUNC wmemset in C.
>
>

I will check it in today.


-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]