This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] x86_64: memcpy/memmove family optimized with AVX512


On 01/12/2016 09:13 AM, Andrew Senkevich wrote:
> Hi,
> 
> here is AVX512 implementations of memcpy, mempcpy, memmove,
> memcpy_chk, mempcpy_chk, memmove_chk.
> It shows average improvement more than 30% over AVX versions on KNL
> hardware, performance results attached.
> Ok for trunk?
> 
> 2016-01-12  Andrew Senkevich  <andrew.senkevich@intel.com>
> 
>         * sysdeps/x86_64/multiarch/Makefile: Added new files.
>         * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests.
>         * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: New file.
>         * sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S: Likewise.
>         * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: Likewise.
>         * sysdeps/x86_64/multiarch/memcpy.S: Added new IFUNC branch.
>         * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
>         * sysdeps/x86_64/multiarch/memmove.c: Likewise.
>         * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
>         * sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
>         * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.

Looks good to me.

Thanks for the results. Yes, it looks like consistently ~30% over AVX.
My only thoughts are: How does this scale as you add more threads to
the process that try to use those functional units? Have you done any
scalability testing on these implementations?

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]