This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] x86_64: memcpy/memmove family optimized with AVX512
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: Andrew Senkevich <andrew dot n dot senkevich at gmail dot com>, libc-alpha <libc-alpha at sourceware dot org>
- Date: Wed, 13 Jan 2016 13:20:10 -0500
- Subject: Re: [PATCH] x86_64: memcpy/memmove family optimized with AVX512
- Authentication-results: sourceware.org; auth=none
- References: <CAMXFM3uGLiFE+pKPzFgWP6Sx4C3w2Ktd4w3+35O0Bj=B1s0naA at mail dot gmail dot com>
On 01/12/2016 09:13 AM, Andrew Senkevich wrote:
> Hi,
>
> here is AVX512 implementations of memcpy, mempcpy, memmove,
> memcpy_chk, mempcpy_chk, memmove_chk.
> It shows average improvement more than 30% over AVX versions on KNL
> hardware, performance results attached.
> Ok for trunk?
>
> 2016-01-12 Andrew Senkevich <andrew.senkevich@intel.com>
>
> * sysdeps/x86_64/multiarch/Makefile: Added new files.
> * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests.
> * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: New file.
> * sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S: Likewise.
> * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: Likewise.
> * sysdeps/x86_64/multiarch/memcpy.S: Added new IFUNC branch.
> * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
> * sysdeps/x86_64/multiarch/memmove.c: Likewise.
> * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
> * sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
> * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
Looks good to me.
Thanks for the results. Yes, it looks like consistently ~30% over AVX.
My only thoughts are: How does this scale as you add more threads to
the process that try to use those functional units? Have you done any
scalability testing on these implementations?
Cheers,
Carlos.