This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] x86_64: memcpy/memmove family optimized with AVX512

From: "Carlos O'Donell" <carlos at redhat dot com>
To: Andrew Senkevich <andrew dot n dot senkevich at gmail dot com>, libc-alpha <libc-alpha at sourceware dot org>
Date: Wed, 13 Jan 2016 13:20:10 -0500
Subject: Re: [PATCH] x86_64: memcpy/memmove family optimized with AVX512
Authentication-results: sourceware.org; auth=none
References: <CAMXFM3uGLiFE+pKPzFgWP6Sx4C3w2Ktd4w3+35O0Bj=B1s0naA at mail dot gmail dot com>

On 01/12/2016 09:13 AM, Andrew Senkevich wrote:
> Hi,
> 
> here is AVX512 implementations of memcpy, mempcpy, memmove,
> memcpy_chk, mempcpy_chk, memmove_chk.
> It shows average improvement more than 30% over AVX versions on KNL
> hardware, performance results attached.
> Ok for trunk?
> 
> 2016-01-12  Andrew Senkevich  <andrew.senkevich@intel.com>
> 
>         * sysdeps/x86_64/multiarch/Makefile: Added new files.
>         * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests.
>         * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: New file.
>         * sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S: Likewise.
>         * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: Likewise.
>         * sysdeps/x86_64/multiarch/memcpy.S: Added new IFUNC branch.
>         * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
>         * sysdeps/x86_64/multiarch/memmove.c: Likewise.
>         * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
>         * sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
>         * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.

Looks good to me.

Thanks for the results. Yes, it looks like consistently ~30% over AVX.
My only thoughts are: How does this scale as you add more threads to
the process that try to use those functional units? Have you done any
scalability testing on these implementations?

Cheers,
Carlos.

References:
- [PATCH] x86_64: memcpy/memmove family optimized with AVX512
  - From: Andrew Senkevich

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]