This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug string/19928] memmove-vec-unaligned-erms.S is slow with large data size


https://sourceware.org/bugzilla/show_bug.cgi?id=19928

--- Comment #6 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  a057f5f8cd1becc5ae8b51220283095bc808d72a (commit)
      from  b39d84adff832bddc3e2fc4a1878a7fba6bbb2a1 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a057f5f8cd1becc5ae8b51220283095bc808d72a

commit a057f5f8cd1becc5ae8b51220283095bc808d72a
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Apr 12 08:10:31 2016 -0700

    X86-64: Use non-temporal store in memcpy on large data

    The large memcpy micro benchmark in glibc shows that there is a
    regression with large data on Haswell machine.  non-temporal store in
    memcpy on large data can improve performance significantly.  This
    patch adds a threshold to use non temporal store which is 6 times of
    shared cache size.  When size is above the threshold, non temporal
    store will be used, but avoid non-temporal store if there is overlap
    between destination and source since destination may be in cache when
    source is loaded.

    For size below 8 vector register width, we load all data into registers
    and store them together.  Only forward and backward loops, which move 4
    vector registers at a time, are used to support overlapping addresses.
    For forward loop, we load the last 4 vector register width of data and
    the first vector register width of data into vector registers before the
    loop and store them after the loop.  For backward loop, we load the first
    4 vector register width of data and the last vector register width of
    data into vector registers before the loop and store them after the loop.

        [BZ #19928]
        * sysdeps/x86_64/cacheinfo.c (__x86_shared_non_temporal_threshold):
        New.
        (init_cacheinfo): Set __x86_shared_non_temporal_threshold to 6
        times of shared cache size.
        * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S
        (VMOVNT): New.
        * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S
        (VMOVNT): Likewise.
        * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S
        (VMOVNT): Likewise.
        (VMOVU): Changed to movups for smaller code sizes.
        (VMOVA): Changed to movaps for smaller code sizes.
        * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Update
        comments.
        (PREFETCH): New.
        (PREFETCH_SIZE): Likewise.
        (PREFETCHED_LOAD_SIZE): Likewise.
        (PREFETCH_ONE_SET): Likewise.
        Rewrite to use forward and backward loops, which move 4 vector
        registers at a time, to support overlapping addresses and use
        non temporal store if size is above the threshold and there is
        no overlap between destination and source.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                                          |   26 ++
 sysdeps/x86_64/cacheinfo.c                         |    8 +
 .../x86_64/multiarch/memmove-avx-unaligned-erms.S  |    1 +
 .../multiarch/memmove-avx512-unaligned-erms.S      |    1 +
 .../x86_64/multiarch/memmove-sse2-unaligned-erms.S |    6 +-
 .../x86_64/multiarch/memmove-vec-unaligned-erms.S  |  389 +++++++++++---------
 6 files changed, 260 insertions(+), 171 deletions(-)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]