On amd64 memcpy is actually calling __memcpy_avx_unaligned, and on i686 it’s calling __memcpy_ssse3_rep, and with a Sandy Bridge CPU, AVX is slower than SSSE3, despite being newer. I tested by disabling the AVX implementation and got nearly the same speed on amd64 than with the i686 version of the same program. Other functions (like memmove or strncmp) may also be affected, but I haven’t checked them. Other CPUs as well, I’ve heard that Ivy Bridge also has a slower AVX implementation, maybe some AMD ones too.
On Mon, Jan 05, 2015 at 10:23:18PM +0000, bugs at linkmauve dot fr wrote: > On amd64 memcpy is actually calling __memcpy_avx_unaligned, and on i686 it’s > calling __memcpy_ssse3_rep, and with a Sandy Bridge CPU, AVX is slower than > SSSE3, despite being newer. > > I tested by disabling the AVX implementation and got nearly the same speed on > amd64 than with the i686 version of the same program. > > Other functions (like memmove or strncmp) may also be affected, but I haven’t > checked them. > > Other CPUs as well, I’ve heard that Ivy Bridge also has a slower AVX > implementation, maybe some AMD ones too. > No, that is typo, that implementation was aimed for avx2 only, specially haswell where its fast.
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/pr17711 has been updated via 56d25c11b64a97255a115901d136d753c86de24e (commit) from a29c4064115e59bcf8c001c0b3dedfa8d49d3653 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=56d25c11b64a97255a115901d136d753c86de24e commit 56d25c11b64a97255a115901d136d753c86de24e Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Jan 30 06:50:20 2015 -0800 Use AVX unaligned memcpy only if AVX2 is available memcpy with unaligned 256-bit AVX register loads/stores are slow on older processorsl like Sandy Bridge. This patch adds bit_AVX_Fast_Unaligned_Load and sets it only when AVX2 is available. [BZ #17801] * sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features): Set the bit_AVX_Fast_Unaligned_Load bit for AVX2. * sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load): New. (index_AVX_Fast_Unaligned_Load): Likewise. (HAS_AVX_FAST_UNALIGNED_LOAD): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise. * sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD. * sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise. ----------------------------------------------------------------------- Summary of changes: ChangeLog | 18 ++++++++++++++++++ sysdeps/x86_64/multiarch/init-arch.c | 9 +++++++-- sysdeps/x86_64/multiarch/init-arch.h | 4 ++++ sysdeps/x86_64/multiarch/memcpy.S | 2 +- sysdeps/x86_64/multiarch/memcpy_chk.S | 2 +- sysdeps/x86_64/multiarch/memmove.c | 2 +- sysdeps/x86_64/multiarch/memmove_chk.c | 2 +- sysdeps/x86_64/multiarch/mempcpy.S | 2 +- sysdeps/x86_64/multiarch/mempcpy_chk.S | 2 +- 9 files changed, 35 insertions(+), 8 deletions(-)
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, master has been updated via 5f3d0b78e011d2a72f9e88b0e9ef5bc081d18f97 (commit) from b658fdd82b4524cf6a39881d092caa23f63d93ac (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5f3d0b78e011d2a72f9e88b0e9ef5bc081d18f97 commit 5f3d0b78e011d2a72f9e88b0e9ef5bc081d18f97 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Jan 30 06:50:20 2015 -0800 Use AVX unaligned memcpy only if AVX2 is available memcpy with unaligned 256-bit AVX register loads/stores are slow on older processorsl like Sandy Bridge. This patch adds bit_AVX_Fast_Unaligned_Load and sets it only when AVX2 is available. [BZ #17801] * sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features): Set the bit_AVX_Fast_Unaligned_Load bit for AVX2. * sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load): New. (index_AVX_Fast_Unaligned_Load): Likewise. (HAS_AVX_FAST_UNALIGNED_LOAD): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise. * sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD. * sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise. ----------------------------------------------------------------------- Summary of changes: ChangeLog | 18 ++++++++++++++++++ NEWS | 4 ++-- sysdeps/x86_64/multiarch/init-arch.c | 9 +++++++-- sysdeps/x86_64/multiarch/init-arch.h | 4 ++++ sysdeps/x86_64/multiarch/memcpy.S | 2 +- sysdeps/x86_64/multiarch/memcpy_chk.S | 2 +- sysdeps/x86_64/multiarch/memmove.c | 2 +- sysdeps/x86_64/multiarch/memmove_chk.c | 2 +- sysdeps/x86_64/multiarch/mempcpy.S | 2 +- sysdeps/x86_64/multiarch/mempcpy_chk.S | 2 +- 10 files changed, 37 insertions(+), 10 deletions(-)
Fixed for 2.21.
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, release/2.20/master has been updated via 4d54424420c6300efbf57a7b9aa8635a8b8c1942 (commit) via 1bf9d48aec087062e2a14b77cb5ee1fa81be334c (commit) via f9e0f439b72e0b2fb035be1bc60aaceeed7f6ed0 (commit) via b0694b9e98ee64cb25490de0921ce307f3872749 (commit) from f80af76648ed97a76745fad6caa3315a79cb1c7c (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4d54424420c6300efbf57a7b9aa8635a8b8c1942 commit 4d54424420c6300efbf57a7b9aa8635a8b8c1942 Author: Paul Pluzhnikov <ppluzhnikov@google.com> Date: Fri Feb 6 00:30:42 2015 -0500 CVE-2015-1472: wscanf allocates too little memory BZ #16618 Under certain conditions wscanf can allocate too little memory for the to-be-scanned arguments and overflow the allocated buffer. The implementation now correctly computes the required buffer size when using malloc. A regression test was added to tst-sscanf. (cherry picked from commit 5bd80bfe9ca0d955bfbbc002781bc7b01b6bcb06) Conflicts: ChangeLog NEWS https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1bf9d48aec087062e2a14b77cb5ee1fa81be334c commit 1bf9d48aec087062e2a14b77cb5ee1fa81be334c Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Jan 30 06:50:20 2015 -0800 Use AVX unaligned memcpy only if AVX2 is available memcpy with unaligned 256-bit AVX register loads/stores are slow on older processorsl like Sandy Bridge. This patch adds bit_AVX_Fast_Unaligned_Load and sets it only when AVX2 is available. [BZ #17801] * sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features): Set the bit_AVX_Fast_Unaligned_Load bit for AVX2. * sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load): New. (index_AVX_Fast_Unaligned_Load): Likewise. (HAS_AVX_FAST_UNALIGNED_LOAD): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise. * sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD. * sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise. (cherry picked from commit 5f3d0b78e011d2a72f9e88b0e9ef5bc081d18f97) Conflicts: ChangeLog NEWS https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f9e0f439b72e0b2fb035be1bc60aaceeed7f6ed0 commit f9e0f439b72e0b2fb035be1bc60aaceeed7f6ed0 Author: Leonhard Holz <leonhard.holz@web.de> Date: Tue Jan 13 11:33:56 2015 +0530 Fix memory handling in strxfrm_l [BZ #16009] [Modified from the original email by Siddhesh Poyarekar] This patch solves bug #16009 by implementing an additional path in strxfrm that does not depend on caching the weight and rule indices. In detail the following changed: * The old main loop was factored out of strxfrm_l into the function do_xfrm_cached to be able to alternativly use the non-caching version do_xfrm. * strxfrm_l allocates a a fixed size array on the stack. If this is not sufficiant to store the weight and rule indices, the non-caching path is taken. As the cache size is not dependent on the input there can be no problems with integer overflows or stack allocations greater than __MAX_ALLOCA_CUTOFF. Note that malloc-ing is not possible because the definition of strxfrm does not allow an oom errorhandling. * The uncached path determines the weight and rule index for every char and for every pass again. * Passing all the locale data array by array resulted in very long parameter lists, so I introduced a structure that holds them. * Checking for zero src string has been moved a bit upwards, it is before the locale data initialization now. * To verify that the non-caching path works correct I added a test run to localedata/sort-test.sh & localedata/xfrm-test.c where all strings are patched up with spaces so that they are too large for the caching path. (cherry picked from commit 0f9e585480edcdf1e30dc3d79e24b84aeee516fa) Conflicts: ChangeLog NEWS https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b0694b9e98ee64cb25490de0921ce307f3872749 commit b0694b9e98ee64cb25490de0921ce307f3872749 Author: Roland McGrath <roland@hack.frob.com> Date: Thu Sep 11 16:02:17 2014 -0700 Move findidx nested functions to top-level. Needed in order to backport strxfrm_l security fix cleanly. (cherry picked from commit 8c0ab919f63dc03a420751172602a52d2bea59a8) Conflicts: ChangeLog ----------------------------------------------------------------------- Summary of changes: ChangeLog | 77 +++++ NEWS | 8 +- locale/weight.h | 13 +- locale/weightwc.h | 13 +- localedata/sort-test.sh | 7 + localedata/xfrm-test.c | 52 +++- posix/fnmatch.c | 8 + posix/fnmatch_loop.c | 17 +- posix/regcomp.c | 10 +- posix/regex_internal.h | 7 +- posix/regexec.c | 8 +- stdio-common/tst-sscanf.c | 33 +++ stdio-common/vfscanf.c | 12 +- string/strcoll_l.c | 9 +- string/strxfrm_l.c | 491 +++++++++++++++++++++++++------- sysdeps/x86_64/multiarch/init-arch.c | 9 +- sysdeps/x86_64/multiarch/init-arch.h | 4 + sysdeps/x86_64/multiarch/memcpy.S | 2 +- sysdeps/x86_64/multiarch/memcpy_chk.S | 2 +- sysdeps/x86_64/multiarch/memmove.c | 2 +- sysdeps/x86_64/multiarch/memmove_chk.c | 2 +- sysdeps/x86_64/multiarch/mempcpy.S | 2 +- sysdeps/x86_64/multiarch/mempcpy_chk.S | 2 +- 23 files changed, 642 insertions(+), 148 deletions(-)
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/release/2.20/master has been created at 328fc20e5e334a642f0152d9662474789381a897 (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=328fc20e5e334a642f0152d9662474789381a897 commit 328fc20e5e334a642f0152d9662474789381a897 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Jan 30 06:50:20 2015 -0800 Use AVX unaligned memcpy only if AVX2 is available memcpy with unaligned 256-bit AVX register loads/stores are slow on older processorsl like Sandy Bridge. This patch adds bit_AVX_Fast_Unaligned_Load and sets it only when AVX2 is available. [BZ #17801] * sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features): Set the bit_AVX_Fast_Unaligned_Load bit for AVX2. * sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load): New. (index_AVX_Fast_Unaligned_Load): Likewise. (HAS_AVX_FAST_UNALIGNED_LOAD): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise. * sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD. * sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise. [cherry picked from commit 56d25c11b64a97255a115901d136d753c86de24e] -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/memcpy/dpdk/master has been created at 1bc1103620e8f6c7e01cb54a8ed04ee1c3eb5a1a (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1bc1103620e8f6c7e01cb54a8ed04ee1c3eb5a1a commit 1bc1103620e8f6c7e01cb54a8ed04ee1c3eb5a1a Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Jan 30 11:07:13 2015 -0800 Add memcpy-rte-ssse3.c https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f63a6815da4c72626b14b456a6902cc8d3671729 commit f63a6815da4c72626b14b456a6902cc8d3671729 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Jan 30 08:44:30 2015 -0800 Add memcpy-rte-avx.c Don't inline rte_memcpy. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d2ca99bf141c78bd8d9c1f314ce8a1f12c439d4b commit d2ca99bf141c78bd8d9c1f314ce8a1f12c439d4b Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Jan 30 08:51:45 2015 -0800 Import rte_memcpy.h rte_memcpy.h is a memcpy implementation from DPDK: http://dpdk.org/ optimized for Sandy Bridge and Haswell. See http://dpdk.org/ml/archives/dev/2014-November/008158.html The original code is at https://gist.github.com/lukego/efc82a15bde5ec83cb1b https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=56d25c11b64a97255a115901d136d753c86de24e commit 56d25c11b64a97255a115901d136d753c86de24e Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Jan 30 06:50:20 2015 -0800 Use AVX unaligned memcpy only if AVX2 is available memcpy with unaligned 256-bit AVX register loads/stores are slow on older processorsl like Sandy Bridge. This patch adds bit_AVX_Fast_Unaligned_Load and sets it only when AVX2 is available. [BZ #17801] * sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features): Set the bit_AVX_Fast_Unaligned_Load bit for AVX2. * sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load): New. (index_AVX_Fast_Unaligned_Load): Likewise. (HAS_AVX_FAST_UNALIGNED_LOAD): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise. * sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD. * sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a29c4064115e59bcf8c001c0b3dedfa8d49d3653 commit a29c4064115e59bcf8c001c0b3dedfa8d49d3653 Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Jan 14 06:29:04 2015 -0800 Support compilers defaulting to PIE If PIE is the default, we need to build programs as PIE. * Makeconfig (+link): Set to $(+link-pie) if default to PIE. (+link-tests): Set to $(+link-pie-tests) if default to PIE. * config.make.in (build-pie-default): New. * configure.ac (libc_cv_pie_default): New. Set to yes if -fPIE is default. AC_SUBST. * configure: Regenerated. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f0b03bc24b54927677af56778309b6d58aac5eb4 commit f0b03bc24b54927677af56778309b6d58aac5eb4 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Jan 13 06:19:44 2015 -0800 Compile gcrt1.o with -fPIC We compile gcrt1.o with -fPIC to support both "gcc -pg" and "gcc -pie -pg". [BZ #17836] * csu/Makefile (extra-objs): Add gmon-start.o if not builing shared library. Add gmon-start.os otherwise. ($(objpfx)g$(start-installed-name)): Use $(objpfx)S% $(objpfx)gmon-start.os if builing shared library. ($(objpfx)g$(static-start-installed-name)): Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ccf880ba92fe1ef7f29f17062ba6aa2aa7b52f50 commit ccf880ba92fe1ef7f29f17062ba6aa2aa7b52f50 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Dec 19 06:30:31 2014 -0800 Compile vismain with -fPIC and link with -pie -----------------------------------------------------------------------