When running make check on x86_64, we get the following error: This is probably due to commit 6aa3e97e2530f9917f504eb4146af119a3f27229 renaming bit_AVX512F to bit_cpu_AVX512F. I'm attaching a patch that fixes this. I have a company-wide copyright assignment for glibc.
Created attachment 9120 [details] Bug-fix
Sorry, had a copy-paste problem and didn't actually write the error message in the bug description. Here it is: ../sysdeps/x86_64/tst-audit10.c: In function ‘avx512_enabled’: ../sysdeps/x86_64/tst-audit10.c:34:15: error: ‘bit_AVX512F’ undeclared (first use in this function)
Please send the patch to <libc-alpha@sourceware.org>.
The patch is wrong, bit_AVX512F is supposed to come from <cpuid.h>, but is only available if gcc is new enough.
Patch is here: https://sourceware.org/ml/libc-alpha/2016-03/msg00365.html In need to test it with GCC 4.7 and commit it.
(In reply to Martin Galvan from comment #2) > Sorry, had a copy-paste problem and didn't actually write the error message > in the bug description. Here it is: > > ../sysdeps/x86_64/tst-audit10.c: In function ‘avx512_enabled’: > ../sysdeps/x86_64/tst-audit10.c:34:15: error: ‘bit_AVX512F’ undeclared > (first use in this function) What's the error message for tst-auditmod10b?
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, master has been updated via f327f5b47be57bc05a4077344b381016c1bb2c11 (commit) from c898991d8bcfacc825097ba389ffccc5367c2b2d (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f327f5b47be57bc05a4077344b381016c1bb2c11 commit f327f5b47be57bc05a4077344b381016c1bb2c11 Author: Florian Weimer <fweimer@redhat.com> Date: Fri Mar 25 11:11:42 2016 +0100 tst-audit10: Fix compilation on compilers without bit_AVX512F [BZ #19860] [BZ# 19860] * sysdeps/x86_64/tst-audit10.c (avx512_enabled): Always return zero if the compiler does not provide the AVX512F bit. ----------------------------------------------------------------------- Summary of changes: ChangeLog | 6 ++++++ sysdeps/x86_64/tst-audit10.c | 5 ++++- 2 files changed, 10 insertions(+), 1 deletions(-)
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/erms/2.23 has been created at 4e339b9dc65217fb9b9be6cdc0e991f4ae64ccfe (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4e339b9dc65217fb9b9be6cdc0e991f4ae64ccfe commit 4e339b9dc65217fb9b9be6cdc0e991f4ae64ccfe Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Apr 1 14:01:24 2016 -0700 X86-64: Add dummy memcopy.h and wordcopy.c Since x86-64 doesn't use memory copy functions, add dummy memcopy.h and wordcopy.c to reduce code size. It reduces the size of libc.so by about 1 KB. * sysdeps/x86_64/memcopy.h: New file. * sysdeps/x86_64/wordcopy.c: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=997e6c0db2c351f4a7b688c3134c1f77a0aa49de commit 997e6c0db2c351f4a7b688c3134c1f77a0aa49de Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 12:46:57 2016 -0700 X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove Since the new SSE2/AVX2 memcpy/memmove are faster than the previous ones, we can remove the previous SSE2/AVX2 memcpy/memmove and replace them with the new ones. No change in IFUNC selection if SSE2 and AVX2 memcpy/memmove weren't used before. If SSE2 or AVX2 memcpy/memmove were used, the new SSE2 or AVX2 memcpy/memmove optimized with Enhanced REP MOVSB will be used for processors with ERMS. The new AVX512 memcpy/memmove will be used for processors with AVX512 which prefer vzeroupper. Since the new SSE2 memcpy/memmove are faster than the previous default memcpy/memmove used in libc.a and ld.so, we also remove the previous default memcpy/memmove and make them the default memcpy/memmove. Together, it reduces the size of libc.so by about 6 KB and the size of ld.so by about 2 KB. It also fixes the placement of __mempcpy_erms and __memmove_erms. [BZ #19776] * sysdeps/x86_64/memcpy.S: Make it dummy. * sysdeps/x86_64/mempcpy.S: Likewise. * sysdeps/x86_64/memmove.S: New file. * sysdeps/x86_64/memmove_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise. * sysdeps/x86_64/memmove.c: Removed. * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memcpy-sse2-unaligned, memmove-avx-unaligned, memcpy-avx-unaligned and memmove-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Replace __memmove_chk_avx512_unaligned_2 with __memmove_chk_avx512_unaligned. Remove __memmove_chk_avx_unaligned_2. Replace __memmove_chk_sse2_unaligned_2 with __memmove_chk_sse2_unaligned. Remove __memmove_chk_sse2 and __memmove_avx_unaligned_2. Replace __memmove_avx512_unaligned_2 with __memmove_avx512_unaligned. Replace __memmove_sse2_unaligned_2 with __memmove_sse2_unaligned. Remove __memmove_sse2. Replace __memcpy_chk_avx512_unaligned_2 with __memcpy_chk_avx512_unaligned. Remove __memcpy_chk_avx_unaligned_2. Replace __memcpy_chk_sse2_unaligned_2 with __memcpy_chk_sse2_unaligned. Remove __memcpy_chk_sse2. Remove __memcpy_avx_unaligned_2. Replace __memcpy_avx512_unaligned_2 with __memcpy_avx512_unaligned. Remove __memcpy_sse2_unaligned_2 and __memcpy_sse2. Replace __mempcpy_chk_avx512_unaligned_2 with __mempcpy_chk_avx512_unaligned. Remove __mempcpy_chk_avx_unaligned_2. Replace __mempcpy_chk_sse2_unaligned_2 with __mempcpy_chk_sse2_unaligned. Remove __mempcpy_chk_sse2. Replace __mempcpy_avx512_unaligned_2 with __mempcpy_avx512_unaligned. Remove __mempcpy_avx_unaligned_2. Replace __mempcpy_sse2_unaligned_2 with __mempcpy_sse2_unaligned. Remove __mempcpy_sse2. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Support __memcpy_avx512_unaligned_erms and __memcpy_avx512_unaligned. Use __memcpy_avx_unaligned_erms and __memcpy_sse2_unaligned_erms if processor has ERMS. Default to __memcpy_sse2_unaligned. (ENTRY): Removed. (END): Likewise. (ENTRY_CHK): Likewise. (libc_hidden_builtin_def): Likewise. Don't include ../memcpy.S. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Support __memcpy_chk_avx512_unaligned_erms and __memcpy_chk_avx512_unaligned. Use __memcpy_chk_avx_unaligned_erms and __memcpy_chk_sse2_unaligned_erms if if processor has ERMS. Default to __memcpy_chk_sse2_unaligned. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (MEMCPY_SYMBOL): New. (MEMPCPY_SYMBOL): Likewise. (MEMMOVE_CHK_SYMBOL): Likewise. (__mempcpy_erms, __memmove_erms): Moved before __mempcpy_chk with unaligned_erms. Replace MEMMOVE_SYMBOL with MEMMOVE_CHK_SYMBOL on __mempcpy_chk symbols. Replace MEMMOVE_SYMBOL with MEMPCPY_SYMBOL on __mempcpy symbols. Change function suffix from unaligned_2 to unaligned. Provide alias for __memcpy_chk in libc.a. Provide alias for memcpy in libc.a and ld.so. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Support __mempcpy_avx512_unaligned_erms and __mempcpy_avx512_unaligned. Use __mempcpy_avx_unaligned_erms and __mempcpy_sse2_unaligned_erms if processor has ERMS. Default to __mempcpy_sse2_unaligned. (ENTRY): Removed. (END): Likewise. (ENTRY_CHK): Likewise. (libc_hidden_builtin_def): Likewise. Don't include ../mempcpy.S. (mempcpy): New. Add a weak alias. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Support __mempcpy_chk_avx512_unaligned_erms and __mempcpy_chk_avx512_unaligned. Use __mempcpy_chk_avx_unaligned_erms and __mempcpy_chk_sse2_unaligned_erms if if processor has ERMS. Default to __mempcpy_chk_sse2_unaligned. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0ff8c6a7b53c5bb28ac3d3e0ae8da8099491b16c commit 0ff8c6a7b53c5bb28ac3d3e0ae8da8099491b16c Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 10:42:30 2016 -0700 X86-64: Remove the previous SSE2/AVX2 memsets Since the new SSE2/AVX2 memsets are faster than the previous ones, we can remove the previous SSE2/AVX2 memsets and replace them with the new ones. This reduces the size of libc.so by about 900 bytes. No change in IFUNC selection if SSE2 and AVX2 memsets weren't used before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset optimized with Enhanced REP STOSB will be used for processors with ERMS. The new AVX512 memset will be used for processors with AVX512 which prefer vzeroupper. [BZ #19881] * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded into ... * sysdeps/x86_64/memset.S: This. (__bzero): Removed. (__memset_tail): Likewise. (__memset_chk): Likewise. (memset): Likewise. (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't defined. (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined. * sysdeps/x86_64/multiarch/memset-avx2.S: Removed. (__memset_zero_constant_len_parameter): Check SHARED instead of PIC. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memset-avx2 and memset-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Remove __memset_chk_sse2, __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (MEMSET_CHK_SYMBOL): New. Define if not defined. (__bzero): Check VEC_SIZE == 16 instead of USE_MULTIARCH. Replace MEMSET_SYMBOL with MEMSET_CHK_SYMBOL on __memset_chk symbols. Properly check USE_MULTIARCH on __memset symbols. * sysdeps/x86_64/multiarch/memset.S (memset): Replace __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms or __memset_avx2_unaligned_erms if processor has ERMS. Support __memset_avx512_unaligned_erms and __memset_avx512_unaligned. (memset): Removed. (__memset_chk): Likewise. (MEMSET_SYMBOL): New. (libc_hidden_builtin_def): Replace __memset_sse2 with __memset_sse2_unaligned. * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace __memset_chk_sse2 and __memset_chk_avx2 with __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms. Use __memset_chk_sse2_unaligned_erms or __memset_chk_avx2_unaligned_erms if processor has ERMS. Support __memset_chk_avx512_unaligned_erms and __memset_chk_avx512_unaligned. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=cfb059c79729b26284863334c9aa04f0a3b967b9 commit cfb059c79729b26284863334c9aa04f0a3b967b9 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Apr 1 15:08:48 2016 -0700 Remove Fast_Copy_Backward from Intel Core processors Intel Core i3, i5 and i7 processors have fast unaligned copy and copy backward is ignored. Remove Fast_Copy_Backward from Intel Core processors to avoid confusion. * sysdeps/x86/cpu-features.c (init_cpu_features): Don't set bit_arch_Fast_Copy_Backward for Intel Core proessors. (cherry picked from commit 27d3ce1467990f89126e228559dec8f84b96c60e) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=30c389be1af67c4d0716d207b6780c6169d1355f commit 30c389be1af67c4d0716d207b6780c6169d1355f Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 10:05:51 2016 -0700 Add x86-64 memset with unaligned store and rep stosb Implement x86-64 memset with unaligned store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. A single file provides 2 implementations of memset, one with rep stosb and the other without rep stosb. They share the same codes when size is between 2 times of vector register size and REP_STOSB_THRESHOLD which defaults to 2KB. Key features: 1. Use overlapping store to avoid branch. 2. For size <= 4 times of vector register size, fully unroll the loop. 3. For size > 4 times of vector register size, store 4 times of vector register size at a time. [BZ #19881] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and memset-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned, __memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned, __memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned, __memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned, __memset_sse2_unaligned_erms, __memset_erms, __memset_avx2_unaligned, __memset_avx2_unaligned_erms, __memset_avx512_unaligned_erms and __memset_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise. (cherry picked from commit 830566307f038387ca0af3fd327706a8d1a2f595) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=980d639b4ae58209843f09a29d86b0a8303b6650 commit 980d639b4ae58209843f09a29d86b0a8303b6650 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 10:04:26 2016 -0700 Add x86-64 memmove with unaligned load/store and rep movsb Implement x86-64 memmove with unaligned load/store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. When size <= 8 times of vector register size, there is no check for address overlap bewteen source and destination. Since overhead for overlap check is small when size > 8 times of vector register size, memcpy is an alias of memmove. A single file provides 2 implementations of memmove, one with rep movsb and the other without rep movsb. They share the same codes when size is between 2 times of vector register size and REP_MOVSB_THRESHOLD which is 2KB for 16-byte vector register size and scaled up by large vector register size. Key features: 1. Use overlapping load and store to avoid branch. 2. For size <= 8 times of vector register size, load all sources into registers and store them together. 3. If there is no address overlap bewteen source and destination, copy from both ends with 4 times of vector register size at a time. 4. If address of destination > address of source, backward copy 8 times of vector register size at a time. 5. Otherwise, forward copy 8 times of vector register size at a time. 6. Use rep movsb only for forward copy. Avoid slow backward rep movsb by fallbacking to backward copy 8 times of vector register size at a time. 7. Skip when address of destination == address of source. [BZ #19776] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and memmove-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memmove_chk_avx512_unaligned_2, __memmove_chk_avx512_unaligned_erms, __memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms, __memmove_chk_sse2_unaligned_2, __memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2, __memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2, __memmove_avx512_unaligned_erms, __memmove_erms, __memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms, __memcpy_chk_avx512_unaligned_2, __memcpy_chk_avx512_unaligned_erms, __memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms, __memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms, __memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms, __memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms, __memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms, __memcpy_erms, __mempcpy_chk_avx512_unaligned_2, __mempcpy_chk_avx512_unaligned_erms, __mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms, __mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms, __mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms, __mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms, __mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and __mempcpy_erms. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Likwise. (cherry picked from commit 88b57b8ed41d5ecf2e1bdfc19556f9246a665ebb) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bf2bc5e5c9d7aa8af28b299ec26b8a37352730cc commit bf2bc5e5c9d7aa8af28b299ec26b8a37352730cc Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 19:22:59 2016 -0700 Initial Enhanced REP MOVSB/STOSB (ERMS) support The newer Intel processors support Enhanced REP MOVSB/STOSB (ERMS) which has a feature bit in CPUID. This patch adds the Enhanced REP MOVSB/STOSB (ERMS) bit to x86 cpu-features. * sysdeps/x86/cpu-features.h (bit_cpu_ERMS): New. (index_cpu_ERMS): Likewise. (reg_ERMS): Likewise. (cherry picked from commit 0791f91dff9a77263fa8173b143d854cad902c6d) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7c244283ff12329b3bca9878b8edac3b3fe5c7bc commit 7c244283ff12329b3bca9878b8edac3b3fe5c7bc Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 13:15:59 2016 -0700 Make __memcpy_avx512_no_vzeroupper an alias Since x86-64 memcpy-avx512-no-vzeroupper.S implements memmove, make __memcpy_avx512_no_vzeroupper an alias of __memmove_avx512_no_vzeroupper to reduce code size of libc.so. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memcpy-avx512-no-vzeroupper. * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Renamed to ... * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: This. (MEMCPY): Don't define. (MEMCPY_CHK): Likewise. (MEMPCPY): Likewise. (MEMPCPY_CHK): Likewise. (MEMPCPY_CHK): Renamed to ... (__mempcpy_chk_avx512_no_vzeroupper): This. (MEMPCPY_CHK): Renamed to ... (__mempcpy_chk_avx512_no_vzeroupper): This. (MEMCPY_CHK): Renamed to ... (__memmove_chk_avx512_no_vzeroupper): This. (MEMCPY): Renamed to ... (__memmove_avx512_no_vzeroupper): This. (__memcpy_avx512_no_vzeroupper): New alias. (__memcpy_chk_avx512_no_vzeroupper): Likewise. (cherry picked from commit 064f01b10b57ff09cda7025f484b848c38ddd57a) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a9a14991fb2d3e69f80d25e9bbf2f6b0bcf11c3d commit a9a14991fb2d3e69f80d25e9bbf2f6b0bcf11c3d Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 13:13:36 2016 -0700 Implement x86-64 multiarch mempcpy in memcpy Implement x86-64 multiarch mempcpy in memcpy to share most of code. It reduces code size of libc.so. [BZ #18858] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove mempcpy-ssse3, mempcpy-ssse3-back, mempcpy-avx-unaligned and mempcpy-avx512-no-vzeroupper. * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3-back.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/mempcpy-avx-unaligned.S: Removed. * sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy-ssse3-back.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy-ssse3.S: Likewise. (cherry picked from commit c365e615f7429aee302f8af7bf07ae262278febb) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4fc09dabecee1b7cafdbca26ee7c63f68e53c229 commit 4fc09dabecee1b7cafdbca26ee7c63f68e53c229 Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 04:39:48 2016 -0700 [x86] Add a feature bit: Fast_Unaligned_Copy On AMD processors, memcpy optimized with unaligned SSE load is slower than emcpy optimized with aligned SSSE3 while other string functions are faster with unaligned SSE load. A feature bit, Fast_Unaligned_Copy, is added to select memcpy optimized with unaligned SSE load. [BZ #19583] * sysdeps/x86/cpu-features.c (init_cpu_features): Set Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel processors. Set Fast_Copy_Backward for AMD Excavator processors. * sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy): New. (index_arch_Fast_Unaligned_Copy): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check Fast_Unaligned_Copy instead of Fast_Unaligned_Load. (cherry picked from commit e41b395523040fcb58c7d378475720c2836d280c) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=75f2d47e459a6bf5656a938e5c63f8b581eb3ee6 commit 75f2d47e459a6bf5656a938e5c63f8b581eb3ee6 Author: Florian Weimer <fweimer@redhat.com> Date: Fri Mar 25 11:11:42 2016 +0100 tst-audit10: Fix compilation on compilers without bit_AVX512F [BZ #19860] [BZ# 19860] * sysdeps/x86_64/tst-audit10.c (avx512_enabled): Always return zero if the compiler does not provide the AVX512F bit. (cherry picked from commit f327f5b47be57bc05a4077344b381016c1bb2c11) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=96c7375cb8b6f1875d9865f2ae92ecacf5f5e6fa commit 96c7375cb8b6f1875d9865f2ae92ecacf5f5e6fa Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Mar 22 08:36:16 2016 -0700 Don't set %rcx twice before "rep movsb" * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMCPY): Don't set %rcx twice before "rep movsb". (cherry picked from commit 3c9a4cd16cbc7b79094fec68add2df66061ab5d7) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c273f613b0cc779ee33cc33d20941d271316e483 commit c273f613b0cc779ee33cc33d20941d271316e483 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Mar 22 07:46:56 2016 -0700 Set index_arch_AVX_Fast_Unaligned_Load only for Intel processors Since only Intel processors with AVX2 have fast unaligned load, we should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors. Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces and call get_common_indeces for other processors. Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading GLRO(dl_x86_cpu_features) in cpu-features.c. [BZ #19583] * sysdeps/x86/cpu-features.c (get_common_indeces): Remove inline. Check family before setting family, model and extended_model. Set AVX, AVX2, AVX512, FMA and FMA4 usable bits here. (init_cpu_features): Replace HAS_CPU_FEATURE and HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P. Set index_arch_AVX_Fast_Unaligned_Load for Intel processors with usable AVX2. Call get_common_indeces for other processors with family == NULL. * sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro. (CPU_FEATURES_ARCH_P): Likewise. (HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P. (HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P. (cherry picked from commit f781a9e96138d8839663af5e88649ab1fbed74f8) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c858d10a4e7fd682f2e7083836e4feacc2d580f4 commit c858d10a4e7fd682f2e7083836e4feacc2d580f4 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 10 05:26:46 2016 -0800 Add _arch_/_cpu_ to index_*/bit_* in x86 cpu-features.h index_* and bit_* macros are used to access cpuid and feature arrays o struct cpu_features. It is very easy to use bits and indices of cpuid array on feature array, especially in assembly codes. For example, sysdeps/i386/i686/multiarch/bcopy.S has HAS_CPU_FEATURE (Fast_Rep_String) which should be HAS_ARCH_FEATURE (Fast_Rep_String) We change index_* and bit_* to index_cpu_*/index_arch_* and bit_cpu_*/bit_arch_* so that we can catch such error at build time. [BZ #19762] * sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h (EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*. * sysdeps/x86/cpu-features.c (init_cpu_features): Likewise. * sysdeps/x86/cpu-features.h (bit_*): Renamed to ... (bit_arch_*): This for feature array. (bit_*): Renamed to ... (bit_cpu_*): This for cpu array. (index_*): Renamed to ... (index_arch_*): This for feature array. (index_*): Renamed to ... (index_cpu_*): This for cpu array. [__ASSEMBLER__] (HAS_FEATURE): Add and use field. [__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE. [__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE. [!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and bit_##name with index_cpu_##name and bit_cpu_##name. [!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and bit_##name with index_arch_##name and bit_arch_##name. (cherry picked from commit 6aa3e97e2530f9917f504eb4146af119a3f27229) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7a90b56b0c3f8e55df44957cf6de7d3c9c04cbb9 commit 7a90b56b0c3f8e55df44957cf6de7d3c9c04cbb9 Author: Roland McGrath <roland@hack.frob.com> Date: Tue Mar 8 12:31:13 2016 -0800 Fix tst-audit10 build when -mavx512f is not supported. (cherry picked from commit 3bd80c0de2f8e7ca8020d37739339636d169957e) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ba80f6ceea3a6b6f711038646f419125fe3ad39c commit ba80f6ceea3a6b6f711038646f419125fe3ad39c Author: Florian Weimer <fweimer@redhat.com> Date: Mon Mar 7 16:00:25 2016 +0100 tst-audit4, tst-audit10: Compile AVX/AVX-512 code separately [BZ #19269] This ensures that GCC will not use unsupported instructions before the run-time check to ensure support. (cherry picked from commit 3c0f7407eedb524c9114bb675cd55b903c71daaa) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b8fe596e7f750d4ee2fca14d6a3999364c02662e commit b8fe596e7f750d4ee2fca14d6a3999364c02662e Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Mar 6 16:48:11 2016 -0800 Group AVX512 functions in .text.avx512 section * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Replace .text with .text.avx512. * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Likewise. (cherry picked from commit fee9eb6200f0e44a4b684903bc47fde36d46f1a5) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e455d17680cfaebb12692547422f95ba1ed30e29 commit e455d17680cfaebb12692547422f95ba1ed30e29 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Mar 4 08:37:40 2016 -0800 x86-64: Fix memcpy IFUNC selection Chek Fast_Unaligned_Load, instead of Slow_BSF, and also check for Fast_Copy_Backward to enable __memcpy_ssse3_back. Existing selection order is updated with following selection order: 1. __memcpy_avx_unaligned if AVX_Fast_Unaligned_Load bit is set. 2. __memcpy_sse2_unaligned if Fast_Unaligned_Load bit is set. 3. __memcpy_sse2 if SSSE3 isn't available. 4. __memcpy_ssse3_back if Fast_Copy_Backward bit it set. 5. __memcpy_ssse3 [BZ #18880] * sysdeps/x86_64/multiarch/memcpy.S: Check Fast_Unaligned_Load, instead of Slow_BSF, and also check for Fast_Copy_Backward to enable __memcpy_ssse3_back. (cherry picked from commit 14a1d7cc4c4fd5ee8e4e66b777221dd32a84efe8) -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/erms/2.23 has been created at 9910c54c2e97b6c36f8593097e53d5e09f837a69 (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9910c54c2e97b6c36f8593097e53d5e09f837a69 commit 9910c54c2e97b6c36f8593097e53d5e09f837a69 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Apr 1 14:01:24 2016 -0700 X86-64: Add dummy memcopy.h and wordcopy.c Since x86-64 doesn't use memory copy functions, add dummy memcopy.h and wordcopy.c to reduce code size. It reduces the size of libc.so by about 1 KB. * sysdeps/x86_64/memcopy.h: New file. * sysdeps/x86_64/wordcopy.c: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3429d9dd330a5c140cb37e77e7c388a71fdb44f1 commit 3429d9dd330a5c140cb37e77e7c388a71fdb44f1 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 12:46:57 2016 -0700 X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove Since the new SSE2/AVX2 memcpy/memmove are faster than the previous ones, we can remove the previous SSE2/AVX2 memcpy/memmove and replace them with the new ones. No change in IFUNC selection if SSE2 and AVX2 memcpy/memmove weren't used before. If SSE2 or AVX2 memcpy/memmove were used, the new SSE2 or AVX2 memcpy/memmove optimized with Enhanced REP MOVSB will be used for processors with ERMS. The new AVX512 memcpy/memmove will be used for processors with AVX512 which prefer vzeroupper. Since the new SSE2 memcpy/memmove are faster than the previous default memcpy/memmove used in libc.a and ld.so, we also remove the previous default memcpy/memmove and make them the default memcpy/memmove. Together, it reduces the size of libc.so by about 6 KB and the size of ld.so by about 2 KB. [BZ #19776] * sysdeps/x86_64/memcpy.S: Make it dummy. * sysdeps/x86_64/mempcpy.S: Likewise. * sysdeps/x86_64/memmove.S: New file. * sysdeps/x86_64/memmove_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise. * sysdeps/x86_64/memmove.c: Removed. * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memcpy-sse2-unaligned, memmove-avx-unaligned, memcpy-avx-unaligned and memmove-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Replace __memmove_chk_avx512_unaligned_2 with __memmove_chk_avx512_unaligned. Remove __memmove_chk_avx_unaligned_2. Replace __memmove_chk_sse2_unaligned_2 with __memmove_chk_sse2_unaligned. Remove __memmove_chk_sse2 and __memmove_avx_unaligned_2. Replace __memmove_avx512_unaligned_2 with __memmove_avx512_unaligned. Replace __memmove_sse2_unaligned_2 with __memmove_sse2_unaligned. Remove __memmove_sse2. Replace __memcpy_chk_avx512_unaligned_2 with __memcpy_chk_avx512_unaligned. Remove __memcpy_chk_avx_unaligned_2. Replace __memcpy_chk_sse2_unaligned_2 with __memcpy_chk_sse2_unaligned. Remove __memcpy_chk_sse2. Remove __memcpy_avx_unaligned_2. Replace __memcpy_avx512_unaligned_2 with __memcpy_avx512_unaligned. Remove __memcpy_sse2_unaligned_2 and __memcpy_sse2. Replace __mempcpy_chk_avx512_unaligned_2 with __mempcpy_chk_avx512_unaligned. Remove __mempcpy_chk_avx_unaligned_2. Replace __mempcpy_chk_sse2_unaligned_2 with __mempcpy_chk_sse2_unaligned. Remove __mempcpy_chk_sse2. Replace __mempcpy_avx512_unaligned_2 with __mempcpy_avx512_unaligned. Remove __mempcpy_avx_unaligned_2. Replace __mempcpy_sse2_unaligned_2 with __mempcpy_sse2_unaligned. Remove __mempcpy_sse2. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Support __memcpy_avx512_unaligned_erms and __memcpy_avx512_unaligned. Use __memcpy_avx_unaligned_erms and __memcpy_sse2_unaligned_erms if processor has ERMS. Default to __memcpy_sse2_unaligned. (ENTRY): Removed. (END): Likewise. (ENTRY_CHK): Likewise. (libc_hidden_builtin_def): Likewise. Don't include ../memcpy.S. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Support __memcpy_chk_avx512_unaligned_erms and __memcpy_chk_avx512_unaligned. Use __memcpy_chk_avx_unaligned_erms and __memcpy_chk_sse2_unaligned_erms if if processor has ERMS. Default to __memcpy_chk_sse2_unaligned. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (MEMCPY_SYMBOL): New. (MEMPCPY_SYMBOL): Likewise. (MEMMOVE_CHK_SYMBOL): Likewise. Replace MEMMOVE_SYMBOL with MEMMOVE_CHK_SYMBOL on __mempcpy_chk symbols. Replace MEMMOVE_SYMBOL with MEMPCPY_SYMBOL on __mempcpy symbols. Change function suffix from unaligned_2 to unaligned. Provide alias for __memcpy_chk in libc.a. Provide alias for memcpy in libc.a and ld.so. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Support __mempcpy_avx512_unaligned_erms and __mempcpy_avx512_unaligned. Use __mempcpy_avx_unaligned_erms and __mempcpy_sse2_unaligned_erms if processor has ERMS. Default to __mempcpy_sse2_unaligned. (ENTRY): Removed. (END): Likewise. (ENTRY_CHK): Likewise. (libc_hidden_builtin_def): Likewise. Don't include ../mempcpy.S. (mempcpy): New. Add a weak alias. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Support __mempcpy_chk_avx512_unaligned_erms and __mempcpy_chk_avx512_unaligned. Use __mempcpy_chk_avx_unaligned_erms and __mempcpy_chk_sse2_unaligned_erms if if processor has ERMS. Default to __mempcpy_chk_sse2_unaligned. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7c36cac64f6855f1f4ff007beaca3cb766e694ec commit 7c36cac64f6855f1f4ff007beaca3cb766e694ec Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 10:42:30 2016 -0700 X86-64: Remove the previous SSE2/AVX2 memsets Since the new SSE2/AVX2 memsets are faster than the previous ones, we can remove the previous SSE2/AVX2 memsets and replace them with the new ones. This reduces the size of libc.so by about 900 bytes. No change in IFUNC selection if SSE2 and AVX2 memsets weren't used before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset optimized with Enhanced REP STOSB will be used for processors with ERMS. The new AVX512 memset will be used for processors with AVX512 which prefer vzeroupper. [BZ #19881] * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded into ... * sysdeps/x86_64/memset.S: This. (__bzero): Removed. (__memset_tail): Likewise. (__memset_chk): Likewise. (memset): Likewise. (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't defined. (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined. * sysdeps/x86_64/multiarch/memset-avx2.S: Removed. (__memset_zero_constant_len_parameter): Check SHARED instead of PIC. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memset-avx2 and memset-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Remove __memset_chk_sse2, __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (MEMSET_CHK_SYMBOL): New. Define if not defined. (__bzero): Check VEC_SIZE == 16 instead of USE_MULTIARCH. Replace MEMSET_SYMBOL with MEMSET_CHK_SYMBOL on __memset_chk symbols. Properly check USE_MULTIARCH on __memset symbols. * sysdeps/x86_64/multiarch/memset.S (memset): Replace __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms or __memset_avx2_unaligned_erms if processor has ERMS. Support __memset_avx512_unaligned_erms and __memset_avx512_unaligned. (memset): Removed. (__memset_chk): Likewise. (MEMSET_SYMBOL): New. (libc_hidden_builtin_def): Replace __memset_sse2 with __memset_sse2_unaligned. * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace __memset_chk_sse2 and __memset_chk_avx2 with __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms. Use __memset_chk_sse2_unaligned_erms or __memset_chk_avx2_unaligned_erms if processor has ERMS. Support __memset_chk_avx512_unaligned_erms and __memset_chk_avx512_unaligned. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=69b122e1149e158c382c2b0bdd4591a4a19cb505 commit 69b122e1149e158c382c2b0bdd4591a4a19cb505 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Apr 3 17:21:45 2016 -0700 X86-64: Use non-temporal store in memmove on large data memcpy/memmove benchmarks with large data shows that there is a regression with large data on Haswell machine. non-temporal store in memmove on large data can improve performance significantly. This patch adds a threshold to use non temporal store which is 4 times of shared cache size. When size is above the threshold, non temporal store will be used. For size below 8 vector register width, we load all data into registers and store them together. Only forward and backward loops, which move 4 vector registers at a time, are used to support overlapping addresses. For forward loop, we load the last 4 vector register width of data and the first vector register width of data into vector registers before the loop and store them after the loop. For backward loop, we load the first 4 vector register width of data and the last vector register width of data into vector registers before the loop and store them after the loop. * sysdeps/x86_64/cacheinfo.c (__x86_shared_non_temporal_threshold): New. (init_cacheinfo): Set __x86_shared_non_temporal_threshold to 4 times of shared cache size. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S (PREFETCHNT): New. (VMOVNT): Likewise. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S (PREFETCHNT): Likewise. (VMOVNT): Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S (PREFETCHNT): Likewise. (VMOVNT): Likewise. (VMOVU): Changed to movups for smaller code sizes. (VMOVA): Changed to movaps for smaller code sizes. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Update comments. Rewrite to use forward and backward loops, which move 4 vector registers at a time, to support overlapping addresses and use non temporal store if size is above the threshold. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9a93bdbaff81edf67c5486c84f8098055e355abb commit 9a93bdbaff81edf67c5486c84f8098055e355abb Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Apr 5 05:21:07 2016 -0700 Force 32-bit displacement in memset-vec-unaligned-erms.S * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Force 32-bit displacement to avoid long nop between instructions. (cherry picked from commit ec0cac9a1f4094bd0db6f77c1b329e7a40eecc10) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5118e532600549ad0f56cb9b1a179b8eab70c483 commit 5118e532600549ad0f56cb9b1a179b8eab70c483 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Apr 5 05:19:05 2016 -0700 Add a comment in memset-sse2-unaligned-erms.S * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Add a comment on VMOVU and VMOVA. (cherry picked from commit 696ac774847b80cf994438739478b0c3003b5958) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=06c6d4ae6ee7e5b83fd5868bef494def01f59292 commit 06c6d4ae6ee7e5b83fd5868bef494def01f59292 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Apr 3 14:32:20 2016 -0700 Don't put SSE2/AVX/AVX512 memmove/memset in ld.so Since memmove and memset in ld.so don't use IFUNC, don't put SSE2, AVX and AVX512 memmove and memset in ld.so. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. (cherry picked from commit 5cd7af016d8587ff53b20ba259746f97edbddbf7) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a96379797a7eecc1b709cad7b68981eb698783dc commit a96379797a7eecc1b709cad7b68981eb698783dc Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Apr 3 12:38:25 2016 -0700 Fix memmove-vec-unaligned-erms.S __mempcpy_erms and __memmove_erms can't be placed between __memmove_chk and __memmove it breaks __memmove_chk. Don't check source == destination first since it is less common. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: (__mempcpy_erms, __memmove_erms): Moved before __mempcpy_chk with unaligned_erms. (__memmove_erms): Skip if source == destination. (__memmove_unaligned_erms): Don't check source == destination first. (cherry picked from commit ea2785e96fa503f3a2b5dd9f3a6ca65622b3c5f2) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=cfb059c79729b26284863334c9aa04f0a3b967b9 commit cfb059c79729b26284863334c9aa04f0a3b967b9 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Apr 1 15:08:48 2016 -0700 Remove Fast_Copy_Backward from Intel Core processors Intel Core i3, i5 and i7 processors have fast unaligned copy and copy backward is ignored. Remove Fast_Copy_Backward from Intel Core processors to avoid confusion. * sysdeps/x86/cpu-features.c (init_cpu_features): Don't set bit_arch_Fast_Copy_Backward for Intel Core proessors. (cherry picked from commit 27d3ce1467990f89126e228559dec8f84b96c60e) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=30c389be1af67c4d0716d207b6780c6169d1355f commit 30c389be1af67c4d0716d207b6780c6169d1355f Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 10:05:51 2016 -0700 Add x86-64 memset with unaligned store and rep stosb Implement x86-64 memset with unaligned store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. A single file provides 2 implementations of memset, one with rep stosb and the other without rep stosb. They share the same codes when size is between 2 times of vector register size and REP_STOSB_THRESHOLD which defaults to 2KB. Key features: 1. Use overlapping store to avoid branch. 2. For size <= 4 times of vector register size, fully unroll the loop. 3. For size > 4 times of vector register size, store 4 times of vector register size at a time. [BZ #19881] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and memset-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned, __memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned, __memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned, __memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned, __memset_sse2_unaligned_erms, __memset_erms, __memset_avx2_unaligned, __memset_avx2_unaligned_erms, __memset_avx512_unaligned_erms and __memset_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise. (cherry picked from commit 830566307f038387ca0af3fd327706a8d1a2f595) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=980d639b4ae58209843f09a29d86b0a8303b6650 commit 980d639b4ae58209843f09a29d86b0a8303b6650 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 10:04:26 2016 -0700 Add x86-64 memmove with unaligned load/store and rep movsb Implement x86-64 memmove with unaligned load/store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. When size <= 8 times of vector register size, there is no check for address overlap bewteen source and destination. Since overhead for overlap check is small when size > 8 times of vector register size, memcpy is an alias of memmove. A single file provides 2 implementations of memmove, one with rep movsb and the other without rep movsb. They share the same codes when size is between 2 times of vector register size and REP_MOVSB_THRESHOLD which is 2KB for 16-byte vector register size and scaled up by large vector register size. Key features: 1. Use overlapping load and store to avoid branch. 2. For size <= 8 times of vector register size, load all sources into registers and store them together. 3. If there is no address overlap bewteen source and destination, copy from both ends with 4 times of vector register size at a time. 4. If address of destination > address of source, backward copy 8 times of vector register size at a time. 5. Otherwise, forward copy 8 times of vector register size at a time. 6. Use rep movsb only for forward copy. Avoid slow backward rep movsb by fallbacking to backward copy 8 times of vector register size at a time. 7. Skip when address of destination == address of source. [BZ #19776] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and memmove-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memmove_chk_avx512_unaligned_2, __memmove_chk_avx512_unaligned_erms, __memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms, __memmove_chk_sse2_unaligned_2, __memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2, __memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2, __memmove_avx512_unaligned_erms, __memmove_erms, __memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms, __memcpy_chk_avx512_unaligned_2, __memcpy_chk_avx512_unaligned_erms, __memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms, __memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms, __memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms, __memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms, __memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms, __memcpy_erms, __mempcpy_chk_avx512_unaligned_2, __mempcpy_chk_avx512_unaligned_erms, __mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms, __mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms, __mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms, __mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms, __mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and __mempcpy_erms. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Likwise. (cherry picked from commit 88b57b8ed41d5ecf2e1bdfc19556f9246a665ebb) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bf2bc5e5c9d7aa8af28b299ec26b8a37352730cc commit bf2bc5e5c9d7aa8af28b299ec26b8a37352730cc Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 19:22:59 2016 -0700 Initial Enhanced REP MOVSB/STOSB (ERMS) support The newer Intel processors support Enhanced REP MOVSB/STOSB (ERMS) which has a feature bit in CPUID. This patch adds the Enhanced REP MOVSB/STOSB (ERMS) bit to x86 cpu-features. * sysdeps/x86/cpu-features.h (bit_cpu_ERMS): New. (index_cpu_ERMS): Likewise. (reg_ERMS): Likewise. (cherry picked from commit 0791f91dff9a77263fa8173b143d854cad902c6d) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7c244283ff12329b3bca9878b8edac3b3fe5c7bc commit 7c244283ff12329b3bca9878b8edac3b3fe5c7bc Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 13:15:59 2016 -0700 Make __memcpy_avx512_no_vzeroupper an alias Since x86-64 memcpy-avx512-no-vzeroupper.S implements memmove, make __memcpy_avx512_no_vzeroupper an alias of __memmove_avx512_no_vzeroupper to reduce code size of libc.so. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memcpy-avx512-no-vzeroupper. * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Renamed to ... * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: This. (MEMCPY): Don't define. (MEMCPY_CHK): Likewise. (MEMPCPY): Likewise. (MEMPCPY_CHK): Likewise. (MEMPCPY_CHK): Renamed to ... (__mempcpy_chk_avx512_no_vzeroupper): This. (MEMPCPY_CHK): Renamed to ... (__mempcpy_chk_avx512_no_vzeroupper): This. (MEMCPY_CHK): Renamed to ... (__memmove_chk_avx512_no_vzeroupper): This. (MEMCPY): Renamed to ... (__memmove_avx512_no_vzeroupper): This. (__memcpy_avx512_no_vzeroupper): New alias. (__memcpy_chk_avx512_no_vzeroupper): Likewise. (cherry picked from commit 064f01b10b57ff09cda7025f484b848c38ddd57a) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a9a14991fb2d3e69f80d25e9bbf2f6b0bcf11c3d commit a9a14991fb2d3e69f80d25e9bbf2f6b0bcf11c3d Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 13:13:36 2016 -0700 Implement x86-64 multiarch mempcpy in memcpy Implement x86-64 multiarch mempcpy in memcpy to share most of code. It reduces code size of libc.so. [BZ #18858] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove mempcpy-ssse3, mempcpy-ssse3-back, mempcpy-avx-unaligned and mempcpy-avx512-no-vzeroupper. * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3-back.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/mempcpy-avx-unaligned.S: Removed. * sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy-ssse3-back.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy-ssse3.S: Likewise. (cherry picked from commit c365e615f7429aee302f8af7bf07ae262278febb) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4fc09dabecee1b7cafdbca26ee7c63f68e53c229 commit 4fc09dabecee1b7cafdbca26ee7c63f68e53c229 Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 04:39:48 2016 -0700 [x86] Add a feature bit: Fast_Unaligned_Copy On AMD processors, memcpy optimized with unaligned SSE load is slower than emcpy optimized with aligned SSSE3 while other string functions are faster with unaligned SSE load. A feature bit, Fast_Unaligned_Copy, is added to select memcpy optimized with unaligned SSE load. [BZ #19583] * sysdeps/x86/cpu-features.c (init_cpu_features): Set Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel processors. Set Fast_Copy_Backward for AMD Excavator processors. * sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy): New. (index_arch_Fast_Unaligned_Copy): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check Fast_Unaligned_Copy instead of Fast_Unaligned_Load. (cherry picked from commit e41b395523040fcb58c7d378475720c2836d280c) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=75f2d47e459a6bf5656a938e5c63f8b581eb3ee6 commit 75f2d47e459a6bf5656a938e5c63f8b581eb3ee6 Author: Florian Weimer <fweimer@redhat.com> Date: Fri Mar 25 11:11:42 2016 +0100 tst-audit10: Fix compilation on compilers without bit_AVX512F [BZ #19860] [BZ# 19860] * sysdeps/x86_64/tst-audit10.c (avx512_enabled): Always return zero if the compiler does not provide the AVX512F bit. (cherry picked from commit f327f5b47be57bc05a4077344b381016c1bb2c11) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=96c7375cb8b6f1875d9865f2ae92ecacf5f5e6fa commit 96c7375cb8b6f1875d9865f2ae92ecacf5f5e6fa Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Mar 22 08:36:16 2016 -0700 Don't set %rcx twice before "rep movsb" * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMCPY): Don't set %rcx twice before "rep movsb". (cherry picked from commit 3c9a4cd16cbc7b79094fec68add2df66061ab5d7) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c273f613b0cc779ee33cc33d20941d271316e483 commit c273f613b0cc779ee33cc33d20941d271316e483 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Mar 22 07:46:56 2016 -0700 Set index_arch_AVX_Fast_Unaligned_Load only for Intel processors Since only Intel processors with AVX2 have fast unaligned load, we should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors. Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces and call get_common_indeces for other processors. Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading GLRO(dl_x86_cpu_features) in cpu-features.c. [BZ #19583] * sysdeps/x86/cpu-features.c (get_common_indeces): Remove inline. Check family before setting family, model and extended_model. Set AVX, AVX2, AVX512, FMA and FMA4 usable bits here. (init_cpu_features): Replace HAS_CPU_FEATURE and HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P. Set index_arch_AVX_Fast_Unaligned_Load for Intel processors with usable AVX2. Call get_common_indeces for other processors with family == NULL. * sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro. (CPU_FEATURES_ARCH_P): Likewise. (HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P. (HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P. (cherry picked from commit f781a9e96138d8839663af5e88649ab1fbed74f8) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c858d10a4e7fd682f2e7083836e4feacc2d580f4 commit c858d10a4e7fd682f2e7083836e4feacc2d580f4 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 10 05:26:46 2016 -0800 Add _arch_/_cpu_ to index_*/bit_* in x86 cpu-features.h index_* and bit_* macros are used to access cpuid and feature arrays o struct cpu_features. It is very easy to use bits and indices of cpuid array on feature array, especially in assembly codes. For example, sysdeps/i386/i686/multiarch/bcopy.S has HAS_CPU_FEATURE (Fast_Rep_String) which should be HAS_ARCH_FEATURE (Fast_Rep_String) We change index_* and bit_* to index_cpu_*/index_arch_* and bit_cpu_*/bit_arch_* so that we can catch such error at build time. [BZ #19762] * sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h (EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*. * sysdeps/x86/cpu-features.c (init_cpu_features): Likewise. * sysdeps/x86/cpu-features.h (bit_*): Renamed to ... (bit_arch_*): This for feature array. (bit_*): Renamed to ... (bit_cpu_*): This for cpu array. (index_*): Renamed to ... (index_arch_*): This for feature array. (index_*): Renamed to ... (index_cpu_*): This for cpu array. [__ASSEMBLER__] (HAS_FEATURE): Add and use field. [__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE. [__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE. [!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and bit_##name with index_cpu_##name and bit_cpu_##name. [!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and bit_##name with index_arch_##name and bit_arch_##name. (cherry picked from commit 6aa3e97e2530f9917f504eb4146af119a3f27229) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7a90b56b0c3f8e55df44957cf6de7d3c9c04cbb9 commit 7a90b56b0c3f8e55df44957cf6de7d3c9c04cbb9 Author: Roland McGrath <roland@hack.frob.com> Date: Tue Mar 8 12:31:13 2016 -0800 Fix tst-audit10 build when -mavx512f is not supported. (cherry picked from commit 3bd80c0de2f8e7ca8020d37739339636d169957e) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ba80f6ceea3a6b6f711038646f419125fe3ad39c commit ba80f6ceea3a6b6f711038646f419125fe3ad39c Author: Florian Weimer <fweimer@redhat.com> Date: Mon Mar 7 16:00:25 2016 +0100 tst-audit4, tst-audit10: Compile AVX/AVX-512 code separately [BZ #19269] This ensures that GCC will not use unsupported instructions before the run-time check to ensure support. (cherry picked from commit 3c0f7407eedb524c9114bb675cd55b903c71daaa) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b8fe596e7f750d4ee2fca14d6a3999364c02662e commit b8fe596e7f750d4ee2fca14d6a3999364c02662e Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Mar 6 16:48:11 2016 -0800 Group AVX512 functions in .text.avx512 section * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Replace .text with .text.avx512. * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Likewise. (cherry picked from commit fee9eb6200f0e44a4b684903bc47fde36d46f1a5) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e455d17680cfaebb12692547422f95ba1ed30e29 commit e455d17680cfaebb12692547422f95ba1ed30e29 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Mar 4 08:37:40 2016 -0800 x86-64: Fix memcpy IFUNC selection Chek Fast_Unaligned_Load, instead of Slow_BSF, and also check for Fast_Copy_Backward to enable __memcpy_ssse3_back. Existing selection order is updated with following selection order: 1. __memcpy_avx_unaligned if AVX_Fast_Unaligned_Load bit is set. 2. __memcpy_sse2_unaligned if Fast_Unaligned_Load bit is set. 3. __memcpy_sse2 if SSSE3 isn't available. 4. __memcpy_ssse3_back if Fast_Copy_Backward bit it set. 5. __memcpy_ssse3 [BZ #18880] * sysdeps/x86_64/multiarch/memcpy.S: Check Fast_Unaligned_Load, instead of Slow_BSF, and also check for Fast_Copy_Backward to enable __memcpy_ssse3_back. (cherry picked from commit 14a1d7cc4c4fd5ee8e4e66b777221dd32a84efe8) -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/erms/2.23 has been created at c51eab61e17e7575265f1e36bd0293e224500f52 (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c51eab61e17e7575265f1e36bd0293e224500f52 commit c51eab61e17e7575265f1e36bd0293e224500f52 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Apr 1 14:01:24 2016 -0700 X86-64: Add dummy memcopy.h and wordcopy.c Since x86-64 doesn't use memory copy functions, add dummy memcopy.h and wordcopy.c to reduce code size. It reduces the size of libc.so by about 1 KB. * sysdeps/x86_64/memcopy.h: New file. * sysdeps/x86_64/wordcopy.c: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7f9478e6530ab0ede00f705e456445aeff283560 commit 7f9478e6530ab0ede00f705e456445aeff283560 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 12:46:57 2016 -0700 X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove Since the new SSE2/AVX2 memcpy/memmove are faster than the previous ones, we can remove the previous SSE2/AVX2 memcpy/memmove and replace them with the new ones. No change in IFUNC selection if SSE2 and AVX2 memcpy/memmove weren't used before. If SSE2 or AVX2 memcpy/memmove were used, the new SSE2 or AVX2 memcpy/memmove optimized with Enhanced REP MOVSB will be used for processors with ERMS. The new AVX512 memcpy/memmove will be used for processors with AVX512 which prefer vzeroupper. Since the new SSE2 memcpy/memmove are faster than the previous default memcpy/memmove used in libc.a and ld.so, we also remove the previous default memcpy/memmove and make them the default memcpy/memmove. Together, it reduces the size of libc.so by about 6 KB and the size of ld.so by about 2 KB. [BZ #19776] * sysdeps/x86_64/memcpy.S: Make it dummy. * sysdeps/x86_64/mempcpy.S: Likewise. * sysdeps/x86_64/memmove.S: New file. * sysdeps/x86_64/memmove_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise. * sysdeps/x86_64/memmove.c: Removed. * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memcpy-sse2-unaligned, memmove-avx-unaligned, memcpy-avx-unaligned and memmove-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Replace __memmove_chk_avx512_unaligned_2 with __memmove_chk_avx512_unaligned. Remove __memmove_chk_avx_unaligned_2. Replace __memmove_chk_sse2_unaligned_2 with __memmove_chk_sse2_unaligned. Remove __memmove_chk_sse2 and __memmove_avx_unaligned_2. Replace __memmove_avx512_unaligned_2 with __memmove_avx512_unaligned. Replace __memmove_sse2_unaligned_2 with __memmove_sse2_unaligned. Remove __memmove_sse2. Replace __memcpy_chk_avx512_unaligned_2 with __memcpy_chk_avx512_unaligned. Remove __memcpy_chk_avx_unaligned_2. Replace __memcpy_chk_sse2_unaligned_2 with __memcpy_chk_sse2_unaligned. Remove __memcpy_chk_sse2. Remove __memcpy_avx_unaligned_2. Replace __memcpy_avx512_unaligned_2 with __memcpy_avx512_unaligned. Remove __memcpy_sse2_unaligned_2 and __memcpy_sse2. Replace __mempcpy_chk_avx512_unaligned_2 with __mempcpy_chk_avx512_unaligned. Remove __mempcpy_chk_avx_unaligned_2. Replace __mempcpy_chk_sse2_unaligned_2 with __mempcpy_chk_sse2_unaligned. Remove __mempcpy_chk_sse2. Replace __mempcpy_avx512_unaligned_2 with __mempcpy_avx512_unaligned. Remove __mempcpy_avx_unaligned_2. Replace __mempcpy_sse2_unaligned_2 with __mempcpy_sse2_unaligned. Remove __mempcpy_sse2. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Support __memcpy_avx512_unaligned_erms and __memcpy_avx512_unaligned. Use __memcpy_avx_unaligned_erms and __memcpy_sse2_unaligned_erms if processor has ERMS. Default to __memcpy_sse2_unaligned. (ENTRY): Removed. (END): Likewise. (ENTRY_CHK): Likewise. (libc_hidden_builtin_def): Likewise. Don't include ../memcpy.S. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Support __memcpy_chk_avx512_unaligned_erms and __memcpy_chk_avx512_unaligned. Use __memcpy_chk_avx_unaligned_erms and __memcpy_chk_sse2_unaligned_erms if if processor has ERMS. Default to __memcpy_chk_sse2_unaligned. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S Change function suffix from unaligned_2 to unaligned. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Support __mempcpy_avx512_unaligned_erms and __mempcpy_avx512_unaligned. Use __mempcpy_avx_unaligned_erms and __mempcpy_sse2_unaligned_erms if processor has ERMS. Default to __mempcpy_sse2_unaligned. (ENTRY): Removed. (END): Likewise. (ENTRY_CHK): Likewise. (libc_hidden_builtin_def): Likewise. Don't include ../mempcpy.S. (mempcpy): New. Add a weak alias. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Support __mempcpy_chk_avx512_unaligned_erms and __mempcpy_chk_avx512_unaligned. Use __mempcpy_chk_avx_unaligned_erms and __mempcpy_chk_sse2_unaligned_erms if if processor has ERMS. Default to __mempcpy_chk_sse2_unaligned. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=68248ecc51b4725e794236c495effde76d4be61c commit 68248ecc51b4725e794236c495effde76d4be61c Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 10:42:30 2016 -0700 X86-64: Remove the previous SSE2/AVX2 memsets Since the new SSE2/AVX2 memsets are faster than the previous ones, we can remove the previous SSE2/AVX2 memsets and replace them with the new ones. This reduces the size of libc.so by about 900 bytes. No change in IFUNC selection if SSE2 and AVX2 memsets weren't used before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset optimized with Enhanced REP STOSB will be used for processors with ERMS. The new AVX512 memset will be used for processors with AVX512 which prefer vzeroupper. [BZ #19881] * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded into ... * sysdeps/x86_64/memset.S: This. (__bzero): Removed. (__memset_tail): Likewise. (__memset_chk): Likewise. (memset): Likewise. (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't defined. (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined. * sysdeps/x86_64/multiarch/memset-avx2.S: Removed. (__memset_zero_constant_len_parameter): Check SHARED instead of PIC. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memset-avx2 and memset-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Remove __memset_chk_sse2, __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__bzero): Enabled. * sysdeps/x86_64/multiarch/memset.S (memset): Replace __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms or __memset_avx2_unaligned_erms if processor has ERMS. Support __memset_avx512_unaligned_erms and __memset_avx512_unaligned. (memset): Removed. (__memset_chk): Likewise. (MEMSET_SYMBOL): New. (libc_hidden_builtin_def): Replace __memset_sse2 with __memset_sse2_unaligned. * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace __memset_chk_sse2 and __memset_chk_avx2 with __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms. Use __memset_chk_sse2_unaligned_erms or __memset_chk_avx2_unaligned_erms if processor has ERMS. Support __memset_chk_avx512_unaligned_erms and __memset_chk_avx512_unaligned. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=095d851c67b7ea5edb536ead965c73fce34b2edd commit 095d851c67b7ea5edb536ead965c73fce34b2edd Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Apr 3 17:21:45 2016 -0700 X86-64: Use non-temporal store in memmove on large data memcpy/memmove benchmarks with large data shows that there is a regression with large data on Haswell machine. non-temporal store in memmove on large data can improve performance significantly. This patch adds a threshold to use non temporal store which is 4 times of shared cache size. When size is above the threshold, non temporal store will be used. For size below 8 vector register width, we load all data into registers and store them together. Only forward and backward loops, which move 4 vector registers at a time, are used to support overlapping addresses. For forward loop, we load the last 4 vector register width of data and the first vector register width of data into vector registers before the loop and store them after the loop. For backward loop, we load the first 4 vector register width of data and the last vector register width of data into vector registers before the loop and store them after the loop. * sysdeps/x86_64/cacheinfo.c (__x86_shared_non_temporal_threshold): New. (init_cacheinfo): Set __x86_shared_non_temporal_threshold to 4 times of shared cache size. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S (PREFETCHNT): New. (VMOVNT): Likewise. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S (PREFETCHNT): Likewise. (VMOVNT): Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S (PREFETCHNT): Likewise. (VMOVNT): Likewise. (VMOVU): Changed to movups for smaller code sizes. (VMOVA): Changed to movaps for smaller code sizes. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Update comments. (PREFETCH_SIZE): New. (PREFETCHED_LOAD_SIZE): Likewise. (PREFETCH_ONE_SET): Likewise. Rewrite to use forward and backward loops, which move 4 vector registers at a time, to support overlapping addresses and use non temporal store if size is above the threshold. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0932dd8b56db46dd421a4855fb5dee9de092538d commit 0932dd8b56db46dd421a4855fb5dee9de092538d Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Apr 6 10:19:16 2016 -0700 X86-64: Prepare memmove-vec-unaligned-erms.S Prepare memmove-vec-unaligned-erms.S to make the SSE2 version as the default memcpy, mempcpy and memmove. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (MEMCPY_SYMBOL): New. (MEMPCPY_SYMBOL): Likewise. (MEMMOVE_CHK_SYMBOL): Likewise. Replace MEMMOVE_SYMBOL with MEMMOVE_CHK_SYMBOL on __mempcpy_chk symbols. Replace MEMMOVE_SYMBOL with MEMPCPY_SYMBOL on __mempcpy symbols. Provide alias for __memcpy_chk in libc.a. Provide alias for memcpy in libc.a and ld.so. (cherry picked from commit a7d1c51482d15ab6c07e2ee0ae5e007067b18bfb) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=da2da79262814ba4ead3ee487549949096d8ad2d commit da2da79262814ba4ead3ee487549949096d8ad2d Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Apr 6 09:10:18 2016 -0700 X86-64: Prepare memset-vec-unaligned-erms.S Prepare memset-vec-unaligned-erms.S to make the SSE2 version as the default memset. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (MEMSET_CHK_SYMBOL): New. Define if not defined. (__bzero): Check VEC_SIZE == 16 instead of USE_MULTIARCH. Disabled fro now. Replace MEMSET_SYMBOL with MEMSET_CHK_SYMBOL on __memset_chk symbols. Properly check USE_MULTIARCH on __memset symbols. (cherry picked from commit 4af1bb06c59d24f35bf8dc55897838d926c05892) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9a93bdbaff81edf67c5486c84f8098055e355abb commit 9a93bdbaff81edf67c5486c84f8098055e355abb Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Apr 5 05:21:07 2016 -0700 Force 32-bit displacement in memset-vec-unaligned-erms.S * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Force 32-bit displacement to avoid long nop between instructions. (cherry picked from commit ec0cac9a1f4094bd0db6f77c1b329e7a40eecc10) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5118e532600549ad0f56cb9b1a179b8eab70c483 commit 5118e532600549ad0f56cb9b1a179b8eab70c483 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Apr 5 05:19:05 2016 -0700 Add a comment in memset-sse2-unaligned-erms.S * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Add a comment on VMOVU and VMOVA. (cherry picked from commit 696ac774847b80cf994438739478b0c3003b5958) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=06c6d4ae6ee7e5b83fd5868bef494def01f59292 commit 06c6d4ae6ee7e5b83fd5868bef494def01f59292 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Apr 3 14:32:20 2016 -0700 Don't put SSE2/AVX/AVX512 memmove/memset in ld.so Since memmove and memset in ld.so don't use IFUNC, don't put SSE2, AVX and AVX512 memmove and memset in ld.so. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. (cherry picked from commit 5cd7af016d8587ff53b20ba259746f97edbddbf7) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a96379797a7eecc1b709cad7b68981eb698783dc commit a96379797a7eecc1b709cad7b68981eb698783dc Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Apr 3 12:38:25 2016 -0700 Fix memmove-vec-unaligned-erms.S __mempcpy_erms and __memmove_erms can't be placed between __memmove_chk and __memmove it breaks __memmove_chk. Don't check source == destination first since it is less common. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: (__mempcpy_erms, __memmove_erms): Moved before __mempcpy_chk with unaligned_erms. (__memmove_erms): Skip if source == destination. (__memmove_unaligned_erms): Don't check source == destination first. (cherry picked from commit ea2785e96fa503f3a2b5dd9f3a6ca65622b3c5f2) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=cfb059c79729b26284863334c9aa04f0a3b967b9 commit cfb059c79729b26284863334c9aa04f0a3b967b9 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Apr 1 15:08:48 2016 -0700 Remove Fast_Copy_Backward from Intel Core processors Intel Core i3, i5 and i7 processors have fast unaligned copy and copy backward is ignored. Remove Fast_Copy_Backward from Intel Core processors to avoid confusion. * sysdeps/x86/cpu-features.c (init_cpu_features): Don't set bit_arch_Fast_Copy_Backward for Intel Core proessors. (cherry picked from commit 27d3ce1467990f89126e228559dec8f84b96c60e) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=30c389be1af67c4d0716d207b6780c6169d1355f commit 30c389be1af67c4d0716d207b6780c6169d1355f Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 10:05:51 2016 -0700 Add x86-64 memset with unaligned store and rep stosb Implement x86-64 memset with unaligned store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. A single file provides 2 implementations of memset, one with rep stosb and the other without rep stosb. They share the same codes when size is between 2 times of vector register size and REP_STOSB_THRESHOLD which defaults to 2KB. Key features: 1. Use overlapping store to avoid branch. 2. For size <= 4 times of vector register size, fully unroll the loop. 3. For size > 4 times of vector register size, store 4 times of vector register size at a time. [BZ #19881] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and memset-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned, __memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned, __memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned, __memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned, __memset_sse2_unaligned_erms, __memset_erms, __memset_avx2_unaligned, __memset_avx2_unaligned_erms, __memset_avx512_unaligned_erms and __memset_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise. (cherry picked from commit 830566307f038387ca0af3fd327706a8d1a2f595) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=980d639b4ae58209843f09a29d86b0a8303b6650 commit 980d639b4ae58209843f09a29d86b0a8303b6650 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 10:04:26 2016 -0700 Add x86-64 memmove with unaligned load/store and rep movsb Implement x86-64 memmove with unaligned load/store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. When size <= 8 times of vector register size, there is no check for address overlap bewteen source and destination. Since overhead for overlap check is small when size > 8 times of vector register size, memcpy is an alias of memmove. A single file provides 2 implementations of memmove, one with rep movsb and the other without rep movsb. They share the same codes when size is between 2 times of vector register size and REP_MOVSB_THRESHOLD which is 2KB for 16-byte vector register size and scaled up by large vector register size. Key features: 1. Use overlapping load and store to avoid branch. 2. For size <= 8 times of vector register size, load all sources into registers and store them together. 3. If there is no address overlap bewteen source and destination, copy from both ends with 4 times of vector register size at a time. 4. If address of destination > address of source, backward copy 8 times of vector register size at a time. 5. Otherwise, forward copy 8 times of vector register size at a time. 6. Use rep movsb only for forward copy. Avoid slow backward rep movsb by fallbacking to backward copy 8 times of vector register size at a time. 7. Skip when address of destination == address of source. [BZ #19776] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and memmove-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memmove_chk_avx512_unaligned_2, __memmove_chk_avx512_unaligned_erms, __memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms, __memmove_chk_sse2_unaligned_2, __memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2, __memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2, __memmove_avx512_unaligned_erms, __memmove_erms, __memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms, __memcpy_chk_avx512_unaligned_2, __memcpy_chk_avx512_unaligned_erms, __memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms, __memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms, __memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms, __memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms, __memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms, __memcpy_erms, __mempcpy_chk_avx512_unaligned_2, __mempcpy_chk_avx512_unaligned_erms, __mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms, __mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms, __mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms, __mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms, __mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and __mempcpy_erms. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Likwise. (cherry picked from commit 88b57b8ed41d5ecf2e1bdfc19556f9246a665ebb) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bf2bc5e5c9d7aa8af28b299ec26b8a37352730cc commit bf2bc5e5c9d7aa8af28b299ec26b8a37352730cc Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 19:22:59 2016 -0700 Initial Enhanced REP MOVSB/STOSB (ERMS) support The newer Intel processors support Enhanced REP MOVSB/STOSB (ERMS) which has a feature bit in CPUID. This patch adds the Enhanced REP MOVSB/STOSB (ERMS) bit to x86 cpu-features. * sysdeps/x86/cpu-features.h (bit_cpu_ERMS): New. (index_cpu_ERMS): Likewise. (reg_ERMS): Likewise. (cherry picked from commit 0791f91dff9a77263fa8173b143d854cad902c6d) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7c244283ff12329b3bca9878b8edac3b3fe5c7bc commit 7c244283ff12329b3bca9878b8edac3b3fe5c7bc Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 13:15:59 2016 -0700 Make __memcpy_avx512_no_vzeroupper an alias Since x86-64 memcpy-avx512-no-vzeroupper.S implements memmove, make __memcpy_avx512_no_vzeroupper an alias of __memmove_avx512_no_vzeroupper to reduce code size of libc.so. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memcpy-avx512-no-vzeroupper. * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Renamed to ... * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: This. (MEMCPY): Don't define. (MEMCPY_CHK): Likewise. (MEMPCPY): Likewise. (MEMPCPY_CHK): Likewise. (MEMPCPY_CHK): Renamed to ... (__mempcpy_chk_avx512_no_vzeroupper): This. (MEMPCPY_CHK): Renamed to ... (__mempcpy_chk_avx512_no_vzeroupper): This. (MEMCPY_CHK): Renamed to ... (__memmove_chk_avx512_no_vzeroupper): This. (MEMCPY): Renamed to ... (__memmove_avx512_no_vzeroupper): This. (__memcpy_avx512_no_vzeroupper): New alias. (__memcpy_chk_avx512_no_vzeroupper): Likewise. (cherry picked from commit 064f01b10b57ff09cda7025f484b848c38ddd57a) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a9a14991fb2d3e69f80d25e9bbf2f6b0bcf11c3d commit a9a14991fb2d3e69f80d25e9bbf2f6b0bcf11c3d Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 13:13:36 2016 -0700 Implement x86-64 multiarch mempcpy in memcpy Implement x86-64 multiarch mempcpy in memcpy to share most of code. It reduces code size of libc.so. [BZ #18858] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove mempcpy-ssse3, mempcpy-ssse3-back, mempcpy-avx-unaligned and mempcpy-avx512-no-vzeroupper. * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3-back.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/mempcpy-avx-unaligned.S: Removed. * sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy-ssse3-back.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy-ssse3.S: Likewise. (cherry picked from commit c365e615f7429aee302f8af7bf07ae262278febb) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4fc09dabecee1b7cafdbca26ee7c63f68e53c229 commit 4fc09dabecee1b7cafdbca26ee7c63f68e53c229 Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 04:39:48 2016 -0700 [x86] Add a feature bit: Fast_Unaligned_Copy On AMD processors, memcpy optimized with unaligned SSE load is slower than emcpy optimized with aligned SSSE3 while other string functions are faster with unaligned SSE load. A feature bit, Fast_Unaligned_Copy, is added to select memcpy optimized with unaligned SSE load. [BZ #19583] * sysdeps/x86/cpu-features.c (init_cpu_features): Set Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel processors. Set Fast_Copy_Backward for AMD Excavator processors. * sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy): New. (index_arch_Fast_Unaligned_Copy): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check Fast_Unaligned_Copy instead of Fast_Unaligned_Load. (cherry picked from commit e41b395523040fcb58c7d378475720c2836d280c) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=75f2d47e459a6bf5656a938e5c63f8b581eb3ee6 commit 75f2d47e459a6bf5656a938e5c63f8b581eb3ee6 Author: Florian Weimer <fweimer@redhat.com> Date: Fri Mar 25 11:11:42 2016 +0100 tst-audit10: Fix compilation on compilers without bit_AVX512F [BZ #19860] [BZ# 19860] * sysdeps/x86_64/tst-audit10.c (avx512_enabled): Always return zero if the compiler does not provide the AVX512F bit. (cherry picked from commit f327f5b47be57bc05a4077344b381016c1bb2c11) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=96c7375cb8b6f1875d9865f2ae92ecacf5f5e6fa commit 96c7375cb8b6f1875d9865f2ae92ecacf5f5e6fa Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Mar 22 08:36:16 2016 -0700 Don't set %rcx twice before "rep movsb" * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMCPY): Don't set %rcx twice before "rep movsb". (cherry picked from commit 3c9a4cd16cbc7b79094fec68add2df66061ab5d7) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c273f613b0cc779ee33cc33d20941d271316e483 commit c273f613b0cc779ee33cc33d20941d271316e483 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Mar 22 07:46:56 2016 -0700 Set index_arch_AVX_Fast_Unaligned_Load only for Intel processors Since only Intel processors with AVX2 have fast unaligned load, we should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors. Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces and call get_common_indeces for other processors. Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading GLRO(dl_x86_cpu_features) in cpu-features.c. [BZ #19583] * sysdeps/x86/cpu-features.c (get_common_indeces): Remove inline. Check family before setting family, model and extended_model. Set AVX, AVX2, AVX512, FMA and FMA4 usable bits here. (init_cpu_features): Replace HAS_CPU_FEATURE and HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P. Set index_arch_AVX_Fast_Unaligned_Load for Intel processors with usable AVX2. Call get_common_indeces for other processors with family == NULL. * sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro. (CPU_FEATURES_ARCH_P): Likewise. (HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P. (HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P. (cherry picked from commit f781a9e96138d8839663af5e88649ab1fbed74f8) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c858d10a4e7fd682f2e7083836e4feacc2d580f4 commit c858d10a4e7fd682f2e7083836e4feacc2d580f4 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 10 05:26:46 2016 -0800 Add _arch_/_cpu_ to index_*/bit_* in x86 cpu-features.h index_* and bit_* macros are used to access cpuid and feature arrays o struct cpu_features. It is very easy to use bits and indices of cpuid array on feature array, especially in assembly codes. For example, sysdeps/i386/i686/multiarch/bcopy.S has HAS_CPU_FEATURE (Fast_Rep_String) which should be HAS_ARCH_FEATURE (Fast_Rep_String) We change index_* and bit_* to index_cpu_*/index_arch_* and bit_cpu_*/bit_arch_* so that we can catch such error at build time. [BZ #19762] * sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h (EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*. * sysdeps/x86/cpu-features.c (init_cpu_features): Likewise. * sysdeps/x86/cpu-features.h (bit_*): Renamed to ... (bit_arch_*): This for feature array. (bit_*): Renamed to ... (bit_cpu_*): This for cpu array. (index_*): Renamed to ... (index_arch_*): This for feature array. (index_*): Renamed to ... (index_cpu_*): This for cpu array. [__ASSEMBLER__] (HAS_FEATURE): Add and use field. [__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE. [__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE. [!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and bit_##name with index_cpu_##name and bit_cpu_##name. [!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and bit_##name with index_arch_##name and bit_arch_##name. (cherry picked from commit 6aa3e97e2530f9917f504eb4146af119a3f27229) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7a90b56b0c3f8e55df44957cf6de7d3c9c04cbb9 commit 7a90b56b0c3f8e55df44957cf6de7d3c9c04cbb9 Author: Roland McGrath <roland@hack.frob.com> Date: Tue Mar 8 12:31:13 2016 -0800 Fix tst-audit10 build when -mavx512f is not supported. (cherry picked from commit 3bd80c0de2f8e7ca8020d37739339636d169957e) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ba80f6ceea3a6b6f711038646f419125fe3ad39c commit ba80f6ceea3a6b6f711038646f419125fe3ad39c Author: Florian Weimer <fweimer@redhat.com> Date: Mon Mar 7 16:00:25 2016 +0100 tst-audit4, tst-audit10: Compile AVX/AVX-512 code separately [BZ #19269] This ensures that GCC will not use unsupported instructions before the run-time check to ensure support. (cherry picked from commit 3c0f7407eedb524c9114bb675cd55b903c71daaa) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b8fe596e7f750d4ee2fca14d6a3999364c02662e commit b8fe596e7f750d4ee2fca14d6a3999364c02662e Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Mar 6 16:48:11 2016 -0800 Group AVX512 functions in .text.avx512 section * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Replace .text with .text.avx512. * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Likewise. (cherry picked from commit fee9eb6200f0e44a4b684903bc47fde36d46f1a5) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e455d17680cfaebb12692547422f95ba1ed30e29 commit e455d17680cfaebb12692547422f95ba1ed30e29 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Mar 4 08:37:40 2016 -0800 x86-64: Fix memcpy IFUNC selection Chek Fast_Unaligned_Load, instead of Slow_BSF, and also check for Fast_Copy_Backward to enable __memcpy_ssse3_back. Existing selection order is updated with following selection order: 1. __memcpy_avx_unaligned if AVX_Fast_Unaligned_Load bit is set. 2. __memcpy_sse2_unaligned if Fast_Unaligned_Load bit is set. 3. __memcpy_sse2 if SSSE3 isn't available. 4. __memcpy_ssse3_back if Fast_Copy_Backward bit it set. 5. __memcpy_ssse3 [BZ #18880] * sysdeps/x86_64/multiarch/memcpy.S: Check Fast_Unaligned_Load, instead of Slow_BSF, and also check for Fast_Copy_Backward to enable __memcpy_ssse3_back. (cherry picked from commit 14a1d7cc4c4fd5ee8e4e66b777221dd32a84efe8) -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/erms/2.23 has been created at 2a1cca399be415d6c5a556af2018e5fb726d9a37 (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=2a1cca399be415d6c5a556af2018e5fb726d9a37 commit 2a1cca399be415d6c5a556af2018e5fb726d9a37 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Apr 1 14:01:24 2016 -0700 X86-64: Add dummy memcopy.h and wordcopy.c Since x86-64 doesn't use memory copy functions, add dummy memcopy.h and wordcopy.c to reduce code size. It reduces the size of libc.so by about 1 KB. * sysdeps/x86_64/memcopy.h: New file. * sysdeps/x86_64/wordcopy.c: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b361e72f264a06e856d97cbbf1cedbf2f7dd73bf commit b361e72f264a06e856d97cbbf1cedbf2f7dd73bf Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 12:46:57 2016 -0700 X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove Since the new SSE2/AVX2 memcpy/memmove are faster than the previous ones, we can remove the previous SSE2/AVX2 memcpy/memmove and replace them with the new ones. No change in IFUNC selection if SSE2 and AVX2 memcpy/memmove weren't used before. If SSE2 or AVX2 memcpy/memmove were used, the new SSE2 or AVX2 memcpy/memmove optimized with Enhanced REP MOVSB will be used for processors with ERMS. The new AVX512 memcpy/memmove will be used for processors with AVX512 which prefer vzeroupper. Since the new SSE2 memcpy/memmove are faster than the previous default memcpy/memmove used in libc.a and ld.so, we also remove the previous default memcpy/memmove and make them the default memcpy/memmove. Together, it reduces the size of libc.so by about 6 KB and the size of ld.so by about 2 KB. [BZ #19776] * sysdeps/x86_64/memcpy.S: Make it dummy. * sysdeps/x86_64/mempcpy.S: Likewise. * sysdeps/x86_64/memmove.S: New file. * sysdeps/x86_64/memmove_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise. * sysdeps/x86_64/memmove.c: Removed. * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memcpy-sse2-unaligned, memmove-avx-unaligned, memcpy-avx-unaligned and memmove-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Replace __memmove_chk_avx512_unaligned_2 with __memmove_chk_avx512_unaligned. Remove __memmove_chk_avx_unaligned_2. Replace __memmove_chk_sse2_unaligned_2 with __memmove_chk_sse2_unaligned. Remove __memmove_chk_sse2 and __memmove_avx_unaligned_2. Replace __memmove_avx512_unaligned_2 with __memmove_avx512_unaligned. Replace __memmove_sse2_unaligned_2 with __memmove_sse2_unaligned. Remove __memmove_sse2. Replace __memcpy_chk_avx512_unaligned_2 with __memcpy_chk_avx512_unaligned. Remove __memcpy_chk_avx_unaligned_2. Replace __memcpy_chk_sse2_unaligned_2 with __memcpy_chk_sse2_unaligned. Remove __memcpy_chk_sse2. Remove __memcpy_avx_unaligned_2. Replace __memcpy_avx512_unaligned_2 with __memcpy_avx512_unaligned. Remove __memcpy_sse2_unaligned_2 and __memcpy_sse2. Replace __mempcpy_chk_avx512_unaligned_2 with __mempcpy_chk_avx512_unaligned. Remove __mempcpy_chk_avx_unaligned_2. Replace __mempcpy_chk_sse2_unaligned_2 with __mempcpy_chk_sse2_unaligned. Remove __mempcpy_chk_sse2. Replace __mempcpy_avx512_unaligned_2 with __mempcpy_avx512_unaligned. Remove __mempcpy_avx_unaligned_2. Replace __mempcpy_sse2_unaligned_2 with __mempcpy_sse2_unaligned. Remove __mempcpy_sse2. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Support __memcpy_avx512_unaligned_erms and __memcpy_avx512_unaligned. Use __memcpy_avx_unaligned_erms and __memcpy_sse2_unaligned_erms if processor has ERMS. Default to __memcpy_sse2_unaligned. (ENTRY): Removed. (END): Likewise. (ENTRY_CHK): Likewise. (libc_hidden_builtin_def): Likewise. Don't include ../memcpy.S. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Support __memcpy_chk_avx512_unaligned_erms and __memcpy_chk_avx512_unaligned. Use __memcpy_chk_avx_unaligned_erms and __memcpy_chk_sse2_unaligned_erms if if processor has ERMS. Default to __memcpy_chk_sse2_unaligned. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S Change function suffix from unaligned_2 to unaligned. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Support __mempcpy_avx512_unaligned_erms and __mempcpy_avx512_unaligned. Use __mempcpy_avx_unaligned_erms and __mempcpy_sse2_unaligned_erms if processor has ERMS. Default to __mempcpy_sse2_unaligned. (ENTRY): Removed. (END): Likewise. (ENTRY_CHK): Likewise. (libc_hidden_builtin_def): Likewise. Don't include ../mempcpy.S. (mempcpy): New. Add a weak alias. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Support __mempcpy_chk_avx512_unaligned_erms and __mempcpy_chk_avx512_unaligned. Use __mempcpy_chk_avx_unaligned_erms and __mempcpy_chk_sse2_unaligned_erms if if processor has ERMS. Default to __mempcpy_chk_sse2_unaligned. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c97c370612496379176be8e33c19dc4f80b7f01c commit c97c370612496379176be8e33c19dc4f80b7f01c Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 10:42:30 2016 -0700 X86-64: Remove the previous SSE2/AVX2 memsets Since the new SSE2/AVX2 memsets are faster than the previous ones, we can remove the previous SSE2/AVX2 memsets and replace them with the new ones. This reduces the size of libc.so by about 900 bytes. No change in IFUNC selection if SSE2 and AVX2 memsets weren't used before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset optimized with Enhanced REP STOSB will be used for processors with ERMS. The new AVX512 memset will be used for processors with AVX512 which prefer vzeroupper. [BZ #19881] * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded into ... * sysdeps/x86_64/memset.S: This. (__bzero): Removed. (__memset_tail): Likewise. (__memset_chk): Likewise. (memset): Likewise. (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't defined. (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined. * sysdeps/x86_64/multiarch/memset-avx2.S: Removed. (__memset_zero_constant_len_parameter): Check SHARED instead of PIC. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memset-avx2 and memset-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Remove __memset_chk_sse2, __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__bzero): Enabled. * sysdeps/x86_64/multiarch/memset.S (memset): Replace __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms or __memset_avx2_unaligned_erms if processor has ERMS. Support __memset_avx512_unaligned_erms and __memset_avx512_unaligned. (memset): Removed. (__memset_chk): Likewise. (MEMSET_SYMBOL): New. (libc_hidden_builtin_def): Replace __memset_sse2 with __memset_sse2_unaligned. * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace __memset_chk_sse2 and __memset_chk_avx2 with __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms. Use __memset_chk_sse2_unaligned_erms or __memset_chk_avx2_unaligned_erms if processor has ERMS. Support __memset_chk_avx512_unaligned_erms and __memset_chk_avx512_unaligned. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=121270b79236d7c5802e8d9af2d27952cb9efae9 commit 121270b79236d7c5802e8d9af2d27952cb9efae9 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Apr 3 17:21:45 2016 -0700 X86-64: Use non-temporal store in memcpy on large data The large memcpy micro benchmark in glibc shows that there is a regression with large data on Haswell machine. non-temporal store in memcpy on large data can improve performance significantly. This patch adds a threshold to use non temporal store which is 6 times of shared cache size. When size is above the threshold, non temporal store will be used. For size below 8 vector register width, we load all data into registers and store them together. Only forward and backward loops, which move 4 vector registers at a time, are used to support overlapping addresses. For forward loop, we load the last 4 vector register width of data and the first vector register width of data into vector registers before the loop and store them after the loop. For backward loop, we load the first 4 vector register width of data and the last vector register width of data into vector registers before the loop and store them after the loop. * sysdeps/x86_64/cacheinfo.c (__x86_shared_non_temporal_threshold): New. (init_cacheinfo): Set __x86_shared_non_temporal_threshold to 6 times of shared cache size. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S (VMOVNT): New. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S (VMOVNT): Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S (VMOVNT): Likewise. (VMOVU): Changed to movups for smaller code sizes. (VMOVA): Changed to movaps for smaller code sizes. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Update comments. (PREFETCH): New. (PREFETCH_SIZE): Likewise. (PREFETCHED_LOAD_SIZE): Likewise. (PREFETCH_ONE_SET): Likewise. Rewrite to use forward and backward loops, which move 4 vector registers at a time, to support overlapping addresses and use non temporal store if size is above the threshold. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0932dd8b56db46dd421a4855fb5dee9de092538d commit 0932dd8b56db46dd421a4855fb5dee9de092538d Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Apr 6 10:19:16 2016 -0700 X86-64: Prepare memmove-vec-unaligned-erms.S Prepare memmove-vec-unaligned-erms.S to make the SSE2 version as the default memcpy, mempcpy and memmove. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (MEMCPY_SYMBOL): New. (MEMPCPY_SYMBOL): Likewise. (MEMMOVE_CHK_SYMBOL): Likewise. Replace MEMMOVE_SYMBOL with MEMMOVE_CHK_SYMBOL on __mempcpy_chk symbols. Replace MEMMOVE_SYMBOL with MEMPCPY_SYMBOL on __mempcpy symbols. Provide alias for __memcpy_chk in libc.a. Provide alias for memcpy in libc.a and ld.so. (cherry picked from commit a7d1c51482d15ab6c07e2ee0ae5e007067b18bfb) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=da2da79262814ba4ead3ee487549949096d8ad2d commit da2da79262814ba4ead3ee487549949096d8ad2d Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Apr 6 09:10:18 2016 -0700 X86-64: Prepare memset-vec-unaligned-erms.S Prepare memset-vec-unaligned-erms.S to make the SSE2 version as the default memset. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (MEMSET_CHK_SYMBOL): New. Define if not defined. (__bzero): Check VEC_SIZE == 16 instead of USE_MULTIARCH. Disabled fro now. Replace MEMSET_SYMBOL with MEMSET_CHK_SYMBOL on __memset_chk symbols. Properly check USE_MULTIARCH on __memset symbols. (cherry picked from commit 4af1bb06c59d24f35bf8dc55897838d926c05892) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9a93bdbaff81edf67c5486c84f8098055e355abb commit 9a93bdbaff81edf67c5486c84f8098055e355abb Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Apr 5 05:21:07 2016 -0700 Force 32-bit displacement in memset-vec-unaligned-erms.S * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Force 32-bit displacement to avoid long nop between instructions. (cherry picked from commit ec0cac9a1f4094bd0db6f77c1b329e7a40eecc10) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5118e532600549ad0f56cb9b1a179b8eab70c483 commit 5118e532600549ad0f56cb9b1a179b8eab70c483 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Apr 5 05:19:05 2016 -0700 Add a comment in memset-sse2-unaligned-erms.S * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Add a comment on VMOVU and VMOVA. (cherry picked from commit 696ac774847b80cf994438739478b0c3003b5958) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=06c6d4ae6ee7e5b83fd5868bef494def01f59292 commit 06c6d4ae6ee7e5b83fd5868bef494def01f59292 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Apr 3 14:32:20 2016 -0700 Don't put SSE2/AVX/AVX512 memmove/memset in ld.so Since memmove and memset in ld.so don't use IFUNC, don't put SSE2, AVX and AVX512 memmove and memset in ld.so. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. (cherry picked from commit 5cd7af016d8587ff53b20ba259746f97edbddbf7) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a96379797a7eecc1b709cad7b68981eb698783dc commit a96379797a7eecc1b709cad7b68981eb698783dc Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Apr 3 12:38:25 2016 -0700 Fix memmove-vec-unaligned-erms.S __mempcpy_erms and __memmove_erms can't be placed between __memmove_chk and __memmove it breaks __memmove_chk. Don't check source == destination first since it is less common. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: (__mempcpy_erms, __memmove_erms): Moved before __mempcpy_chk with unaligned_erms. (__memmove_erms): Skip if source == destination. (__memmove_unaligned_erms): Don't check source == destination first. (cherry picked from commit ea2785e96fa503f3a2b5dd9f3a6ca65622b3c5f2) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=cfb059c79729b26284863334c9aa04f0a3b967b9 commit cfb059c79729b26284863334c9aa04f0a3b967b9 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Apr 1 15:08:48 2016 -0700 Remove Fast_Copy_Backward from Intel Core processors Intel Core i3, i5 and i7 processors have fast unaligned copy and copy backward is ignored. Remove Fast_Copy_Backward from Intel Core processors to avoid confusion. * sysdeps/x86/cpu-features.c (init_cpu_features): Don't set bit_arch_Fast_Copy_Backward for Intel Core proessors. (cherry picked from commit 27d3ce1467990f89126e228559dec8f84b96c60e) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=30c389be1af67c4d0716d207b6780c6169d1355f commit 30c389be1af67c4d0716d207b6780c6169d1355f Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 10:05:51 2016 -0700 Add x86-64 memset with unaligned store and rep stosb Implement x86-64 memset with unaligned store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. A single file provides 2 implementations of memset, one with rep stosb and the other without rep stosb. They share the same codes when size is between 2 times of vector register size and REP_STOSB_THRESHOLD which defaults to 2KB. Key features: 1. Use overlapping store to avoid branch. 2. For size <= 4 times of vector register size, fully unroll the loop. 3. For size > 4 times of vector register size, store 4 times of vector register size at a time. [BZ #19881] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and memset-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned, __memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned, __memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned, __memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned, __memset_sse2_unaligned_erms, __memset_erms, __memset_avx2_unaligned, __memset_avx2_unaligned_erms, __memset_avx512_unaligned_erms and __memset_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise. (cherry picked from commit 830566307f038387ca0af3fd327706a8d1a2f595) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=980d639b4ae58209843f09a29d86b0a8303b6650 commit 980d639b4ae58209843f09a29d86b0a8303b6650 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 10:04:26 2016 -0700 Add x86-64 memmove with unaligned load/store and rep movsb Implement x86-64 memmove with unaligned load/store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. When size <= 8 times of vector register size, there is no check for address overlap bewteen source and destination. Since overhead for overlap check is small when size > 8 times of vector register size, memcpy is an alias of memmove. A single file provides 2 implementations of memmove, one with rep movsb and the other without rep movsb. They share the same codes when size is between 2 times of vector register size and REP_MOVSB_THRESHOLD which is 2KB for 16-byte vector register size and scaled up by large vector register size. Key features: 1. Use overlapping load and store to avoid branch. 2. For size <= 8 times of vector register size, load all sources into registers and store them together. 3. If there is no address overlap bewteen source and destination, copy from both ends with 4 times of vector register size at a time. 4. If address of destination > address of source, backward copy 8 times of vector register size at a time. 5. Otherwise, forward copy 8 times of vector register size at a time. 6. Use rep movsb only for forward copy. Avoid slow backward rep movsb by fallbacking to backward copy 8 times of vector register size at a time. 7. Skip when address of destination == address of source. [BZ #19776] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and memmove-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memmove_chk_avx512_unaligned_2, __memmove_chk_avx512_unaligned_erms, __memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms, __memmove_chk_sse2_unaligned_2, __memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2, __memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2, __memmove_avx512_unaligned_erms, __memmove_erms, __memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms, __memcpy_chk_avx512_unaligned_2, __memcpy_chk_avx512_unaligned_erms, __memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms, __memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms, __memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms, __memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms, __memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms, __memcpy_erms, __mempcpy_chk_avx512_unaligned_2, __mempcpy_chk_avx512_unaligned_erms, __mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms, __mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms, __mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms, __mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms, __mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and __mempcpy_erms. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Likwise. (cherry picked from commit 88b57b8ed41d5ecf2e1bdfc19556f9246a665ebb) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bf2bc5e5c9d7aa8af28b299ec26b8a37352730cc commit bf2bc5e5c9d7aa8af28b299ec26b8a37352730cc Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 19:22:59 2016 -0700 Initial Enhanced REP MOVSB/STOSB (ERMS) support The newer Intel processors support Enhanced REP MOVSB/STOSB (ERMS) which has a feature bit in CPUID. This patch adds the Enhanced REP MOVSB/STOSB (ERMS) bit to x86 cpu-features. * sysdeps/x86/cpu-features.h (bit_cpu_ERMS): New. (index_cpu_ERMS): Likewise. (reg_ERMS): Likewise. (cherry picked from commit 0791f91dff9a77263fa8173b143d854cad902c6d) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7c244283ff12329b3bca9878b8edac3b3fe5c7bc commit 7c244283ff12329b3bca9878b8edac3b3fe5c7bc Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 13:15:59 2016 -0700 Make __memcpy_avx512_no_vzeroupper an alias Since x86-64 memcpy-avx512-no-vzeroupper.S implements memmove, make __memcpy_avx512_no_vzeroupper an alias of __memmove_avx512_no_vzeroupper to reduce code size of libc.so. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memcpy-avx512-no-vzeroupper. * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Renamed to ... * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: This. (MEMCPY): Don't define. (MEMCPY_CHK): Likewise. (MEMPCPY): Likewise. (MEMPCPY_CHK): Likewise. (MEMPCPY_CHK): Renamed to ... (__mempcpy_chk_avx512_no_vzeroupper): This. (MEMPCPY_CHK): Renamed to ... (__mempcpy_chk_avx512_no_vzeroupper): This. (MEMCPY_CHK): Renamed to ... (__memmove_chk_avx512_no_vzeroupper): This. (MEMCPY): Renamed to ... (__memmove_avx512_no_vzeroupper): This. (__memcpy_avx512_no_vzeroupper): New alias. (__memcpy_chk_avx512_no_vzeroupper): Likewise. (cherry picked from commit 064f01b10b57ff09cda7025f484b848c38ddd57a) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a9a14991fb2d3e69f80d25e9bbf2f6b0bcf11c3d commit a9a14991fb2d3e69f80d25e9bbf2f6b0bcf11c3d Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 13:13:36 2016 -0700 Implement x86-64 multiarch mempcpy in memcpy Implement x86-64 multiarch mempcpy in memcpy to share most of code. It reduces code size of libc.so. [BZ #18858] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove mempcpy-ssse3, mempcpy-ssse3-back, mempcpy-avx-unaligned and mempcpy-avx512-no-vzeroupper. * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3-back.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/mempcpy-avx-unaligned.S: Removed. * sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy-ssse3-back.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy-ssse3.S: Likewise. (cherry picked from commit c365e615f7429aee302f8af7bf07ae262278febb) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4fc09dabecee1b7cafdbca26ee7c63f68e53c229 commit 4fc09dabecee1b7cafdbca26ee7c63f68e53c229 Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 04:39:48 2016 -0700 [x86] Add a feature bit: Fast_Unaligned_Copy On AMD processors, memcpy optimized with unaligned SSE load is slower than emcpy optimized with aligned SSSE3 while other string functions are faster with unaligned SSE load. A feature bit, Fast_Unaligned_Copy, is added to select memcpy optimized with unaligned SSE load. [BZ #19583] * sysdeps/x86/cpu-features.c (init_cpu_features): Set Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel processors. Set Fast_Copy_Backward for AMD Excavator processors. * sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy): New. (index_arch_Fast_Unaligned_Copy): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check Fast_Unaligned_Copy instead of Fast_Unaligned_Load. (cherry picked from commit e41b395523040fcb58c7d378475720c2836d280c) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=75f2d47e459a6bf5656a938e5c63f8b581eb3ee6 commit 75f2d47e459a6bf5656a938e5c63f8b581eb3ee6 Author: Florian Weimer <fweimer@redhat.com> Date: Fri Mar 25 11:11:42 2016 +0100 tst-audit10: Fix compilation on compilers without bit_AVX512F [BZ #19860] [BZ# 19860] * sysdeps/x86_64/tst-audit10.c (avx512_enabled): Always return zero if the compiler does not provide the AVX512F bit. (cherry picked from commit f327f5b47be57bc05a4077344b381016c1bb2c11) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=96c7375cb8b6f1875d9865f2ae92ecacf5f5e6fa commit 96c7375cb8b6f1875d9865f2ae92ecacf5f5e6fa Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Mar 22 08:36:16 2016 -0700 Don't set %rcx twice before "rep movsb" * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMCPY): Don't set %rcx twice before "rep movsb". (cherry picked from commit 3c9a4cd16cbc7b79094fec68add2df66061ab5d7) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c273f613b0cc779ee33cc33d20941d271316e483 commit c273f613b0cc779ee33cc33d20941d271316e483 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Mar 22 07:46:56 2016 -0700 Set index_arch_AVX_Fast_Unaligned_Load only for Intel processors Since only Intel processors with AVX2 have fast unaligned load, we should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors. Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces and call get_common_indeces for other processors. Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading GLRO(dl_x86_cpu_features) in cpu-features.c. [BZ #19583] * sysdeps/x86/cpu-features.c (get_common_indeces): Remove inline. Check family before setting family, model and extended_model. Set AVX, AVX2, AVX512, FMA and FMA4 usable bits here. (init_cpu_features): Replace HAS_CPU_FEATURE and HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P. Set index_arch_AVX_Fast_Unaligned_Load for Intel processors with usable AVX2. Call get_common_indeces for other processors with family == NULL. * sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro. (CPU_FEATURES_ARCH_P): Likewise. (HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P. (HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P. (cherry picked from commit f781a9e96138d8839663af5e88649ab1fbed74f8) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c858d10a4e7fd682f2e7083836e4feacc2d580f4 commit c858d10a4e7fd682f2e7083836e4feacc2d580f4 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 10 05:26:46 2016 -0800 Add _arch_/_cpu_ to index_*/bit_* in x86 cpu-features.h index_* and bit_* macros are used to access cpuid and feature arrays o struct cpu_features. It is very easy to use bits and indices of cpuid array on feature array, especially in assembly codes. For example, sysdeps/i386/i686/multiarch/bcopy.S has HAS_CPU_FEATURE (Fast_Rep_String) which should be HAS_ARCH_FEATURE (Fast_Rep_String) We change index_* and bit_* to index_cpu_*/index_arch_* and bit_cpu_*/bit_arch_* so that we can catch such error at build time. [BZ #19762] * sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h (EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*. * sysdeps/x86/cpu-features.c (init_cpu_features): Likewise. * sysdeps/x86/cpu-features.h (bit_*): Renamed to ... (bit_arch_*): This for feature array. (bit_*): Renamed to ... (bit_cpu_*): This for cpu array. (index_*): Renamed to ... (index_arch_*): This for feature array. (index_*): Renamed to ... (index_cpu_*): This for cpu array. [__ASSEMBLER__] (HAS_FEATURE): Add and use field. [__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE. [__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE. [!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and bit_##name with index_cpu_##name and bit_cpu_##name. [!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and bit_##name with index_arch_##name and bit_arch_##name. (cherry picked from commit 6aa3e97e2530f9917f504eb4146af119a3f27229) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7a90b56b0c3f8e55df44957cf6de7d3c9c04cbb9 commit 7a90b56b0c3f8e55df44957cf6de7d3c9c04cbb9 Author: Roland McGrath <roland@hack.frob.com> Date: Tue Mar 8 12:31:13 2016 -0800 Fix tst-audit10 build when -mavx512f is not supported. (cherry picked from commit 3bd80c0de2f8e7ca8020d37739339636d169957e) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ba80f6ceea3a6b6f711038646f419125fe3ad39c commit ba80f6ceea3a6b6f711038646f419125fe3ad39c Author: Florian Weimer <fweimer@redhat.com> Date: Mon Mar 7 16:00:25 2016 +0100 tst-audit4, tst-audit10: Compile AVX/AVX-512 code separately [BZ #19269] This ensures that GCC will not use unsupported instructions before the run-time check to ensure support. (cherry picked from commit 3c0f7407eedb524c9114bb675cd55b903c71daaa) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b8fe596e7f750d4ee2fca14d6a3999364c02662e commit b8fe596e7f750d4ee2fca14d6a3999364c02662e Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Mar 6 16:48:11 2016 -0800 Group AVX512 functions in .text.avx512 section * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Replace .text with .text.avx512. * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Likewise. (cherry picked from commit fee9eb6200f0e44a4b684903bc47fde36d46f1a5) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e455d17680cfaebb12692547422f95ba1ed30e29 commit e455d17680cfaebb12692547422f95ba1ed30e29 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Mar 4 08:37:40 2016 -0800 x86-64: Fix memcpy IFUNC selection Chek Fast_Unaligned_Load, instead of Slow_BSF, and also check for Fast_Copy_Backward to enable __memcpy_ssse3_back. Existing selection order is updated with following selection order: 1. __memcpy_avx_unaligned if AVX_Fast_Unaligned_Load bit is set. 2. __memcpy_sse2_unaligned if Fast_Unaligned_Load bit is set. 3. __memcpy_sse2 if SSSE3 isn't available. 4. __memcpy_ssse3_back if Fast_Copy_Backward bit it set. 5. __memcpy_ssse3 [BZ #18880] * sysdeps/x86_64/multiarch/memcpy.S: Check Fast_Unaligned_Load, instead of Slow_BSF, and also check for Fast_Copy_Backward to enable __memcpy_ssse3_back. (cherry picked from commit 14a1d7cc4c4fd5ee8e4e66b777221dd32a84efe8) -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/erms/2.23 has been created at 9e1ddc1180ca0619d12b620b227726233a48b9bc (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9e1ddc1180ca0619d12b620b227726233a48b9bc commit 9e1ddc1180ca0619d12b620b227726233a48b9bc Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Apr 1 14:01:24 2016 -0700 X86-64: Add dummy memcopy.h and wordcopy.c Since x86-64 doesn't use memory copy functions, add dummy memcopy.h and wordcopy.c to reduce code size. It reduces the size of libc.so by about 1 KB. * sysdeps/x86_64/memcopy.h: New file. * sysdeps/x86_64/wordcopy.c: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3443d7810db1092ac70a0fde7b85732a2e00cdc3 commit 3443d7810db1092ac70a0fde7b85732a2e00cdc3 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 12:46:57 2016 -0700 X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove Since the new SSE2/AVX2 memcpy/memmove are faster than the previous ones, we can remove the previous SSE2/AVX2 memcpy/memmove and replace them with the new ones. No change in IFUNC selection if SSE2 and AVX2 memcpy/memmove weren't used before. If SSE2 or AVX2 memcpy/memmove were used, the new SSE2 or AVX2 memcpy/memmove optimized with Enhanced REP MOVSB will be used for processors with ERMS. The new AVX512 memcpy/memmove will be used for processors with AVX512 which prefer vzeroupper. Since the new SSE2 memcpy/memmove are faster than the previous default memcpy/memmove used in libc.a and ld.so, we also remove the previous default memcpy/memmove and make them the default memcpy/memmove, except that non-temporal store isn't used in ld.so. Together, it reduces the size of libc.so by about 6 KB and the size of ld.so by about 2 KB. [BZ #19776] * sysdeps/x86_64/memcpy.S: Make it dummy. * sysdeps/x86_64/mempcpy.S: Likewise. * sysdeps/x86_64/memmove.S: New file. * sysdeps/x86_64/memmove_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise. * sysdeps/x86_64/memmove.c: Removed. * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memcpy-sse2-unaligned, memmove-avx-unaligned, memcpy-avx-unaligned and memmove-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Replace __memmove_chk_avx512_unaligned_2 with __memmove_chk_avx512_unaligned. Remove __memmove_chk_avx_unaligned_2. Replace __memmove_chk_sse2_unaligned_2 with __memmove_chk_sse2_unaligned. Remove __memmove_chk_sse2 and __memmove_avx_unaligned_2. Replace __memmove_avx512_unaligned_2 with __memmove_avx512_unaligned. Replace __memmove_sse2_unaligned_2 with __memmove_sse2_unaligned. Remove __memmove_sse2. Replace __memcpy_chk_avx512_unaligned_2 with __memcpy_chk_avx512_unaligned. Remove __memcpy_chk_avx_unaligned_2. Replace __memcpy_chk_sse2_unaligned_2 with __memcpy_chk_sse2_unaligned. Remove __memcpy_chk_sse2. Remove __memcpy_avx_unaligned_2. Replace __memcpy_avx512_unaligned_2 with __memcpy_avx512_unaligned. Remove __memcpy_sse2_unaligned_2 and __memcpy_sse2. Replace __mempcpy_chk_avx512_unaligned_2 with __mempcpy_chk_avx512_unaligned. Remove __mempcpy_chk_avx_unaligned_2. Replace __mempcpy_chk_sse2_unaligned_2 with __mempcpy_chk_sse2_unaligned. Remove __mempcpy_chk_sse2. Replace __mempcpy_avx512_unaligned_2 with __mempcpy_avx512_unaligned. Remove __mempcpy_avx_unaligned_2. Replace __mempcpy_sse2_unaligned_2 with __mempcpy_sse2_unaligned. Remove __mempcpy_sse2. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Support __memcpy_avx512_unaligned_erms and __memcpy_avx512_unaligned. Use __memcpy_avx_unaligned_erms and __memcpy_sse2_unaligned_erms if processor has ERMS. Default to __memcpy_sse2_unaligned. (ENTRY): Removed. (END): Likewise. (ENTRY_CHK): Likewise. (libc_hidden_builtin_def): Likewise. Don't include ../memcpy.S. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Support __memcpy_chk_avx512_unaligned_erms and __memcpy_chk_avx512_unaligned. Use __memcpy_chk_avx_unaligned_erms and __memcpy_chk_sse2_unaligned_erms if if processor has ERMS. Default to __memcpy_chk_sse2_unaligned. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S Change function suffix from unaligned_2 to unaligned. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Support __mempcpy_avx512_unaligned_erms and __mempcpy_avx512_unaligned. Use __mempcpy_avx_unaligned_erms and __mempcpy_sse2_unaligned_erms if processor has ERMS. Default to __mempcpy_sse2_unaligned. (ENTRY): Removed. (END): Likewise. (ENTRY_CHK): Likewise. (libc_hidden_builtin_def): Likewise. Don't include ../mempcpy.S. (mempcpy): New. Add a weak alias. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Support __mempcpy_chk_avx512_unaligned_erms and __mempcpy_chk_avx512_unaligned. Use __mempcpy_chk_avx_unaligned_erms and __mempcpy_chk_sse2_unaligned_erms if if processor has ERMS. Default to __mempcpy_chk_sse2_unaligned. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1d2a372d44dc05201242d0fd5551df9c3174806c commit 1d2a372d44dc05201242d0fd5551df9c3174806c Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 10:42:30 2016 -0700 X86-64: Remove the previous SSE2/AVX2 memsets Since the new SSE2/AVX2 memsets are faster than the previous ones, we can remove the previous SSE2/AVX2 memsets and replace them with the new ones. This reduces the size of libc.so by about 900 bytes. No change in IFUNC selection if SSE2 and AVX2 memsets weren't used before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset optimized with Enhanced REP STOSB will be used for processors with ERMS. The new AVX512 memset will be used for processors with AVX512 which prefer vzeroupper. [BZ #19881] * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded into ... * sysdeps/x86_64/memset.S: This. (__bzero): Removed. (__memset_tail): Likewise. (__memset_chk): Likewise. (memset): Likewise. (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't defined. (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined. * sysdeps/x86_64/multiarch/memset-avx2.S: Removed. (__memset_zero_constant_len_parameter): Check SHARED instead of PIC. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memset-avx2 and memset-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Remove __memset_chk_sse2, __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__bzero): Enabled. * sysdeps/x86_64/multiarch/memset.S (memset): Replace __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms or __memset_avx2_unaligned_erms if processor has ERMS. Support __memset_avx512_unaligned_erms and __memset_avx512_unaligned. (memset): Removed. (__memset_chk): Likewise. (MEMSET_SYMBOL): New. (libc_hidden_builtin_def): Replace __memset_sse2 with __memset_sse2_unaligned. * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace __memset_chk_sse2 and __memset_chk_avx2 with __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms. Use __memset_chk_sse2_unaligned_erms or __memset_chk_avx2_unaligned_erms if processor has ERMS. Support __memset_chk_avx512_unaligned_erms and __memset_chk_avx512_unaligned. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9fa066d5f5ff996990869bbbad08435f02d18bb3 commit 9fa066d5f5ff996990869bbbad08435f02d18bb3 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Apr 3 17:21:45 2016 -0700 X86-64: Use non-temporal store in memcpy on large data The large memcpy micro benchmark in glibc shows that there is a regression with large data on Haswell machine. non-temporal store in memcpy on large data can improve performance significantly. This patch adds a threshold to use non temporal store which is 6 times of shared cache size. When size is above the threshold, non temporal store will be used, but avoid non-temporal store if there is overlap between destination and source since destination may be in cache when source is loaded. For size below 8 vector register width, we load all data into registers and store them together. Only forward and backward loops, which move 4 vector registers at a time, are used to support overlapping addresses. For forward loop, we load the last 4 vector register width of data and the first vector register width of data into vector registers before the loop and store them after the loop. For backward loop, we load the first 4 vector register width of data and the last vector register width of data into vector registers before the loop and store them after the loop. [BZ #19928] * sysdeps/x86_64/cacheinfo.c (__x86_shared_non_temporal_threshold): New. (init_cacheinfo): Set __x86_shared_non_temporal_threshold to 6 times of shared cache size. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S (VMOVNT): New. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S (VMOVNT): Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S (VMOVNT): Likewise. (VMOVU): Changed to movups for smaller code sizes. (VMOVA): Changed to movaps for smaller code sizes. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Update comments. (PREFETCH): New. (PREFETCH_SIZE): Likewise. (PREFETCHED_LOAD_SIZE): Likewise. (PREFETCH_ONE_SET): Likewise. Rewrite to use forward and backward loops, which move 4 vector registers at a time, to support overlapping addresses and use non temporal store if size is above the threshold and there is no overlap between destination and source. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0932dd8b56db46dd421a4855fb5dee9de092538d commit 0932dd8b56db46dd421a4855fb5dee9de092538d Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Apr 6 10:19:16 2016 -0700 X86-64: Prepare memmove-vec-unaligned-erms.S Prepare memmove-vec-unaligned-erms.S to make the SSE2 version as the default memcpy, mempcpy and memmove. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (MEMCPY_SYMBOL): New. (MEMPCPY_SYMBOL): Likewise. (MEMMOVE_CHK_SYMBOL): Likewise. Replace MEMMOVE_SYMBOL with MEMMOVE_CHK_SYMBOL on __mempcpy_chk symbols. Replace MEMMOVE_SYMBOL with MEMPCPY_SYMBOL on __mempcpy symbols. Provide alias for __memcpy_chk in libc.a. Provide alias for memcpy in libc.a and ld.so. (cherry picked from commit a7d1c51482d15ab6c07e2ee0ae5e007067b18bfb) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=da2da79262814ba4ead3ee487549949096d8ad2d commit da2da79262814ba4ead3ee487549949096d8ad2d Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Apr 6 09:10:18 2016 -0700 X86-64: Prepare memset-vec-unaligned-erms.S Prepare memset-vec-unaligned-erms.S to make the SSE2 version as the default memset. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (MEMSET_CHK_SYMBOL): New. Define if not defined. (__bzero): Check VEC_SIZE == 16 instead of USE_MULTIARCH. Disabled fro now. Replace MEMSET_SYMBOL with MEMSET_CHK_SYMBOL on __memset_chk symbols. Properly check USE_MULTIARCH on __memset symbols. (cherry picked from commit 4af1bb06c59d24f35bf8dc55897838d926c05892) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9a93bdbaff81edf67c5486c84f8098055e355abb commit 9a93bdbaff81edf67c5486c84f8098055e355abb Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Apr 5 05:21:07 2016 -0700 Force 32-bit displacement in memset-vec-unaligned-erms.S * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Force 32-bit displacement to avoid long nop between instructions. (cherry picked from commit ec0cac9a1f4094bd0db6f77c1b329e7a40eecc10) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5118e532600549ad0f56cb9b1a179b8eab70c483 commit 5118e532600549ad0f56cb9b1a179b8eab70c483 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Apr 5 05:19:05 2016 -0700 Add a comment in memset-sse2-unaligned-erms.S * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Add a comment on VMOVU and VMOVA. (cherry picked from commit 696ac774847b80cf994438739478b0c3003b5958) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=06c6d4ae6ee7e5b83fd5868bef494def01f59292 commit 06c6d4ae6ee7e5b83fd5868bef494def01f59292 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Apr 3 14:32:20 2016 -0700 Don't put SSE2/AVX/AVX512 memmove/memset in ld.so Since memmove and memset in ld.so don't use IFUNC, don't put SSE2, AVX and AVX512 memmove and memset in ld.so. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. (cherry picked from commit 5cd7af016d8587ff53b20ba259746f97edbddbf7) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a96379797a7eecc1b709cad7b68981eb698783dc commit a96379797a7eecc1b709cad7b68981eb698783dc Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Apr 3 12:38:25 2016 -0700 Fix memmove-vec-unaligned-erms.S __mempcpy_erms and __memmove_erms can't be placed between __memmove_chk and __memmove it breaks __memmove_chk. Don't check source == destination first since it is less common. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: (__mempcpy_erms, __memmove_erms): Moved before __mempcpy_chk with unaligned_erms. (__memmove_erms): Skip if source == destination. (__memmove_unaligned_erms): Don't check source == destination first. (cherry picked from commit ea2785e96fa503f3a2b5dd9f3a6ca65622b3c5f2) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=cfb059c79729b26284863334c9aa04f0a3b967b9 commit cfb059c79729b26284863334c9aa04f0a3b967b9 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Apr 1 15:08:48 2016 -0700 Remove Fast_Copy_Backward from Intel Core processors Intel Core i3, i5 and i7 processors have fast unaligned copy and copy backward is ignored. Remove Fast_Copy_Backward from Intel Core processors to avoid confusion. * sysdeps/x86/cpu-features.c (init_cpu_features): Don't set bit_arch_Fast_Copy_Backward for Intel Core proessors. (cherry picked from commit 27d3ce1467990f89126e228559dec8f84b96c60e) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=30c389be1af67c4d0716d207b6780c6169d1355f commit 30c389be1af67c4d0716d207b6780c6169d1355f Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 10:05:51 2016 -0700 Add x86-64 memset with unaligned store and rep stosb Implement x86-64 memset with unaligned store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. A single file provides 2 implementations of memset, one with rep stosb and the other without rep stosb. They share the same codes when size is between 2 times of vector register size and REP_STOSB_THRESHOLD which defaults to 2KB. Key features: 1. Use overlapping store to avoid branch. 2. For size <= 4 times of vector register size, fully unroll the loop. 3. For size > 4 times of vector register size, store 4 times of vector register size at a time. [BZ #19881] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and memset-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned, __memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned, __memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned, __memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned, __memset_sse2_unaligned_erms, __memset_erms, __memset_avx2_unaligned, __memset_avx2_unaligned_erms, __memset_avx512_unaligned_erms and __memset_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise. (cherry picked from commit 830566307f038387ca0af3fd327706a8d1a2f595) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=980d639b4ae58209843f09a29d86b0a8303b6650 commit 980d639b4ae58209843f09a29d86b0a8303b6650 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 31 10:04:26 2016 -0700 Add x86-64 memmove with unaligned load/store and rep movsb Implement x86-64 memmove with unaligned load/store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. When size <= 8 times of vector register size, there is no check for address overlap bewteen source and destination. Since overhead for overlap check is small when size > 8 times of vector register size, memcpy is an alias of memmove. A single file provides 2 implementations of memmove, one with rep movsb and the other without rep movsb. They share the same codes when size is between 2 times of vector register size and REP_MOVSB_THRESHOLD which is 2KB for 16-byte vector register size and scaled up by large vector register size. Key features: 1. Use overlapping load and store to avoid branch. 2. For size <= 8 times of vector register size, load all sources into registers and store them together. 3. If there is no address overlap bewteen source and destination, copy from both ends with 4 times of vector register size at a time. 4. If address of destination > address of source, backward copy 8 times of vector register size at a time. 5. Otherwise, forward copy 8 times of vector register size at a time. 6. Use rep movsb only for forward copy. Avoid slow backward rep movsb by fallbacking to backward copy 8 times of vector register size at a time. 7. Skip when address of destination == address of source. [BZ #19776] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and memmove-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memmove_chk_avx512_unaligned_2, __memmove_chk_avx512_unaligned_erms, __memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms, __memmove_chk_sse2_unaligned_2, __memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2, __memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2, __memmove_avx512_unaligned_erms, __memmove_erms, __memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms, __memcpy_chk_avx512_unaligned_2, __memcpy_chk_avx512_unaligned_erms, __memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms, __memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms, __memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms, __memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms, __memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms, __memcpy_erms, __mempcpy_chk_avx512_unaligned_2, __mempcpy_chk_avx512_unaligned_erms, __mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms, __mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms, __mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms, __mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms, __mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and __mempcpy_erms. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Likwise. (cherry picked from commit 88b57b8ed41d5ecf2e1bdfc19556f9246a665ebb) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bf2bc5e5c9d7aa8af28b299ec26b8a37352730cc commit bf2bc5e5c9d7aa8af28b299ec26b8a37352730cc Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 19:22:59 2016 -0700 Initial Enhanced REP MOVSB/STOSB (ERMS) support The newer Intel processors support Enhanced REP MOVSB/STOSB (ERMS) which has a feature bit in CPUID. This patch adds the Enhanced REP MOVSB/STOSB (ERMS) bit to x86 cpu-features. * sysdeps/x86/cpu-features.h (bit_cpu_ERMS): New. (index_cpu_ERMS): Likewise. (reg_ERMS): Likewise. (cherry picked from commit 0791f91dff9a77263fa8173b143d854cad902c6d) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7c244283ff12329b3bca9878b8edac3b3fe5c7bc commit 7c244283ff12329b3bca9878b8edac3b3fe5c7bc Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 13:15:59 2016 -0700 Make __memcpy_avx512_no_vzeroupper an alias Since x86-64 memcpy-avx512-no-vzeroupper.S implements memmove, make __memcpy_avx512_no_vzeroupper an alias of __memmove_avx512_no_vzeroupper to reduce code size of libc.so. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memcpy-avx512-no-vzeroupper. * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Renamed to ... * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: This. (MEMCPY): Don't define. (MEMCPY_CHK): Likewise. (MEMPCPY): Likewise. (MEMPCPY_CHK): Likewise. (MEMPCPY_CHK): Renamed to ... (__mempcpy_chk_avx512_no_vzeroupper): This. (MEMPCPY_CHK): Renamed to ... (__mempcpy_chk_avx512_no_vzeroupper): This. (MEMCPY_CHK): Renamed to ... (__memmove_chk_avx512_no_vzeroupper): This. (MEMCPY): Renamed to ... (__memmove_avx512_no_vzeroupper): This. (__memcpy_avx512_no_vzeroupper): New alias. (__memcpy_chk_avx512_no_vzeroupper): Likewise. (cherry picked from commit 064f01b10b57ff09cda7025f484b848c38ddd57a) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a9a14991fb2d3e69f80d25e9bbf2f6b0bcf11c3d commit a9a14991fb2d3e69f80d25e9bbf2f6b0bcf11c3d Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 13:13:36 2016 -0700 Implement x86-64 multiarch mempcpy in memcpy Implement x86-64 multiarch mempcpy in memcpy to share most of code. It reduces code size of libc.so. [BZ #18858] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove mempcpy-ssse3, mempcpy-ssse3-back, mempcpy-avx-unaligned and mempcpy-avx512-no-vzeroupper. * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3-back.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/mempcpy-avx-unaligned.S: Removed. * sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy-ssse3-back.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy-ssse3.S: Likewise. (cherry picked from commit c365e615f7429aee302f8af7bf07ae262278febb) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4fc09dabecee1b7cafdbca26ee7c63f68e53c229 commit 4fc09dabecee1b7cafdbca26ee7c63f68e53c229 Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 28 04:39:48 2016 -0700 [x86] Add a feature bit: Fast_Unaligned_Copy On AMD processors, memcpy optimized with unaligned SSE load is slower than emcpy optimized with aligned SSSE3 while other string functions are faster with unaligned SSE load. A feature bit, Fast_Unaligned_Copy, is added to select memcpy optimized with unaligned SSE load. [BZ #19583] * sysdeps/x86/cpu-features.c (init_cpu_features): Set Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel processors. Set Fast_Copy_Backward for AMD Excavator processors. * sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy): New. (index_arch_Fast_Unaligned_Copy): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check Fast_Unaligned_Copy instead of Fast_Unaligned_Load. (cherry picked from commit e41b395523040fcb58c7d378475720c2836d280c) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=75f2d47e459a6bf5656a938e5c63f8b581eb3ee6 commit 75f2d47e459a6bf5656a938e5c63f8b581eb3ee6 Author: Florian Weimer <fweimer@redhat.com> Date: Fri Mar 25 11:11:42 2016 +0100 tst-audit10: Fix compilation on compilers without bit_AVX512F [BZ #19860] [BZ# 19860] * sysdeps/x86_64/tst-audit10.c (avx512_enabled): Always return zero if the compiler does not provide the AVX512F bit. (cherry picked from commit f327f5b47be57bc05a4077344b381016c1bb2c11) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=96c7375cb8b6f1875d9865f2ae92ecacf5f5e6fa commit 96c7375cb8b6f1875d9865f2ae92ecacf5f5e6fa Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Mar 22 08:36:16 2016 -0700 Don't set %rcx twice before "rep movsb" * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMCPY): Don't set %rcx twice before "rep movsb". (cherry picked from commit 3c9a4cd16cbc7b79094fec68add2df66061ab5d7) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c273f613b0cc779ee33cc33d20941d271316e483 commit c273f613b0cc779ee33cc33d20941d271316e483 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Mar 22 07:46:56 2016 -0700 Set index_arch_AVX_Fast_Unaligned_Load only for Intel processors Since only Intel processors with AVX2 have fast unaligned load, we should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors. Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces and call get_common_indeces for other processors. Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading GLRO(dl_x86_cpu_features) in cpu-features.c. [BZ #19583] * sysdeps/x86/cpu-features.c (get_common_indeces): Remove inline. Check family before setting family, model and extended_model. Set AVX, AVX2, AVX512, FMA and FMA4 usable bits here. (init_cpu_features): Replace HAS_CPU_FEATURE and HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P. Set index_arch_AVX_Fast_Unaligned_Load for Intel processors with usable AVX2. Call get_common_indeces for other processors with family == NULL. * sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro. (CPU_FEATURES_ARCH_P): Likewise. (HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P. (HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P. (cherry picked from commit f781a9e96138d8839663af5e88649ab1fbed74f8) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c858d10a4e7fd682f2e7083836e4feacc2d580f4 commit c858d10a4e7fd682f2e7083836e4feacc2d580f4 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 10 05:26:46 2016 -0800 Add _arch_/_cpu_ to index_*/bit_* in x86 cpu-features.h index_* and bit_* macros are used to access cpuid and feature arrays o struct cpu_features. It is very easy to use bits and indices of cpuid array on feature array, especially in assembly codes. For example, sysdeps/i386/i686/multiarch/bcopy.S has HAS_CPU_FEATURE (Fast_Rep_String) which should be HAS_ARCH_FEATURE (Fast_Rep_String) We change index_* and bit_* to index_cpu_*/index_arch_* and bit_cpu_*/bit_arch_* so that we can catch such error at build time. [BZ #19762] * sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h (EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*. * sysdeps/x86/cpu-features.c (init_cpu_features): Likewise. * sysdeps/x86/cpu-features.h (bit_*): Renamed to ... (bit_arch_*): This for feature array. (bit_*): Renamed to ... (bit_cpu_*): This for cpu array. (index_*): Renamed to ... (index_arch_*): This for feature array. (index_*): Renamed to ... (index_cpu_*): This for cpu array. [__ASSEMBLER__] (HAS_FEATURE): Add and use field. [__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE. [__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE. [!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and bit_##name with index_cpu_##name and bit_cpu_##name. [!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and bit_##name with index_arch_##name and bit_arch_##name. (cherry picked from commit 6aa3e97e2530f9917f504eb4146af119a3f27229) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7a90b56b0c3f8e55df44957cf6de7d3c9c04cbb9 commit 7a90b56b0c3f8e55df44957cf6de7d3c9c04cbb9 Author: Roland McGrath <roland@hack.frob.com> Date: Tue Mar 8 12:31:13 2016 -0800 Fix tst-audit10 build when -mavx512f is not supported. (cherry picked from commit 3bd80c0de2f8e7ca8020d37739339636d169957e) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ba80f6ceea3a6b6f711038646f419125fe3ad39c commit ba80f6ceea3a6b6f711038646f419125fe3ad39c Author: Florian Weimer <fweimer@redhat.com> Date: Mon Mar 7 16:00:25 2016 +0100 tst-audit4, tst-audit10: Compile AVX/AVX-512 code separately [BZ #19269] This ensures that GCC will not use unsupported instructions before the run-time check to ensure support. (cherry picked from commit 3c0f7407eedb524c9114bb675cd55b903c71daaa) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b8fe596e7f750d4ee2fca14d6a3999364c02662e commit b8fe596e7f750d4ee2fca14d6a3999364c02662e Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Mar 6 16:48:11 2016 -0800 Group AVX512 functions in .text.avx512 section * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Replace .text with .text.avx512. * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Likewise. (cherry picked from commit fee9eb6200f0e44a4b684903bc47fde36d46f1a5) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e455d17680cfaebb12692547422f95ba1ed30e29 commit e455d17680cfaebb12692547422f95ba1ed30e29 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Mar 4 08:37:40 2016 -0800 x86-64: Fix memcpy IFUNC selection Chek Fast_Unaligned_Load, instead of Slow_BSF, and also check for Fast_Copy_Backward to enable __memcpy_ssse3_back. Existing selection order is updated with following selection order: 1. __memcpy_avx_unaligned if AVX_Fast_Unaligned_Load bit is set. 2. __memcpy_sse2_unaligned if Fast_Unaligned_Load bit is set. 3. __memcpy_sse2 if SSSE3 isn't available. 4. __memcpy_ssse3_back if Fast_Copy_Backward bit it set. 5. __memcpy_ssse3 [BZ #18880] * sysdeps/x86_64/multiarch/memcpy.S: Check Fast_Unaligned_Load, instead of Slow_BSF, and also check for Fast_Copy_Backward to enable __memcpy_ssse3_back. (cherry picked from commit 14a1d7cc4c4fd5ee8e4e66b777221dd32a84efe8) -----------------------------------------------------------------------
No answer to question in comment 6, so I assume this is fixed in 2.24.
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, release/2.23/master has been updated via 4cf055a2a331b7361622dc9ac8993b59c6f0ef59 (commit) via d603d94994a1d326ebc9e93c8be892acc834a114 (commit) via 7fa9775594b1592dfcdad5bc32ea449882ca9d9a (commit) from 075b2665b159491fdd17f5aee90d47fa7388ed6f (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4cf055a2a331b7361622dc9ac8993b59c6f0ef59 commit 4cf055a2a331b7361622dc9ac8993b59c6f0ef59 Author: Florian Weimer <fweimer@redhat.com> Date: Fri Mar 25 11:11:42 2016 +0100 tst-audit10: Fix compilation on compilers without bit_AVX512F [BZ #19860] [BZ# 19860] * sysdeps/x86_64/tst-audit10.c (avx512_enabled): Always return zero if the compiler does not provide the AVX512F bit. (cherry picked from commit f327f5b47be57bc05a4077344b381016c1bb2c11) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d603d94994a1d326ebc9e93c8be892acc834a114 commit d603d94994a1d326ebc9e93c8be892acc834a114 Author: Roland McGrath <roland@hack.frob.com> Date: Tue Mar 8 12:31:13 2016 -0800 Fix tst-audit10 build when -mavx512f is not supported. (cherry picked from commit 3bd80c0de2f8e7ca8020d37739339636d169957e) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7fa9775594b1592dfcdad5bc32ea449882ca9d9a commit 7fa9775594b1592dfcdad5bc32ea449882ca9d9a Author: Florian Weimer <fweimer@redhat.com> Date: Mon Mar 7 16:00:25 2016 +0100 tst-audit4, tst-audit10: Compile AVX/AVX-512 code separately [BZ #19269] This ensures that GCC will not use unsupported instructions before the run-time check to ensure support. (cherry picked from commit 3c0f7407eedb524c9114bb675cd55b903c71daaa) ----------------------------------------------------------------------- Summary of changes: ChangeLog | 27 +++++++++++ sysdeps/x86_64/Makefile | 8 ++-- .../x86_64/tst-audit10-aux.c | 46 ++++++++----------- sysdeps/x86_64/tst-audit10.c | 38 +++++----------- .../{generic/dl-irel.h => x86_64/tst-audit4-aux.c} | 28 ++++++++---- sysdeps/x86_64/tst-audit4.c | 45 +++++++++---------- 6 files changed, 101 insertions(+), 91 deletions(-) copy malloc/tst-malloc-usable.c => sysdeps/x86_64/tst-audit10-aux.c (58%) copy sysdeps/{generic/dl-irel.h => x86_64/tst-audit4-aux.c} (58%)
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, master has been updated via 5fa268239b46e127f941c3510ad200ce5ef8df45 (commit) from 0cdaef4dac5a885af9848e158e77cc347ee781bb (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5fa268239b46e127f941c3510ad200ce5ef8df45 commit 5fa268239b46e127f941c3510ad200ce5ef8df45 Author: Stefan Liebler <stli@linux.vnet.ibm.com> Date: Tue Jun 28 12:23:35 2016 +0200 S390: Fix relocation of _nl_current_LC_CATETORY_used in static build. [BZ #19860] With shared libc, all locale categories are always loaded. For static libc they aren't, but there exist a weak _nl_current_LC_CATEGORY_used symbol for each category. If the category is used, the locale/lc-CATEGORY.o is linked in where _NL_CURRENT_DEFINE (LC_CATEGORY) defines and sets the _nl_current_LC_CATEGORY_used symbol to one. As reported by Marcin "Bug 18960 - s390: _nl_locale_subfreeres uses larl opcode on misaligned symbol" (https://sourceware.org/bugzilla/show_bug.cgi?id=18960) In function _nl_locale_subfreeres (locale/setlocale.c) for each category a check - &_nl_current_LC_CATEGORY_used != 0 - decides whether the category is used or not. There is also a second usage with the same mechanism in function __uselocale (locale/uselocale.c). On s390 a larl instruction with R_390_PC32DBL relocation is used to get the address of _nl_current_LC_CATEGORY_used symbols. As larl loads the address relative in halfwords and the code is always 2-byte aligned, larl can only load even addresses. At the end, the relocated address is always zero and never one. Marcins patch (see bugzilla) uses the following declaration in locale/setlocale.c: extern char _nl_current_##category##_used __attribute__((__aligned__(1))); In function _nl_locale_subfreeres all categories are checked and therefore gcc is now building an array of addresses in rodata section with an R_390_64 relocation for every address. This array is loaded with larl instruction and each address is accessed by index. This fixes only the usage in _nl_locale_subfreeres. Each user has to add the alignment attribute. This patch set the _nl_current_LC_CATEGORY_used symbols to two instead of one. This way gcc can use larl instruction and the check against zero works on every usage. ChangeLog: [BZ #19860] * locale/localeinfo.h (_NL_CURRENT_DEFINE): Set _nl_current_LC_CATEGORY_used to two instead of one. ----------------------------------------------------------------------- Summary of changes: ChangeLog | 6 ++++++ locale/localeinfo.h | 7 +++++-- 2 files changed, 11 insertions(+), 2 deletions(-)
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The annotated tag, glibc-2.24 has been created at beb0f59498c3e0337df298f9d7a3f8f77eb39842 (tag) tagging fdfc9260b61d3d72541f18104d24c7bcb0ce5ca2 (commit) replaces glibc-2.23 tagged by Carlos O'Donell on Mon Aug 1 22:46:26 2016 -0400 - Log ----------------------------------------------------------------- The GNU C Library ================= The GNU C Library version 2.24 is now available. The GNU C Library is used as *the* C library in the GNU system and in GNU/Linux systems, as well as many other systems that use Linux as the kernel. The GNU C Library is primarily designed to be a portable and high performance C library. It follows all relevant standards including ISO C11 and POSIX.1-2008. It is also internationalized and has one of the most complete internationalization interfaces known. The GNU C Library webpage is at http://www.gnu.org/software/libc/ Packages for the 2.24 release may be downloaded from: http://ftpmirror.gnu.org/libc/ http://ftp.gnu.org/gnu/libc/ The mirror list is at http://www.gnu.org/order/ftp.html NEWS for version 2.24 ===================== * The minimum Linux kernel version that this version of the GNU C Library can be used with is 3.2, except on i[4567]86 and x86_64, where Linux kernel version 2.6.32 or later suffices (on architectures that already required kernel versions more recent than 3.2, those requirements remain unchanged). Linux 3.2 or later kernel headers are required on all architectures. * The pap_AN locale has been deleted. This has been deprecated for a long time. It has been replaced by pap_AW & pap_CW, both of which have long been included in previous releases. * The readdir_r and readdir64_r functions have been deprecated. It is recommended to use readdir and readdir64 instead. * The type “union wait” has been removed. It was deprecated in the early 1990s and never part of POSIX. Application code should use the int type instead of “union wait”. * A new NSS action is added to facilitate large distributed system administration. The action, MERGE, allows remote user stores like LDAP to be merged into local user stores like /etc/groups in order to provide easy to use, updated, and managed sets of merged credentials. The new action can be used by configuring it in /etc/nsswitch.conf: group: files [SUCCESS=merge] nis Implemented by Stephen Gallagher (Red Hat). * The deprecated __malloc_initialize_hook variable has been removed from the API. * The long unused localedef --old-style option has been removed. It hasn't done anything in over 16 years. Scripts using this option can safely drop it. * nextupl, nextup, nextupf, nextdownl, nextdown and nextdownf are added to libm. They are defined by TS 18661 and IEEE754-2008. The nextup functions return the next representable value in the direction of positive infinity and the nextdown functions return the next representable value in the direction of negative infinity. These are currently enabled as GNU extensions. Security related changes: * An unnecessary stack copy in _nss_dns_getnetbyname_r was removed. It could result in a stack overflow when getnetbyname was called with an overly long name. (CVE-2016-3075) * Previously, getaddrinfo copied large amounts of address data to the stack, even after the fix for CVE-2013-4458 has been applied, potentially resulting in a stack overflow. getaddrinfo now uses a heap allocation instead. Reported by Michael Petlan. (CVE-2016-3706) * The glob function suffered from a stack-based buffer overflow when it was called with the GLOB_ALTDIRFUNC flag and encountered a long file name. Reported by Alexander Cherepanov. (CVE-2016-1234) * The Sun RPC UDP client could exhaust all available stack space when flooded with crafted ICMP and UDP messages. Reported by Aldy Hernandez' alloca plugin for GCC. (CVE-2016-4429) * The IPv6 name server management code in libresolv could result in a memory leak for each thread which is created, performs a failing naming lookup, and exits. Over time, this could result in a denial of service due to memory exhaustion. Reported by Matthias Schiffer. (CVE-2016-5417) The following bugs are resolved with this release: [1170] localedata: ne_NP: update Nepali locale definition file [3629] manual: stpcpy description in string.texi refers to MS-DOG instead of MS-DOS. [6527] malloc: [powerpc] Malloc alignment insufficient for PowerPC [6796] math: fdim() does not set errno on overflow [10354] libc: posix_spawn should use vfork() in more cases than presently [11213] localedata: localedata: add copyright disclaimer to locale files [12143] localedata: chr_US: new Cherokee locale [12450] localedata: sgs_LT: new locale [12676] localedata: ln_CD: new locale [13237] localedata: LC_ADDRESS.country_name: update all locales w/latest CLDR data [13304] math: fma, fmaf, fmal produce wrong results [14259] build: --localedir arg to configure is ignored [14499] nptl: Does posix_spawn invoke atfork handlers / use vfork? [14750] libc: Race condition in posix_spawn vfork usage vs signal handlers [14934] localedata: es_CL: wrong first weekday chilean locale [15262] localedata: LC_MESSAGES.yesexpr/noexpr: inconsistent use of romanisation [15263] localedata: LC_MESSAGES.yesexpr/noexpr: inconsistent use of 1/0 and +/- [15264] localedata: LC_MESSAGES.yesstr/nostr: lacking in many locales [15368] nptl: raise() is not async-signal-safe [15479] math: ceil, floor, round and trunc raise inexact exception [15578] localedata: kk_KZ: various updates [16003] localedata: pap_AN: punt old locale [16137] localedata: iw_IL: punt old locale [16190] localedata: eo: new esperanto locale [16374] localedata: lv_LV: change currency symbol in LC_MONETARY to euro [16742] malloc: race condition: pthread_atfork() called before first malloc() results in unexpected locking behaviour/deadlocks [16975] localedata: LC_MESSAGES.yesexpr/noexpr: revisit capitalization in all locales [16983] localedata: postal_fmt does not allow %l and %n modifiers [17565] localedata: pt_PT: wrong (work-)week start [17899] math: [powerpc] floorl returns negative zero with FE_DOWNWARD [17950] build: Build fails with -msse [18205] localedata: be_BY*: wrong first_weekday and first_workday [18433] libc: posix_spawn does not return correctly upon failure to execute [18453] localedata: charmaps/IBM875: incorrect codes [18712] string: bits/string2.h incompatible with -O2 -Werror=packed -Wsystem-headers [18896] localedata: he_IL: improvements for currency [18911] localedata: ro_RO: Correcting week day name for "Tuesday" in Romanian locale data [18960] locale: s390: _nl_locale_subfreeres uses larl opcode on misaligned symbol [19056] libc: Deprecate readdir_r [19133] localedata: pt_*: days & months should be lowercase in Portuguese language [19198] localedata: nl_NL: small improvements for Dutch locales [19257] network: Per-thread memory leak in __res_vinit with IPv6 nameservers (CVE-2016-5417) [19269] build: tst-audit4 and tst-audit10 failures with gcc-6 on non avx machine [19400] locale: Language missing in "iso-639.def", trivial fix in description [19431] malloc: Deadlock between fflush, getdelim, and fork [19505] libc: Incorrect file descriptor validity checks in posix_spawn_file_actions_add{open,close,dup2} [19509] dynamic-link: dlsym, dlvsym do not report errors through dlerror when using RTLD_NEXT [19512] locale: Stale `#ifndef HAVE_BUILTIN_EXPECT' in `intl/{gettextP,loadinfo}.h' [19534] libc: execle, execlp may use malloc [19568] localedata: *_CH: Swiss locales have inconsistent start of week [19573] network: res_nclose and __res_maybe_init disagree about name server initialization, breaking Hesiod [19575] localedata: Status of GB18030 tables [19581] localedata: sr_* date_fmt string contains additional newline [19583] string: SSSE3_Fast_Copy_Backward flag needs to be enabled for AMD Excavator core [19592] math: [ldbl-128ibm] ceill incorrect in non-default rounding modes [19593] math: [ldbl-128ibm] truncl incorrect in non-default rounding modes [19594] math: [ldbl-128ibm] roundl incorrect in non-default rounding modes [19595] math: [ldbl-128ibm] fmodl incorrect for results in subnormal double range [19602] math: [ldbl-128ibm] fmodl handling of equal arguments with low part zero incorrect [19603] math: [ldbl-128ibm] remainderl, remquol incorrect sign handling in equality tests [19610] dynamic-link: ldconfig -X removes stale symbolic links [19613] libc: s390x (64 bit) macro expansion WCOREDUMP and others [19633] locale: strfmon_l applies global locale to number formatting [19642] network: Memory leak in getnameinfo [19648] libc: test-skeleton.c: Do not set RLIMIT_DATA [19653] libc: Potential for NULL pointer dereference (CWE-476) in glibc-2.22 [19654] math: [x86_64] Need testcase for BZ #19590 fix [19671] localedata: Missing Sanity Check for malloc() in 'tst-fmon.c' & 'tst-numeric.c' [19674] math: [ldbl-128ibm] powl incorrect overflow handling [19677] math: [ldbl-128ibm] remainderl equality test incorrect for zero low part [19678] math: [ldbl-128ibm] nextafterl, nexttowardl incorrect sign of zero result [19679] dynamic-link: gcc-4.9.3 C++ exception handling broken due to unaligned stack [19726] locale: Converting UCS4LE to INTERNAL with iconv() does not update pointers and lengths in error-case. [19727] locale: Converting from/to UTF-xx with iconv() does not always report errors on UTF-16 surrogates values. [19755] nscd: nscd assertion failure in gc [19758] dynamic-link: Typo in EXTRA_LD_ENVVARS for x86-64 [19759] libc: mempcpy shouldn't be inlined [19762] dynamic-link: HAS_CPU_FEATURE/HAS_ARCH_FEATURE are easy to misuse [19765] libc: s390 needs an optimized mempcpy [19779] glob: glob: buffer overflow with GLOB_ALTDIRFUNC due to incorrect NAME_MAX limit assumption (CVE-2016-1234) [19783] build: benchtests don't support --enable-hardcoded-path-in-tests [19787] network: Missing and incorrect truncation checks in getnameinfo [19790] math: [ldbl-128ibm] nearbyintl incorrect in non-default rounding modes [19791] network: Assertion failure in res_query.c with un-connectable name server addresses [19792] libc: MIPS: backtrace yields infinite backtrace with makecontext [19822] math: libm.so install clobbers old version [19825] network: resolv: send_vc can return uninitialized data in second response to getaddrinfo [19830] network: nss_dns: should check RDATA length against buffer length [19831] network: nss_dns: getaddrinfo returns uninitialized data when confronted with A/AAAA records of invalid size [19837] nss: nss_db: No retries for some long lines with a larger buffer [19848] math: powl(10,n) for n=-4,-5,-6,-7 is off by more than 1 ULP [19853] stdio: Printing IBM long double in decimal with high precision is sometimes incorrect [19860] build: x86_64: compile errors for tst-audit10 and tst-auditmod10b [19861] nptl: libpthread IFUNC resolver for fork can lead to crash [19862] network: resolv, nss_dns: Remove remaining logging of unexpected record types [19865] network: Assertion failure or memory leak in _nss_dns_getcanonname_r [19868] network: nss_dns: netent code does not skip over non-PTR records [19879] network: nss_dns: Stack overflow in getnetbyname implementation (CVE-2016-3075) [19881] string: Improve x86-64 memset [19907] string: Incorrect memcpy tests [19916] dynamic-link: S390: fprs/vrs are not saved/restored while resolving symbols [19925] libc: termios.h XCASE namespace [19928] string: memmove-vec-unaligned-erms.S is slow with large data size [19929] libc: limits.h NL_NMAX namespace [19931] stdio: Memory leak in vfprintf [19957] libc: clone(CLONE_VM) access invalid parent memory [19963] localedata: en_IL: New locale [19989] stdio: stdio.h cuserid namespace [19994] network: getaddrinfo does not restore RES_USE_INET6 flag in gethosts [19996] locale: langinfo.h nl_langinfo_l namespace [20005] stdio: fflush on a file opened with fmemopen resets position to 0 [20010] network: getaddrinfo: Stack overflow in hostent translation (CVE-2016-3706) [20012] stdio: libio: fmemopen append mode failure [20014] stdio: stdio.h namespace for pre-threads POSIX [20017] network: resolv: Use gmtime_r instead of gmtime in p_secstodate [20023] libc: fcntl.h timespec namespace [20024] math: [x86_64] vectorized sincos trashes the stack [20031] network: nss_hesiod: Heap overflow in get_txt_records [20041] time: sys/time.h timespec namespace [20043] libc: unistd.h missing cuserid for UNIX98 and before [20044] libc: unistd.h missing pthread_atfork for UNIX98 [20051] libc: ttyslot in wrong header under wrong conditions [20054] libc: gethostname not declared for XPG4 [20055] libc: termios.h missing tcgetsid for XPG4 [20072] dynamic-link: x86 init_cpu_features is called twice in static executable [20073] libc: sys/stat.h fchmod namespace [20074] libc: stdlib.h rand_r namespace [20076] libc: sys/stat.h missing S_IFSOCK, S_ISSOCK for XPG4 [20094] libc: stdlib.h should not declare grantpt, ptsname, unlockpt for XPG3 [20111] libc: struct sockaddr_storage cannot be aggregate-copied [20112] network: sunrpc: stack (frame) overflow in Sun RPC clntudp_call (CVE-2016-4429) [20115] string: Extra alignment in memset-vec-unaligned-erms.S [20119] libc: Wrong mask for processors level type from CPUID [20139] dynamic-link: Upper part of zmm is zeroed if Glibc is built with AS not supporting AVX512 [20151] math: [ldbl-128/ldbl-128ibm] j0l, j1l, y0l, y1l return sNaN for sNaN argument [20153] math: [ldbl-128ibm] sqrtl (sNaN) returns sNaN [20156] math: [ldbl-128ibm] ceill, rintl etc. return sNaN for sNaN argument [20157] math: [powerpc] fabsl (sNaN) wrongly raises "invalid" [20160] math: [powerpc] ceil, rint etc. return sNaN for sNaN input [20178] libc: posix_spawn{p} should not call exit [20191] stdio: libio: vtables hardening [20195] string: FMA4 detection requires CPUID execution with register eax=0x80000001 [20198] libc: quick_exit incorrectly destroys C++11 thread objects. [20205] math: [i386/x86_64] nextafterl incorrect incrementing negative subnormals [20212] math: acos (sNaN) returns sNaN [20213] math: asin (sNaN) returns sNaN [20214] network: Linux header sync with linux/in6.h and ipv6.h again. [20218] math: [i386] asinhl (sNaN) returns sNaN [20219] math: [i386] atanhl (sNaN) returns sNaN [20222] stdio: fopencookie: Mangle function pointers [20224] math: [i386] cbrtl (sNaN) returns sNaN [20225] math: ldexp, scalbn, scalbln return sNaN for sNaN input [20226] math: [i386/x86_64] expl, exp10l, expm1l return sNaN for sNaN input [20227] math: [i386/x86_64] logl (sNaN) returns sNaN [20228] math: [i386/x86_64] log10l (sNaN) returns sNaN [20229] math: [i386/x86_64] log1pl (sNaN) returns sNaN [20232] math: [ldbl-128] expm1l (sNaN) returns sNaN [20233] math: [ldbl-128ibm] expm1l (sNaN) returns sNaN [20234] math: [ldbl-128ibm] log1pl (sNaN) returns sNaN [20235] math: [i386/x86_64] log2l (sNaN) returns sNaN [20237] nss: nss_db: get*ent segfaults without preceding set*ent [20240] math: modf (sNaN) returns sNaN [20248] libc: debug/tst-longjump_chk2 calls printf from a signal handler [20250] math: frexp (sNaN) returns sNaN [20252] math: atan2 (sNaN, qNaN) fails to raise "invalid" [20255] math: [i386] fdim, fdimf return with excess range and precision / double rounding [20256] math: [i386/x86_64] fdiml returns sNaN for sNaN input [20260] string: ../sysdeps/x86/bits/string.h:1092:3: error: array subscript is below array bounds [-Werror=array-bounds] [20262] nis: _nss_nis_initgroups_dyn always returns NSS_STATUS_NOTFOUND [20263] nptl: robust mutex deadlocks if other thread requests timedlock (Only arm/linux) [20277] libc: $dp is not initialized correctly in sysdeps/hppa/start.S [20284] malloc: malloc: Corrupt arena avoidance causes unnecessary mmap fallbacks [20296] math: [i386/x86_64] scalbl returns sNaN for sNaN input, missing "invalid" exceptions [20314] nptl: make[4]: *** [/usr/include/stdlib.h] Error 1 [20316] localedata: id_ID: Februari instead of Pebruari [20327] string: POWER8 strcasecmp returns incorrect result [20347] math: Failure: Test: j0_downward (0xap+0) [20348] libc: FAIL: misc/tst-preadvwritev64 [20349] libc: 64-bit value is passed differently in p{readv,writev}{64} [20350] libc: There is no test for p{read,write}64 [20357] math: Incorrect cos result for 1.5174239687223976 [20384] build: Don't run libmvec-sincos-avx* tests on non avx machines Contributors ============ This release was made possible by the contributions of many people. The maintainers are grateful to everyone who has contributed changes or bug reports. These include: Adhemerval Zanella Andreas Schwab Andrew Senkevich Anton Blanchard Arnas Udovičius Aurelien Jarno Carlos Eduardo Seo Carlos O'Donell Chris Metcalf Chung-Lin Tang Claude Paroz Dimitris Pappas Dmitry V. Levin Dylan Alex Simon Eduardo Trápani Florian Weimer Gabriel F. T. Gomes Gunnar Hjalmarsson Gustavo Romero Guy Rutenberg H.J. Lu Hongjiu Zhang Jiyoung Yun John David Anglin Joseph Myers Khem Raj Maciej W. Rozycki Mark Wielaard Marko Myllynen Martin Galvan Matthew Fortune Matthias Wallnoefer Mike FABIAN Mike Frysinger Neskie Manuel Nick Alcock Paras pradhan Paul E. Murphy Paul Pluzhnikov Rajalakshmi Srinivasaraghavan Rical Jasan Richard Henderson Robin van der Vliet Roland McGrath Samuel Thibault Siddhesh Poyarekar Simion Onea Stefan Liebler Stephen Gallagher Szabolcs Nagy Timur Birsh Torvald Riegel Tulio Magno Quites Machado Filho Wilco Dijkstra Will Newton Yvan Roux Zack Weinberg -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXoAmUAAoJEBZ5K06iU0D4Nx8P/3EGutqtVg6OubIKw9izzWMO 6pNj7Iy569Bk+ER2ElR5xvTeumpVS05A8r94oXX0rzCNsFIAVct7Ocr62r/OQz8A +p6W+USpORha6m+SzY1bkI1109RR6Q4jpbENkhk/JKcBXJ7AHWHeW72QKMP0+JJu QBavQ8b3ZLJQ4X+Be10bjseTaqAn4XNYj6fmajQC2x7F0sL32+xFjSktw8hn9AFs A32yS3c2v/GfqKIyNj4Yz/akZzffACZ+8twVkJGDK5eoMGGQ/Obr3yttzKSNsN+O n73HyyP4O8dG/U+v5k0IR4drT6mnUFRHUvkN10VahCHiTSG5AFQut30lwvVKzhF6 VYylPzbhOcVnSuJqdT4xJ6sumvQl5W4IAb/GKSso62MrcKFYPFdnx0wgMX6Arlgm wkuSkSQCOfj/be2/R88alRJYNTW39vsPqKPop5ov/uXbfHqIoFcQitS0vDFJqGIC zOhJSlV0cS/StKkw0xNgQ6Ay/dnHMm2Hzg5lqRzaQblkDVrNfN7TqyeZBEhwh3N5 KifvkdKSO6L6N7dM3nLsT+qJWoSp8dNsQ+qCHL6A/hL0SA4nxJ3hmC5hanCL+7D8 MrO5m+Z5yjpdSWFDmJv2LZsIzYX2UGZmUO19c7zvZoIrXuJE4bQXEmM4rylrNXGS Lcke/0PPdvDeSW7iWjjP =j7Oo -----END PGP SIGNATURE----- Adhemerval Zanella (40): Open development for 2.24. Updated translations for 2.23. Regenerate libc.pot for 2.23. Regenerated configure scripts. Update NEWS with 2.24 template posix: Remove dynamic memory allocation from execl{e,p} posix: execvpe cleanup posix: New Linux posix_spawn{p} implementation posix: Fix tst-execvpe5 for --enable-hardcoded-path-in-tests posix: Fix posix_spawn invalid memory access posix: Fix posix_spawn implict check style Fix tst-dlsym-error build Improve generic strspn performance Improve generic strpbrk performance Remove powerpc64 strspn, strcspn, and strpbrk implementation Use PTR_ALIGN_DOWN on strcspn and strspn Define __ASSUME_ALIGNED_REGISTER_PAIRS for missing ports Consolidate off_t/off64_t syscall argument passing Consolidate pread/pread64 implementations Consolidate pwrite/pwrite64 implementations Fix pread consolidation on ports that require argument alignment libio: Update internal fmemopen position after write (BZ #20005) Fix clone (CLONE_VM) pid/tid reset (BZ#19957) libio: Fix fmemopen append mode failure (BZ# 20012) powerpc: Fix clone CLONE_VM compare Adjust kernel-features.h defaults for recvmsg and sendmsg network: recvmsg and sendmsg standard compliance (BZ#16919) network: recvmmsg and sendmmsg standard compliance (BZ#16919) network: Fix missing bits from {recv,send}{m}msg standard com,pliance posix: Call _exit in failure case for posix_spawn{p} (BZ#20178) Consolidate preadv/preadv64 implementation Consolidate pwritev/pwritev64 implementations Revert {send,sendm,recv,recvm}msg conformance changes Remove __ASSUME_FUTEX_LOCK_PI nptl: Add sendmmsg and recvmmsg cancellation tests Fix p{readv,writev}{64} consolidation implementation nptl: Add more coverage in tst-cancel4 Remove __ASSUME_OFF_DIFF_OFF64 definition Fix LO_HI_LONG definition Refactor Linux raise implementation (BZ#15368) Andreas Schwab (13): Don't use long double math functions if NO_LONG_DOUBLE Fix min/max needed for ascii to INTERNAL conversion Fix compilation of test-signgam-* tests Fix resource leak in resolver (bug 19257) Register extra test objects m68k: avoid local labels in symbol table m68k: use large PIC model for gcrt1.o Use __typeof instead of typeof Fix nscd assertion failure in gc (bug 19755) Avoid array-bounds warning for strncat on i586 (bug 20260) Return proper status from _nss_nis_initgroups_dyn (bug 20262) m68k: suppress -Wframe-address warning Add test case for bug 20263 Andrew Senkevich (2): Added tests to ensure linkage through libmvec *_finite aliases which are Fixed wrong vector sincos/sincosf ABI to have it compatible with Anton Blanchard (1): powerpc: Add a POWER8-optimized version of sinf() Arnas Udovičius (1): localedata: sgs_LT: new locale [BZ #12450] Aurelien Jarno (17): Add placeholder libnsl.abilist and libutil.abilist files Add sys/auxv.h wrapper to include/sys/ mips: terminate the FDE before the return trampoline in makecontext Assume __NR_openat is always defined Assume __NR_utimensat is always defined Synchronize <sys/personality.h> with kernel headers MIPS, SPARC: fix wrong vfork aliases in libpthread.so MIPS, SPARC: more fixes to the vfork aliases in libpthread.so MIPS: run tst-mode-switch-{1,2,3}.c using test-skeleton.c i686/multiarch: Regenerate ulps SPARC64: update localplt.data SPARC: fix nearbyint on sNaN input New locale de_LI localedata: fix de_LI locale ppc: Fix modf (sNaN) for pre-POWER5+ CPU (bug 20240). Define __USE_KERNEL_IPV6_DEFS macro for non-Linux kernels sparc: remove ceil, floor, trunc sparc specific implementations Carlos Eduardo Seo (2): powerpc: Fix dl-procinfo HWCAP powerpc: Optimization for strlen for POWER8. Carlos O'Donell (16): nptl: support thread stacks that grow up GB 18030-2005: Document non-rountrip and PUA mappings (bug 19575). Enable --localedir to set message catalog directory (Bug 14259) NEWS (2.23): Fix typo in bug 19048 text. Removed unused timezone/checktab.awk. Remove mention of checktab.awk in timezone/README. Fix building glibc master with NDEBUG and --with-cpu. localedata: an_ES: fix case of lang_ab Fix macro API for __USE_KERNEL_IPV6_DEFS. Fix include/wchar.h for C++ Bug 20198: quick_exit should not call destructors. Bug 20214: Fix linux/in6.h and netinet/in.h sync. Bug 20215: Always undefine __always_inline before defining it. Expand comments in Linux times() implementation. Update libc.pot and NEWS. Update for glibc 2.24 release. Chris Metcalf (2): Bump up tst-malloc-thread-fail timeout from 20 to 30s tile: only define __ASSUME_ALIGNED_REGISTER_PAIRS for 32-bit Chung-Lin Tang (2): Fix stdlib/tst-makecontext regression for Nios II Nios II localplt.data update: remove __eqsf2 Claude Paroz (1): localedata: ln_CD: new locale [BZ #12676] Dimitris Pappas (1): charmaps: IBM875: fix mapping of iota/upsilon variants [BZ #18453] Dmitry V. Levin (1): intl: reintroduce unintentionally disabled optimization Dylan Alex Simon (1): math: don't clobber old libm.so on install [BZ #19822] Eduardo Trápani (1): localedata: eo: new Esperanto locale [BZ #16190] Florian Weimer (91): tst-malloc-thread-exit: Use fewer system resources Remove trailing newline from date_fmt in Serbian locales [BZ #19581] Improve file descriptor checks for posix_spawn actions [BZ #19505] res_ninit: Update comment malloc: Remove arena_mem variable malloc: Remove max_total_mem member form struct malloc_par malloc: Remove NO_THREADS Deprecate readdir_r, readdir64_r [BZ #19056] test-skeleton.c: Do not set RLIMIT_DATA [BZ #19648] tst-audit4, tst-audit10: Compile AVX/AVX-512 code separately [BZ #19269] libio: Clean up _IO_file_doallocate and _IO_wfile_doallocate ldconfig: Do not remove stale symbolic links with -X [BZ #19610] sunrpc: In key_call_keyenvoy, use int status instead of union wait tst-audit10: Fix compilation on compilers without bit_AVX512F [BZ #19860] resolv: Always set *resplen2 out parameter in send_dg [BZ #19791] nss_db: Propagate ERANGE error if parse_line fails [BZ #19837] CVE-2016-3075: Stack overflow in _nss_dns_getnetbyname_r [BZ #19879] Report dlsym, dlvsym lookup errors using dlerror [BZ #19509] strfmon_l: Use specified locale for number formatting [BZ #19633] scratch_buffer_set_array_size: Include <limits.h> hsearch_r: Include <limits.h> Add missing bug number to ChangeLog nss_dns: Fix assertion failure in _nss_dns_getcanonname_r [BZ #19865] Remove union wait [BZ #19613] malloc: Run fork handler as late as possible [BZ #19431] malloc: Remove unused definitions of thread_atfork, thread_atfork_static malloc: Remove malloc hooks from fork handler malloc: Add missing internal_function attributes on function definitions vfprintf: Fix memory with large width and precision [BZ #19931] resolv: Always set *resplen2 out parameter in send_vc [BZ #19825] nss_dns: Validate RDATA length against packet length [BZ #19830] resolv, nss_dns: Remove remaining syslog logging [BZ #19862] nss_dns: Check address length before creating addrinfo result [BZ #19831] nss_dns: Remove custom offsetof macro definition nss_dns: Skip over non-PTR records in the netent code [BZ #19868] Fix ChangeLog date to reflect commit date resolv: Remove SCCS and RCS keywords resolv: Remove _LIBC conditionals inet: Remove SCCS keywords resolv: Remove BIND_UPDATE preprocessor conditionals resolv: Remove RESOLVSORT preprocess conditionals resolv: Remove RFC1535 conditionals resolv: Remove traces of ULTRIX support resolv: Remove __BIND_NOSTATIC conditionals resolv: Remove BSD compatibility conditionals and header resolv: Remove SUNSECURITY preprocessor conditionals resolv: Assorted preprocessor cleanups resolv: Reindent preprocessor conditionals following cleanups getnameinfo: Do not preserve errno glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir CVE-2016-3706: getaddrinfo: stack overflow in hostent conversion [BZ #20010] NEWS entry for CVE-2016-3075 getnameinfo: Refactor and fix memory leak [BZ #19642] hesiod: Remove RCS keywords hesiod: Remove DEF_RHS hesiod: Always use thread-local resolver state [BZ #19573] hesiod: Avoid heap overflow in get_txt_records [BZ #20031] CVE-2016-1234: glob: Do not copy d_name field of struct dirent [BZ #19779] getnameinfo: Reduce line length and add missing comments getnameinfo: Avoid calling strnlen on uninitialized buffer getnameinfo: Return EAI_OVERFLOW in more cases [BZ #19787] malloc: Adjust header file guard in malloc-internal.h getaddrinfo: Restore RES_USE_INET6 flag on error path [BZ #19994] resolv: Call gmtime_r instead of gmtime in p_secstodate [BZ #20017] localedef: Do not compile with mcheck getaddrinfo: Convert from extend_alloca to struct scratch_buffer Increase fork signal safety for single-threaded processes [BZ #19703] malloc: Rewrite dumped heap for compatibility in __malloc_set_state tst-mallocfork2: Fix race condition, use fewer resources Make padding in struct sockaddr_storage explicit [BZ #20111] CVE-2016-4429: sunrpc: Do not use alloca in clntudp_call [BZ #20112] malloc: Correct malloc alignment on 32-bit architectures [BZ #6527] fork in libpthread cannot use IFUNC resolver [BZ #19861] libio: Use wmemset instead of __wmemset to avoid linknamespace issue tst-rec-dlopen: Use interposed malloc instead of hooks malloc: Correct size computation in realloc for dumped fake mmapped chunks quick_exit tests: Do not use C++ headers malloc: Remove __malloc_initialize_hook from the API [BZ #19564] fopencookie: Mangle function pointers stored on the heap [BZ #20222] malloc_usable_size: Use correct size for dumped fake mapped chunks nss_db: Fix initialization of iteration position [BZ #20237] debug/tst-longjmp_chk2: Make signal handler more conservative [BZ #20248] Revert __malloc_initialize_hook symbol poisoning elf: Consolidate machine-agnostic DTV definitions in <dl-dtv.h> malloc: Avoid premature fallback to mmap [BZ #20284] test-skeleton.c: Add write_message function test-skeleton.c: xmalloc, xcalloc, xrealloc are potentially unused test-skeleton.c (xrealloc): Support realloc-as-free libio: Implement vtable verification [BZ #20191] Correct bug number in ChangeLog [BZ #18960] CVE-2016-5417 was assigned to bug 19257 Gabriel F. T. Gomes (3): powerpc: Remove uses of operand modifier (%s) in inline asm powerpc: Zero pad using memset in strncpy/stpncpy powerpc: Fix operand prefixes Gunnar Hjalmarsson (1): localedata: id_ID: Februari instead of Pebruari [BZ #20316] Gustavo Romero (1): powerpc: Fix missing verb and typo in comment about AT_HWCAP entry Guy Rutenberg (1): localedata: en_IL: new English locale [BZ #19963] H.J. Lu (68): [x86_64] Set DL_RUNTIME_UNALIGNED_VEC_SIZE to 8 Call x86-64 __setcontext directly Call x86-64 __mcount_internal/__sigjmp_save directly Copy x86_64 _mcount.op from _mcount.o Or bit_Prefer_MAP_32BIT_EXEC in EXTRA_LD_ENVVARS x86-64: Fix memcpy IFUNC selection Add a comment in sysdeps/x86_64/Makefile Replace @PLT with @GOTPCREL(%rip) in call Replace PREINIT_FUNCTION@PLT with *%rax in call Use HAS_ARCH_FEATURE with Fast_Rep_String Group AVX512 functions in .text.avx512 section Support --enable-hardcoded-path-in-tests in benchtests Define _HAVE_STRING_ARCH_mempcpy to 1 for x86 Add _arch_/_cpu_ to index_*/bit_* in x86 cpu-features.h Use JUMPTARGET in x86-64 mathvec Use JUMPTARGET in x86-64 pthread Set index_arch_AVX_Fast_Unaligned_Load only for Intel processors Don't set %rcx twice before "rep movsb" [x86] Add a feature bit: Fast_Unaligned_Copy Implement x86-64 multiarch mempcpy in memcpy Make __memcpy_avx512_no_vzeroupper an alias Initial Enhanced REP MOVSB/STOSB (ERMS) support Add x86-64 memmove with unaligned load/store and rep movsb Add x86-64 memset with unaligned store and rep stosb Test 64-byte alignment in memcpy benchtest Test 64-byte alignment in memmove benchtest Test 64-byte alignment in memset benchtest Remove Fast_Copy_Backward from Intel Core processors Fix memmove-vec-unaligned-erms.S Don't put SSE2/AVX/AVX512 memmove/memset in ld.so Add a comment in memset-sse2-unaligned-erms.S Force 32-bit displacement in memset-vec-unaligned-erms.S Add memcpy/memmove/memset benchmarks with large data X86-64: Prepare memset-vec-unaligned-erms.S X86-64: Prepare memmove-vec-unaligned-erms.S X86-64: Use non-temporal store in memcpy on large data Detect Intel Goldmont and Airmont processors Reduce number of mmap calls from __libc_memalign in ld.so Move sysdeps/x86_64/cacheinfo.c to sysdeps/x86 Remove x86 ifunc-defines.sym and rtld-global-offsets.sym Support non-inclusive caches on Intel processors Call init_cpu_features only if SHARED is defined Clear destination buffer updated by the previous run Don't call internal __pthread_unwind via PLT Don't call internal _Unwind_Resume via PLT Remove alignments on jump targets in memset Check the HTT bit before counting logical threads Correct Intel processor level type mask from CPUID Remove special L2 cache case for Knights Landing Avoid an extra branch to PLT for -z now Count number of logical processors sharing L2 cache Fix a typo in comments in memmove-vec-unaligned-erms.S Check FMA after COMMON_CPUID_INDEX_80000001 X86-64: Remove the previous SSE2/AVX2 memsets X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove X86-64: Add dummy memcopy.h and wordcopy.c Always indirect branch to __libc_start_main via GOT Compile tst-cleanupx4 test with -fexceptions Check Prefer_ERMS in memmove/memcpy/mempcpy/memset Require binutils 2.24 to build x86-64 glibc [BZ #20139] Make copies of cstdlib/cmath and use them [BZ #20314] X86-64: Define LO_HI_LONG to skip pos_h [BZ #20349] x86-64: Properly align stack in _dl_tlsdesc_dynamic [BZ #20309] Test p{read,write}64 with offset > 4GB x86-64: Add p{read,write}[v]64 to syscalls.list [BZ #20348] Regenerate i686 libm-test-ulps with GCC 6.1 at -O3 [BZ #20347] i386: Compile rtld-*.os with -mno-sse -mno-mmx -mfpmath=387 Don't compile do_test with -mavx/-mavx/-mavx512 Hongjiu Zhang (1): sln: use stat64 Jiyoung Yun (1): Fix robust mutex daedlock [BZ #20263] John David Anglin (2): hppa: fix loading of global pointer in _start [BZ #20277] hppa: Update libm-test-ulps. Joseph Myers (107): Fix ldbl-128ibm floorl for non-default rounding modes (bug 17899). Fix ldbl-128ibm ceill for non-default rounding modes (bug 19592). Fix ldbl-128ibm truncl for non-default rounding modes (bug 19593). Fix ldbl-128ibm roundl for non-default rounding modes (bug 19594). Fix ldbl-128ibm fmodl handling of subnormal results (bug 19595). Fix ldbl-128ibm fmodl handling of equal arguments with low part zero (bug 19602). Fix ldbl-128ibm remainderl, remquol equality tests (bug 19603). Fix ldbl-128ibm powl overflow handling (bug 19674). Fix ldbl-128ibm nextafterl, nexttowardl sign of zero result (bug 19678). Require Linux 3.2 except on x86 / x86_64, 3.2 headers everywhere. Remove linux/fanotify.h configure test. Remove kernel-features.h conditionals on pre-3.2 kernels. Fix ldbl-128ibm remainderl equality test for zero low part (bug 19677). Fix ldbl-128ibm nearbyintl in non-default rounding modes (bug 19790). Allow spurious underflow / inexact for ldbl-128ibm. Update glibc headers for Linux 4.5. Adjust kernel-features.h defaults for socket syscalls. Remove __ASSUME_PPOLL. Remove __ASSUME_FALLOCATE. Remove __ASSUME_EVENTFD2, move eventfd to syscalls.list. Remove __ASSUME_SIGNALFD4. Remove __ASSUME_GETDENTS64_SYSCALL. Fix x86_64 / x86 powl inaccuracy for integer exponents (bug 19848). [microblaze] Remove __ASSUME_FUTIMESAT. Fix termios.h XCASE namespace (bug 19925). Fix limits.h NL_NMAX namespace (bug 19929). Fix stdio.h cuserid namespace (bug 19989). Define off_t in stdio.h for XOPEN2K. conformtest: Correct XOPEN2K stdarg.h expectations. Fix langinfo.h nl_langinfo_l namespace (bug 19996). conformtest: Correct some signal.h expectations for XOPEN2K. conformtest: Correct some stdio.h expectations for UNIX98. conformtest: Correct stdio.h expectations for fdopen. Also define off_t in stdio.h for UNIX98. conformtest: Add langinfo.h expectations for YESSTR, NOSTR. Fix stdio.h namespace for pre-threads POSIX (bug 20014). Fix fcntl.h timespec namespace (bug 20023). Fix sys/time.h timespec namespace (bug 20041). conformtest: Remove some bogus sys/types.h expectations for XPG3 and XPG4. Declare cuserid in unistd.h for UNIX98 and before (bug 20043). Declare pthread_atfork in unistd.h for UNIX98 (bug 20044). conformtest: Fix st_blksize, st_blocks expectations for XPG3, XPG4. conformtest: Correct some sys/stat.h expectations for XPG3. Fix sys/stat.h fchmod namespace (bug 20073). Declare tcgetsid for XPG4 (bug 20055). conformtest: Do not expect S_IF* in fcntl.h. Declare gethostname for XPG4 (bug 20054). conformtest: Correct some unistd.h expectations for XPG3, XPG4. conformtest: Correct time.h XPG3 expectations. conformtest: Do not expect strdup in string.h for XPG3. conformtest: Correct some stdlib.h expectations for XPG3. Correct ttyslot header declaration conditions (bug 20051). Fix stdlib.h rand_r namespace (bug 20074). Make sys/stat.h define S_IFSOCK, S_ISSOCK for XPG4 (bug 20076). Do not declare grantpt, ptsname, unlockpt in stdlib.h for XPG3 (bug 20094). Add Q_GETNEXTQUOTA from Linux 4.6 to sys/quota.h. Add CLONE_NEWCGROUP from Linux 4.6 to bits/sched.h. Update libm-test.inc comment about NaN signs. conformtest: Correct search.h expectations for XPG3. conformtest: Correct pwd.h expectations for XPG3. Implement proper fmal for ldbl-128ibm (bug 13304). conformtest: Correct ftw.h expectations for XPG3, XPG4. Update sysdeps/unix/sysv/linux/bits/socket.h for Linux 4.6. conformtest: Correct some limits.h expectations for XPG3, XPG4. Do not raise "inexact" from generic ceil (bug 15479). Do not raise "inexact" from generic floor (bug 15479). Do not raise "inexact" from generic round (bug 15479). Do not raise "inexact" from x86_64 SSE4.1 ceil, floor (bug 15479). Do not raise "inexact" from powerpc32 ceil, floor, trunc (bug 15479). Do not raise "inexact" from powerpc64 ceil, floor, trunc (bug 15479). Support sNaN testing in libm-test.inc. Add more sNaN tests to libm-test.inc. Fix ldbl-128 j0l, j1l, y0l, y1l for sNaN argument (bug 20151). Fix ldbl-128ibm sqrtl (sNaN) (bug 20153). Fix ldbl-128ibm ceill, rintl etc. for sNaN arguments (bug 20156). Remove unused macros from libm-test.inc. Avoid "invalid" exceptions from powerpc fabsl (sNaN) (bug 20157). Fix powerpc32 ceil, rint etc. on sNaN input (bug 20160). Fix powerpc64 ceil, rint etc. on sNaN input (bug 20160). Fix x86/x86_64 nextafterl incrementing negative subnormals (bug 20205). Fix dbl-64 acos (sNaN) (bug 20212). Fix dbl-64 asin (sNaN) (bug 20213). Fix i386 asinhl (sNaN) (bug 20218). Fix i386 atanhl (sNaN) (bug 20219). Fix i386 cbrtl (sNaN) (bug 20224). Fix ldexp, scalbn, scalbln for sNaN input (bug 20225). Fix i386/x86_64 expl, exp10l, expm1l for sNaN input (bug 20226). Fix i386/x86_64 logl (sNaN) (bug 20227). Fix i386/x86_64 log10l (sNaN) (bug 20228). Fix i386/x86_64 log1pl (sNaN) (bug 20229). Fix ldbl-128 expm1l (sNaN) (bug 20232). Fix ldbl-128ibm expm1l (sNaN) (bug 20233). Fix ldbl-128ibm log1pl (sNaN) (bug 20234). Fix i386/x86_64 log2l (sNaN) (bug 20235). Fix modf (sNaN) (bug 20240). Fix frexp (NaN) (bug 20250). Add more sNaN tests (cimag, conj, copysign, creal, fma, fmod). Fix dbl-64 atan2 (sNaN, qNaN) (bug 20252). Simplify generic fdim implementations. Use generic fdim on more architectures (bug 6796, bug 20255, bug 20256). Fix i386 fdim double rounding (bug 20255). Simplify x86 nearbyint functions. Add more sNaN tests (most remaining real functions). Fix i386/x86_64 scalbl with sNaN input (bug 20296). Avoid "inexact" exceptions in i386/x86_64 ceil functions (bug 15479). Avoid "inexact" exceptions in i386/x86_64 floor functions (bug 15479). Avoid "inexact" exceptions in i386/x86_64 trunc functions (bug 15479). Khem Raj (2): When disabling SSE, make sure -fpmath is not set to use SSE either elf: Define missing Meta architecture specific relocations Maciej W. Rozycki (1): Treat STV_HIDDEN and STV_INTERNAL symbols as STB_LOCAL Mark Wielaard (2): elf/elf.h: Add new 386 and X86_64 relocations from binutils. elf.h: Add NT_ARM_SYSTEM_CALL constant. Marko Myllynen (1): localedef: drop unused --old-style Martin Galvan (1): Add pretty printers for the NPTL lock types Matthew Fortune (1): VDSO support for MIPS Matthias Wallnoefer (2): localedata: de_{AT,CH}: copy data from de_DE localedata: de_IT: new locale Mike FABIAN (1): localedata: i18n: fix typos in tel_int_fmt Mike Frysinger (44): locledata: trim trailing blank lines/comments localedata: dz_BT/ps_AF: reformat data localedata: CLDRv28: update LC_TELEPHONE.int_prefix locales: pap_AN: delete old/deprecated locale [BZ #16003] test-skeleton: increase default TIMEOUT to 20 seconds localedata: an_ES: fix lang_ab value localedata: es_PR: change LC_MEASUREMENT to metric localedata: clear LC_IDENTIFICATION tel/fax fields link sln fix to bugzilla [BZ #15333] localedata: use same comment_char/escape_char in these files add ChangeLog entry localedata: standardize first few lines localedata: standardize copyright/license information [BZ #11213] localedata: iw_IL: delete old/deprecated locale [BZ #16137] configure: fix `test ==` usage localedata: CLDRv28: update LC_PAPER values localedata: LC_TIME.date_fmt: delete entries same as the default value localedata: CLDRv29: update LC_IDENTIFICATION language/territory fields localedata: LC_MEASUREMENT: use copy directives everywhere localedata: LC_PAPER: use copy directives everywhere localedata: CLDRv29: update LC_ADDRESS.country_num values localedata: fix LC_ADDRESS.country_car entries localedata: CLDRv29: update LC_ADDRESS.country_name translations localedata: LC_IDENTIFICATION.category: set to ISO 30112 2014 standard localedef: check LC_IDENTIFICATION.category values localedata: CLDRv29: update LC_MONETARY int_curr_symbol & currency_symbol localedata: LC_IDENTIFICATION: delete uncommon fields locale: ld-telephone: update to ISO-30112 2014 localedef: allow %l/%n in postal_fmt [BZ #16983] localedata: fix LC_TELEPHONE in a few locales localedata: CLDRv29: update LC_TIME week/first_week,workday fields localedef: change week_1stweek default to 7 localedata: standard LC_MESSAGES string regexes a bit localedata: LC_MESSAGES.{yes,no}expr: add +1/-0 to all regexes [BZ #15263] localedata: LC_MESSAGES.{yes,no}expr: standardize yY/nN [BZ #15262] localedata: CLDRv29: update LC_MESSAGES yes/no strings [BZ #15264] [BZ #16975] tst-langinfo: update yesexpr/noexpr baselines tst-fmon/tst-numeric: switch malloc to static stack space [BZ #19671] localedata: add more translit entries localedata: pt_BR/pt_PT: make days/months lowercase [BZ #19133] unicode-gen: include standard comment file header NEWS: clarify localedef --old-style update manual: fix spelling typos microblaze: fix variable name collision with syscall macros Neskie Manuel (1): localedata: chr_US: new Cherokee locale [BZ #12143] Nick Alcock (2): x86, pthread_cond_*wait: Do not depend on %eax not being clobbered Allow overriding of CFLAGS as well as CPPFLAGS for rtld. Paras pradhan (1): localedata: ne_NP: misc updates [BZ #1170] Paul E. Murphy (22): Increase internal precision of ldbl-128ibm decimal printf [BZ #19853] powerpc: Add optimized P8 strspn powerpc: Add optimized strcspn for P8 powerpc: Add missing insn in swapcontext [BZ #20004] Refactor bug-strtod.c to better test new types. Refactor bug-strtod2.c to be type generic Refactor tst-strtod6.c Refactor tst-strtod-round.c Fixup usage of MANT_DIG in libm-test.inc Fixup usage of MIN_EXP in libm-test.inc Refactor tst-strtod-round.c for type-generic-ness Begin refactor of libm-test.inc Refactor type specific macros using regexes Refactor M_ macros defined in libm-test.inc Replace M_PI2l with lit_pi_2_d in libm-test.inc Replace M_PIl with lit_pi in libm-test.inc Replace M_PI_4l with lit_pi_4_d in libm-test.inc Replace M_El with lit_e in libm-test.inc Apply LIT(x) to floating point literals in libm-test.c Remove CHOOSE() macro from libm-tests.inc Remove type specific information from auto-libm-test-in Generate new format names in auto-libm-test-out Paul Pluzhnikov (7): 2016-03-03 Paul Pluzhnikov <ppluzhnikov@google.com> 2016-05-30 Paul Pluzhnikov <ppluzhnikov@google.com> Merge branch 'master' of ssh://sourceware.org/git/glibc 2016-06-05 Paul Pluzhnikov <ppluzhnikov@google.com> 2016-06-09 Paul Pluzhnikov <ppluzhnikov@gmail.com> 2016-06-11 Paul Pluzhnikov <ppluzhnikov@google.com> Fix rt/tst-aio64.c as well, and mention login/tst-utmp.c in ChangeLog Rajalakshmi Srinivasaraghavan (4): powerpc: Rearrange cfi_offset calls powerpc: strcasestr optmization for power8 Add nextup and nextdown math functions powerpc: Fix return code of strcasecmp for unaligned inputs Rical Jasan (9): manual: fix typos in the memory chapter manual: fix typos in the character handling chapter manual: fix typos in the string chapters manual: fix typos in character set handling manual: fix typos in the locale chapter manual: fix typos in the locale chapter manual: fix typos in the message chapter manual: fix typos in the search chapter manual: fix typos in the pattern chapter Richard Henderson (2): elf.h: Sync with the gabi webpage elf.h: Add declarations for BPF Robin van der Vliet (1): locale: iso-639: add Talossan language [BZ #19400] Roland McGrath (9): Add fts64_* to sysdeps/arm/nacl/libc.abilist Typo fixes. Gratuitous change to poke buildbot. Fix c++-types-check conditionalization. Omit test-math-isinff when no C++ compiler. Conditionalize c++-types-check.out addition to tests-special. Fix edito in last change. Fix tst-audit10 build when -mavx512f is not supported. stpcpy is part of POSIX.1-2008 [BZ #3629] Samuel Thibault (23): Fix flag test in waitid compatibility layer Fix hurd build hurd: Break errnos.d / libc-modules.h dependency loop Fix mach-syscalls.mk build hurd: Do not hide rtld symbols which need to be preempted hurd: Allow inlining IO locks hurd: Add c++-types expected result Fix malloc threaded tests link on non-Linux Fix crash on getauxval call without HAVE_AUX_VECTOR Fix build with HAVE_AUX_VECTOR hurd: fix profiling short-living processes Fix gprof timing non-linux: Apply RFC3542 obsoletion of RFC2292 macros non-linux: Apply RFC3542 obsoletion of RFC2292 macros aio: fix newp->running data race Revert "aio: fix newp->running data race" hurd: fix _hurd_self_sigstate reference from ____longjmp_chk Add more hurd exception to local headers list hurd: disable ifunc for now mach: Add mach_print sycsall declaration hurd: Fix PTR_{,DE}MANGLE calls Add missing changelog part Fix TABDLY value Siddhesh Poyarekar (10): New make target to only build benchmark binaries Fix up ChangeLog formatting benchtests: Update README to include instructions for bench-build target Fix up ChangeLog benchtests: Clean up extra-objs benchtests: Support for cross-building benchmarks Avoid attempt for runtime checks if all environments are defined Fix up ChangeLog Revert "Add pretty printers for the NPTL lock types" Fix cos computation for multiple precision fallback (bz #20357) Simion Onea (1): localedata: ro_RO: update Tuesday translation [BZ #18911] Stefan Liebler (31): Add missing inclusion of libc-internal.h. S390: Save and restore fprs/vrs while resolving symbols. S390: Extend structs La_s390_regs / La_s390_retval with vector-registers. S390: Use ahi instead of aghi in 32bit _dl_runtime_resolve. Mention Bug in ChangeLog for S390: Save and restore fprs/vrs while resolving symbols. Fix strfmon_l: Use specified locale for number formatting [BZ #19633] Add missing iucv related defines. S390: Add support for vdso getcpu symbol. S390: Use fPIC to avoid R_390_GOT12 relocation in gcrt1.o. Fix tst-cancel17/tst-cancelx17, which sometimes segfaults while exiting. S390: Use mvcle for copies > 1MB on 32bit with default memcpy variant. S390: Use 64bit instruction to check for copies of > 1MB with mvcle. S390: Do not call memcpy, memcmp, memset within libc.so via ifunc-plt. S390: Implement mempcpy with help of memcpy. [BZ #19765] S390: Get rid of make warning: overriding recipe for target gconv-modules. S390: Configure check for vector support in gcc. S390: Optimize 8bit-generic iconv modules. S390: Optimize builtin iconv-modules. S390: Optimize iso-8859-1 to ibm037 iconv-module. S390: Optimize utf8-utf32 module. S390: Optimize utf8-utf16 module. S390: Optimize utf16-utf32 module. S390: Use s390-64 specific ionv-modules on s390-32, too. S390: Fix utf32 to utf8 handling of low surrogates (disable cu41). S390: Fix utf32 to utf16 handling of low surrogates (disable cu42). Fix ucs4le_internal_loop in error case. [BZ #19726] Fix UTF-16 surrogate handling. [BZ #19727] tst-rec-dlopen: Fix build fail due to missing inclusion of string.h S390: Fix relocation of _nl_current_LC_CATETORY_used in static build. [BZ #19860] S390: Use DT_JUMPREL in prelink undo code. S390: Do not clobber r13 with memcpy on 31bit with copies >1MB. Stephen Gallagher (1): NSS: Implement group merging support. Szabolcs Nagy (4): [AArch64] Fix libc internal asm profiling code [AArch64] Add bits/hwcap.h for aarch64 linux [AArch64] Regenerate libm-test-ulps [AArch64] Update libm-test-ulps Timur Birsh (1): localedata: kk_KZ: various updates [BZ #15578] Torvald Riegel (1): Remove atomic_compare_and_exchange_bool_rel. Tulio Magno Quites Machado Filho (3): Fix type of parameter passed by malloc_consolidate powerpc: Fix --disable-multi-arch build on POWER8 powerpc: Add a POWER8-optimized version of expf() Wilco Dijkstra (7): Improve generic strcspn performance Remove pre GCC3.2 optimizations from string/bits/string2.h. Move mempcpy, strcpy and stpcpy inlines to string/string-inlines.c as compatibility This is an optimized memset for AArch64. Memset is split into 4 main cases: This is an optimized memcpy/memmove for AArch64. Copies are split into 3 main Add a simple rawmemchr implementation. Use strlen for rawmemchr(s, '\0') as it This patch further tunes memcpy - avoid one branch for sizes 1-3, Will Newton (1): elf/elf.h: Add missing Meta relocations Yvan Roux (1): Suppress GCC 6 warning about ambiguous 'else' with -Wparentheses Zack Weinberg (3): Move sysdeps/generic/bits/hwcap.h to top-level bits/ Move sysdeps/generic/bits/hwcap.h to top-level bits/ Don't install the internal header grp-merge.h raji (1): powerpc: strcasecmp/strncasecmp optmization for power8 ricaljasan@pacific.net (2): manual: fix typo in the introduction manual: fix typos in error reporting -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, gentoo/2.23 has been updated via 6a04ea1a9a8586c737a71b1a1b55e15c51a25c1f (commit) from 2af88e803f2084c17e55b835ef881b243a393fa9 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=6a04ea1a9a8586c737a71b1a1b55e15c51a25c1f commit 6a04ea1a9a8586c737a71b1a1b55e15c51a25c1f Author: Florian Weimer <fweimer@redhat.com> Date: Fri Mar 25 11:11:42 2016 +0100 tst-audit10: Fix compilation on compilers without bit_AVX512F [BZ #19860] [BZ# 19860] * sysdeps/x86_64/tst-audit10.c (avx512_enabled): Always return zero if the compiler does not provide the AVX512F bit. (cherry picked from commit f327f5b47be57bc05a4077344b381016c1bb2c11) (cherry picked from commit 4cf055a2a331b7361622dc9ac8993b59c6f0ef59) ----------------------------------------------------------------------- Summary of changes: sysdeps/x86_64/tst-audit10.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-)