There is transition penalty when SSE instructions are mixed with 256-bit AVX or 512-bit AVX512 instructions. Since _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 save and restore 256-bit YMM/512-bit ZMM registers, there is transition penalty for SSE instructions with lazy binding.
I also expect that the present state of affairs makes all context switches slower because the kernel has to save and restore the AVX-512F state.
(In reply to Florian Weimer from comment #1) > I also expect that the present state of affairs makes all context switches > slower because the kernel has to save and restore the AVX-512F state. Context switches may not be impacted since XSAVEC and XSAVEOPT track upper bits of vector registers. But SSE transition tracks only YMM/ZMM load instructions, not the bits in vector registers.
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/x86/xgetbv has been created at d963b835c1e0fe430a88168a81c4c69dcd9ad00c (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d963b835c1e0fe430a88168a81c4c69dcd9ad00c commit d963b835c1e0fe430a88168a81c4c69dcd9ad00c Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Aug 23 09:09:32 2016 -0700 X86-64: Add _dl_runtime_resolve_avx[512]_opt [BZ #20508] There is transition penalty when SSE instructions are mixed with 256-bit AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM registers, there is transition penalty when SSE instructions are used with lazy binding on AVX and AVX512 processors. For AVX and AVX512 processors which support XGETBV with ECX == 1, we can use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers or the upper 256 bits of ZMM registers are zero. We can restore only the non-zero portion of vector registers with AVX/AVX512 load instructions which will zero-extend upper bits of vector registers. This patch adds _dl_runtime_resolve_sse_vex which saves and restores XMM registers with 128-bit AVX store/load instructions. It is used to preserve YMM/ZMM registers when only the lower 128 bits are non-zero. _dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so that we store and load only the non-zero portion of vector registers. This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 when only the lower 128 bits of vector registers are used. [BZ #20508] * sysdeps/x86/cpu-features.c (init_cpu_features): Set Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): New. (index_arch_Use_dl_runtime_resolve_opt): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt if Use_dl_runtime_resolve_opt is set. * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>. (_dl_runtime_resolve_opt): New. Defined for AVX and AVX512. (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_opt): New. (_dl_runtime_profile): Define only if _dl_runtime_profile is defined. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d4e9985c033d90c310d7798f2d1f0634a64cedff commit d4e9985c033d90c310d7798f2d1f0634a64cedff Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Aug 18 14:52:42 2016 -0700 X86-64: Correct CFA in _dl_runtime_resolve When stack is re-aligned in _dl_runtime_resolve, there is no need to adjust CFA when allocating register save area on stack. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve): Don't adjust CFA when allocating register save area on re-aligned stack. -----------------------------------------------------------------------
*** Bug 20495 has been marked as a duplicate of this bug. ***
There is also transition penalty on AVX machines
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/x86/xgetbv has been deleted was d963b835c1e0fe430a88168a81c4c69dcd9ad00c - Log ----------------------------------------------------------------- d963b835c1e0fe430a88168a81c4c69dcd9ad00c X86-64: Add _dl_runtime_resolve_avx[512]_opt [BZ #20508] -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/x86/xgetbv has been created at 99143e37c1186c765da5b6e892ddaff0b3719f9f (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=99143e37c1186c765da5b6e892ddaff0b3719f9f commit 99143e37c1186c765da5b6e892ddaff0b3719f9f Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Aug 23 09:09:32 2016 -0700 X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508] There is transition penalty when SSE instructions are mixed with 256-bit AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM registers, there is transition penalty when SSE instructions are used with lazy binding on AVX and AVX512 processors. To avoid SSE transition penalty, if only the lower 128 bits of the first 8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers with the zero upper bits. For AVX and AVX512 processors which support XGETBV with ECX == 1, we can use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers or the upper 256 bits of ZMM registers are zero. We can restore only the non-zero portion of vector registers with AVX/AVX512 load instructions which will zero-extend upper bits of vector registers. This patch adds _dl_runtime_resolve_sse_vex which saves and restores XMM registers with 128-bit AVX store/load instructions. It is used to preserve YMM/ZMM registers when only the lower 128 bits are non-zero. _dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so that we store and load only the non-zero portion of vector registers. This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 when only the lower 128 bits of vector registers are used. _dl_runtime_resolve_avx_slow is added and used for AVX processors which don't upport XGETBV with ECX == 1. Since there is no SSE transition penalty on AVX512 processors which don't support XGETBV with ECX == 1, _dl_runtime_resolve_avx512_slow isn't provided. [BZ #20495] [BZ #20508] * sysdeps/x86/cpu-features.c (init_cpu_features): For Intel processors, set Use_dl_runtime_resolve_slow and set Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): New. (bit_arch_Use_dl_runtime_resolve_slow): Likewise. (index_arch_Use_dl_runtime_resolve_opt): Likewise. (index_arch_Use_dl_runtime_resolve_slow): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt if Use_dl_runtime_resolve_opt is set. Use _dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set. * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>. (_dl_runtime_resolve_opt): New. Defined for AVX and AVX512. (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow): New. (_dl_runtime_resolve_opt): Likewise. (_dl_runtime_profile): Define only if _dl_runtime_profile is defined. -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/x86/xgetbv has been deleted was 99143e37c1186c765da5b6e892ddaff0b3719f9f - Log ----------------------------------------------------------------- 99143e37c1186c765da5b6e892ddaff0b3719f9f X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508] -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/x86/xgetbv has been created at fdb9777e1d770446972f46a80ebfa59d522a93f1 (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=fdb9777e1d770446972f46a80ebfa59d522a93f1 commit fdb9777e1d770446972f46a80ebfa59d522a93f1 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Aug 23 09:09:32 2016 -0700 X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508] There is transition penalty when SSE instructions are mixed with 256-bit AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM registers, there is transition penalty when SSE instructions are used with lazy binding on AVX and AVX512 processors. To avoid SSE transition penalty, if only the lower 128 bits of the first 8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers with the zero upper bits. For AVX and AVX512 processors which support XGETBV with ECX == 1, we can use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers or the upper 256 bits of ZMM registers are zero. We can restore only the non-zero portion of vector registers with AVX/AVX512 load instructions which will zero-extend upper bits of vector registers. This patch adds _dl_runtime_resolve_sse_vex which saves and restores XMM registers with 128-bit AVX store/load instructions. It is used to preserve YMM/ZMM registers when only the lower 128 bits are non-zero. _dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so that we store and load only the non-zero portion of vector registers. This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 when only the lower 128 bits of vector registers are used. _dl_runtime_resolve_avx_slow is added and used for AVX processors which don't support XGETBV with ECX == 1. Since there is no SSE transition penalty on AVX512 processors which don't support XGETBV with ECX == 1, _dl_runtime_resolve_avx512_slow isn't provided. [BZ #20495] [BZ #20508] * sysdeps/x86/cpu-features.c (init_cpu_features): For Intel processors, set Use_dl_runtime_resolve_slow and set Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): New. (bit_arch_Use_dl_runtime_resolve_slow): Likewise. (index_arch_Use_dl_runtime_resolve_opt): Likewise. (index_arch_Use_dl_runtime_resolve_slow): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt if Use_dl_runtime_resolve_opt is set. Use _dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set. * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>. (_dl_runtime_resolve_opt): New. Defined for AVX and AVX512. (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow): New. (_dl_runtime_resolve_opt): Likewise. (_dl_runtime_profile): Define only if _dl_runtime_profile is defined. -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, master has been updated via fb0f7a6755c1bfaec38f490fbfcaa39a66ee3604 (commit) from a0d47f487fe250c63cc21e9608b85bc02dc2a006 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=fb0f7a6755c1bfaec38f490fbfcaa39a66ee3604 commit fb0f7a6755c1bfaec38f490fbfcaa39a66ee3604 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Sep 6 08:50:55 2016 -0700 X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508] There is transition penalty when SSE instructions are mixed with 256-bit AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM registers, there is transition penalty when SSE instructions are used with lazy binding on AVX and AVX512 processors. To avoid SSE transition penalty, if only the lower 128 bits of the first 8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers with the zero upper bits. For AVX and AVX512 processors which support XGETBV with ECX == 1, we can use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers or the upper 256 bits of ZMM registers are zero. We can restore only the non-zero portion of vector registers with AVX/AVX512 load instructions which will zero-extend upper bits of vector registers. This patch adds _dl_runtime_resolve_sse_vex which saves and restores XMM registers with 128-bit AVX store/load instructions. It is used to preserve YMM/ZMM registers when only the lower 128 bits are non-zero. _dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so that we store and load only the non-zero portion of vector registers. This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 when only the lower 128 bits of vector registers are used. _dl_runtime_resolve_avx_slow is added and used for AVX processors which don't support XGETBV with ECX == 1. Since there is no SSE transition penalty on AVX512 processors which don't support XGETBV with ECX == 1, _dl_runtime_resolve_avx512_slow isn't provided. [BZ #20495] [BZ #20508] * sysdeps/x86/cpu-features.c (init_cpu_features): For Intel processors, set Use_dl_runtime_resolve_slow and set Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): New. (bit_arch_Use_dl_runtime_resolve_slow): Likewise. (index_arch_Use_dl_runtime_resolve_opt): Likewise. (index_arch_Use_dl_runtime_resolve_slow): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt if Use_dl_runtime_resolve_opt is set. Use _dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set. * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>. (_dl_runtime_resolve_opt): New. Defined for AVX and AVX512. (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow): New. (_dl_runtime_resolve_opt): Likewise. (_dl_runtime_profile): Define only if _dl_runtime_profile is defined. ----------------------------------------------------------------------- Summary of changes: ChangeLog | 25 ++++++++++ sysdeps/x86/cpu-features.c | 14 +++++ sysdeps/x86/cpu-features.h | 6 ++ sysdeps/x86_64/dl-machine.h | 24 ++++++++- sysdeps/x86_64/dl-trampoline.S | 20 ++++++++ sysdeps/x86_64/dl-trampoline.h | 104 +++++++++++++++++++++++++++++++++++++++- 6 files changed, 190 insertions(+), 3 deletions(-)
Fixed for 2.25.
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, release/2.24/master has been updated via 4b8790c81c1a7b870a43810ec95e08a2e501123d (commit) from 2d16e81babd1d7b66d10cec0bc6d6d86a7e0c95e (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4b8790c81c1a7b870a43810ec95e08a2e501123d commit 4b8790c81c1a7b870a43810ec95e08a2e501123d Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Sep 6 08:50:55 2016 -0700 X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508] There is transition penalty when SSE instructions are mixed with 256-bit AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM registers, there is transition penalty when SSE instructions are used with lazy binding on AVX and AVX512 processors. To avoid SSE transition penalty, if only the lower 128 bits of the first 8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers with the zero upper bits. For AVX and AVX512 processors which support XGETBV with ECX == 1, we can use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers or the upper 256 bits of ZMM registers are zero. We can restore only the non-zero portion of vector registers with AVX/AVX512 load instructions which will zero-extend upper bits of vector registers. This patch adds _dl_runtime_resolve_sse_vex which saves and restores XMM registers with 128-bit AVX store/load instructions. It is used to preserve YMM/ZMM registers when only the lower 128 bits are non-zero. _dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so that we store and load only the non-zero portion of vector registers. This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 when only the lower 128 bits of vector registers are used. _dl_runtime_resolve_avx_slow is added and used for AVX processors which don't support XGETBV with ECX == 1. Since there is no SSE transition penalty on AVX512 processors which don't support XGETBV with ECX == 1, _dl_runtime_resolve_avx512_slow isn't provided. [BZ #20495] [BZ #20508] * sysdeps/x86/cpu-features.c (init_cpu_features): For Intel processors, set Use_dl_runtime_resolve_slow and set Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): New. (bit_arch_Use_dl_runtime_resolve_slow): Likewise. (index_arch_Use_dl_runtime_resolve_opt): Likewise. (index_arch_Use_dl_runtime_resolve_slow): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt if Use_dl_runtime_resolve_opt is set. Use _dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set. * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>. (_dl_runtime_resolve_opt): New. Defined for AVX and AVX512. (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow): New. (_dl_runtime_resolve_opt): Likewise. (_dl_runtime_profile): Define only if _dl_runtime_profile is defined. (cherry picked from commit fb0f7a6755c1bfaec38f490fbfcaa39a66ee3604) ----------------------------------------------------------------------- Summary of changes: ChangeLog | 25 ++++++++++ sysdeps/x86/cpu-features.c | 14 +++++ sysdeps/x86/cpu-features.h | 6 ++ sysdeps/x86_64/dl-machine.h | 24 ++++++++- sysdeps/x86_64/dl-trampoline.S | 20 ++++++++ sysdeps/x86_64/dl-trampoline.h | 104 +++++++++++++++++++++++++++++++++++++++- 6 files changed, 190 insertions(+), 3 deletions(-)
May I ask for a high priority to get this fixed in the stable versions? With the dolphin emulator, we've seen an up to 80% slowdown on random compilation settings and usages because of this issue: https://forums.dolphin-emu.org/Thread-dolphin-uses-ffmpeg-to-play-show-the-videos-of-the-games We use AVX128 and SSE in a mixed way in our just in time compiler, so we are hit hard by this penalty. Especially as this thread run almost no C++ code. So there was a good chance to never call VZEROUPPER at all. Debugging this performance issue was a big pain.
(In reply to markus from comment #13) > May I ask for a high priority to get this fixed in the stable versions? > It has been backported to 2.24 branch.
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, gentoo/2.24 has been updated via b73ec923c79ab493a9265930a45800391329571a (commit) via 04c5f782796052de9d06975061eb3376ccbcbdb1 (commit) via 9b34c1494d8e61bb3d718e2ea83b856030476737 (commit) via 2afb8a945ddc104c5ef9aa61f32427c19b681232 (commit) via df13b9c22a0fb690a0ab9dd4af163ae3c459d975 (commit) via b4391b0c7def246a4503db1af683122681c12a56 (commit) via 0d5f4a32a34f048b35360a110a0e6d1c87e3eced (commit) via 0ab02a62e42e63b058e7a4e160dbe51762ef2c46 (commit) via 901db98f36690e4743feefd985c6ba2d7fd19813 (commit) from caafe2b2612be88046d7bad4da42dbc2b07fbcd7 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b73ec923c79ab493a9265930a45800391329571a commit b73ec923c79ab493a9265930a45800391329571a Author: Aurelien Jarno <aurelien@aurel32.net> Date: Tue Aug 2 09:18:59 2016 +0200 alpha: fix trunc for big input values The alpha specific version of trunc and truncf always add and subtract 0x1.0p23 or 0x1.0p52 even for big values. This causes this kind of errors in the testsuite: Failure: Test: trunc_towardzero (0x1p107) Result: is: 1.6225927682921334e+32 0x1.fffffffffffffp+106 should be: 1.6225927682921336e+32 0x1.0000000000000p+107 difference: 1.8014398509481984e+16 0x1.0000000000000p+54 ulp : 0.5000 max.ulp : 0.0000 Change this by returning the input value when its absolute value is greater than 0x1.0p23 or 0x1.0p52. NaN have to go through the add and subtract operations to get possibly silenced. Finally remove the code to handle inexact exception, trunc should never generate such an exception. Changelog: * sysdeps/alpha/fpu/s_trunc.c (__trunc): Return the input value when its absolute value is greater than 0x1.0p52. [_IEEE_FP_INEXACT] Remove. * sysdeps/alpha/fpu/s_truncf.c (__truncf): Return the input value when its absolute value is greater than 0x1.0p23. [_IEEE_FP_INEXACT] Remove. (cherry picked from commit b74d259fe793499134eb743222cd8dd7c74a31ce) (cherry picked from commit e6eab16cc302e6c42f79e1af02ce98ebb9a783bc) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=04c5f782796052de9d06975061eb3376ccbcbdb1 commit 04c5f782796052de9d06975061eb3376ccbcbdb1 Author: Aurelien Jarno <aurelien@aurel32.net> Date: Tue Aug 2 09:18:59 2016 +0200 alpha: fix rint on sNaN input The alpha version of rint wrongly return sNaN for sNaN input. Fix that by checking for NaN and by returning the input value added with itself in that case. Changelog: * sysdeps/alpha/fpu/s_rint.c (__rint): Add argument with itself when it is a NaN. * sysdeps/alpha/fpu/s_rintf.c (__rintf): Likewise. (cherry picked from commit cb7f9d63b921ea1a1cbb4ab377a8484fd5da9a2b) (cherry picked from commit 8eb9a92e0522f2d4f2d4167df919d066c85d3408) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9b34c1494d8e61bb3d718e2ea83b856030476737 commit 9b34c1494d8e61bb3d718e2ea83b856030476737 Author: Aurelien Jarno <aurelien@aurel32.net> Date: Tue Aug 2 09:18:59 2016 +0200 alpha: fix floor on sNaN input The alpha version of floor wrongly return sNaN for sNaN input. Fix that by checking for NaN and by returning the input value added with itself in that case. Finally remove the code to handle inexact exception, floor should never generate such an exception. Changelog: * sysdeps/alpha/fpu/s_floor.c (__floor): Add argument with itself when it is a NaN. [_IEEE_FP_INEXACT] Remove. * sysdeps/alpha/fpu/s_floorf.c (__floorf): Likewise. (cherry picked from commit 65cc568cf57156e5230db9a061645e54ff028a41) (cherry picked from commit 1912cc082df4739c2388c375f8d486afdaa7d49b) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=2afb8a945ddc104c5ef9aa61f32427c19b681232 commit 2afb8a945ddc104c5ef9aa61f32427c19b681232 Author: Aurelien Jarno <aurelien@aurel32.net> Date: Tue Aug 2 09:18:59 2016 +0200 alpha: fix ceil on sNaN input The alpha version of ceil wrongly return sNaN for sNaN input. Fix that by checking for NaN and by returning the input value added with itself in that case. Finally remove the code to handle inexact exception, ceil should never generate such an exception. Changelog: * sysdeps/alpha/fpu/s_ceil.c (__ceil): Add argument with itself when it is a NaN. [_IEEE_FP_INEXACT] Remove. * sysdeps/alpha/fpu/s_ceilf.c (__ceilf): Likewise. (cherry picked from commit 062e53c195b4a87754632c7d51254867247698b4) (cherry picked from commit 3eff6f84311d2679a58a637e3be78b4ced275762) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=df13b9c22a0fb690a0ab9dd4af163ae3c459d975 commit df13b9c22a0fb690a0ab9dd4af163ae3c459d975 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Sep 6 08:50:55 2016 -0700 X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508] There is transition penalty when SSE instructions are mixed with 256-bit AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM registers, there is transition penalty when SSE instructions are used with lazy binding on AVX and AVX512 processors. To avoid SSE transition penalty, if only the lower 128 bits of the first 8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers with the zero upper bits. For AVX and AVX512 processors which support XGETBV with ECX == 1, we can use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers or the upper 256 bits of ZMM registers are zero. We can restore only the non-zero portion of vector registers with AVX/AVX512 load instructions which will zero-extend upper bits of vector registers. This patch adds _dl_runtime_resolve_sse_vex which saves and restores XMM registers with 128-bit AVX store/load instructions. It is used to preserve YMM/ZMM registers when only the lower 128 bits are non-zero. _dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so that we store and load only the non-zero portion of vector registers. This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 when only the lower 128 bits of vector registers are used. _dl_runtime_resolve_avx_slow is added and used for AVX processors which don't support XGETBV with ECX == 1. Since there is no SSE transition penalty on AVX512 processors which don't support XGETBV with ECX == 1, _dl_runtime_resolve_avx512_slow isn't provided. [BZ #20495] [BZ #20508] * sysdeps/x86/cpu-features.c (init_cpu_features): For Intel processors, set Use_dl_runtime_resolve_slow and set Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): New. (bit_arch_Use_dl_runtime_resolve_slow): Likewise. (index_arch_Use_dl_runtime_resolve_opt): Likewise. (index_arch_Use_dl_runtime_resolve_slow): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt if Use_dl_runtime_resolve_opt is set. Use _dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set. * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>. (_dl_runtime_resolve_opt): New. Defined for AVX and AVX512. (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow): New. (_dl_runtime_resolve_opt): Likewise. (_dl_runtime_profile): Define only if _dl_runtime_profile is defined. (cherry picked from commit fb0f7a6755c1bfaec38f490fbfcaa39a66ee3604) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b4391b0c7def246a4503db1af683122681c12a56 commit b4391b0c7def246a4503db1af683122681c12a56 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Sep 6 08:50:55 2016 -0700 X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508] There is transition penalty when SSE instructions are mixed with 256-bit AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM registers, there is transition penalty when SSE instructions are used with lazy binding on AVX and AVX512 processors. To avoid SSE transition penalty, if only the lower 128 bits of the first 8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers with the zero upper bits. For AVX and AVX512 processors which support XGETBV with ECX == 1, we can use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers or the upper 256 bits of ZMM registers are zero. We can restore only the non-zero portion of vector registers with AVX/AVX512 load instructions which will zero-extend upper bits of vector registers. This patch adds _dl_runtime_resolve_sse_vex which saves and restores XMM registers with 128-bit AVX store/load instructions. It is used to preserve YMM/ZMM registers when only the lower 128 bits are non-zero. _dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so that we store and load only the non-zero portion of vector registers. This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 when only the lower 128 bits of vector registers are used. _dl_runtime_resolve_avx_slow is added and used for AVX processors which don't support XGETBV with ECX == 1. Since there is no SSE transition penalty on AVX512 processors which don't support XGETBV with ECX == 1, _dl_runtime_resolve_avx512_slow isn't provided. [BZ #20495] [BZ #20508] * sysdeps/x86/cpu-features.c (init_cpu_features): For Intel processors, set Use_dl_runtime_resolve_slow and set Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): New. (bit_arch_Use_dl_runtime_resolve_slow): Likewise. (index_arch_Use_dl_runtime_resolve_opt): Likewise. (index_arch_Use_dl_runtime_resolve_slow): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt if Use_dl_runtime_resolve_opt is set. Use _dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set. * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>. (_dl_runtime_resolve_opt): New. Defined for AVX and AVX512. (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow): New. (_dl_runtime_resolve_opt): Likewise. (_dl_runtime_profile): Define only if _dl_runtime_profile is defined. (cherry picked from commit fb0f7a6755c1bfaec38f490fbfcaa39a66ee3604) (cherry picked from commit 4b8790c81c1a7b870a43810ec95e08a2e501123d) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0d5f4a32a34f048b35360a110a0e6d1c87e3eced commit 0d5f4a32a34f048b35360a110a0e6d1c87e3eced Author: Aurelien Jarno <aurelien@aurel32.net> Date: Thu Nov 24 12:10:13 2016 +0100 x86_64: fix static build of __memcpy_chk for compilers defaulting to PIC/PIE When glibc is compiled with gcc 6.2 that has been configured with to default to PIC/PIE, the static version of __memcpy_chk is not built, as the test is done on PIC instead of SHARED. Fix the test to check for SHARED, like it is done for similar functions like memmove_chk. Changelog: * sysdeps/x86_64/memcpy_chk.S (__memcpy_chk): Check for SHARED instead of PIC. (cherry picked from commit 380ec16d62f459d5a28cfc25b7b20990c45e1cc9) (cherry picked from commit 2d16e81babd1d7b66d10cec0bc6d6d86a7e0c95e) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0ab02a62e42e63b058e7a4e160dbe51762ef2c46 commit 0ab02a62e42e63b058e7a4e160dbe51762ef2c46 Author: Maciej W. Rozycki <macro@imgtec.com> Date: Thu Nov 17 19:15:51 2016 +0000 MIPS: Add `.insn' to ensure a text label is defined as code not data Avoid a build error with microMIPS compilation and recent versions of GAS which complain if a branch targets a label which is marked as data rather than microMIPS code: ../sysdeps/mips/mips32/crti.S: Assembler messages: ../sysdeps/mips/mips32/crti.S:72: Error: branch to a symbol in another ISA mode make[2]: *** [.../csu/crti.o] Error 1 as commit 9d862524f6ae ("MIPS: Verify the ISA mode and alignment of branch and jump targets") closed a hole in branch processing, making relocation calculation respect the ISA mode of the symbol referred. This allowed diagnosing the situation where an attempt is made to pass control from code assembled for one ISA mode to code assembled for a different ISA mode and either relaxing the branch to a cross-mode jump or if that is not possible, then reporting this as an error rather than letting such code build and then fail unpredictably at the run time. This however requires the correct annotation of branch targets as code, because the ISA mode is not relevant for data symbols and is therefore not recorded for them. The `.insn' pseudo-op is used for this purpose and has been supported by GAS since: Wed Feb 12 14:36:29 1997 Ian Lance Taylor <ian@cygnus.com> * config/tc-mips.c (mips_pseudo_table): Add "insn". (s_insn): New static function. * doc/c-mips.texi: Document .insn. so there has been no reason to avoid it where required. More recently this pseudo-op has been documented, by the microMIPS architecture specification[1][2], as required for the correct interpretation of any code label which is not followed by an actual instruction in an assembly source. Use it in our crti.S files then, to mark that the trailing label there with no instructions following is indeed not a code bug and the branch is legitimate. References: [1] "MIPS Architecture for Programmers, Volume II-B: The microMIPS32 Instruction Set", MIPS Technologies, Inc., Document Number: MD00582, Revision 5.04, January 15, 2014, Section 7.1 "Assembly-Level Compatibility", p. 533 [2] "MIPS Architecture for Programmers, Volume II-B: The microMIPS64 Instruction Set", MIPS Technologies, Inc., Document Number: MD00594, Revision 5.04, January 15, 2014, Section 8.1 "Assembly-Level Compatibility", p. 623 2016-11-23 Matthew Fortune <Matthew.Fortune@imgtec.com> Maciej W. Rozycki <macro@imgtec.com> * sysdeps/mips/mips32/crti.S (_init): Add `.insn' pseudo-op at `.Lno_weak_fn' label. * sysdeps/mips/mips64/n32/crti.S (_init): Likewise. * sysdeps/mips/mips64/n64/crti.S (_init): Likewise. (cherry picked from commit cfaf1949ff1f8336b54c43796d0e2531bc8a40a2) (cherry picked from commit 65a2b63756a4d622b938910d582d8b807c471c9a) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=901db98f36690e4743feefd985c6ba2d7fd19813 commit 901db98f36690e4743feefd985c6ba2d7fd19813 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Mon Nov 21 11:06:15 2016 -0200 Fix writes past the allocated array bounds in execvpe (BZ#20847) This patch fixes an invalid write out or stack allocated buffer in 2 places at execvpe implementation: 1. On 'maybe_script_execute' function where it allocates the new argument list and it does not account that a minimum of argc plus 3 elements (default shell path, script name, arguments, and ending null pointer) should be considered. The straightforward fix is just to take account of the correct list size on argument copy. 2. On '__execvpe' where the executable file name lenght may not account for ending '\0' and thus subsequent path creation may write past array bounds because it requires to add the terminating null. The fix is to change how to calculate the executable name size to add the final '\0' and adjust the rest of the code accordingly. As described in GCC bug report 78433 [1], these issues were masked off by GCC because it allocated several bytes more than necessary so that many off-by-one bugs went unnoticed. Checked on x86_64 with a latest GCC (7.0.0 20161121) with -O3 on CFLAGS. [BZ #20847] * posix/execvpe.c (maybe_script_execute): Remove write past allocated array bounds. (__execvpe): Likewise. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78433 (cherry picked from commit d174436712e3cabce70d6cd771f177b6fe0e097b) ----------------------------------------------------------------------- Summary of changes: ChangeLog | 25 ++++++++++ posix/execvpe.c | 15 ++++-- sysdeps/alpha/fpu/s_ceil.c | 7 +-- sysdeps/alpha/fpu/s_ceilf.c | 7 +-- sysdeps/alpha/fpu/s_floor.c | 7 +-- sysdeps/alpha/fpu/s_floorf.c | 7 +-- sysdeps/alpha/fpu/s_rint.c | 3 + sysdeps/alpha/fpu/s_rintf.c | 3 + sysdeps/alpha/fpu/s_trunc.c | 7 +-- sysdeps/alpha/fpu/s_truncf.c | 7 +-- sysdeps/mips/mips32/crti.S | 1 + sysdeps/mips/mips64/n32/crti.S | 1 + sysdeps/mips/mips64/n64/crti.S | 1 + sysdeps/x86/cpu-features.c | 14 +++++ sysdeps/x86/cpu-features.h | 6 ++ sysdeps/x86_64/dl-machine.h | 24 ++++++++- sysdeps/x86_64/dl-trampoline.S | 20 ++++++++ sysdeps/x86_64/dl-trampoline.h | 104 +++++++++++++++++++++++++++++++++++++++- sysdeps/x86_64/memcpy_chk.S | 2 +- 19 files changed, 228 insertions(+), 33 deletions(-)
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The annotated tag, glibc-2.25 has been created at be176490b818b65b5162c332eb6b581690b16e5c (tag) tagging db0242e3023436757bbc7c488a779e6e3343db04 (commit) replaces glibc-2.24 tagged by Siddhesh Poyarekar on Sun Feb 5 21:19:00 2017 +0530 - Log ----------------------------------------------------------------- The GNU C Library ================= The GNU C Library version 2.25 is now available. The GNU C Library is used as *the* C library in the GNU system and in GNU/Linux systems, as well as many other systems that use Linux as the kernel. The GNU C Library is primarily designed to be a portable and high performance C library. It follows all relevant standards including ISO C11 and POSIX.1-2008. It is also internationalized and has one of the most complete internationalization interfaces known. The GNU C Library webpage is at http://www.gnu.org/software/libc/ Packages for the 2.25 release may be downloaded from: http://ftpmirror.gnu.org/libc/ http://ftp.gnu.org/gnu/libc/ The mirror list is at http://www.gnu.org/order/ftp.html NEWS for version 2.25 ===================== * The feature test macro __STDC_WANT_LIB_EXT2__, from ISO/IEC TR 24731-2:2010, is supported to enable declarations of functions from that TR. Note that not all functions from that TR are supported by the GNU C Library. * The feature test macro __STDC_WANT_IEC_60559_BFP_EXT__, from ISO/IEC TS 18661-1:2014, is supported to enable declarations of functions and macros from that TS. Note that not all features from that TS are supported by the GNU C Library. * The feature test macro __STDC_WANT_IEC_60559_FUNCS_EXT__, from ISO/IEC TS 18661-4:2015, is supported to enable declarations of functions and macros from that TS. Note that most features from that TS are not supported by the GNU C Library. * The nonstandard feature selection macros _REENTRANT and _THREAD_SAFE are now treated as compatibility synonyms for _POSIX_C_SOURCE=199506L. Since the GNU C Library defaults to a much newer revision of POSIX, this will only affect programs that specifically request an old conformance mode. For instance, a program compiled with -std=c89 -D_REENTRANT will see a change in the visible declarations, but a program compiled with just -D_REENTRANT, or -std=c99 -D_POSIX_C_SOURCE=200809L -D_REENTRANT, will not. Some C libraries once required _REENTRANT and/or _THREAD_SAFE to be defined by all multithreaded code, but glibc has not required this for many years. * The inclusion of <sys/sysmacros.h> by <sys/types.h> is deprecated. This means that in a future release, the macros “major”, “minor”, and “makedev” will only be available from <sys/sysmacros.h>. These macros are not part of POSIX nor XSI, and their names frequently collide with user code; see for instance glibc bug 19239 and Red Hat bug 130601. <stdlib.h> includes <sys/types.h> under _GNU_SOURCE, and C++ code presently cannot avoid being compiled under _GNU_SOURCE, exacerbating the problem. * New <fenv.h> features from TS 18661-1:2014 are added to libm: the fesetexcept, fetestexceptflag, fegetmode and fesetmode functions, the femode_t type and the FE_DFL_MODE and FE_SNANS_ALWAYS_SIGNAL macros. * Integer width macros from TS 18661-1:2014 are added to <limits.h>: CHAR_WIDTH, SCHAR_WIDTH, UCHAR_WIDTH, SHRT_WIDTH, USHRT_WIDTH, INT_WIDTH, UINT_WIDTH, LONG_WIDTH, ULONG_WIDTH, LLONG_WIDTH, ULLONG_WIDTH; and to <stdint.h>: INT8_WIDTH, UINT8_WIDTH, INT16_WIDTH, UINT16_WIDTH, INT32_WIDTH, UINT32_WIDTH, INT64_WIDTH, UINT64_WIDTH, INT_LEAST8_WIDTH, UINT_LEAST8_WIDTH, INT_LEAST16_WIDTH, UINT_LEAST16_WIDTH, INT_LEAST32_WIDTH, UINT_LEAST32_WIDTH, INT_LEAST64_WIDTH, UINT_LEAST64_WIDTH, INT_FAST8_WIDTH, UINT_FAST8_WIDTH, INT_FAST16_WIDTH, UINT_FAST16_WIDTH, INT_FAST32_WIDTH, UINT_FAST32_WIDTH, INT_FAST64_WIDTH, UINT_FAST64_WIDTH, INTPTR_WIDTH, UINTPTR_WIDTH, INTMAX_WIDTH, UINTMAX_WIDTH, PTRDIFF_WIDTH, SIG_ATOMIC_WIDTH, SIZE_WIDTH, WCHAR_WIDTH, WINT_WIDTH. * New <math.h> features are added from TS 18661-1:2014: - Signaling NaN macros: SNANF, SNAN, SNANL. - Nearest integer functions: roundeven, roundevenf, roundevenl, fromfp, fromfpf, fromfpl, ufromfp, ufromfpf, ufromfpl, fromfpx, fromfpxf, fromfpxl, ufromfpx, ufromfpxf, ufromfpxl. - llogb functions: the llogb, llogbf and llogbl functions, and the FP_LLOGB0 and FP_LLOGBNAN macros. - Max-min magnitude functions: fmaxmag, fmaxmagf, fmaxmagl, fminmag, fminmagf, fminmagl. - Comparison macros: iseqsig. - Classification macros: iscanonical, issubnormal, iszero. - Total order functions: totalorder, totalorderf, totalorderl, totalordermag, totalordermagf, totalordermagl. - Canonicalize functions: canonicalize, canonicalizef, canonicalizel. - NaN functions: getpayload, getpayloadf, getpayloadl, setpayload, setpayloadf, setpayloadl, setpayloadsig, setpayloadsigf, setpayloadsigl. * The functions strfromd, strfromf, and strfroml, from ISO/IEC TS 18661-1:2014, are added to libc. They convert a floating-point number into string. * Most of glibc can now be built with the stack smashing protector enabled. It is recommended to build glibc with --enable-stack-protector=strong. Implemented by Nick Alcock (Oracle). * The function explicit_bzero, from OpenBSD, has been added to libc. It is intended to be used instead of memset() to erase sensitive data after use; the compiler will not optimize out calls to explicit_bzero even if they are "unnecessary" (in the sense that no _correct_ program can observe the effects of the memory clear). * On ColdFire, MicroBlaze, Nios II and SH3, the float_t type is now defined to float instead of double. This does not affect the ABI of any libraries that are part of the GNU C Library, but may affect the ABI of other libraries that use this type in their interfaces. * On x86_64, when compiling with -mfpmath=387 or -mfpmath=sse+387, the float_t and double_t types are now defined to long double instead of float and double. These options are not the default, and this does not affect the ABI of any libraries that are part of the GNU C Library, but it may affect the ABI of other libraries that use this type in their interfaces, if they are compiled or used with those options. * The getentropy and getrandom functions, and the <sys/random.h> header file have been added. * The buffer size for byte-oriented stdio streams is now limited to 8192 bytes by default. Previously, on Linux, the default buffer size on most file systems was 4096 bytes (and thus remains unchanged), except on network file systems, where the buffer size was unpredictable and could be as large as several megabytes. * The <sys/quota.h> header now includes the <linux/quota.h> header. Support for the Linux quota interface which predates kernel version 2.4.22 has been removed. * The malloc_get_state and malloc_set_state functions have been removed. Already-existing binaries that dynamically link to these functions will get a hidden implementation in which malloc_get_state is a stub. As far as we know, these functions are used only by GNU Emacs and this change will not adversely affect already-built Emacs executables. Any undumped Emacs executables, which normally exist only during an Emacs build, should be rebuilt by re-running “./configure; make” in the Emacs build tree. * The “ip6-dotint” and “no-ip6-dotint” resolver options, and the corresponding RES_NOIP6DOTINT flag from <resolv.h> have been removed. “no-ip6-dotint” had already been the default, and support for the “ip6-dotint” option was removed from the Internet in 2006. * The "ip6-bytestring" resolver option and the corresponding RES_USEBSTRING flag from <resolv.h> have been removed. The option relied on a backwards-incompatible DNS extension which was never deployed on the Internet. * The flags RES_AAONLY, RES_PRIMARY, RES_NOCHECKNAME, RES_KEEPTSIG, RES_BLAST defined in the <resolv.h> header file have been deprecated. They were already unimplemented. * The "inet6" option in /etc/resolv.conf and the RES_USE_INET6 flag for _res.flags are deprecated. The flag was standardized in RFC 2133, but removed again from the IETF name lookup interface specification in RFC 2553. Applications should use getaddrinfo instead. * DNSSEC-related declarations and definitions have been removed from the <arpa/nameser.h> header file, and libresolv will no longer attempt to decode the data part of DNSSEC record types. Previous versions of glibc only implemented minimal support for the previous version of DNSSEC, which is incompatible with the currently deployed version. * The resource record type classification macros ns_t_qt_p, ns_t_mrr_p, ns_t_rr_p, ns_t_udp_p, ns_t_xfr_p have been removed from the <arpa/nameser.h> header file because the distinction between RR types and meta-RR types is not officially standardized, subject to revision, and thus not suitable for encoding in a macro. * The types res_sendhookact, res_send_qhook, re_send_rhook, and the qhook and rhook members of the res_state type in <resolv.h> have been removed. The glibc stub resolver did not support these hooks, but the header file did not reflect that. * For multi-arch support it is recommended to use a GCC which has been built with support for GNU indirect functions. This ensures that correct debugging information is generated for functions selected by IFUNC resolvers. This support can either be enabled by configuring GCC with '--enable-gnu-indirect-function', or by enabling it by default by setting 'default_gnu_indirect_function' variable for a particular architecture in the GCC source file 'gcc/config.gcc'. * GDB pretty printers have been added for mutex and condition variable structures in POSIX Threads. When installed and loaded in gdb these pretty printers show various pthread variables in human-readable form when read using the 'print' or 'display' commands in gdb. * Tunables feature added to allow tweaking of the runtime for an application program. This feature can be enabled with the '--enable-tunables' configure flag. The GNU C Library manual has details on usage and README.tunables has instructions on adding new tunables to the library. * A new version of condition variables functions have been implemented in the NPTL implementation of POSIX Threads to provide stronger ordering guarantees. * A new version of pthread_rwlock functions have been implemented to use a more scalable algorithm primarily through not using a critical section anymore to make state changes. Security related changes: * On ARM EABI (32-bit), generating a backtrace for execution contexts which have been created with makecontext could fail to terminate due to a missing .cantunwind annotation. This has been observed to lead to a hang (denial of service) in some Go applications compiled with gccgo. Reported by Andreas Schwab. (CVE-2016-6323) * The DNS stub resolver functions would crash due to a NULL pointer dereference when processing a query with a valid DNS question type which was used internally in the implementation. The stub resolver now uses a question type which is outside the range of valid question type values. (CVE-2015-5180) Contributors ============ This release was made possible by the contributions of many people. The maintainers are grateful to everyone who has contributed changes or bug reports. These include: Adhemerval Zanella Alan Modra Alexandre Oliva Andreas Schwab Andrew Senkevich Aurelien Jarno Brent W. Baccala Carlos O'Donell Chris Metcalf Chung-Lin Tang DJ Delorie David S. Miller Denis Kaganovich Dmitry V. Levin Ernestas Kulik Florian Weimer Gabriel F T Gomes Gabriel F. T. Gomes H.J. Lu Jakub Jelinek James Clarke James Greenhalgh Jim Meyering John David Anglin Joseph Myers Maciej W. Rozycki Mark Wielaard Martin Galvan Martin Pitt Mike Frysinger Märt Põder Nick Alcock Paul E. Murphy Paul Murphy Rajalakshmi Srinivasaraghavan Rasmus Villemoes Rical Jasan Richard Henderson Roland McGrath Samuel Thibault Siddhesh Poyarekar Stefan Liebler Steve Ellcey Svante Signell Szabolcs Nagy Tom Tromey Torvald Riegel Tulio Magno Quites Machado Filho Wilco Dijkstra Yury Norov Zack Weinberg -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJYl0mTAAoJEHnEPfvxzyGHXTgH/jsS205Wdz9EniZrJ6+NXCm1 F/eeOMotGNv82BYaLRnw9XrF7p6+ND8E+7rSvFZT5O309OrdLjg4QG6M63COMRCh 6KKtQUM/00I1u4AYkOOgrUkor3m58GgeQUziOxXNvQNoU8zLguPk4kzVsvxq6lJR /IROH2Mfl1AggOGq9Y1R/0uQCpj4jJSLETxJupg4calGPZQW3isogucSmogdccAB Bqso7L40Xo4LJnEoD7JurlMrP5x043TttmTyvnFTtxRZTAHVjyQpFMKHaSkMgtIG +fe26Ua3oMqbE9A9G3qiMIrPEqu+0tWKbvci0FeaE30vfI6YtVcd8I0RlBW9gok= =3NM3 -----END PGP SIGNATURE----- Adhemerval Zanella (69): Fix test-skeleton C99 designed initialization nptl: Consolidate sem_open implementations nptl: Set sem_open as a non cancellation point (BZ #15765) nptl: Remove sparc sem_wait nptl: Fix sem_wait and sem_timedwait cancellation (BZ#18243) rt: Set shm_open as a non cancellation point (BZ #18243) nptl: Consolidate sem_init implementations posix: Correctly enable/disable cancellation on Linux posix_spawn posix: Correctly block/unblock all signals on Linux posix_spawn Add INTERNAL_SYSCALL_CALL posix: Fix open file action for posix_spawn on Linux Remove C++ style comments from string3.h libio: Multiple fixes for open_{w}memstram (BZ#18241 and BZ#20181) Fix tst-memstream3 build failure Consolidate fallocate{64} implementations Consolidate posix_fallocate{64} implementations Consolidate posix_fadvise implementations Fix iseqsig for ports that do not support FE_INVALID Consolidate Linux sync_file_range implementations Fix posix_fadvise64 build on mips64n64 Fix Linux fallocate tests for EOPNOTSUPP Fix Linux sh4 pread/pwrite argument passing Fix sparc build due missing __WORDSIZE_TIME64_COMPAT32 definition Consolidate lseek/lseek64/llseek implementations Consolidate Linux ftruncate implementations Consolidate Linux truncate implementations Consolidate Linux access implementation Fix sh4 build with __ASSUME_ST_INO_64_BIT redefinition New internal function __access_noerrno Consolidate Linux setrlimit and getrlimit implementation Fix hurd __access_noerrno implementation. Fix writes past the allocated array bounds in execvpe (BZ#20847) Remove cached PID/TID in clone powerpc: Remove stpcpy internal clash with IFUNC powerpc: Remove stpcpy internal clash with IFUNC Fix writes past the allocated array bounds in execvpe (BZ#20847) Consolidate rename Linux implementation Consolidate renameat Linux implementation Fix powerpc64/power7 memchr for large input sizes Fix typos and missing closing bracket in test-memchr.c Adjust benchtests to new support library. benchtests: Add fmax/fmin benchmarks benchtests: Add fmaxf/fminf benchmarks Fix x86_64 memchr for large input sizes powerpc: Remove f{max,min}{f} assembly implementations Add __ASSUME_DIRECT_SYSVIPC_SYSCALL for Linux Refactor Linux ipc_priv header Consolidate Linux msgctl implementation Consolidate Linux msgrcv implementation Use msgsnd syscall for Linux implementation Use msgget syscall for Linux implementation Add SYSV message queue test Consolidate Linux semctl implementation Use semget syscall for Linux implementation Use semop syscall for Linux implementation Consolidate Linux semtimedop implementation Add SYSV semaphore test Use shmat syscall for Linux implementation Consolidate Linux shmctl implementation Use shmdt syscall for linux implementation Use shmget syscall for linux implementation Add SYSV shared memory test Fix i686 memchr for large input sizes Fix test-sysvsem on some platforms Fix x86 strncat optimized implementation for large sizes Remove duplicate strcat implementations Use fortify macros for b{zero,copy} along decl from strings.h Move fortified explicit_bzero back to string3 Add missing bugzilla reference in previous ChangeLog entry Alan Modra (1): powerpc32: make PLT call in _mcount compatible with -msecure-plt (bug 20554) Alexandre Oliva (2): [PR19826] fix non-LE TLS in static programs Bug 20915: Do not initialize DTV of other threads. Andreas Schwab (11): arm: mark __startcontext as .cantunwind (bug 20435) Properly initialize glob structure with GLOB_BRACE|GLOB_DOOFFS (bug 20707) Fix multiple definitions of mk[o]stemp[s]64 Get rid of __elision_available Fix testsuite timeout handling powerpc: remove _dl_platform_string and _dl_powerpc_platforms Fix assertion failure on test timeout Fix ChangeLog typo Revert "Fix ChangeLog typo" m68k: fix 64bit atomic ops Fix missing test dependency Andrew Senkevich (4): x86_64: Call finite scalar versions in vectorized log, pow, exp (bz #20033). Install libm.a as linker script (bug 20539). Better design of libm.a installation rule. Disable TSX on some Haswell processors. Aurelien Jarno (14): alpha: fix ceil on sNaN input alpha: fix floor on sNaN input alpha: fix rint on sNaN input alpha: fix trunc for big input values powerpc: fix ifunc-sel.h with GCC 6 powerpc: fix ifunc-sel.h fix asm constraints and clobber list sparc64: add a VIS3 version of ceil, floor and trunc sparc: build with -mvis on sparc32/sparcv9 and sparc64 sparc: remove fdim sparc specific implementations sparc32/sparcv9: add a VIS3 version of fdim Set NODELETE flag after checking for NULL pointer conform tests: call perl with '-I.' gconv.h: fix build with GCC 7 x86_64: fix static build of __memcpy_chk for compilers defaulting to PIC/PIE Brent W. Baccala (1): hurd: Fix spurious port deallocation Carlos O'Donell (17): Open development for 2.25. Update PO files. Bug 20292 - Simplify and test _dl_addr_inside_object Bug 20689: Fix FMA and AVX2 detection on Intel Fix atomic_fetch_xor_release. Add missing include for stdlib.h. Fix building tst-linkall-static. Add include/crypt.h. Bug 20729: Fix building with -Os. Bug 20729: Include libc-internal.h where required. Bug 20729: Fix build failures on ppc64 and other arches. Remove out of date PROJECTS file. Bug 20918 - Building with --enable-nss-crypt fails tst-linkall-static Bug 11941: ld.so: Improper assert map->l_init_called in dlclose Add deferred cancellation regression test for getpwuid_r. Fix failing pretty printer tests when CPPFLAGS has optimizations. Bug 20116: Fix use after free in pthread_create() Chris Metcalf (6): Make sure tilepro uses kernel atomics fo atomic_store Make tile's set_dataplane API compatibility-only tile: create new math-tests.h header build-many-glibcs: Revert -fno-isolate-erroneous-paths options for tilepro tile: pass __IPC_64 as zero for SysV IPC calls tile: Check for pointer add overflow in memchr Chung-Lin Tang (1): Add ipc_priv.h header for Nios II to set __IPC_64 to zero. DJ Delorie (1): * elf/dl-tunables.c (tunable_set_val_if_valid_range): Split into ... David S. Miller (4): Fix wide-char testsuite SIGBUS on platforms such as Sparc. Fix sNaN handling in nearbyint on 32-bit sparc. Fix a sparc header conformtest failure. sparc: Remove optimized math routines which cause testsuite failures. Denis Kaganovich (1): configure: accept __stack_chk_fail_local for ssp support too [BZ #20662] Dmitry V. Levin (1): Fix typos in the spelling of "implementation" Ernestas Kulik (1): localedata: lt_LT: use hyphens in d_fmt [BZ #20497] Florian Weimer (100): malloc: Preserve arena free list/thread count invariant [BZ #20370] malloc: Run tests without calling mallopt [BZ #19469] Add support for referencing specific symbol versions elf: dl-minimal malloc needs to respect fundamental alignment elf: Avoid using memalign for TLS allocations [BZ #17730] elf: Do not use memalign for TCB/TLS blocks allocation [BZ #17730] x86: Use sysdep.o from libc.a in static libraries Add missing reference to bug 20452 nptl/tst-tls3-malloc: Force freeing of thread stacks Add NEWS entry for CVE-2016-6323 Add CVE-2016-6323 missing from NEWS entry Do not override objects in libc.a in other static libraries [BZ #20452] nptl/tst-once5: Reduce time to expected failure argp: Do not override GCC keywords with macros [BZ #16907] string: More tests for strcmp, strcasecmp, strncmp, strncasecmp nptl: Avoid expected SIGALRM in most tests [BZ #20432] Correct incorrect bug number in changelog malloc: Simplify static malloc interposition [BZ #20432] Base <sys/quota.h> on Linux kernel headers [BZ #20525] vfprintf: Avoid creating a VLA which complicates stack management vfscanf: Avoid multiple reads of multi-byte character width malloc: Automated part of conversion to __libc_lock resolv: Remove _LIBC_REENTRANT Remove the ptw-% patterns inet: Add __inet6_scopeid_pton function [BZ #20611] sysd-rules: Cut down the number of rtld-% pattern rules Remove remnants of .og patterns sln: Preprocessor cleanups Generate .op pattern rules for profiling builds only Avoid running $(CXX) during build to obtain header file paths Add test case for O_TMPFILE handling in open, openat manual: Clarify the documentation of strverscmp [BZ #20524] Remove obsolete DNSSEC support [BZ #20591] resolv: Remove the BIND_4_COMPAT macro <arpa/nameser.h>, <arpa/nameser_compat.h>: Remove versions <arpa/nameser.h>: Remove RR type classification macros [BZ #20592] malloc: Manual part of conversion to __libc_lock resolv: Remove unsupported hook functions from the API [BZ #20016] test-skeleton.c: Remove unintended #include <stdarg.h>. tst-open-tmpfile: Add checks for open64, openat64, linkat manual: Clarify NSS error reporting resolv: Deprecate unimplemented flags resolv: Remove RES_NOIP6DOTINT and its implementation resolv: Remove RES_USEBSTRING and its implementation [BZ #20629] resolv: Compile without -Wno-write-strings math: Define iszero as a function template for C++ [BZ #20715] math.h: Wrap C++ bits in extern "C++" iconv: Avoid writable data and relocations in IBM charsets iconv: Avoid writable data and relocations in ISO646 malloc: Remove malloc_get_state, malloc_set_state [BZ #19473] malloc: Use accessors for chunk metadata access sysmalloc: Initialize previous size field of mmaped chunks Add test for linking against most static libraries i386: Support CFLAGS which imply -fno-omit-frame-pointer [BZ #20729] crypt: Use internal names for the SHA-2 block functions malloc: Update comments about chunk layout nptl: Document the reason why __kind in pthread_mutex_t is part of the ABI s390x: Add hidden definition for __sigsetjmp elf: Assume TLS is initialized in _dl_map_object_from_fd powerpc: Remove unintended __longjmp symbol from ABI powerpc: Add hidden definition for __sigsetjmp gconv: Adjust GBK to support the Euro sign libio: Limit buffer size to 8192 bytes [BZ #4099] Implement _dl_catch_error, _dl_signal_error in libc.so [BZ #16628] ld.so: Remove __libc_memalign aarch64: Use explicit offsets in _dl_tlsdesc_dynamic elf/tst-tls-manydynamic: New test support: Introduce new subdirectory for test infrastructure inet: Make IN6_IS_ADDR_UNSPECIFIED etc. usable with POSIX [BZ #16421] debug: Additional compiler barriers for backtrace tests [BZ #20956] Add getentropy, getrandom, <sys/random.h> [BZ #17252] Expose linking against libsupport as make dependency nptl/tst-cancel7: Add missing case label Add missing bug number to ChangeLog Do not require memset elimination in explicit_bzero test Remove unused function _dl_tls_setup scripts/test_printers_common.py: Log GDB error message rpcinfo: Remove traces of unbuilt helper program sunrpc: Always obtain AF_INET addresses from NSS [BZ #20964] resolv: Remove processing of unimplemented "spoof" host.conf options Declare getentropy in <unistd.h> [BZ #17252] support: Add support for delayed test failure reporting Add file missing from ChangeLog in previous commit Fix various typos in the ChangeLog resolv: Turn historic name lookup functions into compat symbols getentropy: Declare it in <unistd.h> for __USE_MISC [BZ #17252] support: Helper functions for entering namespaces support: Use support_record_failure consistently support: Implement --verbose option for test programs resolv: Add beginnings of a libresolv test suite resolv: Deprecate the "inet6" option and RES_USE_INET6 [BZ #19582] resolv: Deprecate RES_BLAST tunables: Use correct unused attribute CVE-2015-5180: resolv: Fix crash with internal QTYPE [BZ #18784] Update DNS RR type definitions [BZ #20593] malloc: Run tunables tests only if tunables are enabled support: Use %td for pointer difference in xwrite support: struct netent portability fix for support_format_netent string/tst-strcoll-overflow: Do not accept timeout as test result nptl: Add tst-robust-fork Gabriel F T Gomes (1): Fix warning caused by unused-result in bug-atexit3-lib.cc Gabriel F. T. Gomes (10): Add strfromd, strfromf, and strfroml functions Use read_int in vfscanf Use write_message instead of write Write messages to stdout and use write_message instead of write Make w_log1p type-generic Fix arg used as litteral suffix in tst-strfrom.h Make w_scalbln type-generic Replace use of snprintf with strfrom in libm tests Fix typo in manual for iseqsig Move wrappers to libm-compat-calls-auto H.J. Lu (8): X86: Change bit_YMM_state to (1 << 2) X86-64: Correct CFA in _dl_runtime_resolve X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508] X86: Don't assert on older Intel CPUs [BZ #20647] Check IFUNC definition in unrelocated shared library [BZ #20019] X86_64: Don't use PLT nor GOT in static archives [BZ #20750] Add VZEROUPPER to memset-vec-unaligned-erms.S [BZ #21081] Allow IFUNC relocation against unrelocated shared library Jakub Jelinek (1): * soft-fp/op-common.h (_FP_MUL, _FP_FMA, _FP_DIV): Add James Clarke (1): Bug 21053: sh: Reduce namespace pollution from sys/ucontext.h James Greenhalgh (1): [soft-fp] Add support for various half-precision conversion routines. Jim Meyering (1): assert.h: allow gcc to detect assert(a = 1) errors John David Anglin (1): hppa: Optimize atomic_compare_and_exchange_val_acq Joseph Myers (181): Support __STDC_WANT_LIB_EXT2__ feature test macro. Define PF_QIPCRTR, AF_QIPCRTR from Linux 4.7 in bits/socket.h. Define UDP_ENCAP_* from Linux 4.7 in netinet/udp.h. Support __STDC_WANT_IEC_60559_BFP_EXT__ feature test macro. Fix typo in last arith.texi change. Support __STDC_WANT_IEC_60559_FUNCS_EXT__ feature test macro. Also handle __STDC_WANT_IEC_60559_BFP_EXT__ in <tgmath.h>. Do not call __nan in scalb functions. Fix math.h comment about bits/mathdef.h. Add tests for fegetexceptflag, fesetexceptflag. Fix powerpc fesetexceptflag clearing FE_INVALID (bug 20455). Fix test-fexcept when "inexact" implicitly raised. Add comment from sysdeps/powerpc/fpu/fraiseexcpt.c to fsetexcptflg.c. Add fesetexcept. Add fesetexcept: aarch64. Add fesetexcept: alpha. Add fesetexcept: arm. Add fesetexcept: hppa. Add fesetexcept: ia64. Add fesetexcept: m68k. Add fesetexcept: mips. Add fesetexcept: powerpc. Add fesetexcept: s390. Add fesetexcept: sh. Add fesetexcept: sparc. Fix soft-fp extended.h unpacking (GCC bug 77265). Add fetestexceptflag. Add femode_t functions. Add femode_t functions: aarch64. Add femode_t functions: alpha. Add femode_t functions: arm. Add femode_t functions: hppa. Add femode_t functions: ia64. Add femode_t functions: m68k. Add femode_t functions: mips. Add femode_t functions: powerpc. Add femode_t functions: s390. Add femode_t functions: sh. Add femode_t functions: sparc. Add e500 version of fetestexceptflag. Add <limits.h> integer width macros. Add <stdint.h> integer width macros. Add issubnormal. Add iszero. Fix iszero for excess precision. Add iscanonical. Fix ldbl-128ibm iscanonical for -mlong-double-64. Use __builtin_fma more in dbl-64 code. Add TCP_REPAIR_WINDOW from Linux 4.8. Fix LONG_WIDTH, ULONG_WIDTH include ordering issue. Add iseqsig. Make iseqsig handle excess precision. Avoid M_NAN + M_NAN in complex functions. Add totalorder, totalorderf, totalorderl. Add more totalorder tests. Clean up some complex functions raising FE_INVALID. Add totalordermag, totalordermagf, totalordermagl. Define HIGH_ORDER_BIT_IS_SET_FOR_SNAN to 0 or 1. Add getpayload, getpayloadf, getpayloadl. Stop powerpc copysignl raising "invalid" for sNaN argument (bug 20718). Use VSQRT instruction for ARM sqrt (bug 20660). Use -fno-builtin for sqrt benchmark. Fix cmpli usage in power6 memset. Add getpayloadl to libnldbl. Add canonicalize, canonicalizef, canonicalizel. Make strtod raise "inexact" exceptions (bug 19380). Add SNAN, SNANF, SNANL macros. Correct clog10 documentation (bug 19673). Fix linknamespace parallel test failures. Handle tilegx* machine names. Add localplt.data for MIPS. XFAIL check-execstack for MIPS. Make MIPS <sys/user.h> self-contained. Do not hardcode platform names in manual/libm-err-tab.pl (bug 14139). Fix alpha sqrt fegetenv namespace (bug 20768). Handle tests-unsupported if run-built-tests = no. Do not generate UNRESOLVED results for run-built-tests = no. Make check-installed-headers.sh ignore sys/sysctl.h for x32. Update nios2 localplt.data. Update alpha localplt.data. Add localplt.data for hppa. Add localplt.data for sh. Fix rpcgen buffer overrun (bug 20790). Refactor some libm type-generic macros. Make SH <sys/user.h> self-contained. Ignore -Wmaybe-uninitialized in stdlib/bug-getcontext.c. Add script to build many glibc configurations. Make tilegx32 install libraries in lib32 directories. Fix build-many-glibcs.py style issues. Make SH ucontext always match current kernels. Fix SH4 register-dump.h for soft-float. Fix crypt snprintf namespace (bug 20829). Enable linknamespace testing for libdl and libcrypt. Make Alpha <sys/user.h> self-contained. Actually use newly built host libraries in build-many-glibcs.py. Quote shell commands in logs from build-many-glibcs.py. Add setpayload, setpayloadf, setpayloadl. Make build-many-glibcs.py use -fno-isolate-erroneous-paths options for tilepro. Fix default float_t definition (bug 20855). Fix x86_64 -mfpmath=387 float_t, double_t (bug 20787). Fix SH4 FP_ILOGB0 (bug 20859). More NEWS entries / fixes for float_t / double_t changes. Refactor float_t, double_t information into bits/flt-eval-method.h. Make build-many-glibcs.py track component versions requested and used. Add setpayloadsig, setpayloadsigf, setpayloadsigl. Make build-many-glibcs.py re-exec itself if changed by checkout. Make build-many-glibcs.py store more information about builds. Do not include asm/cachectl.h in nios2 sys/cachectl.h. Fix sysdeps/ia64/fpu/libm-symbols.h for inclusion in testcases. Work around IA64 tst-setcontext2.c compile failure. Make ilogb wrappers type-generic. Refactor FP_FAST_* into bits/fp-fast.h. Add build-many-glibcs.py bot-cycle action. Make build-many-glibcs.py support running as a bot. Refactor FP_ILOGB* out of bits/mathdef.h. Add missing hidden_def (__sigsetjmp). Make ldbl-128 getpayload, setpayload functions use _Float128. Add llogb, llogbf, llogbl. Fix pow (qNaN, 0) result with -lieee (bug 20919), remove dead parts of wrappers. Fix sysdeps/ieee754 pow handling of sNaN arguments (bug 20916). Fix x86_64/x86 powl handling of sNaN arguments (bug 20916). Fix hypot sNaN handling (bug 20940). Fix typo in last ChangeLog message. Add build-many-glibcs.py option to strip installed shared libraries. Fix tests-printers handling for cross compiling. Use Linux 4.9 (headers) in build-many-glibcs.py. Add [BZ #19398] marker to ChangeLog entry. Include <linux/falloc.h> in bits/fcntl-linux.h. Refactor long double information into bits/long-double.h. Fix generic fmax, fmin sNaN handling (bug 20947). Fix powerpc fmax, fmin sNaN handling (bug 20947). Fix x86, x86_64 fmax, fmin sNaN handling, add tests (bug 20947). Make build-many-glibcs.py flush stdout before execv. Define FE_SNANS_ALWAYS_SIGNAL. Document sNaN argument error handling. Add fmaxmag, fminmag functions. Add preprocessor indentation for llogb macro in tgmath.h. Add roundeven, roundevenf, roundevenl. Update miscellaneous files from upstream sources. Fix nss_nisplus build with mainline GCC (bug 20978). Update NEWS feature test macro description of TS 18661-1 support. Fix tst-support_record_failure-2 for run-built-tests = no. Define __intmax_t, __uintmax_t in bits/types.h. Add fromfp functions. Update copyright dates with scripts/update-copyrights. Update copyright dates not handled by scripts/update-copyrights. Update config.guess and config.sub to current versions. Make build-many-glibcs.py use binutils 2.28 branch by default. Correct MIPS math-tests.h condition for sNaN payload preservation. Fix math/test-nearbyint-except for no-exceptions configurations. Add build-many-glibcs.py powerpc-linux-gnu-power4 build. Fix MIPS n32 lseek, lseek64 (bug 21019). Fix elf/tst-ldconfig-X for cross testing. Fix math/test-fenvinline for no-exceptions configurations. Update i386 libm-test-ulps. Fix MicroBlaze __backtrace get_frame_size namespace (bug 21022). Make MIPS soft-fp preserve NaN payloads for NAN2008. Fix MicroBlaze bits/setjmp.h for C++. Update libm-test XFAILs for ibm128 format. Fix malloc/ tests for GCC 7 -Walloc-size-larger-than=. Fix string/tester.c for GCC 7 -Wstringop-overflow=. Fix MIPS n64 readahead (bug 21026). Increase some test timeouts. Make fallback fesetexceptflag always succeed (bug 21028). Update MicroBlaze localplt.data. Fix math/test-fenv for no-exceptions / no-rounding-modes configurations. Improve libm-test XFAILing for ibm128-libgcc. XFAIL libm-test.inc tests as needed for ibm128. Fix elf/sotruss-lib format-truncation error. Fix ld-address format-truncation error. Fix testsuite build for GCC 7 -Wformat-truncation. Make endian-conversion macros always return correct types (bug 16458). Make fallback fegetexceptflag work with generic fetestexceptflag. Fix MIPS o32 posix_fadvise. Make soft-float powerpc swapcontext restore the signal mask (bug 21045). Update install.texi latest GCC version known to work. Avoid parallel GCC install in build-many-glibcs.py. Fix ARM fpu_control.h for assemblers requiring VFP insn names (bug 21047). Restore clock_* librt exports for MicroBlaze (bug 21061). Update README.libm-test. Remove very old libm-test-ulps entries. Maciej W. Rozycki (2): MIPS: Add `.insn' to ensure a text label is defined as code not data MIPS: Use R_MICROMIPS_JALR rather than R_MIPS_JALR in microMIPS code Mark Wielaard (1): Reduce memory size of tsearch red-black tree. Martin Galvan (3): Add pretty printers for the NPTL lock types Add -B to python invocation to avoid generating pyc files Fix up tabs/spaces mismatches Martin Pitt (1): locales: en_CA: update d_fmt [BZ #9842] Mike Frysinger (5): localedata: change M$ to Microsoft ChangeLog: change Winblowz to Windows ChangeLog: fix date localedata: GBK: add mapping for 0x80->Euro sign [BZ #20864] localedata: bs_BA: fix yesexpr/noexpr [BZ #20974] Märt Põder (1): locales: et_EE: locale has wrong {p,n}_cs_precedes value [BZ #20459] Nick Alcock (14): Move all tests out of the csu subdirectory x86_64: tst-quad1pie, tst-quad2pie: compile with -fPIE [BZ #7065] Configure support for --enable-stack-protector [BZ #7065] Initialize the stack guard earlier when linking statically [BZ #7065] Do not stack-protect ifunc resolvers [BZ #7065] Disable stack protector in early static initialization [BZ #7065] Compile the dynamic linker without stack protection [BZ #7065] Ignore __stack_chk_fail* in the rtld mapfile computation [BZ #7065] Work even with compilers which enable -fstack-protector by default [BZ #7065] PLT avoidance for __stack_chk_fail [BZ #7065] Link a non-libc-using test with -fno-stack-protector [BZ #7065] Drop explicit stack-protection of pieces of the system [BZ #7065] Do not stack-protect sigreturn stubs [BZ #7065] Enable -fstack-protector=* when requested by configure [BZ #7065] Paul E. Murphy (28): Remove tacit double usage in ldbl-128 Refactor part of math Makefile Unify drift between _Complex function type variants Improve gen-libm-test.pl LIT() application Support for type-generic libm function implementations libm ldbl-128: Remove unused sqrtl declaration in e_asinl.c Add tst-wcstod-round Prepare to convert _Complex cosine functions Convert _Complex cosine functions to generated code Merge common usage of mul_split function Prepare to convert _Complex sine functions Convert _Complex sine functions to generated code Prepare to convert _Complex tangent functions Convert _Complex tangent functions to generated code sparcv9: Restore fdiml@GLIBC_2.1 Prepare to convert remaining _Complex functions Convert remaining complex function to generated files ldbl-128: Rename 'long double' to '_Float128' ldbl-128: Cleanup e_gammal_r.c after _Float128 rename Make common fdim implementation generic. Make common nextdown implementation generic. Make common fmax implementation generic. Make common fmin implementation generic. Remove unneeded stubs for k_rem_pio2l. ldbl-128: Use L(x) macro for long double constants Make ldexpF generic. Remove __nan{f,,l} macros Build s_nan* objects from a generic template Paul Murphy (1): powerpc: Cleanup fenv_private.h Rajalakshmi Srinivasaraghavan (5): Refactor strtod tests Add tests for strfrom functions powerpc: strcmp optimization for power9 powerpc: strncmp optimization for power9 powerpc64: strchr/strchrnul optimization for power8 Rasmus Villemoes (1): linux: spawni.c: simplify error reporting to parent Rical Jasan (28): Manual typos: Input/Output on Streams Manual typos: Low-Level Input/Output Manual typos: File System Interface Manual typos: Sockets Manual typos: Low-Level Terminal Interface Manual typos: Syslog Manual typos: Mathematics Manual typos: Arithmetic Functions Manual typos: Date and Time Manual typos: Resource Usage and Limitation Manual typos: Non-Local Exits Manual typos: Signal Handling Manual typos: The Basic Program/System Interface Manual typos: Processes Manual typos: Job Control Manual typos: Users and Groups Manual typos: System Management Manual typos: System Configuration Parameters Manual typos: DES Encryption and Password Handling Manual typos: Debugging support Manual typos: POSIX Threads Manual typos: Internal probes Manual typos: C Language Facilities in the Library Manual typos: Installing Manual typos: Library Maintenance Manual typos: Contributors to manual: Remove non-existent mount options S_IMMUTABLE and S_APPEND [BZ #11235] manual: Convert @tables of variables to @vtables. Richard Henderson (1): alpha: Use saturating arithmetic in memchr Roland McGrath (3): NaCl: Fix compile error in clock function. Fix generic wait3 after union wait_status removal. NaCl: Fix compile error for __dup after libc_hidden_proto addition. Samuel Thibault (12): Fix recvmsg returning SIGLOST on PF_LOCAL sockets mach: Add more allowed external headers hurd: fix pathconf visibility hurd: fix fcntl visibility Fix exc2signal.c template mach: Fix old-style function definition. Fix old-style function definition hurdmalloc: Run fork handler as late as possible [BZ #19431] hurd: Fix stack pointer corruption in syscall hurd: Fix unused variable warning hurd: fix using hurd/signal.h in C++ programs hurd: fix using hurd.h in C++ programs Siddhesh Poyarekar (47): Consolidate reduce_and_compute code Add fall through comments Use fabs(x) instead of branching on signedness of input to sin and cos Consolidate input partitioning into do_cos and do_sin Use do_sin for sin(x) where 0.25 < |x| < 0.855469 Inline all support functions for sin and cos Remove __libc_csu_irel declaration Add tests-static to tests in malloc/Makefile consolidate sign checks for slow2 Use copysign instead of ternary conditions for positive constants Use copysign instead of ternary for some sin/cos input ranges Make the quadrant shift K a bool in do_sincos_* functions Check n instead of k1 to decide on sign of sin/cos result Manual typos: System Databases and Name Service Switch Make quadrant shift a boolean in reduce_and_compute in s_sin.c Adjust calls to do_sincos_1 and do_sincos_2 in s_sincos.c Update comments for some functions in s_sin.c Add note on MALLOC_MMAP_* environment variables Document the M_ARENA_* mallopt parameters Remove references to sbrk to grow/shrink arenas Remove redundant definitions of M_ARENA_* macros Static inline functions for mallopt helpers Regenerate ULPs for aarch64 Add ChangeLog for previous commit Link benchset tests against libsupport Add configure check for python program Fix pretty printer tests for run-built-tests == no Add framework for tunables Initialize tunable list with the GLIBC_TUNABLES environment variable Enhance --enable-tunables to select tunables frontend at build time User manual documentation for tunables Add NEWS item for tunables tunables: Avoid getenv calls and disable glibc.malloc.check by default Regenerate libc.pot Update translations from the Translation Project Merge translations from the Translation Project Fix typo in NEWS Merge translations from the Translation Project Fix environment traversal when an envvar value is empty Add target to incorporate translations from translations.org tunables: Fix environment variable processing for setuid binaries (bz #21073) Drop GLIBC_TUNABLES for setxid programs when tunables is disabled (bz #21073) tunables: Fail tests correctly when setgid does not work Add missing NEWS items Add list of bugs fixed in 2.25 Add more contributors to contrib.texi Update for 2.25 release Stefan Liebler (22): Get rid of array-bounds warning in __kernel_rem_pio2[f] with gcc 6.1 -O3. S390: Do not set FE_INEXACT with feraiseexcept (FE_OWERFLOW|FE_UNDERFLOW). S390: Support PLT and GOT references in check-localplt. S390: Regenerate ULPs Add configure check to test if gcc supports attribute ifunc. Use gcc attribute ifunc in libc_ifunc macro instead of inline assembly due to false debuginfo. s390: Refactor ifunc resolvers due to false debuginfo. i386, x86: Use libc_ifunc macro for time, gettimeofday. ppc: Use libc_ifunc macro for time, gettimeofday. Use libc_ifunc macro for clock_* symbols in librt. Use libc_ifunc macro for system in libpthread. Use libc_ifunc macro for vfork in libpthread. Use libc_ifunc macro for siglongjmp, longjmp in libpthread. S390: Fix fp comparison not raising FE_INVALID. Fix new testcase elf/tst-latepthread on s390x. S390: Regenerate ULPs. S390: Use C11-like atomics instead of plain memory accesses in lock elision code. S390: Use own tbegin macro instead of __builtin_tbegin. S390: Use new __libc_tbegin_retry macro in elision-lock.c. S390: Optimize lock-elision by decrementing adapt_count at unlock. S390: Fix FAIL in test string/tst-xbzero-opt [BZ #21006] S390: Adjust lock elision code after review. Steve Ellcey (14): Fix -Wformat-length warning in tst-setgetname.c Fix warning from latest GCC in tst-printf.c Fix -Wformat-length warning in time/tst-strptime2.c Define wordsize.h macros everywhere Speed up math/test-tgmath2.c Document do_test in test-skeleton.c Define __ASSUME_ST_INO_64_BIT on all platforms. Add definitions to sysdeps/tile/tilepro/bits/wordsize.h. Always define XSTAT_IS_XSTAT64 Allow [f]statfs64 to alias [f]statfs Fix for [f]statfs64/[f]statfs aliasing patch Partial ILP32 support for aarch64. Use XSTAT_IS_XSTAT64 in generic xstat functions Add comments to check-c++-types.sh. Svante Signell (1): hurd: Fix adjtime call with OLDDELTA == NULL Szabolcs Nagy (1): Make build-many-glibcs.py work on python3.2 Tom Tromey (1): Update and install proc_service.h [BZ #20311] Torvald Riegel (12): Add atomic_exchange_relaxed. Add atomic operations required by the new condition variable. Fix incorrect double-checked locking related to _res_hconf.initialized. Use C11-like atomics instead of plain memory accesses in x86 lock elision. Robust mutexes: Fix lost wake-up. New condvar implementation that provides stronger ordering guarantees. Fix pthread_cond_t on sparc for new condvar. New pthread rwlock that is more scalable. robust mutexes: Fix broken x86 assembly by removing it Clear list of acquired robust mutexes in the child process after forking. Add compiler barriers around modifications of the robust mutex list. Fix mutex pretty printer test and pretty printer output. Tulio Magno Quites Machado Filho (9): powerpc: Fix POWER9 implies powerpc: Installed-header hygiene powerpc: Regenerate ULPs powerpc: Fix TOC stub on powerpc64 clone() Document a behavior of an elided pthread_rwlock_unlock powerpc: Fix powerpc32/power7 memchr for large input sizes powerpc: Fix write-after-destroy in lock elision [BZ #20822] powerpc: Regenerate ULPs powerpc: Fix adapt_count update in __lll_unlock_elision Wilco Dijkstra (4): An optimized memchr was missing for AArch64. This version is similar to Improve generic rawmemchr for targets that don't have an Improve strtok and strtok_r performance. Instead of calling strpbrk which This patch cleans up the strsep implementation and improves performance. Yury Norov (1): * sysdeps/unix/sysv/linux/fxstat.c: Remove useless cast. Zack Weinberg (20): Add utility macros for clang detection, and deprecation with messages. Minimize sysdeps code involved in defining major/minor/makedev. Deprecate inclusion of <sys/sysmacros.h> by <sys/types.h> Add tests for fortification of bcopy and bzero. Installed-header hygiene (BZ#20366): Simple self-contained fixes. Installed-header hygiene (BZ#20366): obsolete BSD u_* types. Installed-header hygiene (BZ#20366): conditionally defined structures. Installed-header hygiene (BZ#20366): time.h types. Installed-header hygiene (BZ#20366): stack_t. Installed header hygiene (BZ#20366): Test of installed headers. Minor correction to the "installed header hygiene" patches. Minor corrections to scripts/check-installed-headers.sh. [BZ #19239] Issue deprecation warnings on macro expansion. Fix typo in string/bits/string2.h. Fix build-and-build-again bug in sunrpc tests. Forgot to add the ChangeLog to the previous commit, doh. Correct comments in string.h re strcoll_l, strxfrm_l. Minor problems exposed by compiling C++ tests under _ISOMAC. Make _REENTRANT and _THREAD_SAFE aliases for _POSIX_C_SOURCE=199506L. New string function explicit_bzero (from OpenBSD). steve ellcey-CA Eng-Software (1): Fix warnings from latest GCC. -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/x86/xgetbv has been deleted was fdb9777e1d770446972f46a80ebfa59d522a93f1 - Log ----------------------------------------------------------------- fdb9777e1d770446972f46a80ebfa59d522a93f1 X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508] -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, hjl/pr21258/2.23 has been created at 883cadc5543ffd3a4537498b44c782ded8a4a4e8 (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=883cadc5543ffd3a4537498b44c782ded8a4a4e8 commit 883cadc5543ffd3a4537498b44c782ded8a4a4e8 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Mar 21 10:59:31 2017 -0700 x86-64: Improve branch predication in _dl_runtime_resolve_avx512_opt [BZ #21258] On Skylake server, _dl_runtime_resolve_avx512_opt is used to preserve the first 8 vector registers. The code layout is if only %xmm0 - %xmm7 registers are used preserve %xmm0 - %xmm7 registers if only %ymm0 - %ymm7 registers are used preserve %ymm0 - %ymm7 registers preserve %zmm0 - %zmm7 registers Branch predication always executes the fallthrough code path to preserve %zmm0 - %zmm7 registers speculatively, even though only %xmm0 - %xmm7 registers are used. This leads to lower CPU frequency on Skylake server. This patch changes the fallthrough code path to preserve %xmm0 - %xmm7 registers instead: if whole %zmm0 - %zmm7 registers are used preserve %zmm0 - %zmm7 registers if only %ymm0 - %ymm7 registers are used preserve %ymm0 - %ymm7 registers preserve %xmm0 - %xmm7 registers Tested on Skylake server. [BZ #21258] * sysdeps/x86_64/dl-trampoline.S (_dl_runtime_resolve_opt): Define only if _dl_runtime_resolve is defined to _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_opt): Fallthrough to _dl_runtime_resolve_sse_vex. (cherry picked from commit c15f8eb50cea7ad1a4ccece6e0982bf426d52c00) https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=83037ea1d9e84b1b44ed307f01cbb5eeac24e22d commit 83037ea1d9e84b1b44ed307f01cbb5eeac24e22d Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Aug 23 09:09:32 2016 -0700 X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508] There is transition penalty when SSE instructions are mixed with 256-bit AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM registers, there is transition penalty when SSE instructions are used with lazy binding on AVX and AVX512 processors. To avoid SSE transition penalty, if only the lower 128 bits of the first 8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers with the zero upper bits. For AVX and AVX512 processors which support XGETBV with ECX == 1, we can use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers or the upper 256 bits of ZMM registers are zero. We can restore only the non-zero portion of vector registers with AVX/AVX512 load instructions which will zero-extend upper bits of vector registers. This patch adds _dl_runtime_resolve_sse_vex which saves and restores XMM registers with 128-bit AVX store/load instructions. It is used to preserve YMM/ZMM registers when only the lower 128 bits are non-zero. _dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so that we store and load only the non-zero portion of vector registers. This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 when only the lower 128 bits of vector registers are used. _dl_runtime_resolve_avx_slow is added and used for AVX processors which don't support XGETBV with ECX == 1. Since there is no SSE transition penalty on AVX512 processors which don't support XGETBV with ECX == 1, _dl_runtime_resolve_avx512_slow isn't provided. [BZ #20495] [BZ #20508] * sysdeps/x86/cpu-features.c (init_cpu_features): For Intel processors, set Use_dl_runtime_resolve_slow and set Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1. * sysdeps/x86/cpu-features.h (bit_Use_dl_runtime_resolve_opt): New. (bit_Use_dl_runtime_resolve_slow): Likewise. (index_Use_dl_runtime_resolve_opt): Likewise. (index_Use_dl_runtime_resolve_slow): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt if Use_dl_runtime_resolve_opt is set. Use _dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set. * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>. (_dl_runtime_resolve_opt): New. Defined for AVX and AVX512. (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow): New. (_dl_runtime_resolve_opt): Likewise. (_dl_runtime_profile): Define only if _dl_runtime_profile is defined. (cherry picked from commit fb0f7a6755c1bfaec38f490fbfcaa39a66ee3604) -----------------------------------------------------------------------