This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug dynamic-link/20508] _dl_runtime_resolve_avx/_dl_runtime_profile_avx512 cause transition penalty
- From: "cvs-commit at gcc dot gnu.org" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Tue, 23 Aug 2016 18:02:34 +0000
- Subject: [Bug dynamic-link/20508] _dl_runtime_resolve_avx/_dl_runtime_profile_avx512 cause transition penalty
- Auto-submitted: auto-generated
- References: <bug-20508-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=20508
--- Comment #3 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, hjl/x86/xgetbv has been created
at d963b835c1e0fe430a88168a81c4c69dcd9ad00c (commit)
- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d963b835c1e0fe430a88168a81c4c69dcd9ad00c
commit d963b835c1e0fe430a88168a81c4c69dcd9ad00c
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Tue Aug 23 09:09:32 2016 -0700
X86-64: Add _dl_runtime_resolve_avx[512]_opt [BZ #20508]
There is transition penalty when SSE instructions are mixed with 256-bit
AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx
and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM
registers, there is transition penalty when SSE instructions are used
with lazy binding on AVX and AVX512 processors.
For AVX and AVX512 processors which support XGETBV with ECX == 1, we can
use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers
or the upper 256 bits of ZMM registers are zero. We can restore only the
non-zero portion of vector registers with AVX/AVX512 load instructions
which will zero-extend upper bits of vector registers.
This patch adds _dl_runtime_resolve_sse_vex which saves and restores
XMM registers with 128-bit AVX store/load instructions. It is used to
preserve YMM/ZMM registers when only the lower 128 bits are non-zero.
_dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added
and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so
that we store and load only the non-zero portion of vector registers.
This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and
_dl_runtime_profile_avx512 when only the lower 128 bits of vector
registers are used.
[BZ #20508]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1.
* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
New.
(index_arch_Use_dl_runtime_resolve_opt): Likewise.
* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use
_dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt
if Use_dl_runtime_resolve_opt is set.
* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
(_dl_runtime_resolve_opt): New. Defined for AVX and AVX512.
(_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex.
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_opt): New.
(_dl_runtime_profile): Define only if _dl_runtime_profile is
defined.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d4e9985c033d90c310d7798f2d1f0634a64cedff
commit d4e9985c033d90c310d7798f2d1f0634a64cedff
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Aug 18 14:52:42 2016 -0700
X86-64: Correct CFA in _dl_runtime_resolve
When stack is re-aligned in _dl_runtime_resolve, there is no need to
adjust CFA when allocating register save area on stack.
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve): Don't
adjust CFA when allocating register save area on re-aligned
stack.
-----------------------------------------------------------------------
--
You are receiving this mail because:
You are on the CC list for the bug.