This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug dynamic-link/20508] _dl_runtime_resolve_avx/_dl_runtime_profile_avx512 cause transition penalty

--- Comment #3 from cvs-commit at gcc dot <cvs-commit at gcc dot> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/x86/xgetbv has been created
        at  d963b835c1e0fe430a88168a81c4c69dcd9ad00c (commit)

- Log -----------------------------------------------------------------;h=d963b835c1e0fe430a88168a81c4c69dcd9ad00c

commit d963b835c1e0fe430a88168a81c4c69dcd9ad00c
Author: H.J. Lu <>
Date:   Tue Aug 23 09:09:32 2016 -0700

    X86-64: Add _dl_runtime_resolve_avx[512]_opt [BZ #20508]

    There is transition penalty when SSE instructions are mixed with 256-bit
    AVX or 512-bit AVX512 load instructions.  Since _dl_runtime_resolve_avx
    and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM
    registers, there is transition penalty when SSE instructions are used
    with lazy binding on AVX and AVX512 processors.

    For AVX and AVX512 processors which support XGETBV with ECX == 1, we can
    use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers
    or the upper 256 bits of ZMM registers are zero.  We can restore only the
    non-zero portion of vector registers with AVX/AVX512 load instructions
    which will zero-extend upper bits of vector registers.

    This patch adds _dl_runtime_resolve_sse_vex which saves and restores
    XMM registers with 128-bit AVX store/load instructions.  It is used to
    preserve YMM/ZMM registers when only the lower 128 bits are non-zero.
    _dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added
    and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so
    that we store and load only the non-zero portion of vector registers.
    This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and
    _dl_runtime_profile_avx512 when only the lower 128 bits of vector
    registers are used.

        [BZ #20508]
        * sysdeps/x86/cpu-features.c (init_cpu_features): Set
        Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1.
        * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
        (index_arch_Use_dl_runtime_resolve_opt): Likewise.
        * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use
        _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt
        if Use_dl_runtime_resolve_opt is set.
        * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
        (_dl_runtime_resolve_opt): New.  Defined for AVX and AVX512.
        (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex.
        * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_opt): New.
        (_dl_runtime_profile): Define only if _dl_runtime_profile is

commit d4e9985c033d90c310d7798f2d1f0634a64cedff
Author: H.J. Lu <>
Date:   Thu Aug 18 14:52:42 2016 -0700

    X86-64: Correct CFA in _dl_runtime_resolve

    When stack is re-aligned in _dl_runtime_resolve, there is no need to
    adjust CFA when allocating register save area on stack.

        * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve): Don't
        adjust CFA when allocating register save area on re-aligned


You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]