Bug 21265 - _dl_runtime_resolve isn't compatible with Intel C++ __regcall calling convention
Summary: _dl_runtime_resolve isn't compatible with Intel C++ __regcall calling convention
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.26
: P2 normal
Target Milestone: 2.27
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 21236 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-03-17 15:37 UTC by H.J. Lu
Modified: 2018-04-07 13:03 UTC (History)
4 users (show)

See Also:
Host:
Target: x86-64
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2017-03-17 15:37:53 UTC
Intel C++ __regcall calling convention for x86-64:

https://software.intel.com/en-us/node/522787

passes function parameters in %xmm0-%xmm15.  Since _dl_runtime_resolve only
preserves %xmm0-%xmm7, %xmm8-%xmm15 may be clobbered with lazy binding.
Comment 1 Markus Trippelsdorf 2017-03-17 16:06:25 UTC
Given that almost nobody uses the Intel C++ __regcall calling convention,
please don't slow down the dynamic linker for everyone else.
Comment 2 Andreas Schwab 2017-03-17 20:13:11 UTC
Like the regparm attribute on 32-bit x86 it can only be safely used by internal function calls.
Comment 3 H.J. Lu 2017-03-17 23:54:06 UTC
(In reply to Andreas Schwab from comment #2)
> Like the regparm attribute on 32-bit x86 it can only be safely used by
> internal function calls.

i386 _dl_runtime_resolve preserves EAX, ECX and EDX for regparm:

Dump of assembler code for function _dl_runtime_resolve:
   0x000174c0 <+0>:	push   %eax
   0x000174c1 <+1>:	push   %ecx
   0x000174c2 <+2>:	push   %edx
   0x000174c3 <+3>:	mov    0x10(%esp),%edx
   0x000174c7 <+7>:	mov    0xc(%esp),%eax
   0x000174cb <+11>:	call   0xff60 <_dl_fixup>
   0x000174d0 <+16>:	pop    %edx
   0x000174d1 <+17>:	mov    (%esp),%ecx
   0x000174d4 <+20>:	mov    %eax,(%esp)
   0x000174d7 <+23>:	mov    0x4(%esp),%eax
   0x000174db <+27>:	ret    $0xc
Comment 4 Andreas Schwab 2017-03-18 08:05:38 UTC
The gcc manual explicitly warns against it.
Comment 5 Florian Weimer 2017-03-20 09:49:42 UTC
(In reply to H.J. Lu from comment #3)
> (In reply to Andreas Schwab from comment #2)
> > Like the regparm attribute on 32-bit x86 it can only be safely used by
> > internal function calls.
> 
> i386 _dl_runtime_resolve preserves EAX, ECX and EDX for regparm:

I think this is mainly for internal glibc use (see the internal_function macro).  I don't think you should read into this that glibc supports a different ABI than what's specified in the psABI supplement.

What makes this bug different from bug 21236?
Comment 6 Florian Weimer 2017-03-20 14:05:54 UTC
Note that there is a parallel mailing list thread reviewing this ABI change proposal:

  https://sourceware.org/ml/libc-alpha/2017-03/msg00343.html

I think we still need some ABI documentation even if more registers are preserved because arbitrary calling conventions still will not work.  Using noplt calls as a workaround in the Intel compiler seems a reasonable fix (no ABI changes required, but this still needs documentation in the psABI supplement IMHO).
Comment 7 Carlos O'Donell 2017-03-20 14:07:44 UTC
(In reply to Florian Weimer from comment #6)
> Note that there is a parallel mailing list thread reviewing this ABI change
> proposal:
> 
>   https://sourceware.org/ml/libc-alpha/2017-03/msg00343.html
> 
> I think we still need some ABI documentation even if more registers are
> preserved because arbitrary calling conventions still will not work.  Using
> noplt calls as a workaround in the Intel compiler seems a reasonable fix (no
> ABI changes required, but this still needs documentation in the psABI
> supplement IMHO).

Agreed.

(In reply to Florian Weimer from comment #5)
> (In reply to H.J. Lu from comment #3)
> > (In reply to Andreas Schwab from comment #2)
> > > Like the regparm attribute on 32-bit x86 it can only be safely used by
> > > internal function calls.
> > 
> > i386 _dl_runtime_resolve preserves EAX, ECX and EDX for regparm:
> 
> I think this is mainly for internal glibc use (see the internal_function
> macro).  I don't think you should read into this that glibc supports a
> different ABI than what's specified in the psABI supplement.
> 
> What makes this bug different from bug 21236?

My high level opinion:

(a) "Optimize for the local special case": Use of regparm and __regcall is restricted to internal functions. The goal is to optimize internal calls as much as possible with the additional registers. The example of i386 support and internal_function macro is a good example of optimizing internal function calls.

(b) "Optimize for the global average case": Use of regparm and __regcall should not be extended to support interposable global symbols. We should optimize for the general case which doesn't use regparm/__regcall and which supports developers interposing code using the standard x86_64 ABI.

If you want (a) to cover more of your program then you need to look at -fno-plt (as Florian Weimer suggests) and LTO to make more of your program local and enable the optimizations that allows.
Comment 8 jsm-csl@polyomino.org.uk 2017-03-20 16:49:21 UTC
On Mon, 20 Mar 2017, fweimer at redhat dot com wrote:

> > i386 _dl_runtime_resolve preserves EAX, ECX and EDX for regparm:
> 
> I think this is mainly for internal glibc use (see the internal_function
> macro).  I don't think you should read into this that glibc supports a
> different ABI than what's specified in the psABI supplement.

As far as I know, anything marked with internal_function should never be 
called through the PLT.  Are there any such functions that are not also 
marked with attribute_hidden?
Comment 9 Florian Weimer 2017-03-20 16:55:05 UTC
(In reply to joseph@codesourcery.com from comment #8)
> On Mon, 20 Mar 2017, fweimer at redhat dot com wrote:
> 
> > > i386 _dl_runtime_resolve preserves EAX, ECX and EDX for regparm:
> > 
> > I think this is mainly for internal glibc use (see the internal_function
> > macro).  I don't think you should read into this that glibc supports a
> > different ABI than what's specified in the psABI supplement.
> 
> As far as I know, anything marked with internal_function should never be 
> called through the PLT.  Are there any such functions that are not also 
> marked with attribute_hidden?

Apart from the functions I added because not knowing about this rule, there is __libc_pthread_init, which is defined in libc.so.6 and called from libpthread.so.0.
Comment 10 Carlos O'Donell 2017-08-25 19:35:08 UTC
I stated a strong position here:

https://sourceware.org/ml/libc-alpha/2017-03/msg00430.html

I'm closing this as RESOLVED / WONTFIX for now.

Further discussions can happen on list or we can reopen this if we think there is a technical solution we can pursue.
Comment 11 Sourceware Commits 2017-10-20 18:01:23 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  b52b0d793dcb226ecb0ecca1e672ca265973233c (commit)
      from  822f523b293bb94a52044f4acea73839f3b3d2bd (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b52b0d793dcb226ecb0ecca1e672ca265973233c

commit b52b0d793dcb226ecb0ecca1e672ca265973233c
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Oct 20 11:00:08 2017 -0700

    x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
    
    In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
    mask and bound registers.  It simplifies _dl_runtime_resolve and supports
    different calling conventions.  ld.so code size is reduced by more than
    1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
    than saving and restoring vector and bound registers individually.
    
    Latency for _dl_runtime_resolve to lookup the function, foo, from one
    shared library plus libc.so:
    
                                 Before    After     Change
    
    Westmere (SSE)/fxsave         345      866       151%
    IvyBridge (AVX)/xsave         420      643       53%
    Haswell (AVX)/xsave           713      1252      75%
    Skylake (AVX+MPX)/xsavec      559      719       28%
    Skylake (AVX512+MPX)/xsavec   145      272       87%
    Ryzen (AVX)/xsavec            280      553       97%
    
    This is the worst case where portion of time spent for saving and
    restoring registers is bigger than majority of cases.  With smaller
    _dl_runtime_resolve code size, overall performance impact is negligible.
    
    On IvyBridge, differences in build and test time of binutils with lazy
    binding GCC and binutils are noises.  On Westmere, differences in
    bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
    binutils are also noises.
    
    	[BZ #21265]
    	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
    	New.
    	* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
    	(get_common_indeces): Set xsave_state_size, xsave_state_full_size
    	and bit_arch_XSAVEC_Usable if needed.
    	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
    	and bit_arch_Use_dl_runtime_resolve_opt.
    	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
    	Removed.
    	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(bit_arch_Prefer_No_AVX512): Updated.
    	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
    	(bit_arch_XSAVEC_Usable): New.
    	(STATE_SAVE_OFFSET): Likewise.
    	(STATE_SAVE_MASK): Likewise.
    	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
    	(cpu_features): Add xsave_state_size and xsave_state_full_size.
    	(index_arch_Use_dl_runtime_resolve_opt): Removed.
    	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(index_arch_XSAVEC_Usable): New.
    	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
    	Support XSAVEC_Usable.  Remove Use_dl_runtime_resolve_slow.
    	* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
    	is enabled.
    	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
    	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
    	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
    	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
    	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
    	_dl_runtime_resolve_xsavec.
    	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
    	Removed.
    	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
    	instead of VEC_SIZE.
    	(REGISTER_SAVE_BND0): Removed.
    	(REGISTER_SAVE_BND1): Likewise.
    	(REGISTER_SAVE_BND3): Likewise.
    	(REGISTER_SAVE_RAX): Always defined to 0.
    	(VMOV): Removed.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_slow): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_avx512): Likewise.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(USE_FXSAVE): New.
    	(_dl_runtime_resolve_fxsave): Likewise.
    	(USE_XSAVE): Likewise.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(USE_XSAVEC): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
    	Removed.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(_dl_runtime_resolve_fxsave): New.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                            |   66 +++++++++
 sysdeps/x86/cpu-features-offsets.sym |    1 +
 sysdeps/x86/cpu-features.c           |   88 +++++++++--
 sysdeps/x86/cpu-features.h           |   34 ++++-
 sysdeps/x86/cpu-tunables.c           |   17 ++-
 sysdeps/x86_64/Makefile              |    4 +
 sysdeps/x86_64/dl-machine.h          |   38 ++----
 sysdeps/x86_64/dl-trampoline.S       |   87 ++++--------
 sysdeps/x86_64/dl-trampoline.h       |  267 ++++++++++------------------------
 9 files changed, 296 insertions(+), 306 deletions(-)
Comment 12 H.J. Lu 2017-10-21 00:43:52 UTC
Fixed for 2.27.
Comment 13 Sourceware Commits 2017-10-21 18:49:53 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/pr21265/2.26 has been created
        at  5d9b05d1ad4faa68f82e80dee014df7d5f9872c3 (commit)

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5d9b05d1ad4faa68f82e80dee014df7d5f9872c3

commit 5d9b05d1ad4faa68f82e80dee014df7d5f9872c3
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 23 08:21:52 2017 -0700

    x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
    
    In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
    mask and bound registers.  It simplifies _dl_runtime_resolve and supports
    different calling conventions.  ld.so code size is reduced by more than
    1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
    than saving and restoring vector and bound registers individually.
    
    Latency for _dl_runtime_resolve to lookup the function, foo, from one
    shared library plus libc.so:
    
                                 Before    After     Change
    
    Westmere (SSE)/fxsave         345      866       151%
    IvyBridge (AVX)/xsave         420      643       53%
    Haswell (AVX)/xsave           713      1252      75%
    Skylake (AVX+MPX)/xsavec      559      719       28%
    Skylake (AVX512+MPX)/xsavec   145      272       87%
    Ryzen (AVX)/xsavec            280      553       97%
    
    This is the worst case where portion of time spent for saving and
    restoring registers is bigger than majority of cases.  With smaller
    _dl_runtime_resolve code size, overall performance impact is negligible.
    
    On IvyBridge, differences in build and test time of binutils with lazy
    binding GCC and binutils are noises.  On Westmere, differences in
    bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
    binutils are also noises.
    
    	[BZ #21265]
    	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
    	New.
    	* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
    	(get_common_indeces): Set xsave_state_size, xsave_state_full_size
    	and bit_arch_XSAVEC_Usable if needed.
    	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
    	and bit_arch_Use_dl_runtime_resolve_opt.
    	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
    	Removed.
    	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(bit_arch_Prefer_No_AVX512): Updated.
    	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
    	(bit_arch_XSAVEC_Usable): New.
    	(STATE_SAVE_OFFSET): Likewise.
    	(STATE_SAVE_MASK): Likewise.
    	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
    	(cpu_features): Add xsave_state_size and xsave_state_full_size.
    	(index_arch_Use_dl_runtime_resolve_opt): Removed.
    	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(index_arch_XSAVEC_Usable): New.
    	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
    	Support XSAVEC_Usable.  Remove Use_dl_runtime_resolve_slow.
    	* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
    	is enabled.
    	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
    	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
    	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
    	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
    	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
    	_dl_runtime_resolve_xsavec.
    	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
    	Removed.
    	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
    	instead of VEC_SIZE.
    	(REGISTER_SAVE_BND0): Removed.
    	(REGISTER_SAVE_BND1): Likewise.
    	(REGISTER_SAVE_BND3): Likewise.
    	(REGISTER_SAVE_RAX): Always defined to 0.
    	(VMOV): Removed.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_slow): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_avx512): Likewise.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(USE_FXSAVE): New.
    	(_dl_runtime_resolve_fxsave): Likewise.
    	(USE_XSAVE): Likewise.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(USE_XSAVEC): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
    	Removed.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(_dl_runtime_resolve_fxsave): New.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    
    (cherry picked from commit b52b0d793dcb226ecb0ecca1e672ca265973233c)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=aca8619adb0e5d96f6e7d821d39adc6cca6d6c55

commit aca8619adb0e5d96f6e7d821d39adc6cca6d6c55
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Mon Sep 11 08:18:11 2017 -0700

    x86: Add x86_64 to x86-64 HWCAP [BZ #22093]
    
    Before glibc 2.26, ld.so set dl_platform to "x86_64" and searched the
    "x86_64" subdirectory when loading a shared library.  ld.so in glibc
    2.26 was changed to set dl_platform to "haswell" or "xeon_phi", based
    on supported ISAs.  This led to shared library loading failure for
    shared libraries placed under the "x86_64" subdirectory.
    
    This patch adds "x86_64" to x86-64 dl_hwcap so that ld.so will always
    search the "x86_64" subdirectory when loading a shared library.
    
    NB: We can't set x86-64 dl_platform to "x86-64" since ld.so will skip
    the "haswell" and "xeon_phi" subdirectories on "haswell" and "xeon_phi"
    machines.
    
    Tested on i686 and x86-64.
    
    	[BZ #22093]
    	* sysdeps/x86/cpu-features.c (init_cpu_features): Initialize
    	GLRO(dl_hwcap) to HWCAP_X86_64 for x86-64.
    	* sysdeps/x86/dl-hwcap.h (HWCAP_COUNT): Updated.
    	(HWCAP_IMPORTANT): Likewise.
    	(HWCAP_X86_64): New enum.
    	(HWCAP_X86_AVX512_1): Updated.
    	* sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): Add "x86_64".
    	* sysdeps/x86_64/Makefile (tests): Add tst-x86_64-1.
    	(modules-names): Add x86_64/tst-x86_64mod-1.
    	(LDFLAGS-tst-x86_64mod-1.so): New.
    	($(objpfx)tst-x86_64-1): Likewise.
    	($(objpfx)x86_64/tst-x86_64mod-1.os): Likewise.
    	(tst-x86_64-1-clean): Likewise.
    	* sysdeps/x86_64/tst-x86_64-1.c: New file.
    	* sysdeps/x86_64/tst-x86_64mod-1.c: Likewise.
    
    (cherry picked from commit 45ff34638f034877b6a490c217d6a0632ce263f4)

-----------------------------------------------------------------------
Comment 14 Sourceware Commits 2017-10-21 18:50:11 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/pr21265/2.25 has been created
        at  33122280d2bab96022dd768d14a69a99768499fc (commit)

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=33122280d2bab96022dd768d14a69a99768499fc

commit 33122280d2bab96022dd768d14a69a99768499fc
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 23 08:21:52 2017 -0700

    x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
    
    In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
    mask and bound registers.  It simplifies _dl_runtime_resolve and supports
    different calling conventions.  ld.so code size is reduced by more than
    1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
    than saving and restoring vector and bound registers individually.
    
    Latency for _dl_runtime_resolve to lookup the function, foo, from one
    shared library plus libc.so:
    
                                 Before    After     Change
    
    Westmere (SSE)/fxsave         345      866       151%
    IvyBridge (AVX)/xsave         420      643       53%
    Haswell (AVX)/xsave           713      1252      75%
    Skylake (AVX+MPX)/xsavec      559      719       28%
    Skylake (AVX512+MPX)/xsavec   145      272       87%
    Ryzen (AVX)/xsavec            280      553       97%
    
    This is the worst case where portion of time spent for saving and
    restoring registers is bigger than majority of cases.  With smaller
    _dl_runtime_resolve code size, overall performance impact is negligible.
    
    On IvyBridge, differences in build and test time of binutils with lazy
    binding GCC and binutils are noises.  On Westmere, differences in
    bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
    binutils are also noises.
    
    	[BZ #21265]
    	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
    	New.
    	* sysdeps/x86/cpu-features.c: Include <libc-internal.h>.
    	(get_common_indeces): Set xsave_state_size and
    	bit_arch_XSAVEC_Usable if needed.
    	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
    	and bit_arch_Use_dl_runtime_resolve_opt.
    	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
    	Removed.
    	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(bit_arch_Prefer_No_AVX512): Updated.
    	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
    	(bit_arch_XSAVEC_Usable): New.
    	(STATE_SAVE_OFFSET): Likewise.
    	(STATE_SAVE_MASK): Likewise.
    	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
    	(cpu_features): Add xsave_state_size.
    	(index_arch_Use_dl_runtime_resolve_opt): Removed.
    	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(index_arch_XSAVEC_Usable): New.
    	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
    	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
    	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
    	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
    	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
    	_dl_runtime_resolve_xsavec.
    	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
    	Removed.
    	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
    	instead of VEC_SIZE.
    	(REGISTER_SAVE_BND0): Removed.
    	(REGISTER_SAVE_BND1): Likewise.
    	(REGISTER_SAVE_BND3): Likewise.
    	(REGISTER_SAVE_RAX): Always defined to 0.
    	(VMOV): Removed.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_slow): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_avx512): Likewise.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(USE_FXSAVE): New.
    	(_dl_runtime_resolve_fxsave): Likewise.
    	(USE_XSAVE): Likewise.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(USE_XSAVEC): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
    	Removed.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(_dl_runtime_resolve_fxsave): New.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    
    (cherry picked from commit b52b0d793dcb226ecb0ecca1e672ca265973233c)

-----------------------------------------------------------------------
Comment 15 Sourceware Commits 2017-10-21 18:50:22 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/pr21265/2.24 has been created
        at  609ccf8ca804e0c65afad74fe5c6d867c3552dbb (commit)

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=609ccf8ca804e0c65afad74fe5c6d867c3552dbb

commit 609ccf8ca804e0c65afad74fe5c6d867c3552dbb
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 23 08:21:52 2017 -0700

    x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
    
    In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
    mask and bound registers.  It simplifies _dl_runtime_resolve and supports
    different calling conventions.  ld.so code size is reduced by more than
    1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
    than saving and restoring vector and bound registers individually.
    
    Latency for _dl_runtime_resolve to lookup the function, foo, from one
    shared library plus libc.so:
    
                                 Before    After     Change
    
    Westmere (SSE)/fxsave         345      866       151%
    IvyBridge (AVX)/xsave         420      643       53%
    Haswell (AVX)/xsave           713      1252      75%
    Skylake (AVX+MPX)/xsavec      559      719       28%
    Skylake (AVX512+MPX)/xsavec   145      272       87%
    Ryzen (AVX)/xsavec            280      553       97%
    
    This is the worst case where portion of time spent for saving and
    restoring registers is bigger than majority of cases.  With smaller
    _dl_runtime_resolve code size, overall performance impact is negligible.
    
    On IvyBridge, differences in build and test time of binutils with lazy
    binding GCC and binutils are noises.  On Westmere, differences in
    bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
    binutils are also noises.
    
    	[BZ #21265]
    	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
    	New.
    	* sysdeps/x86/cpu-features.c: Include <libc-internal.h>.
    	(get_common_indeces): Set xsave_state_size and
    	bit_arch_XSAVEC_Usable if needed.
    	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
    	and bit_arch_Use_dl_runtime_resolve_opt.
    	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
    	Removed.
    	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(bit_arch_Prefer_No_AVX512): Updated.
    	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
    	(bit_arch_XSAVEC_Usable): New.
    	(STATE_SAVE_OFFSET): Likewise.
    	(STATE_SAVE_MASK): Likewise.
    	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
    	(cpu_features): Add xsave_state_size.
    	(index_arch_Use_dl_runtime_resolve_opt): Removed.
    	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(index_arch_XSAVEC_Usable): New.
    	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
    	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
    	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
    	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
    	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
    	_dl_runtime_resolve_xsavec.
    	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
    	Removed.
    	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
    	instead of VEC_SIZE.
    	(REGISTER_SAVE_BND0): Removed.
    	(REGISTER_SAVE_BND1): Likewise.
    	(REGISTER_SAVE_BND3): Likewise.
    	(REGISTER_SAVE_RAX): Always defined to 0.
    	(VMOV): Removed.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_slow): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_avx512): Likewise.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(USE_FXSAVE): New.
    	(_dl_runtime_resolve_fxsave): Likewise.
    	(USE_XSAVE): Likewise.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(USE_XSAVEC): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
    	Removed.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(_dl_runtime_resolve_fxsave): New.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    
    (cherry picked from commit b52b0d793dcb226ecb0ecca1e672ca265973233c)

-----------------------------------------------------------------------
Comment 16 Sourceware Commits 2017-10-21 18:50:34 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/pr21265/2.23 has been created
        at  915e61c5d780ee252bd93cdcf1502af0a7180cd5 (commit)

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=915e61c5d780ee252bd93cdcf1502af0a7180cd5

commit 915e61c5d780ee252bd93cdcf1502af0a7180cd5
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 23 08:21:52 2017 -0700

    x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
    
    In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
    mask and bound registers.  It simplifies _dl_runtime_resolve and supports
    different calling conventions.  ld.so code size is reduced by more than
    1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
    than saving and restoring vector and bound registers individually.
    
    Latency for _dl_runtime_resolve to lookup the function, foo, from one
    shared library plus libc.so:
    
                                 Before    After     Change
    
    Westmere (SSE)/fxsave         345      866       151%
    IvyBridge (AVX)/xsave         420      643       53%
    Haswell (AVX)/xsave           713      1252      75%
    Skylake (AVX+MPX)/xsavec      559      719       28%
    Skylake (AVX512+MPX)/xsavec   145      272       87%
    Ryzen (AVX)/xsavec            280      553       97%
    
    This is the worst case where portion of time spent for saving and
    restoring registers is bigger than majority of cases.  With smaller
    _dl_runtime_resolve code size, overall performance impact is negligible.
    
    On IvyBridge, differences in build and test time of binutils with lazy
    binding GCC and binutils are noises.  On Westmere, differences in
    bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
    binutils are also noises.
    
    	[BZ #21265]
    	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
    	New.
    	* sysdeps/x86/cpu-features.c: Include <libc-internal.h>.
    	(init_cpu_features): Set xsave_state_size and bit_XSAVEC_Usable
    	if needed.
    	* sysdeps/x86/cpu-features.h (bit_XSAVEC_Usable): New.
    	(STATE_SAVE_OFFSET): Likewise.
    	(STATE_SAVE_MASK): Likewise.
    	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
    	(cpu_features): Add xsave_state_size.
    	(index_XSAVEC_Usable): New.
    	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
    	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx and
    	_dl_runtime_resolve_avx512 with _dl_runtime_resolve_fxsave,
    	_dl_runtime_resolve_xsave and _dl_runtime_resolve_xsavec.
    	* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
    	(DL_RUNTIME_UNALIGNED_VEC_SIZE): Removed.
    	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
    	instead of VEC_SIZE.
    	(REGISTER_SAVE_BND0): Removed.
    	(REGISTER_SAVE_BND1): Likewise.
    	(REGISTER_SAVE_BND3): Likewise.
    	(REGISTER_SAVE_RAX): Always defined to 0.
    	(VMOV): Removed.
    	(_dl_runtime_resolve_avx512): Likewise.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(USE_FXSAVE): New.
    	(_dl_runtime_resolve_fxsave): Likewise.
    	(USE_XSAVE): Likewise.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(USE_XSAVEC): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
    	Removed.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_fxsave): New.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	(_dl_runtime_profile): Defined only if _dl_runtime_profile is
    	defined.
    
    (cherry picked from commit b52b0d793dcb226ecb0ecca1e672ca265973233c)

-----------------------------------------------------------------------
Comment 17 Sourceware Commits 2017-10-21 20:24:30 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/pr21265/2.23 has been deleted
       was  915e61c5d780ee252bd93cdcf1502af0a7180cd5

- Log -----------------------------------------------------------------
915e61c5d780ee252bd93cdcf1502af0a7180cd5 x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
-----------------------------------------------------------------------
Comment 18 Sourceware Commits 2017-10-21 20:24:47 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/pr21265/2.23 has been created
        at  c9af02d35a622cf453021be801a53424ad0f7135 (commit)

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c9af02d35a622cf453021be801a53424ad0f7135

commit c9af02d35a622cf453021be801a53424ad0f7135
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 23 08:21:52 2017 -0700

    x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
    
    In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
    mask and bound registers.  It simplifies _dl_runtime_resolve and supports
    different calling conventions.  ld.so code size is reduced by more than
    1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
    than saving and restoring vector and bound registers individually.
    
    Latency for _dl_runtime_resolve to lookup the function, foo, from one
    shared library plus libc.so:
    
                                 Before    After     Change
    
    Westmere (SSE)/fxsave         345      866       151%
    IvyBridge (AVX)/xsave         420      643       53%
    Haswell (AVX)/xsave           713      1252      75%
    Skylake (AVX+MPX)/xsavec      559      719       28%
    Skylake (AVX512+MPX)/xsavec   145      272       87%
    Ryzen (AVX)/xsavec            280      553       97%
    
    This is the worst case where portion of time spent for saving and
    restoring registers is bigger than majority of cases.  With smaller
    _dl_runtime_resolve code size, overall performance impact is negligible.
    
    On IvyBridge, differences in build and test time of binutils with lazy
    binding GCC and binutils are noises.  On Westmere, differences in
    bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
    binutils are also noises.
    
    	[BZ #21265]
    	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
    	New.
    	* sysdeps/x86/cpu-features.c: Include <libc-internal.h>.
    	(init_cpu_features): Set xsave_state_size and bit_XSAVEC_Usable
    	if needed.
    	* sysdeps/x86/cpu-features.h (bit_XSAVEC_Usable): New.
    	(STATE_SAVE_OFFSET): Likewise.
    	(STATE_SAVE_MASK): Likewise.
    	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
    	(cpu_features): Add xsave_state_size.
    	(index_XSAVEC_Usable): New.
    	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
    	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx and
    	_dl_runtime_resolve_avx512 with _dl_runtime_resolve_fxsave,
    	_dl_runtime_resolve_xsave and _dl_runtime_resolve_xsavec.
    	* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
    	(DL_RUNTIME_UNALIGNED_VEC_SIZE): Removed.
    	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
    	instead of VEC_SIZE.
    	(REGISTER_SAVE_BND0): Removed.
    	(REGISTER_SAVE_BND1): Likewise.
    	(REGISTER_SAVE_BND3): Likewise.
    	(REGISTER_SAVE_RAX): Always defined to 0.
    	(VMOV): Removed.
    	(_dl_runtime_resolve_avx512): Likewise.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(USE_FXSAVE): New.
    	(_dl_runtime_resolve_fxsave): Likewise.
    	(USE_XSAVE): Likewise.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(USE_XSAVEC): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
    	Removed.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_fxsave): New.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	(_dl_runtime_profile): Defined only if _dl_runtime_profile is
    	defined.
    
    (cherry picked from commit b52b0d793dcb226ecb0ecca1e672ca265973233c)

-----------------------------------------------------------------------
Comment 19 Sourceware Commits 2017-10-22 11:40:02 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/pr21265/2.26 has been deleted
       was  5d9b05d1ad4faa68f82e80dee014df7d5f9872c3

- Log -----------------------------------------------------------------
5d9b05d1ad4faa68f82e80dee014df7d5f9872c3 x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
-----------------------------------------------------------------------
Comment 20 Sourceware Commits 2017-10-22 11:40:12 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/pr21265/2.26 has been created
        at  4a30c87b87e0c2bb6110230cd53f0fffa3022f77 (commit)

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4a30c87b87e0c2bb6110230cd53f0fffa3022f77

commit 4a30c87b87e0c2bb6110230cd53f0fffa3022f77
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 23 08:21:52 2017 -0700

    x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
    
    In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
    mask and bound registers.  It simplifies _dl_runtime_resolve and supports
    different calling conventions.  ld.so code size is reduced by more than
    1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
    than saving and restoring vector and bound registers individually.
    
    Latency for _dl_runtime_resolve to lookup the function, foo, from one
    shared library plus libc.so:
    
                                 Before    After     Change
    
    Westmere (SSE)/fxsave         345      866       151%
    IvyBridge (AVX)/xsave         420      643       53%
    Haswell (AVX)/xsave           713      1252      75%
    Skylake (AVX+MPX)/xsavec      559      719       28%
    Skylake (AVX512+MPX)/xsavec   145      272       87%
    Ryzen (AVX)/xsavec            280      553       97%
    
    This is the worst case where portion of time spent for saving and
    restoring registers is bigger than majority of cases.  With smaller
    _dl_runtime_resolve code size, overall performance impact is negligible.
    
    On IvyBridge, differences in build and test time of binutils with lazy
    binding GCC and binutils are noises.  On Westmere, differences in
    bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
    binutils are also noises.
    
    	[BZ #21265]
    	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
    	New.
    	* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
    	(get_common_indeces): Set xsave_state_size, xsave_state_full_size
    	and bit_arch_XSAVEC_Usable if needed.
    	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
    	and bit_arch_Use_dl_runtime_resolve_opt.
    	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
    	Removed.
    	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(bit_arch_Prefer_No_AVX512): Updated.
    	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
    	(bit_arch_XSAVEC_Usable): New.
    	(STATE_SAVE_OFFSET): Likewise.
    	(STATE_SAVE_MASK): Likewise.
    	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
    	(cpu_features): Add xsave_state_size and xsave_state_full_size.
    	(index_arch_Use_dl_runtime_resolve_opt): Removed.
    	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(index_arch_XSAVEC_Usable): New.
    	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
    	Support XSAVEC_Usable.  Remove Use_dl_runtime_resolve_slow.
    	* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
    	is enabled.
    	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
    	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
    	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
    	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
    	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
    	_dl_runtime_resolve_xsavec.
    	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
    	Removed.
    	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
    	instead of VEC_SIZE.
    	(REGISTER_SAVE_BND0): Removed.
    	(REGISTER_SAVE_BND1): Likewise.
    	(REGISTER_SAVE_BND3): Likewise.
    	(REGISTER_SAVE_RAX): Always defined to 0.
    	(VMOV): Removed.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_slow): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_avx512): Likewise.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(USE_FXSAVE): New.
    	(_dl_runtime_resolve_fxsave): Likewise.
    	(USE_XSAVE): Likewise.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(USE_XSAVEC): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
    	Removed.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(_dl_runtime_resolve_fxsave): New.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    
    (cherry picked from commit b52b0d793dcb226ecb0ecca1e672ca265973233c)

-----------------------------------------------------------------------
Comment 21 Sourceware Commits 2017-10-22 11:49:06 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/pr21265/2.23 has been deleted
       was  c9af02d35a622cf453021be801a53424ad0f7135

- Log -----------------------------------------------------------------
c9af02d35a622cf453021be801a53424ad0f7135 x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
-----------------------------------------------------------------------
Comment 22 Sourceware Commits 2017-10-22 11:49:12 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/pr21265/2.23 has been created
        at  19d009625f022623cf1d98caa9f493aa01c7f7fb (commit)

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=19d009625f022623cf1d98caa9f493aa01c7f7fb

commit 19d009625f022623cf1d98caa9f493aa01c7f7fb
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 23 08:21:52 2017 -0700

    x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
    
    In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
    mask and bound registers.  It simplifies _dl_runtime_resolve and supports
    different calling conventions.  ld.so code size is reduced by more than
    1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
    than saving and restoring vector and bound registers individually.
    
    Latency for _dl_runtime_resolve to lookup the function, foo, from one
    shared library plus libc.so:
    
                                 Before    After     Change
    
    Westmere (SSE)/fxsave         345      866       151%
    IvyBridge (AVX)/xsave         420      643       53%
    Haswell (AVX)/xsave           713      1252      75%
    Skylake (AVX+MPX)/xsavec      559      719       28%
    Skylake (AVX512+MPX)/xsavec   145      272       87%
    Ryzen (AVX)/xsavec            280      553       97%
    
    This is the worst case where portion of time spent for saving and
    restoring registers is bigger than majority of cases.  With smaller
    _dl_runtime_resolve code size, overall performance impact is negligible.
    
    On IvyBridge, differences in build and test time of binutils with lazy
    binding GCC and binutils are noises.  On Westmere, differences in
    bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
    binutils are also noises.
    
    	[BZ #21265]
    	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
    	New.
    	* sysdeps/x86/cpu-features.c: Include <libc-internal.h>.
    	(init_cpu_features): Set xsave_state_size and bit_XSAVEC_Usable
    	if needed.
    	* sysdeps/x86/cpu-features.h (bit_XSAVEC_Usable): New.
    	(STATE_SAVE_OFFSET): Likewise.
    	(STATE_SAVE_MASK): Likewise.
    	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
    	(cpu_features): Add xsave_state_size.
    	(index_XSAVEC_Usable): New.
    	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
    	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx and
    	_dl_runtime_resolve_avx512 with _dl_runtime_resolve_fxsave,
    	_dl_runtime_resolve_xsave and _dl_runtime_resolve_xsavec.
    	* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
    	(DL_RUNTIME_UNALIGNED_VEC_SIZE): Removed.
    	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
    	instead of VEC_SIZE.
    	(REGISTER_SAVE_BND0): Removed.
    	(REGISTER_SAVE_BND1): Likewise.
    	(REGISTER_SAVE_BND3): Likewise.
    	(REGISTER_SAVE_RAX): Always defined to 0.
    	(VMOV): Removed.
    	(_dl_runtime_resolve_avx512): Likewise.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(USE_FXSAVE): New.
    	(_dl_runtime_resolve_fxsave): Likewise.
    	(USE_XSAVE): Likewise.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(USE_XSAVEC): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
    	Removed.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_fxsave): New.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	(_dl_runtime_profile): Defined only if _dl_runtime_profile is
    	defined.
    
    (cherry picked from commit b52b0d793dcb226ecb0ecca1e672ca265973233c)

-----------------------------------------------------------------------
Comment 23 Sourceware Commits 2017-10-22 15:15:00 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.26/master has been updated
       via  f82a6fc223cbd890b9de9007cfce63e6cae8f799 (commit)
      from  b2c78ae69eb5845c94db94e87a2addd695f978c0 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f82a6fc223cbd890b9de9007cfce63e6cae8f799

commit f82a6fc223cbd890b9de9007cfce63e6cae8f799
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Oct 22 07:40:39 2017 -0700

    x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
    
    In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
    mask and bound registers.  It simplifies _dl_runtime_resolve and supports
    different calling conventions.  ld.so code size is reduced by more than
    1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
    than saving and restoring vector and bound registers individually.
    
    Latency for _dl_runtime_resolve to lookup the function, foo, from one
    shared library plus libc.so:
    
                                 Before    After     Change
    
    Westmere (SSE)/fxsave         345      866       151%
    IvyBridge (AVX)/xsave         420      643       53%
    Haswell (AVX)/xsave           713      1252      75%
    Skylake (AVX+MPX)/xsavec      559      719       28%
    Skylake (AVX512+MPX)/xsavec   145      272       87%
    Ryzen (AVX)/xsavec            280      553       97%
    
    This is the worst case where portion of time spent for saving and
    restoring registers is bigger than majority of cases.  With smaller
    _dl_runtime_resolve code size, overall performance impact is negligible.
    
    On IvyBridge, differences in build and test time of binutils with lazy
    binding GCC and binutils are noises.  On Westmere, differences in
    bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
    binutils are also noises.
    
    	[BZ #21265]
    	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
    	New.
    	* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
    	(get_common_indeces): Set xsave_state_size, xsave_state_full_size
    	and bit_arch_XSAVEC_Usable if needed.
    	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
    	and bit_arch_Use_dl_runtime_resolve_opt.
    	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
    	Removed.
    	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(bit_arch_Prefer_No_AVX512): Updated.
    	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
    	(bit_arch_XSAVEC_Usable): New.
    	(STATE_SAVE_OFFSET): Likewise.
    	(STATE_SAVE_MASK): Likewise.
    	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
    	(cpu_features): Add xsave_state_size and xsave_state_full_size.
    	(index_arch_Use_dl_runtime_resolve_opt): Removed.
    	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(index_arch_XSAVEC_Usable): New.
    	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
    	Support XSAVEC_Usable.  Remove Use_dl_runtime_resolve_slow.
    	* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
    	is enabled.
    	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
    	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
    	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
    	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
    	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
    	_dl_runtime_resolve_xsavec.
    	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
    	Removed.
    	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
    	instead of VEC_SIZE.
    	(REGISTER_SAVE_BND0): Removed.
    	(REGISTER_SAVE_BND1): Likewise.
    	(REGISTER_SAVE_BND3): Likewise.
    	(REGISTER_SAVE_RAX): Always defined to 0.
    	(VMOV): Removed.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_slow): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_avx512): Likewise.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(USE_FXSAVE): New.
    	(_dl_runtime_resolve_fxsave): Likewise.
    	(USE_XSAVE): Likewise.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(USE_XSAVEC): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
    	Removed.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(_dl_runtime_resolve_fxsave): New.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    
    (cherry picked from commit b52b0d793dcb226ecb0ecca1e672ca265973233c)

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                            |   66 +++++++++
 NEWS                                 |    1 +
 sysdeps/x86/cpu-features-offsets.sym |    1 +
 sysdeps/x86/cpu-features.c           |   88 +++++++++--
 sysdeps/x86/cpu-features.h           |   31 +++-
 sysdeps/x86/cpu-tunables.c           |   17 ++-
 sysdeps/x86_64/Makefile              |    4 +
 sysdeps/x86_64/dl-machine.h          |   38 ++----
 sysdeps/x86_64/dl-trampoline.S       |   87 ++++--------
 sysdeps/x86_64/dl-trampoline.h       |  267 ++++++++++------------------------
 10 files changed, 294 insertions(+), 306 deletions(-)
Comment 24 Sourceware Commits 2017-10-22 15:42:55 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.25/master has been updated
       via  61bebc863e65afd994b08353f574c5df3fe8accc (commit)
      from  74e1eb907850f3a132e0b0a94a07989202b1cbf3 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=61bebc863e65afd994b08353f574c5df3fe8accc

commit 61bebc863e65afd994b08353f574c5df3fe8accc
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Oct 22 08:20:38 2017 -0700

    x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
    
    In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
    mask and bound registers.  It simplifies _dl_runtime_resolve and supports
    different calling conventions.  ld.so code size is reduced by more than
    1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
    than saving and restoring vector and bound registers individually.
    
    Latency for _dl_runtime_resolve to lookup the function, foo, from one
    shared library plus libc.so:
    
                                 Before    After     Change
    
    Westmere (SSE)/fxsave         345      866       151%
    IvyBridge (AVX)/xsave         420      643       53%
    Haswell (AVX)/xsave           713      1252      75%
    Skylake (AVX+MPX)/xsavec      559      719       28%
    Skylake (AVX512+MPX)/xsavec   145      272       87%
    Ryzen (AVX)/xsavec            280      553       97%
    
    This is the worst case where portion of time spent for saving and
    restoring registers is bigger than majority of cases.  With smaller
    _dl_runtime_resolve code size, overall performance impact is negligible.
    
    On IvyBridge, differences in build and test time of binutils with lazy
    binding GCC and binutils are noises.  On Westmere, differences in
    bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
    binutils are also noises.
    
    	[BZ #21265]
    	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
    	New.
    	* sysdeps/x86/cpu-features.c: Include <libc-internal.h>.
    	(get_common_indeces): Set xsave_state_size and
    	bit_arch_XSAVEC_Usable if needed.
    	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
    	and bit_arch_Use_dl_runtime_resolve_opt.
    	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
    	Removed.
    	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(bit_arch_Prefer_No_AVX512): Updated.
    	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
    	(bit_arch_XSAVEC_Usable): New.
    	(STATE_SAVE_OFFSET): Likewise.
    	(STATE_SAVE_MASK): Likewise.
    	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
    	(cpu_features): Add xsave_state_size.
    	(index_arch_Use_dl_runtime_resolve_opt): Removed.
    	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(index_arch_XSAVEC_Usable): New.
    	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
    	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
    	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
    	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
    	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
    	_dl_runtime_resolve_xsavec.
    	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
    	Removed.
    	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
    	instead of VEC_SIZE.
    	(REGISTER_SAVE_BND0): Removed.
    	(REGISTER_SAVE_BND1): Likewise.
    	(REGISTER_SAVE_BND3): Likewise.
    	(REGISTER_SAVE_RAX): Always defined to 0.
    	(VMOV): Removed.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_slow): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_avx512): Likewise.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(USE_FXSAVE): New.
    	(_dl_runtime_resolve_fxsave): Likewise.
    	(USE_XSAVE): Likewise.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(USE_XSAVEC): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
    	Removed.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(_dl_runtime_resolve_fxsave): New.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    
    (cherry picked from commit b52b0d793dcb226ecb0ecca1e672ca265973233c)

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                            |   62 ++++++++
 NEWS                                 |    1 +
 sysdeps/x86/cpu-features-offsets.sym |    1 +
 sysdeps/x86/cpu-features.c           |   83 +++++++++--
 sysdeps/x86/cpu-features.h           |   23 +++-
 sysdeps/x86_64/dl-machine.h          |   38 ++----
 sysdeps/x86_64/dl-trampoline.S       |   87 ++++--------
 sysdeps/x86_64/dl-trampoline.h       |  267 ++++++++++------------------------
 8 files changed, 265 insertions(+), 297 deletions(-)
Comment 25 Sourceware Commits 2017-10-22 15:53:31 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.24/master has been updated
       via  bea3f92405f705684275bffee954cafe84ffb09d (commit)
      from  5084717ffa05d15e98bc98a2c8b710ee57c4d133 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bea3f92405f705684275bffee954cafe84ffb09d

commit bea3f92405f705684275bffee954cafe84ffb09d
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Oct 22 08:24:00 2017 -0700

    x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
    
    In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
    mask and bound registers.  It simplifies _dl_runtime_resolve and supports
    different calling conventions.  ld.so code size is reduced by more than
    1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
    than saving and restoring vector and bound registers individually.
    
    Latency for _dl_runtime_resolve to lookup the function, foo, from one
    shared library plus libc.so:
    
                                 Before    After     Change
    
    Westmere (SSE)/fxsave         345      866       151%
    IvyBridge (AVX)/xsave         420      643       53%
    Haswell (AVX)/xsave           713      1252      75%
    Skylake (AVX+MPX)/xsavec      559      719       28%
    Skylake (AVX512+MPX)/xsavec   145      272       87%
    Ryzen (AVX)/xsavec            280      553       97%
    
    This is the worst case where portion of time spent for saving and
    restoring registers is bigger than majority of cases.  With smaller
    _dl_runtime_resolve code size, overall performance impact is negligible.
    
    On IvyBridge, differences in build and test time of binutils with lazy
    binding GCC and binutils are noises.  On Westmere, differences in
    bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
    binutils are also noises.
    
    	[BZ #21265]
    	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
    	New.
    	* sysdeps/x86/cpu-features.c: Include <libc-internal.h>.
    	(get_common_indeces): Set xsave_state_size and
    	bit_arch_XSAVEC_Usable if needed.
    	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
    	and bit_arch_Use_dl_runtime_resolve_opt.
    	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
    	Removed.
    	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(bit_arch_Prefer_No_AVX512): Updated.
    	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
    	(bit_arch_XSAVEC_Usable): New.
    	(STATE_SAVE_OFFSET): Likewise.
    	(STATE_SAVE_MASK): Likewise.
    	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
    	(cpu_features): Add xsave_state_size.
    	(index_arch_Use_dl_runtime_resolve_opt): Removed.
    	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
    	(index_arch_XSAVEC_Usable): New.
    	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
    	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
    	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
    	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
    	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
    	_dl_runtime_resolve_xsavec.
    	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
    	Removed.
    	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
    	instead of VEC_SIZE.
    	(REGISTER_SAVE_BND0): Removed.
    	(REGISTER_SAVE_BND1): Likewise.
    	(REGISTER_SAVE_BND3): Likewise.
    	(REGISTER_SAVE_RAX): Always defined to 0.
    	(VMOV): Removed.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_slow): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_avx512): Likewise.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(USE_FXSAVE): New.
    	(_dl_runtime_resolve_fxsave): Likewise.
    	(USE_XSAVE): Likewise.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(USE_XSAVEC): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
    	Removed.
    	(_dl_runtime_resolve_avx512_opt): Likewise.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_avx_opt): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_sse_vex): Likewise.
    	(_dl_runtime_resolve_fxsave): New.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    
    (cherry picked from commit b52b0d793dcb226ecb0ecca1e672ca265973233c)

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                            |   62 ++++++++
 NEWS                                 |    1 +
 sysdeps/x86/cpu-features-offsets.sym |    1 +
 sysdeps/x86/cpu-features.c           |   83 +++++++++--
 sysdeps/x86/cpu-features.h           |   23 +++-
 sysdeps/x86_64/dl-machine.h          |   38 ++----
 sysdeps/x86_64/dl-trampoline.S       |   87 ++++--------
 sysdeps/x86_64/dl-trampoline.h       |  267 ++++++++++------------------------
 8 files changed, 265 insertions(+), 297 deletions(-)
Comment 26 Sourceware Commits 2017-10-22 16:14:20 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.23/master has been updated
       via  26d289bb92b6d1125536644f607c73617463477d (commit)
      from  9d521f59de10968f874a5e22e9ce5f9b2a51fc2f (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=26d289bb92b6d1125536644f607c73617463477d

commit 26d289bb92b6d1125536644f607c73617463477d
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Oct 22 08:47:03 2017 -0700

    x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
    
    In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
    mask and bound registers.  It simplifies _dl_runtime_resolve and supports
    different calling conventions.  ld.so code size is reduced by more than
    1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
    than saving and restoring vector and bound registers individually.
    
    Latency for _dl_runtime_resolve to lookup the function, foo, from one
    shared library plus libc.so:
    
                                 Before    After     Change
    
    Westmere (SSE)/fxsave         345      866       151%
    IvyBridge (AVX)/xsave         420      643       53%
    Haswell (AVX)/xsave           713      1252      75%
    Skylake (AVX+MPX)/xsavec      559      719       28%
    Skylake (AVX512+MPX)/xsavec   145      272       87%
    Ryzen (AVX)/xsavec            280      553       97%
    
    This is the worst case where portion of time spent for saving and
    restoring registers is bigger than majority of cases.  With smaller
    _dl_runtime_resolve code size, overall performance impact is negligible.
    
    On IvyBridge, differences in build and test time of binutils with lazy
    binding GCC and binutils are noises.  On Westmere, differences in
    bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
    binutils are also noises.
    
    	[BZ #21265]
    	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
    	New.
    	* sysdeps/x86/cpu-features.c: Include <libc-internal.h>.
    	(init_cpu_features): Set xsave_state_size and bit_XSAVEC_Usable
    	if needed.
    	* sysdeps/x86/cpu-features.h (bit_XSAVEC_Usable): New.
    	(STATE_SAVE_OFFSET): Likewise.
    	(STATE_SAVE_MASK): Likewise.
    	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
    	(cpu_features): Add xsave_state_size.
    	(index_XSAVEC_Usable): New.
    	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
    	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx and
    	_dl_runtime_resolve_avx512 with _dl_runtime_resolve_fxsave,
    	_dl_runtime_resolve_xsave and _dl_runtime_resolve_xsavec.
    	* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
    	(DL_RUNTIME_UNALIGNED_VEC_SIZE): Removed.
    	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
    	instead of VEC_SIZE.
    	(REGISTER_SAVE_BND0): Removed.
    	(REGISTER_SAVE_BND1): Likewise.
    	(REGISTER_SAVE_BND3): Likewise.
    	(REGISTER_SAVE_RAX): Always defined to 0.
    	(VMOV): Removed.
    	(_dl_runtime_resolve_avx512): Likewise.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(USE_FXSAVE): New.
    	(_dl_runtime_resolve_fxsave): Likewise.
    	(USE_XSAVE): Likewise.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(USE_XSAVEC): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
    	Removed.
    	(_dl_runtime_resolve_avx): Likewise.
    	(_dl_runtime_resolve_sse): Likewise.
    	(_dl_runtime_resolve_fxsave): New.
    	(_dl_runtime_resolve_xsave): Likewise.
    	(_dl_runtime_resolve_xsavec): Likewise.
    	(_dl_runtime_profile): Defined only if _dl_runtime_profile is
    	defined.
    
    (cherry picked from commit b52b0d793dcb226ecb0ecca1e672ca265973233c)

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                            |   46 +++++++++
 NEWS                                 |    1 +
 sysdeps/x86/cpu-features-offsets.sym |    2 +
 sysdeps/x86/cpu-features.c           |   66 +++++++++++++
 sysdeps/x86/cpu-features.h           |   18 ++++
 sysdeps/x86_64/dl-machine.h          |   18 ++--
 sysdeps/x86_64/dl-trampoline.S       |  103 +++++++++------------
 sysdeps/x86_64/dl-trampoline.h       |  174 +++++++++++++++++-----------------
 8 files changed, 272 insertions(+), 156 deletions(-)
Comment 27 Florian Weimer 2017-11-03 08:24:19 UTC
*** Bug 21236 has been marked as a duplicate of this bug. ***
Comment 28 Sourceware Commits 2018-04-07 13:02:49 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/pr21265/2.23 has been deleted
       was  19d009625f022623cf1d98caa9f493aa01c7f7fb

- Log -----------------------------------------------------------------
19d009625f022623cf1d98caa9f493aa01c7f7fb x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
-----------------------------------------------------------------------
Comment 29 Sourceware Commits 2018-04-07 13:02:59 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/pr21265/2.24 has been deleted
       was  609ccf8ca804e0c65afad74fe5c6d867c3552dbb

- Log -----------------------------------------------------------------
609ccf8ca804e0c65afad74fe5c6d867c3552dbb x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
-----------------------------------------------------------------------
Comment 30 Sourceware Commits 2018-04-07 13:03:06 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/pr21265/2.25 has been deleted
       was  33122280d2bab96022dd768d14a69a99768499fc

- Log -----------------------------------------------------------------
33122280d2bab96022dd768d14a69a99768499fc x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
-----------------------------------------------------------------------
Comment 31 Sourceware Commits 2018-04-07 13:03:12 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/pr21265/2.26 has been deleted
       was  4a30c87b87e0c2bb6110230cd53f0fffa3022f77

- Log -----------------------------------------------------------------
4a30c87b87e0c2bb6110230cd53f0fffa3022f77 x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
-----------------------------------------------------------------------