This is the mail archive of the glibc-cvs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

GNU C Library master sources branch master updated. glibc-2.22-60-gb376899


This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  b376899d27e5ac892f0339cf1bbb3d2158347db8 (commit)
       via  1dfa4a94aea3249f9c6e577d795df420688cd8e3 (commit)
       via  1aee37a22e3977de7a89e734e0a1e112f52045f2 (commit)
       via  0b5395f052ee09cd7e3d219af4e805c38058afb5 (commit)
       via  e2e4f56056adddc3c1efe676b40a4b4f2453103b (commit)
      from  63e952d9be87db68f0e4164d4a5760b32e77ebff (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=b376899d27e5ac892f0339cf1bbb3d2158347db8

commit b376899d27e5ac892f0339cf1bbb3d2158347db8
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Aug 13 03:40:40 2015 -0700

    Update x86 elision-conf.c for <cpu-features.h>
    
    This patch updates x86 elision-conf.c to use the newly defined
    HAS_CPU_FEATURE from <cpu-features.h>.
    
    	* sysdeps/unix/sysv/linux/x86/elision-conf.c (elision_init):
    	Replace HAS_RTM with HAS_CPU_FEATURE (RTM).

diff --git a/ChangeLog b/ChangeLog
index c0fe4cd..5d1cd0f 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,10 @@
 2015-08-13  H.J. Lu  <hongjiu.lu@intel.com>
 
+	* sysdeps/unix/sysv/linux/x86/elision-conf.c (elision_init):
+	Replace HAS_RTM with HAS_CPU_FEATURE (RTM).
+
+2015-08-13  H.J. Lu  <hongjiu.lu@intel.com>
+
 	* math/Makefile ($(addprefix $(objpfx), $(libm-vec-tests))):
 	Remove $(objpfx)init-arch.o.
 	* sysdeps/x86_64/fpu/Makefile (libmvec-support): Remove
diff --git a/sysdeps/unix/sysv/linux/x86/elision-conf.c b/sysdeps/unix/sysv/linux/x86/elision-conf.c
index 84902ac..4a73382 100644
--- a/sysdeps/unix/sysv/linux/x86/elision-conf.c
+++ b/sysdeps/unix/sysv/linux/x86/elision-conf.c
@@ -62,11 +62,11 @@ elision_init (int argc __attribute__ ((unused)),
 	      char **argv  __attribute__ ((unused)),
 	      char **environ)
 {
-  __elision_available = HAS_RTM;
+  __elision_available = HAS_CPU_FEATURE (RTM);
 #ifdef ENABLE_LOCK_ELISION
   __pthread_force_elision = __libc_enable_secure ? 0 : __elision_available;
 #endif
-  if (!HAS_RTM)
+  if (!HAS_CPU_FEATURE (RTM))
     __elision_aconf.retry_try_xbegin = 0; /* Disable elision on rwlocks */
 }
 

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=1dfa4a94aea3249f9c6e577d795df420688cd8e3

commit 1dfa4a94aea3249f9c6e577d795df420688cd8e3
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Aug 13 03:40:00 2015 -0700

    Update libmvec multiarch functions for <cpu-features.h>
    
    This patch updates libmvec multiarch functions to use the newly defined
    HAS_CPU_FEATURE, HAS_ARCH_FEATURE and LOAD_RTLD_GLOBAL_RO_RDX from
    <cpu-features.h>.
    
    	* math/Makefile ($(addprefix $(objpfx), $(libm-vec-tests))):
    	Remove $(objpfx)init-arch.o.
    	* sysdeps/x86_64/fpu/Makefile (libmvec-support): Remove
    	init-arch.
    	* sysdeps/x86_64/fpu/math-tests-arch.h (avx_usable): Removed.
    	(INIT_ARCH_EXT): Defined as empty.
    	(CHECK_ARCH_EXT): Replace HAS_XXX with HAS_ARCH_FEATURE (XXX).
    	* sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.S: Remove
    	__init_cpu_features call.  Replace HAS_XXX with
    	HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
    	* sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_logf4_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_logf8_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S: Likewise.

diff --git a/ChangeLog b/ChangeLog
index 60e3e8f..c0fe4cd 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,53 @@
 2015-08-13  H.J. Lu  <hongjiu.lu@intel.com>
 
+	* math/Makefile ($(addprefix $(objpfx), $(libm-vec-tests))):
+	Remove $(objpfx)init-arch.o.
+	* sysdeps/x86_64/fpu/Makefile (libmvec-support): Remove
+	init-arch.
+	* sysdeps/x86_64/fpu/math-tests-arch.h (avx_usable): Removed.
+	(INIT_ARCH_EXT): Defined as empty.
+	(CHECK_ARCH_EXT): Replace HAS_XXX with HAS_ARCH_FEATURE (XXX).
+	* sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.S: Remove
+	__init_cpu_features call.  Replace HAS_XXX with
+	HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
+	* sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_logf4_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_logf8_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S: Likewise.
+
+2015-08-13  H.J. Lu  <hongjiu.lu@intel.com>
+
 	* sysdeps/i386/i686/fpu/multiarch/e_expf.c: Replace HAS_XXX
 	with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
 	* sysdeps/i386/i686/fpu/multiarch/s_cosf.c: Likewise.
diff --git a/math/Makefile b/math/Makefile
index 6388bae..d3b483d 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -263,7 +263,7 @@ $(objpfx)libieee.a: $(objpfx)ieee-math.o
 $(addprefix $(objpfx),$(filter-out $(tests-static) $(libm-vec-tests),$(tests))): $(libm)
 $(addprefix $(objpfx),$(tests-static)): $(objpfx)libm.a
 $(addprefix $(objpfx), $(libm-vec-tests)): $(objpfx)%: $(libm) $(libmvec) \
-					   $(objpfx)init-arch.o $(objpfx)%-wrappers.o
+					   $(objpfx)%-wrappers.o
 
 gmp-objs = $(patsubst %,$(common-objpfx)stdlib/%.o,\
 		      add_n sub_n cmp addmul_1 mul_1 mul_n divmod_1 \
diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
index 1ebe511..f98f6cf 100644
--- a/sysdeps/x86_64/fpu/Makefile
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -20,7 +20,7 @@ libmvec-support += svml_d_cos2_core svml_d_cos4_core_avx \
 		   svml_d_pow_data svml_s_powf4_core svml_s_powf8_core_avx \
 		   svml_s_powf8_core svml_s_powf16_core svml_s_powf_data \
 		   svml_s_sincosf4_core svml_s_sincosf8_core_avx \
-		   svml_s_sincosf8_core svml_s_sincosf16_core init-arch
+		   svml_s_sincosf8_core svml_s_sincosf16_core
 endif
 
 # Variables for libmvec tests.
diff --git a/sysdeps/x86_64/fpu/math-tests-arch.h b/sysdeps/x86_64/fpu/math-tests-arch.h
index e8833bf..fb8251b 100644
--- a/sysdeps/x86_64/fpu/math-tests-arch.h
+++ b/sysdeps/x86_64/fpu/math-tests-arch.h
@@ -19,66 +19,36 @@
 #if defined REQUIRE_AVX
 # include <init-arch.h>
 
-/* Set to 1 if AVX supported.  */
-static int avx_usable;
-
-# define INIT_ARCH_EXT                                         \
-  do                                                           \
-    {                                                          \
-      __init_cpu_features ();                                  \
-      avx_usable = __cpu_features.feature[index_AVX_Usable]    \
-                   & bit_AVX_Usable;                           \
-    }                                                          \
-  while (0)
+# define INIT_ARCH_EXT
 
 # define CHECK_ARCH_EXT                                        \
   do                                                           \
     {                                                          \
-      if (!avx_usable) return;                                 \
+      if (!HAS_ARCH_FEATURE (AVX_Usable)) return;              \
     }                                                          \
   while (0)
 
 #elif defined REQUIRE_AVX2
 # include <init-arch.h>
 
-  /* Set to 1 if AVX2 supported.  */
-  static int avx2_usable;
-
-# define INIT_ARCH_EXT                                         \
-  do                                                           \
-    {                                                          \
-      __init_cpu_features ();                                  \
-      avx2_usable = __cpu_features.feature[index_AVX2_Usable]  \
-                  & bit_AVX2_Usable;                           \
-    }                                                          \
-  while (0)
+# define INIT_ARCH_EXT
 
 # define CHECK_ARCH_EXT                                        \
   do                                                           \
     {                                                          \
-      if (!avx2_usable) return;                                \
+      if (!HAS_ARCH_FEATURE (AVX2_Usable)) return;             \
     }                                                          \
   while (0)
 
 #elif defined REQUIRE_AVX512F
 # include <init-arch.h>
 
-  /* Set to 1 if supported.  */
-  static int avx512f_usable;
-
-# define INIT_ARCH_EXT                                                \
-  do                                                                  \
-    {                                                                 \
-      __init_cpu_features ();                                         \
-      avx512f_usable = __cpu_features.feature[index_AVX512F_Usable]   \
-		       & bit_AVX512F_Usable;                          \
-    }                                                                 \
-  while (0)
+# define INIT_ARCH_EXT
 
 # define CHECK_ARCH_EXT                                        \
   do                                                           \
     {                                                          \
-      if (!avx512f_usable) return;                             \
+      if (!HAS_ARCH_FEATURE (AVX512F_Usable)) return;          \
     }                                                          \
   while (0)
 
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.S
index 5f67d83..c64485e 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVbN2v_cos)
         .type   _ZGVbN2v_cos, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVbN2v_cos_sse4(%rip), %rax
-        testl   $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVbN2v_cos_sse4(%rip), %rax
+	HAS_CPU_FEATURE (SSE4_1)
         jz      2f
         ret
 2:      leaq    _ZGVbN2v_cos_sse2(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.S
index 5babb83..6460690 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVdN4v_cos)
         .type   _ZGVdN4v_cos, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVdN4v_cos_avx2(%rip), %rax
-        testl   $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVdN4v_cos_avx2(%rip), %rax
+	HAS_ARCH_FEATURE (AVX2_Usable)
         jz      2f
         ret
 2:      leaq    _ZGVdN4v_cos_sse_wrapper(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S
index d0f4f27..add99a1 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S
@@ -22,14 +22,12 @@
 	.text
 ENTRY (_ZGVeN8v_cos)
         .type   _ZGVeN8v_cos, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
+	LOAD_RTLD_GLOBAL_RO_RDX
 1:      leaq    _ZGVeN8v_cos_skx(%rip), %rax
-        testl   $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX512DQ_Usable)
         jnz     2f
         leaq    _ZGVeN8v_cos_knl(%rip), %rax
-        testl   $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX512F_Usable)
         jnz     2f
         leaq    _ZGVeN8v_cos_avx2_wrapper(%rip), %rax
 2:      ret
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.S
index ef3dc49..538e991 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVbN2v_exp)
         .type   _ZGVbN2v_exp, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVbN2v_exp_sse4(%rip), %rax
-        testl   $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVbN2v_exp_sse4(%rip), %rax
+	HAS_CPU_FEATURE (SSE4_1)
         jz      2f
         ret
 2:      leaq    _ZGVbN2v_exp_sse2(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.S
index 7f2ebde..c68ca93 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVdN4v_exp)
         .type   _ZGVdN4v_exp, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVdN4v_exp_avx2(%rip), %rax
-        testl   $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVdN4v_exp_avx2(%rip), %rax
+	HAS_ARCH_FEATURE (AVX2_Usable)
         jz      2f
         ret
 2:      leaq    _ZGVdN4v_exp_sse_wrapper(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S
index 7b7c07d..d3985dc 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S
@@ -22,14 +22,12 @@
 	.text
 ENTRY (_ZGVeN8v_exp)
         .type   _ZGVeN8v_exp, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVeN8v_exp_skx(%rip), %rax
-        testl   $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVeN8v_exp_skx(%rip), %rax
+	HAS_ARCH_FEATURE (AVX512DQ_Usable)
         jnz     2f
         leaq    _ZGVeN8v_exp_knl(%rip), %rax
-        testl   $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX512F_Usable)
         jnz     2f
         leaq    _ZGVeN8v_exp_avx2_wrapper(%rip), %rax
 2:      ret
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.S
index 38d369f..adcb34e 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.S
@@ -22,11 +22,9 @@
         .text
 ENTRY (_ZGVbN2v_log)
         .type   _ZGVbN2v_log, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVbN2v_log_sse4(%rip), %rax
-        testl   $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVbN2v_log_sse4(%rip), %rax
+	HAS_CPU_FEATURE (SSE4_1)
         jz      2f
         ret
 2:      leaq    _ZGVbN2v_log_sse2(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.S
index ddb6105..9c9f84a 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVdN4v_log)
         .type   _ZGVdN4v_log, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVdN4v_log_avx2(%rip), %rax
-        testl   $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVdN4v_log_avx2(%rip), %rax
+	HAS_ARCH_FEATURE (AVX2_Usable)
         jz      2f
         ret
 2:      leaq    _ZGVdN4v_log_sse_wrapper(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S
index 76375fd..0ceb9eb 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S
@@ -22,14 +22,12 @@
 	.text
 ENTRY (_ZGVeN8v_log)
         .type   _ZGVeN8v_log, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVeN8v_log_skx(%rip), %rax
-        testl   $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVeN8v_log_skx(%rip), %rax
+	HAS_ARCH_FEATURE (AVX512DQ_Usable)
         jnz     2f
         leaq    _ZGVeN8v_log_knl(%rip), %rax
-        testl   $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX512F_Usable)
         jnz     2f
         leaq    _ZGVeN8v_log_avx2_wrapper(%rip), %rax
 2:      ret
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.S
index f111388..0fbdb43 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVbN2vv_pow)
         .type   _ZGVbN2vv_pow, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVbN2vv_pow_sse4(%rip), %rax
-        testl   $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVbN2vv_pow_sse4(%rip), %rax
+	HAS_CPU_FEATURE (SSE4_1)
         jz      2f
         ret
 2:      leaq    _ZGVbN2vv_pow_sse2(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.S
index 21e3070..0cf5c9b 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVdN4vv_pow)
         .type   _ZGVdN4vv_pow, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVdN4vv_pow_avx2(%rip), %rax
-        testl   $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVdN4vv_pow_avx2(%rip), %rax
+	HAS_ARCH_FEATURE (AVX2_Usable)
         jz      2f
         ret
 2:      leaq    _ZGVdN4vv_pow_sse_wrapper(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S
index c1e5e76..9afdf67 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S
@@ -22,14 +22,12 @@
 	.text
 ENTRY (_ZGVeN8vv_pow)
         .type   _ZGVeN8vv_pow, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVeN8vv_pow_skx(%rip), %rax
-        testl   $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVeN8vv_pow_skx(%rip), %rax
+	HAS_ARCH_FEATURE (AVX512DQ_Usable)
         jnz     2f
         leaq    _ZGVeN8vv_pow_knl(%rip), %rax
-        testl   $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX512F_Usable)
         jnz     2f
         leaq    _ZGVeN8vv_pow_avx2_wrapper(%rip), %rax
 2:      ret
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S
index 29bd0a7..eec486b 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVbN2v_sin)
         .type   _ZGVbN2v_sin, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVbN2v_sin_sse4(%rip), %rax
-        testl   $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVbN2v_sin_sse4(%rip), %rax
+	HAS_CPU_FEATURE (SSE4_1)
         jz      2f
         ret
 2:      leaq    _ZGVbN2v_sin_sse2(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S
index c3a453a..17cb5c1 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVdN4v_sin)
         .type   _ZGVdN4v_sin, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVdN4v_sin_avx2(%rip), %rax
-        testl   $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVdN4v_sin_avx2(%rip), %rax
+	HAS_ARCH_FEATURE (AVX2_Usable)
         jz      2f
         ret
 2:      leaq    _ZGVdN4v_sin_sse_wrapper(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S
index 131f2f4..61ee0c0 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S
@@ -22,14 +22,12 @@
 	.text
 ENTRY (_ZGVeN8v_sin)
         .type   _ZGVeN8v_sin, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVeN8v_sin_skx(%rip), %rax
-        testl   $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVeN8v_sin_skx(%rip), %rax
+	HAS_ARCH_FEATURE (AVX512DQ_Usable)
         jnz     2f
         leaq    _ZGVeN8v_sin_knl(%rip), %rax
-        testl   $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX512F_Usable)
         jnz     2f
         leaq    _ZGVeN8v_sin_avx2_wrapper(%rip), %rax
 2:      ret
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.S
index e8e5771..3d03c53 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVbN2vvv_sincos)
         .type   _ZGVbN2vvv_sincos, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVbN2vvv_sincos_sse4(%rip), %rax
-        testl   $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVbN2vvv_sincos_sse4(%rip), %rax
+	HAS_CPU_FEATURE (SSE4_1)
         jz      2f
         ret
 2:      leaq    _ZGVbN2vvv_sincos_sse2(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.S
index 64744ff..1cc2b69 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVdN4vvv_sincos)
         .type   _ZGVdN4vvv_sincos, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVdN4vvv_sincos_avx2(%rip), %rax
-        testl   $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVdN4vvv_sincos_avx2(%rip), %rax
+	HAS_ARCH_FEATURE (AVX2_Usable)
         jz      2f
         ret
 2:      leaq    _ZGVdN4vvv_sincos_sse_wrapper(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S
index e331090..850f221 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S
@@ -22,14 +22,12 @@
 	.text
 ENTRY (_ZGVeN8vvv_sincos)
         .type   _ZGVeN8vvv_sincos, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVeN8vvv_sincos_skx(%rip), %rax
-        testl   $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVeN8vvv_sincos_skx(%rip), %rax
+	HAS_ARCH_FEATURE (AVX512DQ_Usable)
         jnz     2f
         leaq    _ZGVeN8vvv_sincos_knl(%rip), %rax
-        testl   $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX512F_Usable)
         jnz     2f
         leaq    _ZGVeN8vvv_sincos_avx2_wrapper(%rip), %rax
 2:      ret
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core.S
index 0654d3c..227f46e 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core.S
@@ -22,14 +22,12 @@
 	.text
 ENTRY (_ZGVeN16v_cosf)
         .type   _ZGVeN16v_cosf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVeN16v_cosf_skx(%rip), %rax
-        testl   $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVeN16v_cosf_skx(%rip), %rax
+	HAS_ARCH_FEATURE (AVX512DQ_Usable)
         jnz     2f
         leaq    _ZGVeN16v_cosf_knl(%rip), %rax
-        testl   $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX512F_Usable)
         jnz     2f
         leaq    _ZGVeN16v_cosf_avx2_wrapper(%rip), %rax
 2:      ret
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core.S
index fa2363b..2e98938 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVbN4v_cosf)
         .type   _ZGVbN4v_cosf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVbN4v_cosf_sse4(%rip), %rax
-        testl   $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVbN4v_cosf_sse4(%rip), %rax
+	HAS_CPU_FEATURE (SSE4_1)
         jz      2f
         ret
 2:      leaq    _ZGVbN4v_cosf_sse2(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core.S
index e14bba4..830b10f 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVdN8v_cosf)
         .type   _ZGVdN8v_cosf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVdN8v_cosf_avx2(%rip), %rax
-        testl   $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVdN8v_cosf_avx2(%rip), %rax
+	HAS_ARCH_FEATURE (AVX2_Usable)
         jz      2f
         ret
 2:      leaq    _ZGVdN8v_cosf_sse_wrapper(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S
index 62858eb..79ac304 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S
@@ -22,14 +22,12 @@
 	.text
 ENTRY (_ZGVeN16v_expf)
         .type   _ZGVeN16v_expf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVeN16v_expf_skx(%rip), %rax
-        testl   $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVeN16v_expf_skx(%rip), %rax
+	HAS_ARCH_FEATURE (AVX512DQ_Usable)
         jnz     2f
         leaq    _ZGVeN16v_expf_knl(%rip), %rax
-        testl   $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX512F_Usable)
         jnz     2f
         leaq    _ZGVeN16v_expf_avx2_wrapper(%rip), %rax
 2:      ret
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S
index 37d38bc..e9781f3 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVbN4v_expf)
         .type   _ZGVbN4v_expf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVbN4v_expf_sse4(%rip), %rax
-        testl   $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVbN4v_expf_sse4(%rip), %rax
+	HAS_CPU_FEATURE (SSE4_1)
         jz      2f
         ret
 2:      leaq    _ZGVbN4v_expf_sse2(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S
index e3dc1b1..41e59ef 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVdN8v_expf)
         .type   _ZGVdN8v_expf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVdN8v_expf_avx2(%rip), %rax
-        testl   $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVdN8v_expf_avx2(%rip), %rax
+	HAS_ARCH_FEATURE (AVX2_Usable)
         jz      2f
         ret
 2:      leaq    _ZGVdN8v_expf_sse_wrapper(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core.S
index 68c57e4..fa01161 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core.S
@@ -22,14 +22,12 @@
 	.text
 ENTRY (_ZGVeN16v_logf)
         .type   _ZGVeN16v_logf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVeN16v_logf_skx(%rip), %rax
-        testl   $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVeN16v_logf_skx(%rip), %rax
+	HAS_ARCH_FEATURE (AVX512DQ_Usable)
         jnz     2f
         leaq    _ZGVeN16v_logf_knl(%rip), %rax
-        testl   $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX512F_Usable)
         jnz     2f
         leaq    _ZGVeN16v_logf_avx2_wrapper(%rip), %rax
 2:      ret
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_logf4_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_logf4_core.S
index 153ed8e..0f1ca73 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_logf4_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_logf4_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVbN4v_logf)
         .type   _ZGVbN4v_logf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVbN4v_logf_sse4(%rip), %rax
-        testl   $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVbN4v_logf_sse4(%rip), %rax
+	HAS_CPU_FEATURE (SSE4_1)
         jz      2f
         ret
 2:      leaq    _ZGVbN4v_logf_sse2(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_logf8_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_logf8_core.S
index 6f50bf6..65d1f7f 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_logf8_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_logf8_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVdN8v_logf)
         .type   _ZGVdN8v_logf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVdN8v_logf_avx2(%rip), %rax
-        testl   $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVdN8v_logf_avx2(%rip), %rax
+	HAS_ARCH_FEATURE (AVX2_Usable)
         jz      2f
         ret
 2:      leaq    _ZGVdN8v_logf_sse_wrapper(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core.S
index 3aa9f95..e33e83e 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core.S
@@ -22,14 +22,12 @@
 	.text
 ENTRY (_ZGVeN16vv_powf)
         .type   _ZGVeN16vv_powf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVeN16vv_powf_skx(%rip), %rax
-        testl   $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVeN16vv_powf_skx(%rip), %rax
+	HAS_ARCH_FEATURE (AVX512DQ_Usable)
         jnz     2f
         leaq    _ZGVeN16vv_powf_knl(%rip), %rax
-        testl   $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX512F_Usable)
         jnz     2f
         leaq    _ZGVeN16vv_powf_avx2_wrapper(%rip), %rax
 2:      ret
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core.S
index f88b9ca..28abeec 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVbN4vv_powf)
         .type   _ZGVbN4vv_powf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVbN4vv_powf_sse4(%rip), %rax
-        testl   $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVbN4vv_powf_sse4(%rip), %rax
+	HAS_CPU_FEATURE (SSE4_1)
         jz      2f
         ret
 2:      leaq    _ZGVbN4vv_powf_sse2(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core.S
index 4552e57..0cbbe5d 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVdN8vv_powf)
         .type   _ZGVdN8vv_powf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVdN8vv_powf_avx2(%rip), %rax
-        testl   $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVdN8vv_powf_avx2(%rip), %rax
+	HAS_ARCH_FEATURE (AVX2_Usable)
         jz      2f
         ret
 2:      leaq    _ZGVdN8vv_powf_sse_wrapper(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S
index bdcabab..a32b66e 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S
@@ -22,14 +22,12 @@
 	.text
 ENTRY (_ZGVeN16vvv_sincosf)
         .type   _ZGVeN16vvv_sincosf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVeN16vvv_sincosf_skx(%rip), %rax
-        testl   $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVeN16vvv_sincosf_skx(%rip), %rax
+	HAS_ARCH_FEATURE (AVX512DQ_Usable)
         jnz     2f
         leaq    _ZGVeN16vvv_sincosf_knl(%rip), %rax
-        testl   $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX512F_Usable)
         jnz     2f
         leaq    _ZGVeN16vvv_sincosf_avx2_wrapper(%rip), %rax
 2:      ret
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
index 610046b..e64fbfb 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVbN4vvv_sincosf)
         .type   _ZGVbN4vvv_sincosf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVbN4vvv_sincosf_sse4(%rip), %rax
-        testl   $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVbN4vvv_sincosf_sse4(%rip), %rax
+	HAS_CPU_FEATURE (SSE4_1)
         jz      2f
         ret
 2:      leaq    _ZGVbN4vvv_sincosf_sse2(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S
index 9e5be67..b3f31c6 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVdN8vvv_sincosf)
         .type   _ZGVdN8vvv_sincosf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVdN8vvv_sincosf_avx2(%rip), %rax
-        testl   $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVdN8vvv_sincosf_avx2(%rip), %rax
+	HAS_ARCH_FEATURE (AVX2_Usable)
         jz      2f
         ret
 2:      leaq    _ZGVdN8vvv_sincosf_sse_wrapper(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S
index 3ec78a0..c7a0adb 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S
@@ -22,14 +22,12 @@
 	.text
 ENTRY (_ZGVeN16v_sinf)
         .type   _ZGVeN16v_sinf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVeN16v_sinf_skx(%rip), %rax
-        testl   $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVeN16v_sinf_skx(%rip), %rax
+	HAS_ARCH_FEATURE (AVX512DQ_Usable)
         jnz     2f
         leaq    _ZGVeN16v_sinf_knl(%rip), %rax
-        testl   $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX512F_Usable)
         jnz     2f
         leaq    _ZGVeN16v_sinf_avx2_wrapper(%rip), %rax
 2:      ret
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S
index cf1e4df..58bd177 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVbN4v_sinf)
         .type   _ZGVbN4v_sinf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
-1:      leaq    _ZGVbN4v_sinf_sse4(%rip), %rax
-        testl   $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+        leaq    _ZGVbN4v_sinf_sse4(%rip), %rax
+	HAS_CPU_FEATURE (SSE4_1)
         jz      2f
         ret
 2:      leaq    _ZGVbN4v_sinf_sse2(%rip), %rax
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S
index b28bf3c..debec59 100644
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S
@@ -22,11 +22,9 @@
 	.text
 ENTRY (_ZGVdN8v_sinf)
         .type   _ZGVdN8v_sinf, @gnu_indirect_function
-        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
-        jne     1f
-        call    __init_cpu_features
+	LOAD_RTLD_GLOBAL_RO_RDX
 1:      leaq    _ZGVdN8v_sinf_avx2(%rip), %rax
-        testl   $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX2_Usable)
         jz      2f
         ret
 2:      leaq    _ZGVdN8v_sinf_sse_wrapper(%rip), %rax

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=1aee37a22e3977de7a89e734e0a1e112f52045f2

commit 1aee37a22e3977de7a89e734e0a1e112f52045f2
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Aug 13 03:39:22 2015 -0700

    Update i686 multiarch functions for <cpu-features.h>
    
    This patch updates i686 multiarch functions to use the newly defined
    HAS_CPU_FEATURE, HAS_ARCH_FEATURE, LOAD_GOT_AND_RTLD_GLOBAL_RO and
    LOAD_FUNC_GOT_EAX from <cpu-features.h>.
    
    	* sysdeps/i386/i686/fpu/multiarch/e_expf.c: Replace HAS_XXX
    	with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
    	* sysdeps/i386/i686/fpu/multiarch/s_cosf.c: Likewise.
    	* sysdeps/i386/i686/fpu/multiarch/s_cosf.c: Likewise.
    	* sysdeps/i386/i686/fpu/multiarch/s_sincosf.c: Likewise.
    	* sysdeps/i386/i686/fpu/multiarch/s_sinf.c: Likewise.
    	* sysdeps/i386/i686/multiarch/ifunc-impl-list.c: Likewise.
    	* sysdeps/i386/i686/multiarch/s_fma.c: Likewise.
    	* sysdeps/i386/i686/multiarch/s_fmaf.c: Likewise.
    	* sysdeps/i386/i686/multiarch/bcopy.S: Remove __init_cpu_features
    	call.  Merge SHARED and !SHARED.  Add LOAD_GOT_AND_RTLD_GLOBAL_RO.
    	Use LOAD_FUNC_GOT_EAX to load function address.  Replace HAS_XXX
    	with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
    	* sysdeps/i386/i686/multiarch/bzero.S: Likewise.
    	* sysdeps/i386/i686/multiarch/memchr.S: Likewise.
    	* sysdeps/i386/i686/multiarch/memcmp.S: Likewise.
    	* sysdeps/i386/i686/multiarch/memcpy.S: Likewise.
    	* sysdeps/i386/i686/multiarch/memcpy_chk.S: Likewise.
    	* sysdeps/i386/i686/multiarch/memmove.S: Likewise.
    	* sysdeps/i386/i686/multiarch/memmove_chk.S: Likewise.
    	* sysdeps/i386/i686/multiarch/mempcpy.S: Likewise.
    	* sysdeps/i386/i686/multiarch/mempcpy_chk.S: Likewise.
    	* sysdeps/i386/i686/multiarch/memrchr.S: Likewise.
    	* sysdeps/i386/i686/multiarch/memset.S: Likewise.
    	* sysdeps/i386/i686/multiarch/memset_chk.S: Likewise.
    	* sysdeps/i386/i686/multiarch/rawmemchr.S: Likewise.
    	* sysdeps/i386/i686/multiarch/strcasecmp.S: Likewise.
    	* sysdeps/i386/i686/multiarch/strcat.S: Likewise.
    	* sysdeps/i386/i686/multiarch/strchr.S: Likewise.
    	* sysdeps/i386/i686/multiarch/strcmp.S: Likewise.
    	* sysdeps/i386/i686/multiarch/strcpy.S: Likewise.
    	* sysdeps/i386/i686/multiarch/strcspn.S: Likewise.
    	* sysdeps/i386/i686/multiarch/strlen.S: Likewise.
    	* sysdeps/i386/i686/multiarch/strncase.S: Likewise.
    	* sysdeps/i386/i686/multiarch/strnlen.S: Likewise.
    	* sysdeps/i386/i686/multiarch/strrchr.S: Likewise.
    	* sysdeps/i386/i686/multiarch/strspn.S: Likewise.
    	* sysdeps/i386/i686/multiarch/wcschr.S: Likewise.
    	* sysdeps/i386/i686/multiarch/wcscmp.S: Likewise.
    	* sysdeps/i386/i686/multiarch/wcscpy.S: Likewise.
    	* sysdeps/i386/i686/multiarch/wcslen.S: Likewise.
    	* sysdeps/i386/i686/multiarch/wcsrchr.S: Likewise.
    	* sysdeps/i386/i686/multiarch/wmemcmp.S: Likewise.

diff --git a/ChangeLog b/ChangeLog
index 5ea2847..60e3e8f 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,51 @@
 2015-08-13  H.J. Lu  <hongjiu.lu@intel.com>
 
+	* sysdeps/i386/i686/fpu/multiarch/e_expf.c: Replace HAS_XXX
+	with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
+	* sysdeps/i386/i686/fpu/multiarch/s_cosf.c: Likewise.
+	* sysdeps/i386/i686/fpu/multiarch/s_cosf.c: Likewise.
+	* sysdeps/i386/i686/fpu/multiarch/s_sincosf.c: Likewise.
+	* sysdeps/i386/i686/fpu/multiarch/s_sinf.c: Likewise.
+	* sysdeps/i386/i686/multiarch/ifunc-impl-list.c: Likewise.
+	* sysdeps/i386/i686/multiarch/s_fma.c: Likewise.
+	* sysdeps/i386/i686/multiarch/s_fmaf.c: Likewise.
+	* sysdeps/i386/i686/multiarch/bcopy.S: Remove __init_cpu_features
+	call.  Merge SHARED and !SHARED.  Add LOAD_GOT_AND_RTLD_GLOBAL_RO.
+	Use LOAD_FUNC_GOT_EAX to load function address.  Replace HAS_XXX
+	with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
+	* sysdeps/i386/i686/multiarch/bzero.S: Likewise.
+	* sysdeps/i386/i686/multiarch/memchr.S: Likewise.
+	* sysdeps/i386/i686/multiarch/memcmp.S: Likewise.
+	* sysdeps/i386/i686/multiarch/memcpy.S: Likewise.
+	* sysdeps/i386/i686/multiarch/memcpy_chk.S: Likewise.
+	* sysdeps/i386/i686/multiarch/memmove.S: Likewise.
+	* sysdeps/i386/i686/multiarch/memmove_chk.S: Likewise.
+	* sysdeps/i386/i686/multiarch/mempcpy.S: Likewise.
+	* sysdeps/i386/i686/multiarch/mempcpy_chk.S: Likewise.
+	* sysdeps/i386/i686/multiarch/memrchr.S: Likewise.
+	* sysdeps/i386/i686/multiarch/memset.S: Likewise.
+	* sysdeps/i386/i686/multiarch/memset_chk.S: Likewise.
+	* sysdeps/i386/i686/multiarch/rawmemchr.S: Likewise.
+	* sysdeps/i386/i686/multiarch/strcasecmp.S: Likewise.
+	* sysdeps/i386/i686/multiarch/strcat.S: Likewise.
+	* sysdeps/i386/i686/multiarch/strchr.S: Likewise.
+	* sysdeps/i386/i686/multiarch/strcmp.S: Likewise.
+	* sysdeps/i386/i686/multiarch/strcpy.S: Likewise.
+	* sysdeps/i386/i686/multiarch/strcspn.S: Likewise.
+	* sysdeps/i386/i686/multiarch/strlen.S: Likewise.
+	* sysdeps/i386/i686/multiarch/strncase.S: Likewise.
+	* sysdeps/i386/i686/multiarch/strnlen.S: Likewise.
+	* sysdeps/i386/i686/multiarch/strrchr.S: Likewise.
+	* sysdeps/i386/i686/multiarch/strspn.S: Likewise.
+	* sysdeps/i386/i686/multiarch/wcschr.S: Likewise.
+	* sysdeps/i386/i686/multiarch/wcscmp.S: Likewise.
+	* sysdeps/i386/i686/multiarch/wcscpy.S: Likewise.
+	* sysdeps/i386/i686/multiarch/wcslen.S: Likewise.
+	* sysdeps/i386/i686/multiarch/wcsrchr.S: Likewise.
+	* sysdeps/i386/i686/multiarch/wmemcmp.S: Likewise.
+
+2015-08-13  H.J. Lu  <hongjiu.lu@intel.com>
+
 	* sysdeps/x86_64/fpu/multiarch/e_asin.c: Replace HAS_XXX with
 	HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
 	* sysdeps/x86_64/fpu/multiarch/e_atan2.c: Likewise.
diff --git a/sysdeps/i386/i686/fpu/multiarch/e_expf.c b/sysdeps/i386/i686/fpu/multiarch/e_expf.c
index 5102dae..6978883 100644
--- a/sysdeps/i386/i686/fpu/multiarch/e_expf.c
+++ b/sysdeps/i386/i686/fpu/multiarch/e_expf.c
@@ -23,11 +23,15 @@ extern double __ieee754_expf_ia32 (double);
 
 double __ieee754_expf (double);
 libm_ifunc (__ieee754_expf,
-	    HAS_SSE2 ? __ieee754_expf_sse2 : __ieee754_expf_ia32);
+	    HAS_CPU_FEATURE (SSE2)
+	    ? __ieee754_expf_sse2
+	    : __ieee754_expf_ia32);
 
 extern double __expf_finite_sse2 (double);
 extern double __expf_finite_ia32 (double);
 
 double __expf_finite (double);
 libm_ifunc (__expf_finite,
-	    HAS_SSE2 ? __expf_finite_sse2 : __expf_finite_ia32);
+	    HAS_CPU_FEATURE (SSE2)
+	    ? __expf_finite_sse2
+	    : __expf_finite_ia32);
diff --git a/sysdeps/i386/i686/fpu/multiarch/s_cosf.c b/sysdeps/i386/i686/fpu/multiarch/s_cosf.c
index 0799dca..e32b2f4 100644
--- a/sysdeps/i386/i686/fpu/multiarch/s_cosf.c
+++ b/sysdeps/i386/i686/fpu/multiarch/s_cosf.c
@@ -22,7 +22,7 @@ extern float __cosf_sse2 (float);
 extern float __cosf_ia32 (float);
 float __cosf (float);
 
-libm_ifunc (__cosf, HAS_SSE2 ? __cosf_sse2 : __cosf_ia32);
+libm_ifunc (__cosf, HAS_CPU_FEATURE (SSE2) ? __cosf_sse2 : __cosf_ia32);
 weak_alias (__cosf, cosf);
 
 #define COSF __cosf_ia32
diff --git a/sysdeps/i386/i686/fpu/multiarch/s_sincosf.c b/sysdeps/i386/i686/fpu/multiarch/s_sincosf.c
index 384d844..0d827e0 100644
--- a/sysdeps/i386/i686/fpu/multiarch/s_sincosf.c
+++ b/sysdeps/i386/i686/fpu/multiarch/s_sincosf.c
@@ -22,7 +22,8 @@ extern void __sincosf_sse2 (float, float *, float *);
 extern void __sincosf_ia32 (float, float *, float *);
 void __sincosf (float, float *, float *);
 
-libm_ifunc (__sincosf, HAS_SSE2 ? __sincosf_sse2 : __sincosf_ia32);
+libm_ifunc (__sincosf,
+	    HAS_CPU_FEATURE (SSE2) ? __sincosf_sse2 : __sincosf_ia32);
 weak_alias (__sincosf, sincosf);
 
 #define SINCOSF __sincosf_ia32
diff --git a/sysdeps/i386/i686/fpu/multiarch/s_sinf.c b/sysdeps/i386/i686/fpu/multiarch/s_sinf.c
index 6b62772..18afaa2 100644
--- a/sysdeps/i386/i686/fpu/multiarch/s_sinf.c
+++ b/sysdeps/i386/i686/fpu/multiarch/s_sinf.c
@@ -22,7 +22,7 @@ extern float __sinf_sse2 (float);
 extern float __sinf_ia32 (float);
 float __sinf (float);
 
-libm_ifunc (__sinf, HAS_SSE2 ? __sinf_sse2 : __sinf_ia32);
+libm_ifunc (__sinf, HAS_CPU_FEATURE (SSE2) ? __sinf_sse2 : __sinf_ia32);
 weak_alias (__sinf, sinf);
 #define SINF __sinf_ia32
 #include <sysdeps/ieee754/flt-32/s_sinf.c>
diff --git a/sysdeps/i386/i686/multiarch/bcopy.S b/sysdeps/i386/i686/multiarch/bcopy.S
index e767d97..3fc95dc 100644
--- a/sysdeps/i386/i686/multiarch/bcopy.S
+++ b/sysdeps/i386/i686/multiarch/bcopy.S
@@ -23,51 +23,24 @@
 
 /* Define multiple versions only for the definition in lib.  */
 #if IS_IN (libc)
-# ifdef SHARED
 	.text
 ENTRY(bcopy)
 	.type	bcopy, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__bcopy_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__bcopy_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__bcopy_sse2_unaligned@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Unaligned_Load, FEATURE_OFFSET+index_Fast_Unaligned_Load+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__bcopy_sse2_unaligned)
+	HAS_ARCH_FEATURE (Fast_Unaligned_Load)
 	jnz	2f
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
-	leal	__bcopy_ssse3@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__bcopy_ssse3)
+	HAS_CPU_FEATURE (Fast_Rep_String)
 	jz	2f
-	leal	__bcopy_ssse3_rep@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
-END(bcopy)
-# else
-	.text
-ENTRY(bcopy)
-	.type	bcopy, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__bcopy_ia32, %eax
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features
-	jz	2f
-	leal	__bcopy_ssse3, %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features
-	jz	2f
-	leal	__bcopy_ssse3_rep, %eax
+	LOAD_FUNC_GOT_EAX (__bcopy_ssse3_rep)
 2:	ret
 END(bcopy)
-# endif
 
 # undef ENTRY
 # define ENTRY(name) \
diff --git a/sysdeps/i386/i686/multiarch/bzero.S b/sysdeps/i386/i686/multiarch/bzero.S
index e8dc85f..95c96a8 100644
--- a/sysdeps/i386/i686/multiarch/bzero.S
+++ b/sysdeps/i386/i686/multiarch/bzero.S
@@ -23,46 +23,19 @@
 
 /* Define multiple versions only for the definition in lib.  */
 #if IS_IN (libc)
-# ifdef SHARED
-	.text
-ENTRY(__bzero)
-	.type	__bzero, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__bzero_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
-	jz	2f
-	leal	__bzero_sse2@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features@GOTOFF(%ebx)
-	jz	2f
-	leal	__bzero_sse2_rep@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
-END(__bzero)
-# else
 	.text
 ENTRY(__bzero)
 	.type	__bzero, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__bzero_ia32, %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__bzero_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__bzero_sse2, %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features
+	LOAD_FUNC_GOT_EAX ( __bzero_sse2)
+	HAS_CPU_FEATURE (Fast_Rep_String)
 	jz	2f
-	leal	__bzero_sse2_rep, %eax
+	LOAD_FUNC_GOT_EAX (__bzero_sse2_rep)
 2:	ret
 END(__bzero)
-# endif
 
 # undef ENTRY
 # define ENTRY(name) \
diff --git a/sysdeps/i386/i686/multiarch/ifunc-impl-list.c b/sysdeps/i386/i686/multiarch/ifunc-impl-list.c
index 92366a7..a6735a8 100644
--- a/sysdeps/i386/i686/multiarch/ifunc-impl-list.c
+++ b/sysdeps/i386/i686/multiarch/ifunc-impl-list.c
@@ -38,152 +38,179 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 
   /* Support sysdeps/i386/i686/multiarch/bcopy.S.  */
   IFUNC_IMPL (i, name, bcopy,
-	      IFUNC_IMPL_ADD (array, i, bcopy, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, bcopy, HAS_CPU_FEATURE (SSSE3),
 			      __bcopy_ssse3_rep)
-	      IFUNC_IMPL_ADD (array, i, bcopy, HAS_SSSE3, __bcopy_ssse3)
-	      IFUNC_IMPL_ADD (array, i, bcopy, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, bcopy, HAS_CPU_FEATURE (SSSE3),
+			      __bcopy_ssse3)
+	      IFUNC_IMPL_ADD (array, i, bcopy, HAS_CPU_FEATURE (SSE2),
 			      __bcopy_sse2_unaligned)
 	      IFUNC_IMPL_ADD (array, i, bcopy, 1, __bcopy_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/bzero.S.  */
   IFUNC_IMPL (i, name, bzero,
-	      IFUNC_IMPL_ADD (array, i, bzero, HAS_SSE2, __bzero_sse2_rep)
-	      IFUNC_IMPL_ADD (array, i, bzero, HAS_SSE2, __bzero_sse2)
+	      IFUNC_IMPL_ADD (array, i, bzero, HAS_CPU_FEATURE (SSE2),
+			      __bzero_sse2_rep)
+	      IFUNC_IMPL_ADD (array, i, bzero, HAS_CPU_FEATURE (SSE2),
+			      __bzero_sse2)
 	      IFUNC_IMPL_ADD (array, i, bzero, 1, __bzero_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/memchr.S.  */
   IFUNC_IMPL (i, name, memchr,
-	      IFUNC_IMPL_ADD (array, i, memchr, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, memchr, HAS_CPU_FEATURE (SSE2),
 			      __memchr_sse2_bsf)
-	      IFUNC_IMPL_ADD (array, i, memchr, HAS_SSE2, __memchr_sse2)
+	      IFUNC_IMPL_ADD (array, i, memchr, HAS_CPU_FEATURE (SSE2),
+			      __memchr_sse2)
 	      IFUNC_IMPL_ADD (array, i, memchr, 1, __memchr_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/memcmp.S.  */
   IFUNC_IMPL (i, name, memcmp,
-	      IFUNC_IMPL_ADD (array, i, memcmp, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, memcmp, HAS_CPU_FEATURE (SSE4_2),
 			      __memcmp_sse4_2)
-	      IFUNC_IMPL_ADD (array, i, memcmp, HAS_SSSE3, __memcmp_ssse3)
+	      IFUNC_IMPL_ADD (array, i, memcmp, HAS_CPU_FEATURE (SSSE3),
+			      __memcmp_ssse3)
 	      IFUNC_IMPL_ADD (array, i, memcmp, 1, __memcmp_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/memmove_chk.S.  */
   IFUNC_IMPL (i, name, __memmove_chk,
-	      IFUNC_IMPL_ADD (array, i, __memmove_chk, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, __memmove_chk,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __memmove_chk_ssse3_rep)
-	      IFUNC_IMPL_ADD (array, i, __memmove_chk, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, __memmove_chk,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __memmove_chk_ssse3)
-	      IFUNC_IMPL_ADD (array, i, __memmove_chk, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, __memmove_chk,
+			      HAS_CPU_FEATURE (SSE2),
 			      __memmove_chk_sse2_unaligned)
 	      IFUNC_IMPL_ADD (array, i, __memmove_chk, 1,
 			      __memmove_chk_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/memmove.S.  */
   IFUNC_IMPL (i, name, memmove,
-	      IFUNC_IMPL_ADD (array, i, memmove, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, memmove, HAS_CPU_FEATURE (SSSE3),
 			      __memmove_ssse3_rep)
-	      IFUNC_IMPL_ADD (array, i, memmove, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, memmove, HAS_CPU_FEATURE (SSSE3),
 			      __memmove_ssse3)
-	      IFUNC_IMPL_ADD (array, i, memmove, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, memmove, HAS_CPU_FEATURE (SSE2),
 			      __memmove_sse2_unaligned)
 	      IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/memrchr.S.  */
   IFUNC_IMPL (i, name, memrchr,
-	      IFUNC_IMPL_ADD (array, i, memrchr, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, memrchr, HAS_CPU_FEATURE (SSE2),
 			      __memrchr_sse2_bsf)
-	      IFUNC_IMPL_ADD (array, i, memrchr, HAS_SSE2, __memrchr_sse2)
+	      IFUNC_IMPL_ADD (array, i, memrchr, HAS_CPU_FEATURE (SSE2),
+			      __memrchr_sse2)
 	      IFUNC_IMPL_ADD (array, i, memrchr, 1, __memrchr_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/memset_chk.S.  */
   IFUNC_IMPL (i, name, __memset_chk,
-	      IFUNC_IMPL_ADD (array, i, __memset_chk, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, __memset_chk,
+			      HAS_CPU_FEATURE (SSE2),
 			      __memset_chk_sse2_rep)
-	      IFUNC_IMPL_ADD (array, i, __memset_chk, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, __memset_chk,
+			      HAS_CPU_FEATURE (SSE2),
 			      __memset_chk_sse2)
 	      IFUNC_IMPL_ADD (array, i, __memset_chk, 1,
 			      __memset_chk_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/memset.S.  */
   IFUNC_IMPL (i, name, memset,
-	      IFUNC_IMPL_ADD (array, i, memset, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, memset, HAS_CPU_FEATURE (SSE2),
 			      __memset_sse2_rep)
-	      IFUNC_IMPL_ADD (array, i, memset, HAS_SSE2, __memset_sse2)
+	      IFUNC_IMPL_ADD (array, i, memset, HAS_CPU_FEATURE (SSE2),
+			      __memset_sse2)
 	      IFUNC_IMPL_ADD (array, i, memset, 1, __memset_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/rawmemchr.S.  */
   IFUNC_IMPL (i, name, rawmemchr,
-	      IFUNC_IMPL_ADD (array, i, rawmemchr, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, rawmemchr, HAS_CPU_FEATURE (SSE2),
 			      __rawmemchr_sse2_bsf)
-	      IFUNC_IMPL_ADD (array, i, rawmemchr, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, rawmemchr, HAS_CPU_FEATURE (SSE2),
 			      __rawmemchr_sse2)
 	      IFUNC_IMPL_ADD (array, i, rawmemchr, 1, __rawmemchr_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/stpncpy.S.  */
   IFUNC_IMPL (i, name, stpncpy,
-	      IFUNC_IMPL_ADD (array, i, stpncpy, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, stpncpy, HAS_CPU_FEATURE (SSSE3),
 			      __stpncpy_ssse3)
-	      IFUNC_IMPL_ADD (array, i, stpncpy, HAS_SSE2, __stpncpy_sse2)
+	      IFUNC_IMPL_ADD (array, i, stpncpy, HAS_CPU_FEATURE (SSE2),
+			      __stpncpy_sse2)
 	      IFUNC_IMPL_ADD (array, i, stpncpy, 1, __stpncpy_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/stpcpy.S.  */
   IFUNC_IMPL (i, name, stpcpy,
-	      IFUNC_IMPL_ADD (array, i, stpcpy, HAS_SSSE3, __stpcpy_ssse3)
-	      IFUNC_IMPL_ADD (array, i, stpcpy, HAS_SSE2, __stpcpy_sse2)
+	      IFUNC_IMPL_ADD (array, i, stpcpy, HAS_CPU_FEATURE (SSSE3),
+			      __stpcpy_ssse3)
+	      IFUNC_IMPL_ADD (array, i, stpcpy, HAS_CPU_FEATURE (SSE2),
+			      __stpcpy_sse2)
 	      IFUNC_IMPL_ADD (array, i, stpcpy, 1, __stpcpy_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strcasecmp.S.  */
   IFUNC_IMPL (i, name, strcasecmp,
-	      IFUNC_IMPL_ADD (array, i, strcasecmp, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, strcasecmp,
+			      HAS_CPU_FEATURE (SSE4_2),
 			      __strcasecmp_sse4_2)
-	      IFUNC_IMPL_ADD (array, i, strcasecmp, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, strcasecmp,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __strcasecmp_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strcasecmp, 1, __strcasecmp_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strcasecmp_l.S.  */
   IFUNC_IMPL (i, name, strcasecmp_l,
-	      IFUNC_IMPL_ADD (array, i, strcasecmp_l, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, strcasecmp_l,
+			      HAS_CPU_FEATURE (SSE4_2),
 			      __strcasecmp_l_sse4_2)
-	      IFUNC_IMPL_ADD (array, i, strcasecmp_l, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, strcasecmp_l,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __strcasecmp_l_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strcasecmp_l, 1,
 			      __strcasecmp_l_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strcat.S.  */
   IFUNC_IMPL (i, name, strcat,
-	      IFUNC_IMPL_ADD (array, i, strcat, HAS_SSSE3, __strcat_ssse3)
-	      IFUNC_IMPL_ADD (array, i, strcat, HAS_SSE2, __strcat_sse2)
+	      IFUNC_IMPL_ADD (array, i, strcat, HAS_CPU_FEATURE (SSSE3),
+			      __strcat_ssse3)
+	      IFUNC_IMPL_ADD (array, i, strcat, HAS_CPU_FEATURE (SSE2),
+			      __strcat_sse2)
 	      IFUNC_IMPL_ADD (array, i, strcat, 1, __strcat_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strchr.S.  */
   IFUNC_IMPL (i, name, strchr,
-	      IFUNC_IMPL_ADD (array, i, strchr, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, strchr, HAS_CPU_FEATURE (SSE2),
 			      __strchr_sse2_bsf)
-	      IFUNC_IMPL_ADD (array, i, strchr, HAS_SSE2, __strchr_sse2)
+	      IFUNC_IMPL_ADD (array, i, strchr, HAS_CPU_FEATURE (SSE2),
+			      __strchr_sse2)
 	      IFUNC_IMPL_ADD (array, i, strchr, 1, __strchr_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strcmp.S.  */
   IFUNC_IMPL (i, name, strcmp,
-	      IFUNC_IMPL_ADD (array, i, strcmp, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, strcmp, HAS_CPU_FEATURE (SSE4_2),
 			      __strcmp_sse4_2)
-	      IFUNC_IMPL_ADD (array, i, strcmp, HAS_SSSE3, __strcmp_ssse3)
+	      IFUNC_IMPL_ADD (array, i, strcmp, HAS_CPU_FEATURE (SSSE3),
+			      __strcmp_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strcmp, 1, __strcmp_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strcpy.S.  */
   IFUNC_IMPL (i, name, strcpy,
-	      IFUNC_IMPL_ADD (array, i, strcpy, HAS_SSSE3, __strcpy_ssse3)
-	      IFUNC_IMPL_ADD (array, i, strcpy, HAS_SSE2, __strcpy_sse2)
+	      IFUNC_IMPL_ADD (array, i, strcpy, HAS_CPU_FEATURE (SSSE3),
+			      __strcpy_ssse3)
+	      IFUNC_IMPL_ADD (array, i, strcpy, HAS_CPU_FEATURE (SSE2),
+			      __strcpy_sse2)
 	      IFUNC_IMPL_ADD (array, i, strcpy, 1, __strcpy_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strcspn.S.  */
   IFUNC_IMPL (i, name, strcspn,
-	      IFUNC_IMPL_ADD (array, i, strcspn, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, strcspn, HAS_CPU_FEATURE (SSE4_2),
 			      __strcspn_sse42)
 	      IFUNC_IMPL_ADD (array, i, strcspn, 1, __strcspn_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strncase.S.  */
   IFUNC_IMPL (i, name, strncasecmp,
-	      IFUNC_IMPL_ADD (array, i, strncasecmp, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, strncasecmp,
+			      HAS_CPU_FEATURE (SSE4_2),
 			      __strncasecmp_sse4_2)
-	      IFUNC_IMPL_ADD (array, i, strncasecmp, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, strncasecmp,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __strncasecmp_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strncasecmp, 1,
 			      __strncasecmp_ia32))
@@ -191,136 +218,156 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
   /* Support sysdeps/i386/i686/multiarch/strncase_l.S.  */
   IFUNC_IMPL (i, name, strncasecmp_l,
 	      IFUNC_IMPL_ADD (array, i, strncasecmp_l,
-			      HAS_SSE4_2, __strncasecmp_l_sse4_2)
+			      HAS_CPU_FEATURE (SSE4_2),
+			      __strncasecmp_l_sse4_2)
 	      IFUNC_IMPL_ADD (array, i, strncasecmp_l,
-			      HAS_SSSE3, __strncasecmp_l_ssse3)
+			      HAS_CPU_FEATURE (SSSE3),
+			      __strncasecmp_l_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strncasecmp_l, 1,
 			      __strncasecmp_l_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strncat.S.  */
   IFUNC_IMPL (i, name, strncat,
-	      IFUNC_IMPL_ADD (array, i, strncat, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, strncat, HAS_CPU_FEATURE (SSSE3),
 			      __strncat_ssse3)
-	      IFUNC_IMPL_ADD (array, i, strncat, HAS_SSE2, __strncat_sse2)
+	      IFUNC_IMPL_ADD (array, i, strncat, HAS_CPU_FEATURE (SSE2),
+			      __strncat_sse2)
 	      IFUNC_IMPL_ADD (array, i, strncat, 1, __strncat_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strncpy.S.  */
   IFUNC_IMPL (i, name, strncpy,
-	      IFUNC_IMPL_ADD (array, i, strncpy, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, strncpy, HAS_CPU_FEATURE (SSSE3),
 			      __strncpy_ssse3)
-	      IFUNC_IMPL_ADD (array, i, strncpy, HAS_SSE2, __strncpy_sse2)
+	      IFUNC_IMPL_ADD (array, i, strncpy, HAS_CPU_FEATURE (SSE2),
+			      __strncpy_sse2)
 	      IFUNC_IMPL_ADD (array, i, strncpy, 1, __strncpy_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strnlen.S.  */
   IFUNC_IMPL (i, name, strnlen,
-	      IFUNC_IMPL_ADD (array, i, strnlen, HAS_SSE2, __strnlen_sse2)
+	      IFUNC_IMPL_ADD (array, i, strnlen, HAS_CPU_FEATURE (SSE2),
+			      __strnlen_sse2)
 	      IFUNC_IMPL_ADD (array, i, strnlen, 1, __strnlen_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strpbrk.S.  */
   IFUNC_IMPL (i, name, strpbrk,
-	      IFUNC_IMPL_ADD (array, i, strpbrk, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, strpbrk, HAS_CPU_FEATURE (SSE4_2),
 			      __strpbrk_sse42)
 	      IFUNC_IMPL_ADD (array, i, strpbrk, 1, __strpbrk_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strrchr.S.  */
   IFUNC_IMPL (i, name, strrchr,
-	      IFUNC_IMPL_ADD (array, i, strrchr, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, strrchr, HAS_CPU_FEATURE (SSE2),
 			      __strrchr_sse2_bsf)
-	      IFUNC_IMPL_ADD (array, i, strrchr, HAS_SSE2, __strrchr_sse2)
+	      IFUNC_IMPL_ADD (array, i, strrchr, HAS_CPU_FEATURE (SSE2),
+			      __strrchr_sse2)
 	      IFUNC_IMPL_ADD (array, i, strrchr, 1, __strrchr_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strspn.S.  */
   IFUNC_IMPL (i, name, strspn,
-	      IFUNC_IMPL_ADD (array, i, strspn, HAS_SSE4_2, __strspn_sse42)
+	      IFUNC_IMPL_ADD (array, i, strspn, HAS_CPU_FEATURE (SSE4_2),
+			      __strspn_sse42)
 	      IFUNC_IMPL_ADD (array, i, strspn, 1, __strspn_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/wcschr.S.  */
   IFUNC_IMPL (i, name, wcschr,
-	      IFUNC_IMPL_ADD (array, i, wcschr, HAS_SSE2, __wcschr_sse2)
+	      IFUNC_IMPL_ADD (array, i, wcschr, HAS_CPU_FEATURE (SSE2),
+			      __wcschr_sse2)
 	      IFUNC_IMPL_ADD (array, i, wcschr, 1, __wcschr_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/wcscmp.S.  */
   IFUNC_IMPL (i, name, wcscmp,
-	      IFUNC_IMPL_ADD (array, i, wcscmp, HAS_SSE2, __wcscmp_sse2)
+	      IFUNC_IMPL_ADD (array, i, wcscmp, HAS_CPU_FEATURE (SSE2),
+			      __wcscmp_sse2)
 	      IFUNC_IMPL_ADD (array, i, wcscmp, 1, __wcscmp_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/wcscpy.S.  */
   IFUNC_IMPL (i, name, wcscpy,
-	      IFUNC_IMPL_ADD (array, i, wcscpy, HAS_SSSE3, __wcscpy_ssse3)
+	      IFUNC_IMPL_ADD (array, i, wcscpy, HAS_CPU_FEATURE (SSSE3),
+			      __wcscpy_ssse3)
 	      IFUNC_IMPL_ADD (array, i, wcscpy, 1, __wcscpy_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/wcslen.S.  */
   IFUNC_IMPL (i, name, wcslen,
-	      IFUNC_IMPL_ADD (array, i, wcslen, HAS_SSE2, __wcslen_sse2)
+	      IFUNC_IMPL_ADD (array, i, wcslen, HAS_CPU_FEATURE (SSE2),
+			      __wcslen_sse2)
 	      IFUNC_IMPL_ADD (array, i, wcslen, 1, __wcslen_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/wcsrchr.S.  */
   IFUNC_IMPL (i, name, wcsrchr,
-	      IFUNC_IMPL_ADD (array, i, wcsrchr, HAS_SSE2, __wcsrchr_sse2)
+	      IFUNC_IMPL_ADD (array, i, wcsrchr, HAS_CPU_FEATURE (SSE2),
+			      __wcsrchr_sse2)
 	      IFUNC_IMPL_ADD (array, i, wcsrchr, 1, __wcsrchr_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/wmemcmp.S.  */
   IFUNC_IMPL (i, name, wmemcmp,
-	      IFUNC_IMPL_ADD (array, i, wmemcmp, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, wmemcmp, HAS_CPU_FEATURE (SSE4_2),
 			      __wmemcmp_sse4_2)
-	      IFUNC_IMPL_ADD (array, i, wmemcmp, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, wmemcmp, HAS_CPU_FEATURE (SSSE3),
 			      __wmemcmp_ssse3)
 	      IFUNC_IMPL_ADD (array, i, wmemcmp, 1, __wmemcmp_ia32))
 
 #ifdef SHARED
   /* Support sysdeps/i386/i686/multiarch/memcpy_chk.S.  */
   IFUNC_IMPL (i, name, __memcpy_chk,
-	      IFUNC_IMPL_ADD (array, i, __memcpy_chk, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, __memcpy_chk,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __memcpy_chk_ssse3_rep)
-	      IFUNC_IMPL_ADD (array, i, __memcpy_chk, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, __memcpy_chk,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __memcpy_chk_ssse3)
-	      IFUNC_IMPL_ADD (array, i, __memcpy_chk, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, __memcpy_chk,
+			      HAS_CPU_FEATURE (SSE2),
 			      __memcpy_chk_sse2_unaligned)
 	      IFUNC_IMPL_ADD (array, i, __memcpy_chk, 1,
 			      __memcpy_chk_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/memcpy.S.  */
   IFUNC_IMPL (i, name, memcpy,
-	      IFUNC_IMPL_ADD (array, i, memcpy, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, memcpy, HAS_CPU_FEATURE (SSSE3),
 			      __memcpy_ssse3_rep)
-	      IFUNC_IMPL_ADD (array, i, memcpy, HAS_SSSE3, __memcpy_ssse3)
-	      IFUNC_IMPL_ADD (array, i, memcpy, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, memcpy, HAS_CPU_FEATURE (SSSE3),
+			      __memcpy_ssse3)
+	      IFUNC_IMPL_ADD (array, i, memcpy, HAS_CPU_FEATURE (SSE2),
 			      __memcpy_sse2_unaligned)
 	      IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/mempcpy_chk.S.  */
   IFUNC_IMPL (i, name, __mempcpy_chk,
-	      IFUNC_IMPL_ADD (array, i, __mempcpy_chk, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, __mempcpy_chk,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __mempcpy_chk_ssse3_rep)
-	      IFUNC_IMPL_ADD (array, i, __mempcpy_chk, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, __mempcpy_chk,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __mempcpy_chk_ssse3)
-	      IFUNC_IMPL_ADD (array, i, __mempcpy_chk, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, __mempcpy_chk,
+			      HAS_CPU_FEATURE (SSE2),
 			      __mempcpy_chk_sse2_unaligned)
 	      IFUNC_IMPL_ADD (array, i, __mempcpy_chk, 1,
 			      __mempcpy_chk_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/mempcpy.S.  */
   IFUNC_IMPL (i, name, mempcpy,
-	      IFUNC_IMPL_ADD (array, i, mempcpy, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, mempcpy, HAS_CPU_FEATURE (SSSE3),
 			      __mempcpy_ssse3_rep)
-	      IFUNC_IMPL_ADD (array, i, mempcpy, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, mempcpy, HAS_CPU_FEATURE (SSSE3),
 			      __mempcpy_ssse3)
-	      IFUNC_IMPL_ADD (array, i, mempcpy, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, mempcpy, HAS_CPU_FEATURE (SSE2),
 			      __mempcpy_sse2_unaligned)
 	      IFUNC_IMPL_ADD (array, i, mempcpy, 1, __mempcpy_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strlen.S.  */
   IFUNC_IMPL (i, name, strlen,
-	      IFUNC_IMPL_ADD (array, i, strlen, HAS_SSE2,
+	      IFUNC_IMPL_ADD (array, i, strlen, HAS_CPU_FEATURE (SSE2),
 			      __strlen_sse2_bsf)
-	      IFUNC_IMPL_ADD (array, i, strlen, HAS_SSE2, __strlen_sse2)
+	      IFUNC_IMPL_ADD (array, i, strlen, HAS_CPU_FEATURE (SSE2),
+			      __strlen_sse2)
 	      IFUNC_IMPL_ADD (array, i, strlen, 1, __strlen_ia32))
 
   /* Support sysdeps/i386/i686/multiarch/strncmp.S.  */
   IFUNC_IMPL (i, name, strncmp,
-	      IFUNC_IMPL_ADD (array, i, strncmp, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, strncmp, HAS_CPU_FEATURE (SSE4_2),
 			      __strncmp_sse4_2)
-	      IFUNC_IMPL_ADD (array, i, strncmp, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, strncmp, HAS_CPU_FEATURE (SSSE3),
 			      __strncmp_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strncmp, 1, __strncmp_ia32))
 #endif
diff --git a/sysdeps/i386/i686/multiarch/memchr.S b/sysdeps/i386/i686/multiarch/memchr.S
index 02994d0..65e6b96 100644
--- a/sysdeps/i386/i686/multiarch/memchr.S
+++ b/sysdeps/i386/i686/multiarch/memchr.S
@@ -22,46 +22,22 @@
 #include <init-arch.h>
 
 #if IS_IN (libc)
-# define CFI_POP(REG) \
-	cfi_adjust_cfa_offset (-4); \
-	cfi_restore (REG)
-
-# define CFI_PUSH(REG) \
-	cfi_adjust_cfa_offset (4); \
-	cfi_rel_offset (REG, 0)
-
 	.text
 ENTRY(__memchr)
 	.type	__memchr, @gnu_indirect_function
-	pushl	%ebx
-	CFI_PUSH (%ebx)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-
-1:	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	testl	$bit_Slow_BSF, FEATURE_OFFSET+index_Slow_BSF+__cpu_features@GOTOFF(%ebx)
+	HAS_ARCH_FEATURE (Slow_BSF)
 	jz	3f
 
-	leal	__memchr_sse2@GOTOFF(%ebx), %eax
-	popl	%ebx
-	CFI_POP	(%ebx)
+	LOAD_FUNC_GOT_EAX ( __memchr_sse2)
 	ret
 
-	CFI_PUSH (%ebx)
-
-2:	leal	__memchr_ia32@GOTOFF(%ebx), %eax
-	popl	%ebx
-	CFI_POP	(%ebx)
+2:	LOAD_FUNC_GOT_EAX (__memchr_ia32)
 	ret
 
-	CFI_PUSH (%ebx)
-
-3:	leal	__memchr_sse2_bsf@GOTOFF(%ebx), %eax
-	popl	%ebx
-	CFI_POP	(%ebx)
+3:	LOAD_FUNC_GOT_EAX (__memchr_sse2_bsf)
 	ret
 END(__memchr)
 
diff --git a/sysdeps/i386/i686/multiarch/memcmp.S b/sysdeps/i386/i686/multiarch/memcmp.S
index 6b607eb..d4d7d2e 100644
--- a/sysdeps/i386/i686/multiarch/memcmp.S
+++ b/sysdeps/i386/i686/multiarch/memcmp.S
@@ -23,46 +23,19 @@
 
 /* Define multiple versions only for the definition in libc. */
 #if IS_IN (libc)
-# ifdef SHARED
-	.text
-ENTRY(memcmp)
-	.type	memcmp, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__memcmp_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx)
-	jz	2f
-	leal	__memcmp_ssse3@GOTOFF(%ebx), %eax
-	testl	$bit_SSE4_2, CPUID_OFFSET+index_SSE4_2+__cpu_features@GOTOFF(%ebx)
-	jz	2f
-	leal	__memcmp_sse4_2@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
-END(memcmp)
-# else
 	.text
 ENTRY(memcmp)
 	.type	memcmp, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__memcmp_ia32, %eax
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__memcmp_ia32)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
-	leal	__memcmp_ssse3, %eax
-	testl	$bit_SSE4_2, CPUID_OFFSET+index_SSE4_2+__cpu_features
+	LOAD_FUNC_GOT_EAX (__memcmp_ssse3)
+	HAS_CPU_FEATURE (SSE4_2)
 	jz	2f
-	leal	__memcmp_sse4_2, %eax
+	LOAD_FUNC_GOT_EAX (__memcmp_sse4_2)
 2:	ret
 END(memcmp)
-# endif
 
 # undef ENTRY
 # define ENTRY(name) \
diff --git a/sysdeps/i386/i686/multiarch/memcpy.S b/sysdeps/i386/i686/multiarch/memcpy.S
index c6d20bd..9a4d183 100644
--- a/sysdeps/i386/i686/multiarch/memcpy.S
+++ b/sysdeps/i386/i686/multiarch/memcpy.S
@@ -28,29 +28,20 @@
 	.text
 ENTRY(memcpy)
 	.type	memcpy, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__memcpy_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__memcpy_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__memcpy_sse2_unaligned@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Unaligned_Load, FEATURE_OFFSET+index_Fast_Unaligned_Load+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__memcpy_sse2_unaligned)
+	HAS_ARCH_FEATURE (Fast_Unaligned_Load)
 	jnz	2f
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
-	leal	__memcpy_ssse3@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__memcpy_ssse3)
+	HAS_CPU_FEATURE (Fast_Rep_String)
 	jz	2f
-	leal	__memcpy_ssse3_rep@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__memcpy_ssse3_rep)
+2:	ret
 END(memcpy)
 
 # undef ENTRY
diff --git a/sysdeps/i386/i686/multiarch/memcpy_chk.S b/sysdeps/i386/i686/multiarch/memcpy_chk.S
index 9399587..3bbd921 100644
--- a/sysdeps/i386/i686/multiarch/memcpy_chk.S
+++ b/sysdeps/i386/i686/multiarch/memcpy_chk.S
@@ -29,29 +29,20 @@
 	.text
 ENTRY(__memcpy_chk)
 	.type	__memcpy_chk, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__memcpy_chk_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__memcpy_chk_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__memcpy_chk_sse2_unaligned@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Unaligned_Load, FEATURE_OFFSET+index_Fast_Unaligned_Load+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__memcpy_chk_sse2_unaligned)
+	HAS_ARCH_FEATURE (Fast_Unaligned_Load)
 	jnz	2f
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
-	leal	__memcpy_chk_ssse3@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__memcpy_chk_ssse3)
+	HAS_CPU_FEATURE (Fast_Rep_String)
 	jz	2f
-	leal	__memcpy_chk_ssse3_rep@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__memcpy_chk_ssse3_rep)
+2:	ret
 END(__memcpy_chk)
 # else
 #  include "../memcpy_chk.S"
diff --git a/sysdeps/i386/i686/multiarch/memmove.S b/sysdeps/i386/i686/multiarch/memmove.S
index 7033463..2bf427f 100644
--- a/sysdeps/i386/i686/multiarch/memmove.S
+++ b/sysdeps/i386/i686/multiarch/memmove.S
@@ -23,37 +23,28 @@
 
 /* Define multiple versions only for the definition in lib.  */
 #if IS_IN (libc)
-# ifdef SHARED
 	.text
 ENTRY(memmove)
 	.type	memmove, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__memmove_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__memmove_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__memmove_sse2_unaligned@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Unaligned_Load, FEATURE_OFFSET+index_Fast_Unaligned_Load+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__memmove_sse2_unaligned)
+	HAS_ARCH_FEATURE (Fast_Unaligned_Load)
 	jnz	2f
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
-	leal	__memmove_ssse3@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__memmove_ssse3)
+	HAS_ARCH_FEATURE (Fast_Rep_String)
 	jz	2f
-	leal	__memmove_ssse3_rep@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__memmove_ssse3_rep)
+2:	ret
 END(memmove)
 
-# undef ENTRY
-# define ENTRY(name) \
+# ifdef SHARED
+#  undef ENTRY
+#  define ENTRY(name) \
 	.type __memmove_ia32, @function; \
 	.p2align 4; \
 	.globl __memmove_ia32; \
@@ -61,29 +52,8 @@ END(memmove)
 	__memmove_ia32: cfi_startproc; \
 	CALL_MCOUNT
 # else
-	.text
-ENTRY(memmove)
-	.type	memmove, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__memmove_ia32, %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features
-	jz	2f
-	leal	__memmove_sse2_unaligned, %eax
-	testl	$bit_Fast_Unaligned_Load, FEATURE_OFFSET+index_Fast_Unaligned_Load+__cpu_features
-	jnz	2f
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features
-	jz	2f
-	leal	__memmove_ssse3, %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features
-	jz	2f
-	leal	__memmove_ssse3_rep, %eax
-2:	ret
-END(memmove)
-
-# undef ENTRY
-# define ENTRY(name) \
+#  undef ENTRY
+#  define ENTRY(name) \
 	.type __memmove_ia32, @function; \
 	.globl __memmove_ia32; \
 	.p2align 4; \
diff --git a/sysdeps/i386/i686/multiarch/memmove_chk.S b/sysdeps/i386/i686/multiarch/memmove_chk.S
index 2b576d4..b17f6ed 100644
--- a/sysdeps/i386/i686/multiarch/memmove_chk.S
+++ b/sysdeps/i386/i686/multiarch/memmove_chk.S
@@ -23,56 +23,26 @@
 
 /* Define multiple versions only for the definition in lib.  */
 #if IS_IN (libc)
-# ifdef SHARED
 	.text
 ENTRY(__memmove_chk)
 	.type	__memmove_chk, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__memmove_chk_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__memmove_chk_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__memmove_chk_sse2_unaligned@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Unaligned_Load, FEATURE_OFFSET+index_Fast_Unaligned_Load+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__memmove_chk_sse2_unaligned)
+	HAS_ARCH_FEATURE (Fast_Unaligned_Load)
 	jnz	2f
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
-	leal	__memmove_chk_ssse3@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__memmove_chk_ssse3)
+	HAS_CPU_FEATURE (Fast_Rep_String)
 	jz	2f
-	leal	__memmove_chk_ssse3_rep@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
-END(__memmove_chk)
-# else
-	.text
-ENTRY(__memmove_chk)
-	.type	__memmove_chk, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__memmove_chk_ia32, %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features
-	jz	2f
-	leal	__memmove_chk_sse2_unaligned, %eax
-	testl	$bit_Fast_Unaligned_Load, FEATURE_OFFSET+index_Fast_Unaligned_Load+__cpu_features
-	jnz	2f
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features
-	jz	2f
-	leal	__memmove_chk_ssse3, %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features
-	jz	2f
-	leal	__memmove_chk_ssse3_rep, %eax
+	LOAD_FUNC_GOT_EAX (__memmove_chk_ssse3_rep)
 2:	ret
 END(__memmove_chk)
 
+# ifndef SHARED
 	.type __memmove_chk_sse2_unaligned, @function
 	.p2align 4;
 __memmove_chk_sse2_unaligned:
diff --git a/sysdeps/i386/i686/multiarch/mempcpy.S b/sysdeps/i386/i686/multiarch/mempcpy.S
index 39c934e..021558a 100644
--- a/sysdeps/i386/i686/multiarch/mempcpy.S
+++ b/sysdeps/i386/i686/multiarch/mempcpy.S
@@ -28,29 +28,20 @@
 	.text
 ENTRY(__mempcpy)
 	.type	__mempcpy, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__mempcpy_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__mempcpy_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__mempcpy_sse2_unaligned@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Unaligned_Load, FEATURE_OFFSET+index_Fast_Unaligned_Load+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__mempcpy_sse2_unaligned)
+	HAS_ARCH_FEATURE (Fast_Unaligned_Load)
 	jnz	2f
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
-	leal	__mempcpy_ssse3@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__mempcpy_ssse3)
+	HAS_CPU_FEATURE (Fast_Rep_String)
 	jz	2f
-	leal	__mempcpy_ssse3_rep@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__mempcpy_ssse3_rep)
+2:	ret
 END(__mempcpy)
 
 # undef ENTRY
diff --git a/sysdeps/i386/i686/multiarch/mempcpy_chk.S b/sysdeps/i386/i686/multiarch/mempcpy_chk.S
index b6fa202..1bea6ea 100644
--- a/sysdeps/i386/i686/multiarch/mempcpy_chk.S
+++ b/sysdeps/i386/i686/multiarch/mempcpy_chk.S
@@ -29,29 +29,20 @@
 	.text
 ENTRY(__mempcpy_chk)
 	.type	__mempcpy_chk, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__mempcpy_chk_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__mempcpy_chk_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__mempcpy_chk_sse2_unaligned@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Unaligned_Load, FEATURE_OFFSET+index_Fast_Unaligned_Load+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__mempcpy_chk_sse2_unaligned)
+	HAS_ARCH_FEATURE (Fast_Unaligned_Load)
 	jnz	2f
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
-	leal	__mempcpy_chk_ssse3@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__mempcpy_chk_ssse3)
+	HAS_CPU_FEATURE (Fast_Rep_String)
 	jz	2f
-	leal	__mempcpy_chk_ssse3_rep@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__mempcpy_chk_ssse3_rep)
+2:	ret
 END(__mempcpy_chk)
 # else
 #  include "../mempcpy_chk.S"
diff --git a/sysdeps/i386/i686/multiarch/memrchr.S b/sysdeps/i386/i686/multiarch/memrchr.S
index 321e0b7..32fb1a6 100644
--- a/sysdeps/i386/i686/multiarch/memrchr.S
+++ b/sysdeps/i386/i686/multiarch/memrchr.S
@@ -22,46 +22,22 @@
 #include <init-arch.h>
 
 #if IS_IN (libc)
-# define CFI_POP(REG) \
-	cfi_adjust_cfa_offset (-4); \
-	cfi_restore (REG)
-
-# define CFI_PUSH(REG) \
-	cfi_adjust_cfa_offset (4); \
-	cfi_rel_offset (REG, 0)
-
 	.text
 ENTRY(__memrchr)
 	.type	__memrchr, @gnu_indirect_function
-	pushl	%ebx
-	CFI_PUSH (%ebx)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-
-1:	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	testl	$bit_Slow_BSF, FEATURE_OFFSET+index_Slow_BSF+__cpu_features@GOTOFF(%ebx)
+	HAS_ARCH_FEATURE (Slow_BSF)
 	jz	3f
 
-	leal	__memrchr_sse2@GOTOFF(%ebx), %eax
-	popl	%ebx
-	CFI_POP	(%ebx)
+	LOAD_FUNC_GOT_EAX (__memrchr_sse2)
 	ret
 
-	CFI_PUSH (%ebx)
-
-2:	leal	__memrchr_ia32@GOTOFF(%ebx), %eax
-	popl	%ebx
-	CFI_POP	(%ebx)
+2:	LOAD_FUNC_GOT_EAX (__memrchr_ia32)
 	ret
 
-	CFI_PUSH (%ebx)
-
-3:	leal	__memrchr_sse2_bsf@GOTOFF(%ebx), %eax
-	popl	%ebx
-	CFI_POP	(%ebx)
+3:	LOAD_FUNC_GOT_EAX (__memrchr_sse2_bsf)
 	ret
 END(__memrchr)
 
diff --git a/sysdeps/i386/i686/multiarch/memset.S b/sysdeps/i386/i686/multiarch/memset.S
index 6d7d919..8015d57 100644
--- a/sysdeps/i386/i686/multiarch/memset.S
+++ b/sysdeps/i386/i686/multiarch/memset.S
@@ -23,46 +23,19 @@
 
 /* Define multiple versions only for the definition in lib.  */
 #if IS_IN (libc)
-# ifdef SHARED
-	.text
-ENTRY(memset)
-	.type	memset, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__memset_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
-	jz	2f
-	leal	__memset_sse2@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features@GOTOFF(%ebx)
-	jz	2f
-	leal	__memset_sse2_rep@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
-END(memset)
-# else
 	.text
 ENTRY(memset)
 	.type	memset, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__memset_ia32, %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__memset_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__memset_sse2, %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features
+	LOAD_FUNC_GOT_EAX (__memset_sse2)
+	HAS_CPU_FEATURE (Fast_Rep_String)
 	jz	2f
-	leal	__memset_sse2_rep, %eax
+	LOAD_FUNC_GOT_EAX (__memset_sse2_rep)
 2:	ret
 END(memset)
-# endif
 
 # undef ENTRY
 # define ENTRY(name) \
diff --git a/sysdeps/i386/i686/multiarch/memset_chk.S b/sysdeps/i386/i686/multiarch/memset_chk.S
index a770c0d..7be45e7 100644
--- a/sysdeps/i386/i686/multiarch/memset_chk.S
+++ b/sysdeps/i386/i686/multiarch/memset_chk.S
@@ -23,50 +23,26 @@
 
 /* Define multiple versions only for the definition in lib.  */
 #if IS_IN (libc)
-# ifdef SHARED
 	.text
 ENTRY(__memset_chk)
 	.type	__memset_chk, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__memset_chk_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__memset_chk_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__memset_chk_sse2@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__memset_chk_sse2)
+	HAS_CPU_FEATURE (Fast_Rep_String)
 	jz	2f
-	leal	__memset_chk_sse2_rep@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__memset_chk_sse2_rep)
+2:	ret
 END(__memset_chk)
 
+# ifdef SHARED
 strong_alias (__memset_chk, __memset_zero_constant_len_parameter)
 	.section .gnu.warning.__memset_zero_constant_len_parameter
 	.string "memset used with constant zero length parameter; this could be due to transposed parameters"
 # else
 	.text
-ENTRY(__memset_chk)
-	.type	__memset_chk, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__memset_chk_ia32, %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features
-	jz	2f
-	leal	__memset_chk_sse2, %eax
-	testl	$bit_Fast_Rep_String, FEATURE_OFFSET+index_Fast_Rep_String+__cpu_features
-	jz	2f
-	leal	__memset_chk_sse2_rep, %eax
-2:	ret
-END(__memset_chk)
-
 	.type __memset_chk_sse2, @function
 	.p2align 4;
 __memset_chk_sse2:
diff --git a/sysdeps/i386/i686/multiarch/rawmemchr.S b/sysdeps/i386/i686/multiarch/rawmemchr.S
index c2b7ee6..2cfbe1b 100644
--- a/sysdeps/i386/i686/multiarch/rawmemchr.S
+++ b/sysdeps/i386/i686/multiarch/rawmemchr.S
@@ -22,46 +22,22 @@
 #include <init-arch.h>
 
 #if IS_IN (libc)
-# define CFI_POP(REG) \
-	cfi_adjust_cfa_offset (-4); \
-	cfi_restore (REG)
-
-# define CFI_PUSH(REG) \
-	cfi_adjust_cfa_offset (4); \
-	cfi_rel_offset (REG, 0)
-
 	.text
 ENTRY(__rawmemchr)
 	.type	__rawmemchr, @gnu_indirect_function
-	pushl	%ebx
-	CFI_PUSH (%ebx)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-
-1:	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	testl	$bit_Slow_BSF, FEATURE_OFFSET+index_Slow_BSF+__cpu_features@GOTOFF(%ebx)
+	HAS_ARCH_FEATURE (Slow_BSF)
 	jz	3f
 
-	leal	__rawmemchr_sse2@GOTOFF(%ebx), %eax
-	popl	%ebx
-	CFI_POP	(%ebx)
+	LOAD_FUNC_GOT_EAX (__rawmemchr_sse2)
 	ret
 
-	CFI_PUSH (%ebx)
-
-2:	leal	__rawmemchr_ia32@GOTOFF(%ebx), %eax
-	popl	%ebx
-	CFI_POP	(%ebx)
+2:	LOAD_FUNC_GOT_EAX (__rawmemchr_ia32)
 	ret
 
-	CFI_PUSH (%ebx)
-
-3:	leal	__rawmemchr_sse2_bsf@GOTOFF(%ebx), %eax
-	popl	%ebx
-	CFI_POP	(%ebx)
+3:	LOAD_FUNC_GOT_EAX (__rawmemchr_sse2_bsf)
 	ret
 END(__rawmemchr)
 
diff --git a/sysdeps/i386/i686/multiarch/s_fma.c b/sysdeps/i386/i686/multiarch/s_fma.c
index dd70f78..cf2ede5 100644
--- a/sysdeps/i386/i686/multiarch/s_fma.c
+++ b/sysdeps/i386/i686/multiarch/s_fma.c
@@ -26,7 +26,8 @@
 extern double __fma_ia32 (double x, double y, double z) attribute_hidden;
 extern double __fma_fma (double x, double y, double z) attribute_hidden;
 
-libm_ifunc (__fma, HAS_FMA ? __fma_fma : __fma_ia32);
+libm_ifunc (__fma,
+	    HAS_ARCH_FEATURE (FMA_Usable) ? __fma_fma : __fma_ia32);
 weak_alias (__fma, fma)
 
 # define __fma __fma_ia32
diff --git a/sysdeps/i386/i686/multiarch/s_fmaf.c b/sysdeps/i386/i686/multiarch/s_fmaf.c
index 9ffa4f1..526cdf1 100644
--- a/sysdeps/i386/i686/multiarch/s_fmaf.c
+++ b/sysdeps/i386/i686/multiarch/s_fmaf.c
@@ -26,7 +26,8 @@
 extern float __fmaf_ia32 (float x, float y, float z) attribute_hidden;
 extern float __fmaf_fma (float x, float y, float z) attribute_hidden;
 
-libm_ifunc (__fmaf, HAS_FMA ? __fmaf_fma : __fmaf_ia32);
+libm_ifunc (__fmaf,
+	    HAS_ARCH_FEATURE (FMA_Usable) ? __fmaf_fma : __fmaf_ia32);
 weak_alias (__fmaf, fmaf)
 
 # define __fmaf __fmaf_ia32
diff --git a/sysdeps/i386/i686/multiarch/strcasecmp.S b/sysdeps/i386/i686/multiarch/strcasecmp.S
index c30ac3a..e4b3cf5 100644
--- a/sysdeps/i386/i686/multiarch/strcasecmp.S
+++ b/sysdeps/i386/i686/multiarch/strcasecmp.S
@@ -20,49 +20,20 @@
 #include <sysdep.h>
 #include <init-arch.h>
 
-#ifdef SHARED
 	.text
 ENTRY(__strcasecmp)
 	.type	__strcasecmp, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__strcasecmp_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__strcasecmp_ia32)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
-	leal	__strcasecmp_ssse3@GOTOFF(%ebx), %eax
-	testl	$bit_SSE4_2, CPUID_OFFSET+index_SSE4_2+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__strcasecmp_ssse3)
+	HAS_CPU_FEATURE (SSE4_2)
 	jz	2f
-	testl	$bit_Slow_SSE4_2, FEATURE_OFFSET+index_Slow_SSE4_2+__cpu_features@GOTOFF(%ebx)
+	HAS_ARCH_FEATURE (Slow_SSE4_2)
 	jnz	2f
-	leal	__strcasecmp_sse4_2@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
-END(__strcasecmp)
-#else
-	.text
-ENTRY(__strcasecmp)
-	.type	__strcasecmp, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__strcasecmp_ia32, %eax
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features
-	jz	2f
-	leal	__strcasecmp_ssse3, %eax
-	testl	$bit_SSE4_2, CPUID_OFFSET+index_SSE4_2+__cpu_features
-	jz	2f
-	testl	$bit_Slow_SSE4_2, FEATURE_OFFSET+index_Slow_SSE4_2+__cpu_features
-	jnz	2f
-	leal	__strcasecmp_sse4_2, %eax
+	LOAD_FUNC_GOT_EAX (__strcasecmp_sse4_2)
 2:	ret
 END(__strcasecmp)
-#endif
 
 weak_alias (__strcasecmp, strcasecmp)
diff --git a/sysdeps/i386/i686/multiarch/strcat.S b/sysdeps/i386/i686/multiarch/strcat.S
index 474f753..45d84cd 100644
--- a/sysdeps/i386/i686/multiarch/strcat.S
+++ b/sysdeps/i386/i686/multiarch/strcat.S
@@ -45,52 +45,22 @@
    need strncat before the initialization happened.  */
 #if IS_IN (libc)
 
-# ifdef SHARED
 	.text
 ENTRY(STRCAT)
 	.type	STRCAT, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	STRCAT_IA32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
-	jz	2f
-	leal	STRCAT_SSE2@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Unaligned_Load, FEATURE_OFFSET+index_Fast_Unaligned_Load+__cpu_features@GOTOFF(%ebx)
-	jnz	2f
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx)
-	jz	2f
-	leal	STRCAT_SSSE3@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
-END(STRCAT)
-# else
-
-ENTRY(STRCAT)
-	.type	STRCAT, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features
-	jne	1f
-	call	__init_cpu_features
-1:	leal	STRCAT_IA32, %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (STRCAT_IA32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	STRCAT_SSE2, %eax
-	testl	$bit_Fast_Unaligned_Load, FEATURE_OFFSET+index_Fast_Unaligned_Load+__cpu_features
+	LOAD_FUNC_GOT_EAX (STRCAT_SSE2)
+	HAS_ARCH_FEATURE (Fast_Unaligned_Load)
 	jnz	2f
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
-	leal	STRCAT_SSSE3, %eax
+	LOAD_FUNC_GOT_EAX (STRCAT_SSSE3)
 2:	ret
 END(STRCAT)
 
-# endif
-
 # undef ENTRY
 # define ENTRY(name) \
 	.type STRCAT_IA32, @function; \
diff --git a/sysdeps/i386/i686/multiarch/strchr.S b/sysdeps/i386/i686/multiarch/strchr.S
index 45624fd..6b46565 100644
--- a/sysdeps/i386/i686/multiarch/strchr.S
+++ b/sysdeps/i386/i686/multiarch/strchr.S
@@ -25,24 +25,15 @@
 	.text
 ENTRY(strchr)
 	.type	strchr, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__strchr_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__strchr_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__strchr_sse2_bsf@GOTOFF(%ebx), %eax
-	testl	$bit_Slow_BSF, FEATURE_OFFSET+index_Slow_BSF+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__strchr_sse2_bsf)
+	HAS_ARCH_FEATURE (Slow_BSF)
 	jz	2f
-	leal	__strchr_sse2@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4);
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__strchr_sse2)
+2:	ret
 END(strchr)
 
 # undef ENTRY
diff --git a/sysdeps/i386/i686/multiarch/strcmp.S b/sysdeps/i386/i686/multiarch/strcmp.S
index 9df4008..cad179d 100644
--- a/sysdeps/i386/i686/multiarch/strcmp.S
+++ b/sysdeps/i386/i686/multiarch/strcmp.S
@@ -51,50 +51,21 @@
    define multiple versions for strncmp in static library since we
    need strncmp before the initialization happened.  */
 #if (defined SHARED || !defined USE_AS_STRNCMP) && IS_IN (libc)
-# ifdef SHARED
 	.text
 ENTRY(STRCMP)
 	.type	STRCMP, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__STRCMP_IA32@GOTOFF(%ebx), %eax
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__STRCMP_IA32)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
-	leal	__STRCMP_SSSE3@GOTOFF(%ebx), %eax
-	testl	$bit_SSE4_2, CPUID_OFFSET+index_SSE4_2+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__STRCMP_SSSE3)
+	HAS_CPU_FEATURE (SSE4_2)
 	jz	2f
-	testl	$bit_Slow_SSE4_2, FEATURE_OFFSET+index_Slow_SSE4_2+__cpu_features@GOTOFF(%ebx)
+	HAS_ARCH_FEATURE (Slow_SSE4_2)
 	jnz	2f
-	leal	__STRCMP_SSE4_2@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
-END(STRCMP)
-# else
-	.text
-ENTRY(STRCMP)
-	.type	STRCMP, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__STRCMP_IA32, %eax
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features
-	jz	2f
-	leal	__STRCMP_SSSE3, %eax
-	testl	$bit_SSE4_2, CPUID_OFFSET+index_SSE4_2+__cpu_features
-	jz	2f
-	testl	$bit_Slow_SSE4_2, FEATURE_OFFSET+index_Slow_SSE4_2+__cpu_features
-	jnz	2f
-	leal	__STRCMP_SSE4_2, %eax
+	LOAD_FUNC_GOT_EAX (__STRCMP_SSE4_2)
 2:	ret
 END(STRCMP)
-# endif
 
 # undef ENTRY
 # define ENTRY(name) \
diff --git a/sysdeps/i386/i686/multiarch/strcpy.S b/sysdeps/i386/i686/multiarch/strcpy.S
index c279d46..e9db766 100644
--- a/sysdeps/i386/i686/multiarch/strcpy.S
+++ b/sysdeps/i386/i686/multiarch/strcpy.S
@@ -61,52 +61,22 @@
    need strncpy before the initialization happened.  */
 #if IS_IN (libc)
 
-# ifdef SHARED
 	.text
 ENTRY(STRCPY)
 	.type	STRCPY, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	STRCPY_IA32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (STRCPY_IA32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	STRCPY_SSE2@GOTOFF(%ebx), %eax
-	testl	$bit_Fast_Unaligned_Load, FEATURE_OFFSET+index_Fast_Unaligned_Load+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (STRCPY_SSE2)
+	HAS_ARCH_FEATURE (Fast_Unaligned_Load)
 	jnz	2f
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
-	leal	STRCPY_SSSE3@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
-END(STRCPY)
-# else
-
-ENTRY(STRCPY)
-	.type	STRCPY, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features
-	jne	1f
-	call	__init_cpu_features
-1:	leal	STRCPY_IA32, %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features
-	jz	2f
-	leal	STRCPY_SSE2, %eax
-	testl	$bit_Fast_Unaligned_Load, FEATURE_OFFSET+index_Fast_Unaligned_Load+__cpu_features
-	jnz	2f
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features
-	jz	2f
-	leal	STRCPY_SSSE3, %eax
+	LOAD_FUNC_GOT_EAX (STRCPY_SSSE3)
 2:	ret
 END(STRCPY)
 
-# endif
-
 # undef ENTRY
 # define ENTRY(name) \
 	.type STRCPY_IA32, @function; \
diff --git a/sysdeps/i386/i686/multiarch/strcspn.S b/sysdeps/i386/i686/multiarch/strcspn.S
index e6ea454..b669b97 100644
--- a/sysdeps/i386/i686/multiarch/strcspn.S
+++ b/sysdeps/i386/i686/multiarch/strcspn.S
@@ -42,40 +42,16 @@
    define multiple versions for strpbrk in static library since we
    need strpbrk before the initialization happened.  */
 #if (defined SHARED || !defined USE_AS_STRPBRK) && IS_IN (libc)
-# ifdef SHARED
 	.text
 ENTRY(STRCSPN)
 	.type	STRCSPN, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	STRCSPN_IA32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE4_2, CPUID_OFFSET+index_SSE4_2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (STRCSPN_IA32)
+	HAS_CPU_FEATURE (SSE4_2)
 	jz	2f
-	leal	STRCSPN_SSE42@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4);
-	cfi_restore (ebx)
-	ret
-END(STRCSPN)
-# else
-	.text
-ENTRY(STRCSPN)
-	.type	STRCSPN, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features
-	jne	1f
-	call	__init_cpu_features
-1:	leal	STRCSPN_IA32, %eax
-	testl	$bit_SSE4_2, CPUID_OFFSET+index_SSE4_2+__cpu_features
-	jz	2f
-	leal	STRCSPN_SSE42, %eax
+	LOAD_FUNC_GOT_EAX (STRCSPN_SSE42)
 2:	ret
 END(STRCSPN)
-# endif
 
 # undef ENTRY
 # define ENTRY(name) \
diff --git a/sysdeps/i386/i686/multiarch/strlen.S b/sysdeps/i386/i686/multiarch/strlen.S
index 2e6993b..613559c 100644
--- a/sysdeps/i386/i686/multiarch/strlen.S
+++ b/sysdeps/i386/i686/multiarch/strlen.S
@@ -28,24 +28,15 @@
 	.text
 ENTRY(strlen)
 	.type	strlen, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__strlen_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__strlen_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__strlen_sse2_bsf@GOTOFF(%ebx), %eax
-	testl	$bit_Slow_BSF, FEATURE_OFFSET+index_Slow_BSF+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__strlen_sse2_bsf)
+	HAS_ARCH_FEATURE (Slow_BSF)
 	jz	2f
-	leal	__strlen_sse2@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4);
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__strlen_sse2)
+2:	ret
 END(strlen)
 
 # undef ENTRY
diff --git a/sysdeps/i386/i686/multiarch/strncase.S b/sysdeps/i386/i686/multiarch/strncase.S
index c2cb03c..0cdbeff 100644
--- a/sysdeps/i386/i686/multiarch/strncase.S
+++ b/sysdeps/i386/i686/multiarch/strncase.S
@@ -20,49 +20,20 @@
 #include <sysdep.h>
 #include <init-arch.h>
 
-#ifdef SHARED
 	.text
 ENTRY(__strncasecmp)
 	.type	__strncasecmp, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__strncasecmp_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__strncasecmp_ia32)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
-	leal	__strncasecmp_ssse3@GOTOFF(%ebx), %eax
-	testl	$bit_SSE4_2, CPUID_OFFSET+index_SSE4_2+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__strncasecmp_ssse3)
+	HAS_CPU_FEATURE (SSE4_2)
 	jz	2f
-	testl	$bit_Slow_SSE4_2, FEATURE_OFFSET+index_Slow_SSE4_2+__cpu_features@GOTOFF(%ebx)
+	HAS_ARCH_FEATURE (Slow_SSE4_2)
 	jnz	2f
-	leal	__strncasecmp_sse4_2@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
-END(__strncasecmp)
-#else
-	.text
-ENTRY(__strncasecmp)
-	.type	__strncasecmp, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__strncasecmp_ia32, %eax
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features
-	jz	2f
-	leal	__strncasecmp_ssse3, %eax
-	testl	$bit_SSE4_2, CPUID_OFFSET+index_SSE4_2+__cpu_features
-	jz	2f
-	testl	$bit_Slow_SSE4_2, FEATURE_OFFSET+index_Slow_SSE4_2+__cpu_features
-	jnz	2f
-	leal	__strncasecmp_sse4_2, %eax
+	LOAD_FUNC_GOT_EAX (__strncasecmp_sse4_2)
 2:	ret
 END(__strncasecmp)
-#endif
 
 weak_alias (__strncasecmp, strncasecmp)
diff --git a/sysdeps/i386/i686/multiarch/strnlen.S b/sysdeps/i386/i686/multiarch/strnlen.S
index 56a5136..baf21fc 100644
--- a/sysdeps/i386/i686/multiarch/strnlen.S
+++ b/sysdeps/i386/i686/multiarch/strnlen.S
@@ -25,21 +25,12 @@
 	.text
 ENTRY(__strnlen)
 	.type	__strnlen, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__strnlen_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__strnlen_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__strnlen_sse2@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4);
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__strnlen_sse2)
+2:	ret
 END(__strnlen)
 
 weak_alias(__strnlen, strnlen)
diff --git a/sysdeps/i386/i686/multiarch/strrchr.S b/sysdeps/i386/i686/multiarch/strrchr.S
index 91074b4..6aa3321 100644
--- a/sysdeps/i386/i686/multiarch/strrchr.S
+++ b/sysdeps/i386/i686/multiarch/strrchr.S
@@ -25,24 +25,15 @@
 	.text
 ENTRY(strrchr)
 	.type	strrchr, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__strrchr_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__strrchr_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__strrchr_sse2_bsf@GOTOFF(%ebx), %eax
-	testl	$bit_Slow_BSF, FEATURE_OFFSET+index_Slow_BSF+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__strrchr_sse2_bsf)
+	HAS_ARCH_FEATURE (Slow_BSF)
 	jz	2f
-	leal	__strrchr_sse2@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4);
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__strrchr_sse2)
+2:	ret
 END(strrchr)
 
 # undef ENTRY
diff --git a/sysdeps/i386/i686/multiarch/strspn.S b/sysdeps/i386/i686/multiarch/strspn.S
index 9d353a2..4ba87be 100644
--- a/sysdeps/i386/i686/multiarch/strspn.S
+++ b/sysdeps/i386/i686/multiarch/strspn.S
@@ -27,40 +27,16 @@
 
 /* Define multiple versions only for the definition in libc.  */
 #if IS_IN (libc)
-# ifdef SHARED
 	.text
 ENTRY(strspn)
 	.type	strspn, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__strspn_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE4_2, CPUID_OFFSET+index_SSE4_2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__strspn_ia32)
+	HAS_CPU_FEATURE (SSE4_2)
 	jz	2f
-	leal	__strspn_sse42@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4);
-	cfi_restore (ebx)
-	ret
-END(strspn)
-# else
-	.text
-ENTRY(strspn)
-	.type	strspn, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__strspn_ia32, %eax
-	testl	$bit_SSE4_2, CPUID_OFFSET+index_SSE4_2+__cpu_features
-	jz	2f
-	leal	__strspn_sse42, %eax
+	LOAD_FUNC_GOT_EAX (__strspn_sse42)
 2:	ret
 END(strspn)
-# endif
 
 # undef ENTRY
 # define ENTRY(name) \
diff --git a/sysdeps/i386/i686/multiarch/wcschr.S b/sysdeps/i386/i686/multiarch/wcschr.S
index 603d7d7..5918b12 100644
--- a/sysdeps/i386/i686/multiarch/wcschr.S
+++ b/sysdeps/i386/i686/multiarch/wcschr.S
@@ -25,21 +25,12 @@
 	.text
 ENTRY(__wcschr)
 	.type	wcschr, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__wcschr_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__wcschr_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__wcschr_sse2@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4);
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__wcschr_sse2)
+2:	ret
 END(__wcschr)
 weak_alias (__wcschr, wcschr)
 #endif
diff --git a/sysdeps/i386/i686/multiarch/wcscmp.S b/sysdeps/i386/i686/multiarch/wcscmp.S
index 92c2c84..db9c05a 100644
--- a/sysdeps/i386/i686/multiarch/wcscmp.S
+++ b/sysdeps/i386/i686/multiarch/wcscmp.S
@@ -28,21 +28,12 @@
 	.text
 ENTRY(__wcscmp)
 	.type	__wcscmp, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__wcscmp_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__wcscmp_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__wcscmp_sse2@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4);
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__wcscmp_sse2)
+2:	ret
 END(__wcscmp)
 weak_alias (__wcscmp, wcscmp)
 #endif
diff --git a/sysdeps/i386/i686/multiarch/wcscpy.S b/sysdeps/i386/i686/multiarch/wcscpy.S
index f7253c7..5f14970 100644
--- a/sysdeps/i386/i686/multiarch/wcscpy.S
+++ b/sysdeps/i386/i686/multiarch/wcscpy.S
@@ -26,20 +26,11 @@
 	.text
 ENTRY(wcscpy)
 	.type	wcscpy, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__wcscpy_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__wcscpy_ia32)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
-	leal	__wcscpy_ssse3@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__wcscpy_ssse3)
+2:	ret
 END(wcscpy)
 #endif
diff --git a/sysdeps/i386/i686/multiarch/wcslen.S b/sysdeps/i386/i686/multiarch/wcslen.S
index 3926a50..7740404 100644
--- a/sysdeps/i386/i686/multiarch/wcslen.S
+++ b/sysdeps/i386/i686/multiarch/wcslen.S
@@ -25,21 +25,12 @@
 	.text
 ENTRY(__wcslen)
 	.type	__wcslen, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__wcslen_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__wcslen_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__wcslen_sse2@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4);
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__wcslen_sse2)
+2:	ret
 END(__wcslen)
 
 weak_alias(__wcslen, wcslen)
diff --git a/sysdeps/i386/i686/multiarch/wcsrchr.S b/sysdeps/i386/i686/multiarch/wcsrchr.S
index 5c96129..9ed6810 100644
--- a/sysdeps/i386/i686/multiarch/wcsrchr.S
+++ b/sysdeps/i386/i686/multiarch/wcsrchr.S
@@ -25,20 +25,11 @@
 	.text
 ENTRY(wcsrchr)
 	.type	wcsrchr, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__wcsrchr_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSE2, CPUID_OFFSET+index_SSE2+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__wcsrchr_ia32)
+	HAS_CPU_FEATURE (SSE2)
 	jz	2f
-	leal	__wcsrchr_sse2@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4);
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__wcsrchr_sse2)
+2:	ret
 END(wcsrchr)
 #endif
diff --git a/sysdeps/i386/i686/multiarch/wmemcmp.S b/sysdeps/i386/i686/multiarch/wmemcmp.S
index 6ca6053..6025942 100644
--- a/sysdeps/i386/i686/multiarch/wmemcmp.S
+++ b/sysdeps/i386/i686/multiarch/wmemcmp.S
@@ -27,23 +27,14 @@
 	.text
 ENTRY(wmemcmp)
 	.type	wmemcmp, @gnu_indirect_function
-	pushl	%ebx
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (ebx, 0)
-	LOAD_PIC_REG(bx)
-	cmpl	$0, KIND_OFFSET+__cpu_features@GOTOFF(%ebx)
-	jne	1f
-	call	__init_cpu_features
-1:	leal	__wmemcmp_ia32@GOTOFF(%ebx), %eax
-	testl	$bit_SSSE3, CPUID_OFFSET+index_SSSE3+__cpu_features@GOTOFF(%ebx)
+	LOAD_GOT_AND_RTLD_GLOBAL_RO
+	LOAD_FUNC_GOT_EAX (__wmemcmp_ia32)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
-	leal	__wmemcmp_ssse3@GOTOFF(%ebx), %eax
-	testl	$bit_SSE4_2, CPUID_OFFSET+index_SSE4_2+__cpu_features@GOTOFF(%ebx)
+	LOAD_FUNC_GOT_EAX (__wmemcmp_ssse3)
+	HAS_CPU_FEATURE (SSE4_2)
 	jz	2f
-	leal	__wmemcmp_sse4_2@GOTOFF(%ebx), %eax
-2:	popl	%ebx
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (ebx)
-	ret
+	LOAD_FUNC_GOT_EAX (__wmemcmp_sse4_2)
+2:	ret
 END(wmemcmp)
 #endif

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=0b5395f052ee09cd7e3d219af4e805c38058afb5

commit 0b5395f052ee09cd7e3d219af4e805c38058afb5
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Aug 13 03:38:47 2015 -0700

    Update x86_64 multiarch functions for <cpu-features.h>
    
    This patch updates x86_64 multiarch functions to use the newly defined
    HAS_CPU_FEATURE, HAS_ARCH_FEATURE and LOAD_RTLD_GLOBAL_RO_RDX from
    <cpu-features.h>.
    
    	* sysdeps/x86_64/fpu/multiarch/e_asin.c: Replace HAS_XXX with
    	HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
    	* sysdeps/x86_64/fpu/multiarch/e_atan2.c: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/e_exp.c: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/e_log.c: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/e_pow.c: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_atan.c: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_fma.c: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_fmaf.c: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_sin.c: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_tan.c: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_ceil.S: Use
    	LOAD_RTLD_GLOBAL_RO_RDX and HAS_CPU_FEATURE (SSE4_1).
    	* sysdeps/x86_64/fpu/multiarch/s_ceilf.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_floor.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_floorf.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_nearbyint.S : Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_rintf.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_rintf.S : Likewise.
    	* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise.
    	* sysdeps/x86_64/multiarch/sched_cpucount.c: Likewise.
    	* sysdeps/x86_64/multiarch/strstr.c: Likewise.
    	* sysdeps/x86_64/multiarch/memmove.c: Likewise.
    	* sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
    	* sysdeps/x86_64/multiarch/test-multiarch.c: Likewise.
    	* sysdeps/x86_64/multiarch/memcmp.S: Remove __init_cpu_features
    	call.  Add LOAD_RTLD_GLOBAL_RO_RDX.  Replace HAS_XXX with
    	HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
    	* sysdeps/x86_64/multiarch/memcpy.S: Likewise.
    	* sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
    	* sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
    	* sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
    	* sysdeps/x86_64/multiarch/memset.S: Likewise.
    	* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
    	* sysdeps/x86_64/multiarch/strcat.S: Likewise.
    	* sysdeps/x86_64/multiarch/strchr.S: Likewise.
    	* sysdeps/x86_64/multiarch/strcmp.S: Likewise.
    	* sysdeps/x86_64/multiarch/strcpy.S: Likewise.
    	* sysdeps/x86_64/multiarch/strcspn.S: Likewise.
    	* sysdeps/x86_64/multiarch/strspn.S: Likewise.
    	* sysdeps/x86_64/multiarch/wcscpy.S: Likewise.
    	* sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.

diff --git a/ChangeLog b/ChangeLog
index 2775dba..5ea2847 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,51 @@
 2015-08-13  H.J. Lu  <hongjiu.lu@intel.com>
 
+	* sysdeps/x86_64/fpu/multiarch/e_asin.c: Replace HAS_XXX with
+	HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
+	* sysdeps/x86_64/fpu/multiarch/e_atan2.c: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/e_exp.c: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/e_log.c: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/e_pow.c: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/s_atan.c: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/s_fma.c: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/s_fmaf.c: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/s_sin.c: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/s_tan.c: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/s_ceil.S: Use
+	LOAD_RTLD_GLOBAL_RO_RDX and HAS_CPU_FEATURE (SSE4_1).
+	* sysdeps/x86_64/fpu/multiarch/s_ceilf.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/s_floor.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/s_floorf.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/s_nearbyint.S : Likewise.
+	* sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/s_rintf.S: Likewise.
+	* sysdeps/x86_64/fpu/multiarch/s_rintf.S : Likewise.
+	* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise.
+	* sysdeps/x86_64/multiarch/sched_cpucount.c: Likewise.
+	* sysdeps/x86_64/multiarch/strstr.c: Likewise.
+	* sysdeps/x86_64/multiarch/memmove.c: Likewise.
+	* sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
+	* sysdeps/x86_64/multiarch/test-multiarch.c: Likewise.
+	* sysdeps/x86_64/multiarch/memcmp.S: Remove __init_cpu_features
+	call.  Add LOAD_RTLD_GLOBAL_RO_RDX.  Replace HAS_XXX with
+	HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
+	* sysdeps/x86_64/multiarch/memcpy.S: Likewise.
+	* sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
+	* sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
+	* sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
+	* sysdeps/x86_64/multiarch/memset.S: Likewise.
+	* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
+	* sysdeps/x86_64/multiarch/strcat.S: Likewise.
+	* sysdeps/x86_64/multiarch/strchr.S: Likewise.
+	* sysdeps/x86_64/multiarch/strcmp.S: Likewise.
+	* sysdeps/x86_64/multiarch/strcpy.S: Likewise.
+	* sysdeps/x86_64/multiarch/strcspn.S: Likewise.
+	* sysdeps/x86_64/multiarch/strspn.S: Likewise.
+	* sysdeps/x86_64/multiarch/wcscpy.S: Likewise.
+	* sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.
+
+2015-08-13  H.J. Lu  <hongjiu.lu@intel.com>
+
 	* sysdeps/i386/dl-machine.h: Include <cpu-features.c>.
 	(dl_platform_init): Call init_cpu_features.
 	* sysdeps/i386/dl-procinfo.c (_dl_x86_cpu_features): New.
diff --git a/sysdeps/x86_64/fpu/multiarch/e_asin.c b/sysdeps/x86_64/fpu/multiarch/e_asin.c
index 55865c0..a0edb96 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_asin.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_asin.c
@@ -9,11 +9,15 @@ extern double __ieee754_acos_fma4 (double);
 extern double __ieee754_asin_fma4 (double);
 
 libm_ifunc (__ieee754_acos,
-	    HAS_FMA4 ? __ieee754_acos_fma4 : __ieee754_acos_sse2);
+	    HAS_ARCH_FEATURE (FMA4_Usable)
+	    ? __ieee754_acos_fma4
+	    : __ieee754_acos_sse2);
 strong_alias (__ieee754_acos, __acos_finite)
 
 libm_ifunc (__ieee754_asin,
-	    HAS_FMA4 ? __ieee754_asin_fma4 : __ieee754_asin_sse2);
+	    HAS_ARCH_FEATURE (FMA4_Usable)
+	    ? __ieee754_asin_fma4
+	    : __ieee754_asin_sse2);
 strong_alias (__ieee754_asin, __asin_finite)
 
 # define __ieee754_acos __ieee754_acos_sse2
diff --git a/sysdeps/x86_64/fpu/multiarch/e_atan2.c b/sysdeps/x86_64/fpu/multiarch/e_atan2.c
index 547681c..269dcc9 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_atan2.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_atan2.c
@@ -8,14 +8,15 @@ extern double __ieee754_atan2_avx (double, double);
 # ifdef HAVE_FMA4_SUPPORT
 extern double __ieee754_atan2_fma4 (double, double);
 # else
-#  undef HAS_FMA4
-#  define HAS_FMA4 0
+#  undef HAS_ARCH_FEATURE
+#  define HAS_ARCH_FEATURE(feature) 0
 #  define __ieee754_atan2_fma4 ((void *) 0)
 # endif
 
 libm_ifunc (__ieee754_atan2,
-	    HAS_FMA4 ? __ieee754_atan2_fma4
-	    : (HAS_AVX ? __ieee754_atan2_avx : __ieee754_atan2_sse2));
+	    HAS_ARCH_FEATURE (FMA4_Usable) ? __ieee754_atan2_fma4
+	    : (HAS_ARCH_FEATURE (AVX_Usable)
+	       ? __ieee754_atan2_avx : __ieee754_atan2_sse2));
 strong_alias (__ieee754_atan2, __atan2_finite)
 
 # define __ieee754_atan2 __ieee754_atan2_sse2
diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp.c b/sysdeps/x86_64/fpu/multiarch/e_exp.c
index d244954..9c124ca 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_exp.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_exp.c
@@ -8,14 +8,15 @@ extern double __ieee754_exp_avx (double);
 # ifdef HAVE_FMA4_SUPPORT
 extern double __ieee754_exp_fma4 (double);
 # else
-#  undef HAS_FMA4
-#  define HAS_FMA4 0
+#  undef HAS_ARCH_FEATURE
+#  define HAS_ARCH_FEATURE(feature) 0
 #  define __ieee754_exp_fma4 ((void *) 0)
 # endif
 
 libm_ifunc (__ieee754_exp,
-	    HAS_FMA4 ? __ieee754_exp_fma4
-	    : (HAS_AVX ? __ieee754_exp_avx : __ieee754_exp_sse2));
+	    HAS_ARCH_FEATURE (FMA4_Usable) ? __ieee754_exp_fma4
+	    : (HAS_ARCH_FEATURE (AVX_Usable)
+	       ? __ieee754_exp_avx : __ieee754_exp_sse2));
 strong_alias (__ieee754_exp, __exp_finite)
 
 # define __ieee754_exp __ieee754_exp_sse2
diff --git a/sysdeps/x86_64/fpu/multiarch/e_log.c b/sysdeps/x86_64/fpu/multiarch/e_log.c
index 9805473..04e9ac5 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_log.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_log.c
@@ -8,14 +8,15 @@ extern double __ieee754_log_avx (double);
 # ifdef HAVE_FMA4_SUPPORT
 extern double __ieee754_log_fma4 (double);
 # else
-#  undef HAS_FMA4
-#  define HAS_FMA4 0
+#  undef HAS_ARCH_FEATURE
+#  define HAS_ARCH_FEATURE(feature) 0
 #  define __ieee754_log_fma4 ((void *) 0)
 # endif
 
 libm_ifunc (__ieee754_log,
-	    HAS_FMA4 ? __ieee754_log_fma4
-	    : (HAS_AVX ? __ieee754_log_avx : __ieee754_log_sse2));
+	    HAS_ARCH_FEATURE (FMA4_Usable) ? __ieee754_log_fma4
+	    : (HAS_ARCH_FEATURE (AVX_Usable)
+	       ? __ieee754_log_avx : __ieee754_log_sse2));
 strong_alias (__ieee754_log, __log_finite)
 
 # define __ieee754_log __ieee754_log_sse2
diff --git a/sysdeps/x86_64/fpu/multiarch/e_pow.c b/sysdeps/x86_64/fpu/multiarch/e_pow.c
index 433cce0..6d422d6 100644
--- a/sysdeps/x86_64/fpu/multiarch/e_pow.c
+++ b/sysdeps/x86_64/fpu/multiarch/e_pow.c
@@ -6,7 +6,10 @@
 extern double __ieee754_pow_sse2 (double, double);
 extern double __ieee754_pow_fma4 (double, double);
 
-libm_ifunc (__ieee754_pow, HAS_FMA4 ? __ieee754_pow_fma4 : __ieee754_pow_sse2);
+libm_ifunc (__ieee754_pow,
+	    HAS_ARCH_FEATURE (FMA4_Usable)
+	    ? __ieee754_pow_fma4
+	    : __ieee754_pow_sse2);
 strong_alias (__ieee754_pow, __pow_finite)
 
 # define __ieee754_pow __ieee754_pow_sse2
diff --git a/sysdeps/x86_64/fpu/multiarch/s_atan.c b/sysdeps/x86_64/fpu/multiarch/s_atan.c
index ae16d7c..57b5c65 100644
--- a/sysdeps/x86_64/fpu/multiarch/s_atan.c
+++ b/sysdeps/x86_64/fpu/multiarch/s_atan.c
@@ -7,13 +7,14 @@ extern double __atan_avx (double);
 # ifdef HAVE_FMA4_SUPPORT
 extern double __atan_fma4 (double);
 # else
-#  undef HAS_FMA4
-#  define HAS_FMA4 0
+#  undef HAS_ARCH_FEATURE
+#  define HAS_ARCH_FEATURE(feature) 0
 #  define __atan_fma4 ((void *) 0)
 # endif
 
-libm_ifunc (atan, (HAS_FMA4 ? __atan_fma4 :
-		   HAS_AVX ? __atan_avx : __atan_sse2));
+libm_ifunc (atan, (HAS_ARCH_FEATURE (FMA4_Usable) ? __atan_fma4 :
+		   HAS_ARCH_FEATURE (AVX_Usable)
+		   ? __atan_avx : __atan_sse2));
 
 # define atan __atan_sse2
 #endif
diff --git a/sysdeps/x86_64/fpu/multiarch/s_ceil.S b/sysdeps/x86_64/fpu/multiarch/s_ceil.S
index 00ecede..c1b9026 100644
--- a/sysdeps/x86_64/fpu/multiarch/s_ceil.S
+++ b/sysdeps/x86_64/fpu/multiarch/s_ceil.S
@@ -22,10 +22,9 @@
 
 ENTRY(__ceil)
 	.type	__ceil, @gnu_indirect_function
-	call	__get_cpu_features@plt
-	movq	%rax, %rdx
+	LOAD_RTLD_GLOBAL_RO_RDX
 	leaq	__ceil_sse41(%rip), %rax
-	testl	$bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx)
+	HAS_CPU_FEATURE (SSE4_1)
 	jnz	2f
 	leaq	__ceil_c(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/fpu/multiarch/s_ceilf.S b/sysdeps/x86_64/fpu/multiarch/s_ceilf.S
index c8ed705..7809e03 100644
--- a/sysdeps/x86_64/fpu/multiarch/s_ceilf.S
+++ b/sysdeps/x86_64/fpu/multiarch/s_ceilf.S
@@ -22,10 +22,9 @@
 
 ENTRY(__ceilf)
 	.type	__ceilf, @gnu_indirect_function
-	call	__get_cpu_features@plt
-	movq	%rax, %rdx
+	LOAD_RTLD_GLOBAL_RO_RDX
 	leaq	__ceilf_sse41(%rip), %rax
-	testl	$bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx)
+	HAS_CPU_FEATURE (SSE4_1)
 	jnz	2f
 	leaq	__ceilf_c(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/fpu/multiarch/s_floor.S b/sysdeps/x86_64/fpu/multiarch/s_floor.S
index 952ffaa..fa3f98e 100644
--- a/sysdeps/x86_64/fpu/multiarch/s_floor.S
+++ b/sysdeps/x86_64/fpu/multiarch/s_floor.S
@@ -22,10 +22,9 @@
 
 ENTRY(__floor)
 	.type	__floor, @gnu_indirect_function
-	call	__get_cpu_features@plt
-	movq	%rax, %rdx
+	LOAD_RTLD_GLOBAL_RO_RDX
 	leaq	__floor_sse41(%rip), %rax
-	testl	$bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx)
+	HAS_CPU_FEATURE (SSE4_1)
 	jnz	2f
 	leaq	__floor_c(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/fpu/multiarch/s_floorf.S b/sysdeps/x86_64/fpu/multiarch/s_floorf.S
index c8231e8..f60f662 100644
--- a/sysdeps/x86_64/fpu/multiarch/s_floorf.S
+++ b/sysdeps/x86_64/fpu/multiarch/s_floorf.S
@@ -22,10 +22,10 @@
 
 ENTRY(__floorf)
 	.type	__floorf, @gnu_indirect_function
-	call	__get_cpu_features@plt
+	LOAD_RTLD_GLOBAL_RO_RDX
 	movq	%rax, %rdx
 	leaq	__floorf_sse41(%rip), %rax
-	testl	$bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx)
+	HAS_CPU_FEATURE (SSE4_1)
 	jnz	2f
 	leaq	__floorf_c(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/fpu/multiarch/s_fma.c b/sysdeps/x86_64/fpu/multiarch/s_fma.c
index 0963a0b..78e7732 100644
--- a/sysdeps/x86_64/fpu/multiarch/s_fma.c
+++ b/sysdeps/x86_64/fpu/multiarch/s_fma.c
@@ -42,14 +42,15 @@ __fma_fma4 (double x, double y, double z)
   return x;
 }
 # else
-#  undef HAS_FMA4
-#  define HAS_FMA4 0
+#  undef HAS_ARCH_FEATURE
+#  define HAS_ARCH_FEATURE(feature) 0
 #  define __fma_fma4 ((void *) 0)
 # endif
 
 
-libm_ifunc (__fma, HAS_FMA
-	    ? __fma_fma3 : (HAS_FMA4 ? __fma_fma4 : __fma_sse2));
+libm_ifunc (__fma, HAS_ARCH_FEATURE (FMA_Usable)
+	    ? __fma_fma3 : (HAS_ARCH_FEATURE (FMA4_Usable)
+			    ? __fma_fma4 : __fma_sse2));
 weak_alias (__fma, fma)
 
 # define __fma __fma_sse2
diff --git a/sysdeps/x86_64/fpu/multiarch/s_fmaf.c b/sysdeps/x86_64/fpu/multiarch/s_fmaf.c
index 6046961..bebd3ee 100644
--- a/sysdeps/x86_64/fpu/multiarch/s_fmaf.c
+++ b/sysdeps/x86_64/fpu/multiarch/s_fmaf.c
@@ -41,14 +41,15 @@ __fmaf_fma4 (float x, float y, float z)
   return x;
 }
 # else
-#  undef HAS_FMA4
-#  define HAS_FMA4 0
+#  undef HAS_ARCH_FEATURE
+#  define HAS_ARCH_FEATURE(feature) 0
 #  define __fmaf_fma4 ((void *) 0)
 # endif
 
 
-libm_ifunc (__fmaf, HAS_FMA
-	    ? __fmaf_fma3 : (HAS_FMA4 ? __fmaf_fma4 : __fmaf_sse2));
+libm_ifunc (__fmaf, HAS_ARCH_FEATURE (FMA_Usable)
+	    ? __fmaf_fma3 : (HAS_ARCH_FEATURE (FMA4_Usable)
+			     ? __fmaf_fma4 : __fmaf_sse2));
 weak_alias (__fmaf, fmaf)
 
 # define __fmaf __fmaf_sse2
diff --git a/sysdeps/x86_64/fpu/multiarch/s_nearbyint.S b/sysdeps/x86_64/fpu/multiarch/s_nearbyint.S
index b5d32b5..109395c 100644
--- a/sysdeps/x86_64/fpu/multiarch/s_nearbyint.S
+++ b/sysdeps/x86_64/fpu/multiarch/s_nearbyint.S
@@ -22,10 +22,10 @@
 
 ENTRY(__nearbyint)
 	.type	__nearbyint, @gnu_indirect_function
-	call	__get_cpu_features@plt
+	LOAD_RTLD_GLOBAL_RO_RDX
 	movq	%rax, %rdx
 	leaq	__nearbyint_sse41(%rip), %rax
-	testl	$bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx)
+	HAS_CPU_FEATURE (SSE4_1)
 	jnz	2f
 	leaq	__nearbyint_c(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S b/sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S
index cd7e177..b870c0c 100644
--- a/sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S
+++ b/sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S
@@ -22,10 +22,9 @@
 
 ENTRY(__nearbyintf)
 	.type	__nearbyintf, @gnu_indirect_function
-	call	__get_cpu_features@plt
-	movq	%rax, %rdx
+	LOAD_RTLD_GLOBAL_RO_RDX
 	leaq	__nearbyintf_sse41(%rip), %rax
-	testl	$bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx)
+	HAS_CPU_FEATURE (SSE4_1)
 	jnz	2f
 	leaq	__nearbyintf_c(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/fpu/multiarch/s_rint.S b/sysdeps/x86_64/fpu/multiarch/s_rint.S
index f52cef6..b238d49 100644
--- a/sysdeps/x86_64/fpu/multiarch/s_rint.S
+++ b/sysdeps/x86_64/fpu/multiarch/s_rint.S
@@ -22,10 +22,9 @@
 
 ENTRY(__rint)
 	.type	__rint, @gnu_indirect_function
-	call	__get_cpu_features@plt
-	movq	%rax, %rdx
+	LOAD_RTLD_GLOBAL_RO_RDX
 	leaq	__rint_sse41(%rip), %rax
-	testl	$bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx)
+	HAS_CPU_FEATURE (SSE4_1)
 	jnz	2f
 	leaq	__rint_c(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/fpu/multiarch/s_rintf.S b/sysdeps/x86_64/fpu/multiarch/s_rintf.S
index e2608d4..8869196 100644
--- a/sysdeps/x86_64/fpu/multiarch/s_rintf.S
+++ b/sysdeps/x86_64/fpu/multiarch/s_rintf.S
@@ -22,10 +22,9 @@
 
 ENTRY(__rintf)
 	.type	__rintf, @gnu_indirect_function
-	call	__get_cpu_features@plt
-	movq	%rax, %rdx
+	LOAD_RTLD_GLOBAL_RO_RDX
 	leaq	__rintf_sse41(%rip), %rax
-	testl	$bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx)
+	HAS_CPU_FEATURE (SSE4_1)
 	jnz	2f
 	leaq	__rintf_c(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/fpu/multiarch/s_sin.c b/sysdeps/x86_64/fpu/multiarch/s_sin.c
index a0c2521..3bc7330 100644
--- a/sysdeps/x86_64/fpu/multiarch/s_sin.c
+++ b/sysdeps/x86_64/fpu/multiarch/s_sin.c
@@ -11,18 +11,20 @@ extern double __sin_avx (double);
 extern double __cos_fma4 (double);
 extern double __sin_fma4 (double);
 # else
-#  undef HAS_FMA4
-#  define HAS_FMA4 0
+#  undef HAS_ARCH_FEATURE
+#  define HAS_ARCH_FEATURE(feature) 0
 #  define __cos_fma4 ((void *) 0)
 #  define __sin_fma4 ((void *) 0)
 # endif
 
-libm_ifunc (__cos, (HAS_FMA4 ? __cos_fma4 :
-		    HAS_AVX ? __cos_avx : __cos_sse2));
+libm_ifunc (__cos, (HAS_ARCH_FEATURE (FMA4_Usable) ? __cos_fma4 :
+		    HAS_ARCH_FEATURE (AVX_Usable)
+		    ? __cos_avx : __cos_sse2));
 weak_alias (__cos, cos)
 
-libm_ifunc (__sin, (HAS_FMA4 ? __sin_fma4 :
-		    HAS_AVX ? __sin_avx : __sin_sse2));
+libm_ifunc (__sin, (HAS_ARCH_FEATURE (FMA4_Usable) ? __sin_fma4 :
+		    HAS_ARCH_FEATURE (AVX_Usable)
+		    ? __sin_avx : __sin_sse2));
 weak_alias (__sin, sin)
 
 # define __cos __cos_sse2
diff --git a/sysdeps/x86_64/fpu/multiarch/s_tan.c b/sysdeps/x86_64/fpu/multiarch/s_tan.c
index 904308f..d99d9db 100644
--- a/sysdeps/x86_64/fpu/multiarch/s_tan.c
+++ b/sysdeps/x86_64/fpu/multiarch/s_tan.c
@@ -7,13 +7,14 @@ extern double __tan_avx (double);
 # ifdef HAVE_FMA4_SUPPORT
 extern double __tan_fma4 (double);
 # else
-#  undef HAS_FMA4
-#  define HAS_FMA4 0
+#  undef HAS_ARCH_FEATURE
+#  define HAS_ARCH_FEATURE(feature) 0
 #  define __tan_fma4 ((void *) 0)
 # endif
 
-libm_ifunc (tan, (HAS_FMA4 ? __tan_fma4 :
-		  HAS_AVX ? __tan_avx : __tan_sse2));
+libm_ifunc (tan, (HAS_ARCH_FEATURE (FMA4_Usable) ? __tan_fma4 :
+		  HAS_ARCH_FEATURE (AVX_Usable)
+		  ? __tan_avx : __tan_sse2));
 
 # define tan __tan_sse2
 #endif
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index b64e4f1..f5a576c 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -39,48 +39,57 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 
   /* Support sysdeps/x86_64/multiarch/memcmp.S.  */
   IFUNC_IMPL (i, name, memcmp,
-	      IFUNC_IMPL_ADD (array, i, memcmp, HAS_SSE4_1,
+	      IFUNC_IMPL_ADD (array, i, memcmp, HAS_CPU_FEATURE (SSE4_1),
 			      __memcmp_sse4_1)
-	      IFUNC_IMPL_ADD (array, i, memcmp, HAS_SSSE3, __memcmp_ssse3)
+	      IFUNC_IMPL_ADD (array, i, memcmp, HAS_CPU_FEATURE (SSSE3),
+			      __memcmp_ssse3)
 	      IFUNC_IMPL_ADD (array, i, memcmp, 1, __memcmp_sse2))
 
   /* Support sysdeps/x86_64/multiarch/memmove_chk.S.  */
   IFUNC_IMPL (i, name, __memmove_chk,
-	      IFUNC_IMPL_ADD (array, i, __memmove_chk, HAS_AVX,
+	      IFUNC_IMPL_ADD (array, i, __memmove_chk,
+			      HAS_ARCH_FEATURE (AVX_Usable),
 			      __memmove_chk_avx_unaligned)
-	      IFUNC_IMPL_ADD (array, i, __memmove_chk, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, __memmove_chk,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __memmove_chk_ssse3_back)
-	      IFUNC_IMPL_ADD (array, i, __memmove_chk, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, __memmove_chk,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __memmove_chk_ssse3)
 	      IFUNC_IMPL_ADD (array, i, __memmove_chk, 1,
 			      __memmove_chk_sse2))
 
   /* Support sysdeps/x86_64/multiarch/memmove.S.  */
   IFUNC_IMPL (i, name, memmove,
-	      IFUNC_IMPL_ADD (array, i, memmove, HAS_AVX,
+	      IFUNC_IMPL_ADD (array, i, memmove,
+			      HAS_ARCH_FEATURE (AVX_Usable),
 			      __memmove_avx_unaligned)
-	      IFUNC_IMPL_ADD (array, i, memmove, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, memmove, HAS_CPU_FEATURE (SSSE3),
 			      __memmove_ssse3_back)
-	      IFUNC_IMPL_ADD (array, i, memmove, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, memmove, HAS_CPU_FEATURE (SSSE3),
 			      __memmove_ssse3)
 	      IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_sse2))
 
 #ifdef HAVE_AVX2_SUPPORT
   /* Support sysdeps/x86_64/multiarch/memset_chk.S.  */
   IFUNC_IMPL (i, name, __memset_chk,
-	      IFUNC_IMPL_ADD (array, i, __memset_chk, 1, __memset_chk_sse2)
-	      IFUNC_IMPL_ADD (array, i, __memset_chk, HAS_AVX2,
+	      IFUNC_IMPL_ADD (array, i, __memset_chk, 1,
+			      __memset_chk_sse2)
+	      IFUNC_IMPL_ADD (array, i, __memset_chk,
+			      HAS_ARCH_FEATURE (AVX2_Usable),
 			      __memset_chk_avx2))
 
   /* Support sysdeps/x86_64/multiarch/memset.S.  */
   IFUNC_IMPL (i, name, memset,
 	      IFUNC_IMPL_ADD (array, i, memset, 1, __memset_sse2)
-	      IFUNC_IMPL_ADD (array, i, memset, HAS_AVX2, __memset_avx2))
+	      IFUNC_IMPL_ADD (array, i, memset,
+			      HAS_ARCH_FEATURE (AVX2_Usable),
+			      __memset_avx2))
 #endif
 
   /* Support sysdeps/x86_64/multiarch/stpncpy.S.  */
   IFUNC_IMPL (i, name, stpncpy,
-	      IFUNC_IMPL_ADD (array, i, stpncpy, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, stpncpy, HAS_CPU_FEATURE (SSSE3),
 			      __stpncpy_ssse3)
 	      IFUNC_IMPL_ADD (array, i, stpncpy, 1,
 			      __stpncpy_sse2_unaligned)
@@ -88,27 +97,34 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 
   /* Support sysdeps/x86_64/multiarch/stpcpy.S.  */
   IFUNC_IMPL (i, name, stpcpy,
-	      IFUNC_IMPL_ADD (array, i, stpcpy, HAS_SSSE3, __stpcpy_ssse3)
+	      IFUNC_IMPL_ADD (array, i, stpcpy, HAS_CPU_FEATURE (SSSE3),
+			      __stpcpy_ssse3)
 	      IFUNC_IMPL_ADD (array, i, stpcpy, 1, __stpcpy_sse2_unaligned)
 	      IFUNC_IMPL_ADD (array, i, stpcpy, 1, __stpcpy_sse2))
 
   /* Support sysdeps/x86_64/multiarch/strcasecmp_l.S.  */
   IFUNC_IMPL (i, name, strcasecmp,
-	      IFUNC_IMPL_ADD (array, i, strcasecmp, HAS_AVX,
+	      IFUNC_IMPL_ADD (array, i, strcasecmp,
+			      HAS_ARCH_FEATURE (AVX_Usable),
 			      __strcasecmp_avx)
-	      IFUNC_IMPL_ADD (array, i, strcasecmp, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, strcasecmp,
+			      HAS_CPU_FEATURE (SSE4_2),
 			      __strcasecmp_sse42)
-	      IFUNC_IMPL_ADD (array, i, strcasecmp, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, strcasecmp,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __strcasecmp_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strcasecmp, 1, __strcasecmp_sse2))
 
   /* Support sysdeps/x86_64/multiarch/strcasecmp_l.S.  */
   IFUNC_IMPL (i, name, strcasecmp_l,
-	      IFUNC_IMPL_ADD (array, i, strcasecmp_l, HAS_AVX,
+	      IFUNC_IMPL_ADD (array, i, strcasecmp_l,
+			      HAS_ARCH_FEATURE (AVX_Usable),
 			      __strcasecmp_l_avx)
-	      IFUNC_IMPL_ADD (array, i, strcasecmp_l, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, strcasecmp_l,
+			      HAS_CPU_FEATURE (SSE4_2),
 			      __strcasecmp_l_sse42)
-	      IFUNC_IMPL_ADD (array, i, strcasecmp_l, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, strcasecmp_l,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __strcasecmp_l_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strcasecmp_l, 1,
 			      __strcasecmp_l_sse2))
@@ -119,7 +135,8 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 
   /* Support sysdeps/x86_64/multiarch/strcat.S.  */
   IFUNC_IMPL (i, name, strcat,
-	      IFUNC_IMPL_ADD (array, i, strcat, HAS_SSSE3, __strcat_ssse3)
+	      IFUNC_IMPL_ADD (array, i, strcat, HAS_CPU_FEATURE (SSSE3),
+			      __strcat_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strcat, 1, __strcat_sse2_unaligned)
 	      IFUNC_IMPL_ADD (array, i, strcat, 1, __strcat_sse2))
 
@@ -130,48 +147,57 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 
   /* Support sysdeps/x86_64/multiarch/strcmp.S.  */
   IFUNC_IMPL (i, name, strcmp,
-	      IFUNC_IMPL_ADD (array, i, strcmp, HAS_SSE4_2, __strcmp_sse42)
-	      IFUNC_IMPL_ADD (array, i, strcmp, HAS_SSSE3, __strcmp_ssse3)
+	      IFUNC_IMPL_ADD (array, i, strcmp, HAS_CPU_FEATURE (SSE4_2),
+			      __strcmp_sse42)
+	      IFUNC_IMPL_ADD (array, i, strcmp, HAS_CPU_FEATURE (SSSE3),
+			      __strcmp_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strcmp, 1, __strcmp_sse2_unaligned)
 	      IFUNC_IMPL_ADD (array, i, strcmp, 1, __strcmp_sse2))
 
   /* Support sysdeps/x86_64/multiarch/strcpy.S.  */
   IFUNC_IMPL (i, name, strcpy,
-	      IFUNC_IMPL_ADD (array, i, strcpy, HAS_SSSE3, __strcpy_ssse3)
+	      IFUNC_IMPL_ADD (array, i, strcpy, HAS_CPU_FEATURE (SSSE3),
+			      __strcpy_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strcpy, 1, __strcpy_sse2_unaligned)
 	      IFUNC_IMPL_ADD (array, i, strcpy, 1, __strcpy_sse2))
 
   /* Support sysdeps/x86_64/multiarch/strcspn.S.  */
   IFUNC_IMPL (i, name, strcspn,
-	      IFUNC_IMPL_ADD (array, i, strcspn, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, strcspn, HAS_CPU_FEATURE (SSE4_2),
 			      __strcspn_sse42)
 	      IFUNC_IMPL_ADD (array, i, strcspn, 1, __strcspn_sse2))
 
   /* Support sysdeps/x86_64/multiarch/strncase_l.S.  */
   IFUNC_IMPL (i, name, strncasecmp,
-	      IFUNC_IMPL_ADD (array, i, strncasecmp, HAS_AVX,
+	      IFUNC_IMPL_ADD (array, i, strncasecmp,
+			      HAS_ARCH_FEATURE (AVX_Usable),
 			      __strncasecmp_avx)
-	      IFUNC_IMPL_ADD (array, i, strncasecmp, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, strncasecmp,
+			      HAS_CPU_FEATURE (SSE4_2),
 			      __strncasecmp_sse42)
-	      IFUNC_IMPL_ADD (array, i, strncasecmp, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, strncasecmp,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __strncasecmp_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strncasecmp, 1,
 			      __strncasecmp_sse2))
 
   /* Support sysdeps/x86_64/multiarch/strncase_l.S.  */
   IFUNC_IMPL (i, name, strncasecmp_l,
-	      IFUNC_IMPL_ADD (array, i, strncasecmp_l, HAS_AVX,
+	      IFUNC_IMPL_ADD (array, i, strncasecmp_l,
+			      HAS_ARCH_FEATURE (AVX_Usable),
 			      __strncasecmp_l_avx)
-	      IFUNC_IMPL_ADD (array, i, strncasecmp_l, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, strncasecmp_l,
+			      HAS_CPU_FEATURE (SSE4_2),
 			      __strncasecmp_l_sse42)
-	      IFUNC_IMPL_ADD (array, i, strncasecmp_l, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, strncasecmp_l,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __strncasecmp_l_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strncasecmp_l, 1,
 			      __strncasecmp_l_sse2))
 
   /* Support sysdeps/x86_64/multiarch/strncat.S.  */
   IFUNC_IMPL (i, name, strncat,
-	      IFUNC_IMPL_ADD (array, i, strncat, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, strncat, HAS_CPU_FEATURE (SSSE3),
 			      __strncat_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strncat, 1,
 			      __strncat_sse2_unaligned)
@@ -179,7 +205,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 
   /* Support sysdeps/x86_64/multiarch/strncpy.S.  */
   IFUNC_IMPL (i, name, strncpy,
-	      IFUNC_IMPL_ADD (array, i, strncpy, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, strncpy, HAS_CPU_FEATURE (SSSE3),
 			      __strncpy_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strncpy, 1,
 			      __strncpy_sse2_unaligned)
@@ -187,14 +213,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 
   /* Support sysdeps/x86_64/multiarch/strpbrk.S.  */
   IFUNC_IMPL (i, name, strpbrk,
-	      IFUNC_IMPL_ADD (array, i, strpbrk, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, strpbrk, HAS_CPU_FEATURE (SSE4_2),
 			      __strpbrk_sse42)
 	      IFUNC_IMPL_ADD (array, i, strpbrk, 1, __strpbrk_sse2))
 
 
   /* Support sysdeps/x86_64/multiarch/strspn.S.  */
   IFUNC_IMPL (i, name, strspn,
-	      IFUNC_IMPL_ADD (array, i, strspn, HAS_SSE4_2, __strspn_sse42)
+	      IFUNC_IMPL_ADD (array, i, strspn, HAS_CPU_FEATURE (SSE4_2),
+			      __strspn_sse42)
 	      IFUNC_IMPL_ADD (array, i, strspn, 1, __strspn_sse2))
 
   /* Support sysdeps/x86_64/multiarch/strstr.c.  */
@@ -204,65 +231,75 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 
   /* Support sysdeps/x86_64/multiarch/wcscpy.S.  */
   IFUNC_IMPL (i, name, wcscpy,
-	      IFUNC_IMPL_ADD (array, i, wcscpy, HAS_SSSE3, __wcscpy_ssse3)
+	      IFUNC_IMPL_ADD (array, i, wcscpy, HAS_CPU_FEATURE (SSSE3),
+			      __wcscpy_ssse3)
 	      IFUNC_IMPL_ADD (array, i, wcscpy, 1, __wcscpy_sse2))
 
   /* Support sysdeps/x86_64/multiarch/wmemcmp.S.  */
   IFUNC_IMPL (i, name, wmemcmp,
-	      IFUNC_IMPL_ADD (array, i, wmemcmp, HAS_SSE4_1,
+	      IFUNC_IMPL_ADD (array, i, wmemcmp, HAS_CPU_FEATURE (SSE4_1),
 			      __wmemcmp_sse4_1)
-	      IFUNC_IMPL_ADD (array, i, wmemcmp, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, wmemcmp, HAS_CPU_FEATURE (SSSE3),
 			      __wmemcmp_ssse3)
 	      IFUNC_IMPL_ADD (array, i, wmemcmp, 1, __wmemcmp_sse2))
 
 #ifdef SHARED
   /* Support sysdeps/x86_64/multiarch/memcpy_chk.S.  */
   IFUNC_IMPL (i, name, __memcpy_chk,
-	      IFUNC_IMPL_ADD (array, i, __memcpy_chk, HAS_AVX,
+	      IFUNC_IMPL_ADD (array, i, __memcpy_chk,
+			      HAS_ARCH_FEATURE (AVX_Usable),
 			      __memcpy_chk_avx_unaligned)
-	      IFUNC_IMPL_ADD (array, i, __memcpy_chk, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, __memcpy_chk,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __memcpy_chk_ssse3_back)
-	      IFUNC_IMPL_ADD (array, i, __memcpy_chk, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, __memcpy_chk,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __memcpy_chk_ssse3)
 	      IFUNC_IMPL_ADD (array, i, __memcpy_chk, 1,
 			      __memcpy_chk_sse2))
 
   /* Support sysdeps/x86_64/multiarch/memcpy.S.  */
   IFUNC_IMPL (i, name, memcpy,
-	      IFUNC_IMPL_ADD (array, i, memcpy, HAS_AVX,
+	      IFUNC_IMPL_ADD (array, i, memcpy,
+			      HAS_ARCH_FEATURE (AVX_Usable),
 			      __memcpy_avx_unaligned)
-	      IFUNC_IMPL_ADD (array, i, memcpy, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, memcpy, HAS_CPU_FEATURE (SSSE3),
 			      __memcpy_ssse3_back)
-	      IFUNC_IMPL_ADD (array, i, memcpy, HAS_SSSE3, __memcpy_ssse3)
+	      IFUNC_IMPL_ADD (array, i, memcpy, HAS_CPU_FEATURE (SSSE3),
+			      __memcpy_ssse3)
 	      IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_sse2_unaligned)
 	      IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_sse2))
 
   /* Support sysdeps/x86_64/multiarch/mempcpy_chk.S.  */
   IFUNC_IMPL (i, name, __mempcpy_chk,
-	      IFUNC_IMPL_ADD (array, i, __mempcpy_chk, HAS_AVX,
+	      IFUNC_IMPL_ADD (array, i, __mempcpy_chk,
+			      HAS_ARCH_FEATURE (AVX_Usable),
 			      __mempcpy_chk_avx_unaligned)
-	      IFUNC_IMPL_ADD (array, i, __mempcpy_chk, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, __mempcpy_chk,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __mempcpy_chk_ssse3_back)
-	      IFUNC_IMPL_ADD (array, i, __mempcpy_chk, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, __mempcpy_chk,
+			      HAS_CPU_FEATURE (SSSE3),
 			      __mempcpy_chk_ssse3)
 	      IFUNC_IMPL_ADD (array, i, __mempcpy_chk, 1,
 			      __mempcpy_chk_sse2))
 
   /* Support sysdeps/x86_64/multiarch/mempcpy.S.  */
   IFUNC_IMPL (i, name, mempcpy,
-	      IFUNC_IMPL_ADD (array, i, mempcpy, HAS_AVX,
+	      IFUNC_IMPL_ADD (array, i, mempcpy,
+			      HAS_ARCH_FEATURE (AVX_Usable),
 			      __mempcpy_avx_unaligned)
-	      IFUNC_IMPL_ADD (array, i, mempcpy, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, mempcpy, HAS_CPU_FEATURE (SSSE3),
 			      __mempcpy_ssse3_back)
-	      IFUNC_IMPL_ADD (array, i, mempcpy, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, mempcpy, HAS_CPU_FEATURE (SSSE3),
 			      __mempcpy_ssse3)
 	      IFUNC_IMPL_ADD (array, i, mempcpy, 1, __mempcpy_sse2))
 
   /* Support sysdeps/x86_64/multiarch/strncmp.S.  */
   IFUNC_IMPL (i, name, strncmp,
-	      IFUNC_IMPL_ADD (array, i, strncmp, HAS_SSE4_2,
+	      IFUNC_IMPL_ADD (array, i, strncmp, HAS_CPU_FEATURE (SSE4_2),
 			      __strncmp_sse42)
-	      IFUNC_IMPL_ADD (array, i, strncmp, HAS_SSSE3,
+	      IFUNC_IMPL_ADD (array, i, strncmp, HAS_CPU_FEATURE (SSSE3),
 			      __strncmp_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strncmp, 1, __strncmp_sse2))
 #endif
diff --git a/sysdeps/x86_64/multiarch/memcmp.S b/sysdeps/x86_64/multiarch/memcmp.S
index f8b4636..871a081 100644
--- a/sysdeps/x86_64/multiarch/memcmp.S
+++ b/sysdeps/x86_64/multiarch/memcmp.S
@@ -26,16 +26,13 @@
 	.text
 ENTRY(memcmp)
 	.type	memcmp, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features(%rip)
-	jne	1f
-	call	__init_cpu_features
-
-1:	testl	$bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+	HAS_CPU_FEATURE (SSSE3)
 	jnz	2f
 	leaq	__memcmp_sse2(%rip), %rax
 	ret
 
-2:	testl	$bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+2:	HAS_CPU_FEATURE (SSE4_1)
 	jz	3f
 	leaq	__memcmp_sse4_1(%rip), %rax
 	ret
diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S
index 4e18cd3..7e119d3 100644
--- a/sysdeps/x86_64/multiarch/memcpy.S
+++ b/sysdeps/x86_64/multiarch/memcpy.S
@@ -29,19 +29,17 @@
 	.text
 ENTRY(__new_memcpy)
 	.type	__new_memcpy, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features(%rip)
-	jne	1f
-	call	__init_cpu_features
-1:	leaq	__memcpy_avx_unaligned(%rip), %rax
-	testl	$bit_AVX_Fast_Unaligned_Load, __cpu_features+FEATURE_OFFSET+index_AVX_Fast_Unaligned_Load(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+	leaq	__memcpy_avx_unaligned(%rip), %rax
+	HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
 	jz 1f
 	ret
 1:	leaq	__memcpy_sse2(%rip), %rax
-	testl	$bit_Slow_BSF, __cpu_features+FEATURE_OFFSET+index_Slow_BSF(%rip)
+	HAS_ARCH_FEATURE (Slow_BSF)
 	jnz	2f
 	leaq	__memcpy_sse2_unaligned(%rip), %rax
 	ret
-2:	testl   $bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip)
+2:	HAS_CPU_FEATURE (SSSE3)
 	jz 3f
 	leaq    __memcpy_ssse3(%rip), %rax
 3:	ret
diff --git a/sysdeps/x86_64/multiarch/memcpy_chk.S b/sysdeps/x86_64/multiarch/memcpy_chk.S
index 1e756ea..81f83dd 100644
--- a/sysdeps/x86_64/multiarch/memcpy_chk.S
+++ b/sysdeps/x86_64/multiarch/memcpy_chk.S
@@ -29,17 +29,15 @@
 	.text
 ENTRY(__memcpy_chk)
 	.type	__memcpy_chk, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features(%rip)
-	jne	1f
-	call	__init_cpu_features
-1:	leaq	__memcpy_chk_sse2(%rip), %rax
-	testl	$bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+	leaq	__memcpy_chk_sse2(%rip), %rax
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
 	leaq	__memcpy_chk_ssse3(%rip), %rax
-	testl	$bit_Fast_Copy_Backward, __cpu_features+FEATURE_OFFSET+index_Fast_Copy_Backward(%rip)
+	HAS_ARCH_FEATURE (Fast_Copy_Backward)
 	jz	2f
 	leaq	__memcpy_chk_ssse3_back(%rip), %rax
-	testl   $bit_AVX_Fast_Unaligned_Load, __cpu_features+FEATURE_OFFSET+index_AVX_Fast_Unaligned_Load(%rip)
+	HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
 	jz  2f
 	leaq    __memcpy_chk_avx_unaligned(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/multiarch/memmove.c b/sysdeps/x86_64/multiarch/memmove.c
index dd153a3..bbddbc1 100644
--- a/sysdeps/x86_64/multiarch/memmove.c
+++ b/sysdeps/x86_64/multiarch/memmove.c
@@ -49,10 +49,10 @@ extern __typeof (__redirect_memmove) __memmove_avx_unaligned attribute_hidden;
    ifunc symbol properly.  */
 extern __typeof (__redirect_memmove) __libc_memmove;
 libc_ifunc (__libc_memmove,
-	    HAS_AVX_FAST_UNALIGNED_LOAD
+	    HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
 	    ? __memmove_avx_unaligned
-	    : (HAS_SSSE3
-	       ? (HAS_FAST_COPY_BACKWARD
+	    : (HAS_CPU_FEATURE (SSSE3)
+	       ? (HAS_ARCH_FEATURE (Fast_Copy_Backward)
 	          ? __memmove_ssse3_back : __memmove_ssse3)
 	       : __memmove_sse2));
 
diff --git a/sysdeps/x86_64/multiarch/memmove_chk.c b/sysdeps/x86_64/multiarch/memmove_chk.c
index 8b12d00..5f70e3a 100644
--- a/sysdeps/x86_64/multiarch/memmove_chk.c
+++ b/sysdeps/x86_64/multiarch/memmove_chk.c
@@ -30,8 +30,8 @@ extern __typeof (__memmove_chk) __memmove_chk_avx_unaligned attribute_hidden;
 #include "debug/memmove_chk.c"
 
 libc_ifunc (__memmove_chk,
-	    HAS_AVX_FAST_UNALIGNED_LOAD ? __memmove_chk_avx_unaligned :
-	    (HAS_SSSE3
-	    ? (HAS_FAST_COPY_BACKWARD
+	    HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) ? __memmove_chk_avx_unaligned :
+	    (HAS_CPU_FEATURE (SSSE3)
+	    ? (HAS_ARCH_FEATURE (Fast_Copy_Backward)
 	       ? __memmove_chk_ssse3_back : __memmove_chk_ssse3)
 	    : __memmove_chk_sse2));
diff --git a/sysdeps/x86_64/multiarch/mempcpy.S b/sysdeps/x86_64/multiarch/mempcpy.S
index 2eaacdf..ad36840 100644
--- a/sysdeps/x86_64/multiarch/mempcpy.S
+++ b/sysdeps/x86_64/multiarch/mempcpy.S
@@ -27,17 +27,15 @@
 #if defined SHARED && IS_IN (libc)
 ENTRY(__mempcpy)
 	.type	__mempcpy, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features(%rip)
-	jne	1f
-	call	__init_cpu_features
-1:	leaq	__mempcpy_sse2(%rip), %rax
-	testl	$bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+	leaq	__mempcpy_sse2(%rip), %rax
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
 	leaq	__mempcpy_ssse3(%rip), %rax
-	testl	$bit_Fast_Copy_Backward, __cpu_features+FEATURE_OFFSET+index_Fast_Copy_Backward(%rip)
+	HAS_ARCH_FEATURE (Fast_Copy_Backward)
 	jz	2f
 	leaq	__mempcpy_ssse3_back(%rip), %rax
-	testl	$bit_AVX_Fast_Unaligned_Load, __cpu_features+FEATURE_OFFSET+index_AVX_Fast_Unaligned_Load(%rip)
+	HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
 	jz	2f
 	leaq	__mempcpy_avx_unaligned(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/multiarch/mempcpy_chk.S b/sysdeps/x86_64/multiarch/mempcpy_chk.S
index 17b8470..0a46b56 100644
--- a/sysdeps/x86_64/multiarch/mempcpy_chk.S
+++ b/sysdeps/x86_64/multiarch/mempcpy_chk.S
@@ -29,17 +29,15 @@
 	.text
 ENTRY(__mempcpy_chk)
 	.type	__mempcpy_chk, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features(%rip)
-	jne	1f
-	call	__init_cpu_features
-1:	leaq	__mempcpy_chk_sse2(%rip), %rax
-	testl	$bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+	leaq	__mempcpy_chk_sse2(%rip), %rax
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
 	leaq	__mempcpy_chk_ssse3(%rip), %rax
-	testl	$bit_Fast_Copy_Backward, __cpu_features+FEATURE_OFFSET+index_Fast_Copy_Backward(%rip)
+	HAS_ARCH_FEATURE (Fast_Copy_Backward)
 	jz	2f
 	leaq	__mempcpy_chk_ssse3_back(%rip), %rax
-	testl	$bit_AVX_Fast_Unaligned_Load, __cpu_features+FEATURE_OFFSET+index_AVX_Fast_Unaligned_Load(%rip)
+	HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
 	jz	2f
 	leaq	__mempcpy_chk_avx_unaligned(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/multiarch/memset.S b/sysdeps/x86_64/multiarch/memset.S
index c5f1fb3..16fefa7 100644
--- a/sysdeps/x86_64/multiarch/memset.S
+++ b/sysdeps/x86_64/multiarch/memset.S
@@ -26,11 +26,9 @@
 # if IS_IN (libc)
 ENTRY(memset)
 	.type	memset, @gnu_indirect_function
-	cmpl	$0, __cpu_features+KIND_OFFSET(%rip)
-	jne	1f
-	call	__init_cpu_features
-1:	leaq	__memset_sse2(%rip), %rax
-	testl	$bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+	leaq	__memset_sse2(%rip), %rax
+	HAS_ARCH_FEATURE (AVX2_Usable)
 	jz	2f
 	leaq	__memset_avx2(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/multiarch/memset_chk.S b/sysdeps/x86_64/multiarch/memset_chk.S
index 64fed31..ef8c64f 100644
--- a/sysdeps/x86_64/multiarch/memset_chk.S
+++ b/sysdeps/x86_64/multiarch/memset_chk.S
@@ -25,11 +25,9 @@
 # if defined SHARED && defined HAVE_AVX2_SUPPORT
 ENTRY(__memset_chk)
 	.type	__memset_chk, @gnu_indirect_function
-	cmpl	$0, __cpu_features+KIND_OFFSET(%rip)
-	jne	1f
-	call	__init_cpu_features
-1:	leaq	__memset_chk_sse2(%rip), %rax
-	testl	$bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+	leaq	__memset_chk_sse2(%rip), %rax
+	HAS_ARCH_FEATURE (AVX2_Usable)
 	jz	2f
 	leaq	__memset_chk_avx2(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/multiarch/sched_cpucount.c b/sysdeps/x86_64/multiarch/sched_cpucount.c
index 72ad7b0..e9391a2 100644
--- a/sysdeps/x86_64/multiarch/sched_cpucount.c
+++ b/sysdeps/x86_64/multiarch/sched_cpucount.c
@@ -33,4 +33,4 @@
 #undef __sched_cpucount
 
 libc_ifunc (__sched_cpucount,
-	    HAS_POPCOUNT ? popcount_cpucount : generic_cpucount);
+	    HAS_CPU_FEATURE (POPCOUNT) ? popcount_cpucount : generic_cpucount);
diff --git a/sysdeps/x86_64/multiarch/strcat.S b/sysdeps/x86_64/multiarch/strcat.S
index 44993fa..25d926c 100644
--- a/sysdeps/x86_64/multiarch/strcat.S
+++ b/sysdeps/x86_64/multiarch/strcat.S
@@ -47,14 +47,12 @@
 	.text
 ENTRY(STRCAT)
 	.type	STRCAT, @gnu_indirect_function
-	cmpl	$0, __cpu_features+KIND_OFFSET(%rip)
-	jne	1f
-	call	__init_cpu_features
-1:	leaq	STRCAT_SSE2_UNALIGNED(%rip), %rax
-	testl	$bit_Fast_Unaligned_Load, __cpu_features+FEATURE_OFFSET+index_Fast_Unaligned_Load(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+	leaq	STRCAT_SSE2_UNALIGNED(%rip), %rax
+	HAS_ARCH_FEATURE (Fast_Unaligned_Load)
 	jnz	2f
 	leaq	STRCAT_SSE2(%rip), %rax
-	testl	$bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
 	leaq	STRCAT_SSSE3(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/multiarch/strchr.S b/sysdeps/x86_64/multiarch/strchr.S
index af55fac..0c5fdd9 100644
--- a/sysdeps/x86_64/multiarch/strchr.S
+++ b/sysdeps/x86_64/multiarch/strchr.S
@@ -25,11 +25,9 @@
 	.text
 ENTRY(strchr)
 	.type	strchr, @gnu_indirect_function
-	cmpl	$0, __cpu_features+KIND_OFFSET(%rip)
-	jne	1f
-	call	__init_cpu_features
-1:	leaq	__strchr_sse2(%rip), %rax
-2:	testl	$bit_Slow_BSF, __cpu_features+FEATURE_OFFSET+index_Slow_BSF(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+	leaq	__strchr_sse2(%rip), %rax
+2:	HAS_ARCH_FEATURE (Slow_BSF)
 	jz	3f
 	leaq    __strchr_sse2_no_bsf(%rip), %rax
 3:	ret
diff --git a/sysdeps/x86_64/multiarch/strcmp.S b/sysdeps/x86_64/multiarch/strcmp.S
index f50f26c..c180ce6 100644
--- a/sysdeps/x86_64/multiarch/strcmp.S
+++ b/sysdeps/x86_64/multiarch/strcmp.S
@@ -84,24 +84,20 @@
 	.text
 ENTRY(STRCMP)
 	.type	STRCMP, @gnu_indirect_function
-	/* Manually inlined call to __get_cpu_features.  */
-	cmpl	$0, __cpu_features+KIND_OFFSET(%rip)
-	jne	1f
-	call	__init_cpu_features
-1:
+	LOAD_RTLD_GLOBAL_RO_RDX
 #ifdef USE_AS_STRCMP
 	leaq	__strcmp_sse2_unaligned(%rip), %rax
-	testl   $bit_Fast_Unaligned_Load, __cpu_features+FEATURE_OFFSET+index_Fast_Unaligned_Load(%rip)
+	HAS_ARCH_FEATURE (Fast_Unaligned_Load)
 	jnz     3f
 #else
-	testl	$bit_Slow_SSE4_2, __cpu_features+FEATURE_OFFSET+index_Slow_SSE4_2(%rip)
+	HAS_ARCH_FEATURE (Slow_SSE4_2)
 	jnz	2f
 	leaq	STRCMP_SSE42(%rip), %rax
-	testl	$bit_SSE4_2, __cpu_features+CPUID_OFFSET+index_SSE4_2(%rip)
+	HAS_CPU_FEATURE (SSE4_2)
 	jnz	3f
 #endif
 2:	leaq	STRCMP_SSSE3(%rip), %rax
-	testl	$bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip)
+	HAS_CPU_FEATURE (SSSE3)
 	jnz	3f
 	leaq	STRCMP_SSE2(%rip), %rax
 3:	ret
@@ -110,23 +106,19 @@ END(STRCMP)
 # ifdef USE_AS_STRCASECMP_L
 ENTRY(__strcasecmp)
 	.type	__strcasecmp, @gnu_indirect_function
-	/* Manually inlined call to __get_cpu_features.  */
-	cmpl	$0, __cpu_features+KIND_OFFSET(%rip)
-	jne	1f
-	call	__init_cpu_features
-1:
+	LOAD_RTLD_GLOBAL_RO_RDX
 #  ifdef HAVE_AVX_SUPPORT
 	leaq	__strcasecmp_avx(%rip), %rax
-	testl	$bit_AVX_Usable, __cpu_features+FEATURE_OFFSET+index_AVX_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX_Usable)
 	jnz	3f
 #  endif
-	testl	$bit_Slow_SSE4_2, __cpu_features+FEATURE_OFFSET+index_Slow_SSE4_2(%rip)
+	HAS_ARCH_FEATURE (Slow_SSE4_2)
 	jnz	2f
 	leaq	__strcasecmp_sse42(%rip), %rax
-	testl	$bit_SSE4_2, __cpu_features+CPUID_OFFSET+index_SSE4_2(%rip)
+	HAS_CPU_FEATURE (SSE4_2)
 	jnz	3f
 2:	leaq	__strcasecmp_ssse3(%rip), %rax
-	testl	$bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip)
+	HAS_CPU_FEATURE (SSSE3)
 	jnz	3f
 	leaq	__strcasecmp_sse2(%rip), %rax
 3:	ret
@@ -136,23 +128,19 @@ weak_alias (__strcasecmp, strcasecmp)
 # ifdef USE_AS_STRNCASECMP_L
 ENTRY(__strncasecmp)
 	.type	__strncasecmp, @gnu_indirect_function
-	/* Manually inlined call to __get_cpu_features.  */
-	cmpl	$0, __cpu_features+KIND_OFFSET(%rip)
-	jne	1f
-	call	__init_cpu_features
-1:
+	LOAD_RTLD_GLOBAL_RO_RDX
 #  ifdef HAVE_AVX_SUPPORT
 	leaq	__strncasecmp_avx(%rip), %rax
-	testl	$bit_AVX_Usable, __cpu_features+FEATURE_OFFSET+index_AVX_Usable(%rip)
+	HAS_ARCH_FEATURE (AVX_Usable)
 	jnz	3f
 #  endif
-	testl	$bit_Slow_SSE4_2, __cpu_features+FEATURE_OFFSET+index_Slow_SSE4_2(%rip)
+	HAS_ARCH_FEATURE (Slow_SSE4_2)
 	jnz	2f
 	leaq	__strncasecmp_sse42(%rip), %rax
-	testl	$bit_SSE4_2, __cpu_features+CPUID_OFFSET+index_SSE4_2(%rip)
+	HAS_CPU_FEATURE (SSE4_2)
 	jnz	3f
 2:	leaq	__strncasecmp_ssse3(%rip), %rax
-	testl	$bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip)
+	HAS_CPU_FEATURE (SSSE3)
 	jnz	3f
 	leaq	__strncasecmp_sse2(%rip), %rax
 3:	ret
diff --git a/sysdeps/x86_64/multiarch/strcpy.S b/sysdeps/x86_64/multiarch/strcpy.S
index 9464ee8..3aae8ee 100644
--- a/sysdeps/x86_64/multiarch/strcpy.S
+++ b/sysdeps/x86_64/multiarch/strcpy.S
@@ -61,14 +61,12 @@
 	.text
 ENTRY(STRCPY)
 	.type	STRCPY, @gnu_indirect_function
-	cmpl	$0, __cpu_features+KIND_OFFSET(%rip)
-	jne	1f
-	call	__init_cpu_features
-1:	leaq	STRCPY_SSE2_UNALIGNED(%rip), %rax
-	testl	$bit_Fast_Unaligned_Load, __cpu_features+FEATURE_OFFSET+index_Fast_Unaligned_Load(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+	leaq	STRCPY_SSE2_UNALIGNED(%rip), %rax
+	HAS_ARCH_FEATURE (Fast_Unaligned_Load)
 	jnz	2f
 	leaq	STRCPY_SSE2(%rip), %rax
-	testl	$bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip)
+	HAS_CPU_FEATURE (SSSE3)
 	jz	2f
 	leaq	STRCPY_SSSE3(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/multiarch/strcspn.S b/sysdeps/x86_64/multiarch/strcspn.S
index 95e882c..45c69b3 100644
--- a/sysdeps/x86_64/multiarch/strcspn.S
+++ b/sysdeps/x86_64/multiarch/strcspn.S
@@ -45,11 +45,9 @@
 	.text
 ENTRY(STRCSPN)
 	.type	STRCSPN, @gnu_indirect_function
-	cmpl	$0, __cpu_features+KIND_OFFSET(%rip)
-	jne	1f
-	call	__init_cpu_features
-1:	leaq	STRCSPN_SSE2(%rip), %rax
-	testl	$bit_SSE4_2, __cpu_features+CPUID_OFFSET+index_SSE4_2(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+	leaq	STRCSPN_SSE2(%rip), %rax
+	HAS_CPU_FEATURE (SSE4_2)
 	jz	2f
 	leaq	STRCSPN_SSE42(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/multiarch/strspn.S b/sysdeps/x86_64/multiarch/strspn.S
index b734c17..c4d3b27 100644
--- a/sysdeps/x86_64/multiarch/strspn.S
+++ b/sysdeps/x86_64/multiarch/strspn.S
@@ -30,11 +30,9 @@
 	.text
 ENTRY(strspn)
 	.type	strspn, @gnu_indirect_function
-	cmpl	$0, __cpu_features+KIND_OFFSET(%rip)
-	jne	1f
-	call	__init_cpu_features
-1:	leaq	__strspn_sse2(%rip), %rax
-	testl	$bit_SSE4_2, __cpu_features+CPUID_OFFSET+index_SSE4_2(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+	leaq	__strspn_sse2(%rip), %rax
+	HAS_CPU_FEATURE (SSE4_2)
 	jz	2f
 	leaq	__strspn_sse42(%rip), %rax
 2:	ret
diff --git a/sysdeps/x86_64/multiarch/strstr.c b/sysdeps/x86_64/multiarch/strstr.c
index 507994b..b8827f0 100644
--- a/sysdeps/x86_64/multiarch/strstr.c
+++ b/sysdeps/x86_64/multiarch/strstr.c
@@ -41,7 +41,10 @@ extern __typeof (__redirect_strstr) __strstr_sse2 attribute_hidden;
 /* Avoid DWARF definition DIE on ifunc symbol so that GDB can handle
    ifunc symbol properly.  */
 extern __typeof (__redirect_strstr) __libc_strstr;
-libc_ifunc (__libc_strstr, HAS_FAST_UNALIGNED_LOAD ? __strstr_sse2_unaligned : __strstr_sse2)
+libc_ifunc (__libc_strstr,
+	    HAS_ARCH_FEATURE (Fast_Unaligned_Load)
+	    ? __strstr_sse2_unaligned
+	    : __strstr_sse2)
 
 #undef strstr
 strong_alias (__libc_strstr, strstr)
diff --git a/sysdeps/x86_64/multiarch/test-multiarch.c b/sysdeps/x86_64/multiarch/test-multiarch.c
index 949d26e..e893894 100644
--- a/sysdeps/x86_64/multiarch/test-multiarch.c
+++ b/sysdeps/x86_64/multiarch/test-multiarch.c
@@ -75,12 +75,18 @@ do_test (int argc, char **argv)
   int fails;
 
   get_cpuinfo ();
-  fails = check_proc ("avx", HAS_AVX, "HAS_AVX");
-  fails += check_proc ("fma4", HAS_FMA4, "HAS_FMA4");
-  fails += check_proc ("sse4_2", HAS_SSE4_2, "HAS_SSE4_2");
-  fails += check_proc ("sse4_1", HAS_SSE4_1, "HAS_SSE4_1");
-  fails += check_proc ("ssse3", HAS_SSSE3, "HAS_SSSE3");
-  fails += check_proc ("popcnt", HAS_POPCOUNT, "HAS_POPCOUNT");
+  fails = check_proc ("avx", HAS_ARCH_FEATURE (AVX_Usable),
+		      "HAS_ARCH_FEATURE (AVX_Usable)");
+  fails += check_proc ("fma4", HAS_ARCH_FEATURE (FMA4_Usable),
+		       "HAS_ARCH_FEATURE (FMA4_Usable)");
+  fails += check_proc ("sse4_2", HAS_CPU_FEATURE (SSE4_2),
+		       "HAS_CPU_FEATURE (SSE4_2)");
+  fails += check_proc ("sse4_1", HAS_CPU_FEATURE (SSE4_1)
+		       , "HAS_CPU_FEATURE (SSE4_1)");
+  fails += check_proc ("ssse3", HAS_CPU_FEATURE (SSSE3),
+		       "HAS_CPU_FEATURE (SSSE3)");
+  fails += check_proc ("popcnt", HAS_CPU_FEATURE (POPCOUNT),
+		       "HAS_CPU_FEATURE (POPCOUNT)");
 
   printf ("%d differences between /proc/cpuinfo and glibc code.\n", fails);
 
diff --git a/sysdeps/x86_64/multiarch/wcscpy.S b/sysdeps/x86_64/multiarch/wcscpy.S
index ff2f5a7..c47c51c 100644
--- a/sysdeps/x86_64/multiarch/wcscpy.S
+++ b/sysdeps/x86_64/multiarch/wcscpy.S
@@ -27,11 +27,8 @@
 	.text
 ENTRY(wcscpy)
 	.type	wcscpy, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features(%rip)
-	jne	1f
-	call	__init_cpu_features
-
-1:	testl	$bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+	HAS_CPU_FEATURE (SSSE3)
 	jnz	2f
 	leaq	__wcscpy_sse2(%rip), %rax
 	ret
diff --git a/sysdeps/x86_64/multiarch/wmemcmp.S b/sysdeps/x86_64/multiarch/wmemcmp.S
index 109e245..62215f4 100644
--- a/sysdeps/x86_64/multiarch/wmemcmp.S
+++ b/sysdeps/x86_64/multiarch/wmemcmp.S
@@ -26,16 +26,13 @@
 	.text
 ENTRY(wmemcmp)
 	.type	wmemcmp, @gnu_indirect_function
-	cmpl	$0, KIND_OFFSET+__cpu_features(%rip)
-	jne	1f
-	call	__init_cpu_features
-
-1:	testl	$bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip)
+	LOAD_RTLD_GLOBAL_RO_RDX
+	HAS_CPU_FEATURE (SSSE3)
 	jnz	2f
 	leaq	__wmemcmp_sse2(%rip), %rax
 	ret
 
-2:	testl	$bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+2:	HAS_CPU_FEATURE (SSE4_1)
 	jz	3f
 	leaq	__wmemcmp_sse4_1(%rip), %rax
 	ret

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=e2e4f56056adddc3c1efe676b40a4b4f2453103b

commit e2e4f56056adddc3c1efe676b40a4b4f2453103b
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Aug 13 03:37:47 2015 -0700

    Add _dl_x86_cpu_features to rtld_global
    
    This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so
    and initializes it early before __libc_start_main is called so that
    cpu_features is always available when it is used and we can avoid
    calling __init_cpu_features in IFUNC selectors.
    
    	* sysdeps/i386/dl-machine.h: Include <cpu-features.c>.
    	(dl_platform_init): Call init_cpu_features.
    	* sysdeps/i386/dl-procinfo.c (_dl_x86_cpu_features): New.
    	* sysdeps/i386/i686/cacheinfo.c
    	(DISABLE_PREFERRED_MEMORY_INSTRUCTION): Removed.
    	* sysdeps/i386/i686/multiarch/Makefile (aux): Remove init-arch.
    	* sysdeps/i386/i686/multiarch/Versions: Removed.
    	* sysdeps/i386/i686/multiarch/ifunc-defines.sym (KIND_OFFSET):
    	Removed.
    	* sysdeps/i386/ldsodefs.h: Include <cpu-features.h>.
    	* sysdeps/unix/sysv/linux/x86/Makefile
    	(libpthread-sysdep_routines): Remove init-arch.
    	* sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c: Include
    	<sysdeps/x86_64/dl-procinfo.c> instead of
    	sysdeps/generic/dl-procinfo.c>.
    	* sysdeps/x86/Makefile [$(subdir) == csu] (gen-as-const-headers):
    	Add cpu-features-offsets.sym and rtld-global-offsets.sym.
    	[$(subdir) == elf] (sysdep-dl-routines): Add dl-get-cpu-features.
    	[$(subdir) == elf] (tests): Add tst-get-cpu-features.
    	[$(subdir) == elf] (tests-static): Add
    	tst-get-cpu-features-static.
    	* sysdeps/x86/Versions: New file.
    	* sysdeps/x86/cpu-features-offsets.sym: Likewise.
    	* sysdeps/x86/cpu-features.c: Likewise.
    	* sysdeps/x86/cpu-features.h: Likewise.
    	* sysdeps/x86/dl-get-cpu-features.c: Likewise.
    	* sysdeps/x86/libc-start.c: Likewise.
    	* sysdeps/x86/rtld-global-offsets.sym: Likewise.
    	* sysdeps/x86/tst-get-cpu-features-static.c: Likewise.
    	* sysdeps/x86/tst-get-cpu-features.c: Likewise.
    	* sysdeps/x86_64/dl-procinfo.c: Likewise.
    	* sysdeps/x86_64/cacheinfo.c (__cpuid_count): Removed.
    	Assume USE_MULTIARCH is defined and don't check it.
    	(is_intel): Replace __cpu_features with GLRO(dl_x86_cpu_features).
    	(is_amd): Likewise.
    	(max_cpuid): Likewise.
    	(intel_check_word): Likewise.
    	(__cache_sysconf): Don't call __init_cpu_features.
    	(__x86_preferred_memory_instruction): Removed.
    	(init_cacheinfo): Don't call __init_cpu_features. Replace
    	__cpu_features with GLRO(dl_x86_cpu_features).
    	* sysdeps/x86_64/dl-machine.h: <cpu-features.c>.
    	(dl_platform_init): Call init_cpu_features.
    	* sysdeps/x86_64/ldsodefs.h: Include <cpu-features.h>.
    	* sysdeps/x86_64/multiarch/Makefile (aux): Remove init-arch.
    	* sysdeps/x86_64/multiarch/Versions: Removed.
    	* sysdeps/x86_64/multiarch/cacheinfo.c: Likewise.
    	* sysdeps/x86_64/multiarch/init-arch.c: Likewise.
    	* sysdeps/x86_64/multiarch/ifunc-defines.sym (KIND_OFFSET):
    	Removed.
    	* sysdeps/x86_64/multiarch/init-arch.h: Rewrite.

diff --git a/ChangeLog b/ChangeLog
index d056197..2775dba 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,57 @@
+2015-08-13  H.J. Lu  <hongjiu.lu@intel.com>
+
+	* sysdeps/i386/dl-machine.h: Include <cpu-features.c>.
+	(dl_platform_init): Call init_cpu_features.
+	* sysdeps/i386/dl-procinfo.c (_dl_x86_cpu_features): New.
+	* sysdeps/i386/i686/cacheinfo.c
+	(DISABLE_PREFERRED_MEMORY_INSTRUCTION): Removed.
+	* sysdeps/i386/i686/multiarch/Makefile (aux): Remove init-arch.
+	* sysdeps/i386/i686/multiarch/Versions: Removed.
+	* sysdeps/i386/i686/multiarch/ifunc-defines.sym (KIND_OFFSET):
+	Removed.
+	* sysdeps/i386/ldsodefs.h: Include <cpu-features.h>.
+	* sysdeps/unix/sysv/linux/x86/Makefile
+	(libpthread-sysdep_routines): Remove init-arch.
+	* sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c: Include
+	<sysdeps/x86_64/dl-procinfo.c> instead of
+	sysdeps/generic/dl-procinfo.c>.
+	* sysdeps/x86/Makefile [$(subdir) == csu] (gen-as-const-headers):
+	Add cpu-features-offsets.sym and rtld-global-offsets.sym.
+	[$(subdir) == elf] (sysdep-dl-routines): Add dl-get-cpu-features.
+	[$(subdir) == elf] (tests): Add tst-get-cpu-features.
+	[$(subdir) == elf] (tests-static): Add
+	tst-get-cpu-features-static.
+	* sysdeps/x86/Versions: New file.
+	* sysdeps/x86/cpu-features-offsets.sym: Likewise.
+	* sysdeps/x86/cpu-features.c: Likewise.
+	* sysdeps/x86/cpu-features.h: Likewise.
+	* sysdeps/x86/dl-get-cpu-features.c: Likewise.
+	* sysdeps/x86/libc-start.c: Likewise.
+	* sysdeps/x86/rtld-global-offsets.sym: Likewise.
+	* sysdeps/x86/tst-get-cpu-features-static.c: Likewise.
+	* sysdeps/x86/tst-get-cpu-features.c: Likewise.
+	* sysdeps/x86_64/dl-procinfo.c: Likewise.
+	* sysdeps/x86_64/cacheinfo.c (__cpuid_count): Removed.
+	Assume USE_MULTIARCH is defined and don't check it.
+	(is_intel): Replace __cpu_features with GLRO(dl_x86_cpu_features).
+	(is_amd): Likewise.
+	(max_cpuid): Likewise.
+	(intel_check_word): Likewise.
+	(__cache_sysconf): Don't call __init_cpu_features.
+	(__x86_preferred_memory_instruction): Removed.
+	(init_cacheinfo): Don't call __init_cpu_features. Replace
+	__cpu_features with GLRO(dl_x86_cpu_features).
+	* sysdeps/x86_64/dl-machine.h: <cpu-features.c>.
+	(dl_platform_init): Call init_cpu_features.
+	* sysdeps/x86_64/ldsodefs.h: Include <cpu-features.h>.
+	* sysdeps/x86_64/multiarch/Makefile (aux): Remove init-arch.
+	* sysdeps/x86_64/multiarch/Versions: Removed.
+	* sysdeps/x86_64/multiarch/cacheinfo.c: Likewise.
+	* sysdeps/x86_64/multiarch/init-arch.c: Likewise.
+	* sysdeps/x86_64/multiarch/ifunc-defines.sym (KIND_OFFSET):
+	Removed.
+	* sysdeps/x86_64/multiarch/init-arch.h: Rewrite.
+
 2015-08-12  Paul Pluzhnikov  <ppluzhnikov@google.com>
 
 	[BZ #18820]
diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
index 04f9247..4a28eb3 100644
--- a/sysdeps/i386/dl-machine.h
+++ b/sysdeps/i386/dl-machine.h
@@ -25,6 +25,7 @@
 #include <sysdep.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
+#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -235,6 +236,8 @@ dl_platform_init (void)
   if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
     /* Avoid an empty string which would disturb us.  */
     GLRO(dl_platform) = NULL;
+
+  init_cpu_features (&GLRO(dl_x86_cpu_features));
 }
 
 static inline Elf32_Addr
diff --git a/sysdeps/i386/dl-procinfo.c b/sysdeps/i386/dl-procinfo.c
index b673b3c..e95f335 100644
--- a/sysdeps/i386/dl-procinfo.c
+++ b/sysdeps/i386/dl-procinfo.c
@@ -43,6 +43,22 @@
 # define PROCINFO_CLASS
 #endif
 
+#if !IS_IN (ldconfig)
+# if !defined PROCINFO_DECL && defined SHARED
+  ._dl_x86_cpu_features
+# else
+PROCINFO_CLASS struct cpu_features _dl_x86_cpu_features
+# endif
+# ifndef PROCINFO_DECL
+= { }
+# endif
+# if !defined SHARED || defined PROCINFO_DECL
+;
+# else
+,
+# endif
+#endif
+
 #if !defined PROCINFO_DECL && defined SHARED
   ._dl_x86_cap_flags
 #else
diff --git a/sysdeps/i386/i686/cacheinfo.c b/sysdeps/i386/i686/cacheinfo.c
index 0f869df..0b50c6d 100644
--- a/sysdeps/i386/i686/cacheinfo.c
+++ b/sysdeps/i386/i686/cacheinfo.c
@@ -1,4 +1,3 @@
 #define DISABLE_PREFETCHW
-#define DISABLE_PREFERRED_MEMORY_INSTRUCTION
 
 #include <sysdeps/x86_64/cacheinfo.c>
diff --git a/sysdeps/i386/i686/multiarch/Makefile b/sysdeps/i386/i686/multiarch/Makefile
index 11ce4ba..31bfd39 100644
--- a/sysdeps/i386/i686/multiarch/Makefile
+++ b/sysdeps/i386/i686/multiarch/Makefile
@@ -1,5 +1,4 @@
 ifeq ($(subdir),csu)
-aux += init-arch
 tests += test-multiarch
 gen-as-const-headers += ifunc-defines.sym
 endif
diff --git a/sysdeps/i386/i686/multiarch/ifunc-defines.sym b/sysdeps/i386/i686/multiarch/ifunc-defines.sym
index eb1538a..96e9cfa 100644
--- a/sysdeps/i386/i686/multiarch/ifunc-defines.sym
+++ b/sysdeps/i386/i686/multiarch/ifunc-defines.sym
@@ -4,7 +4,6 @@
 --
 
 CPU_FEATURES_SIZE	sizeof (struct cpu_features)
-KIND_OFFSET		offsetof (struct cpu_features, kind)
 CPUID_OFFSET		offsetof (struct cpu_features, cpuid)
 CPUID_SIZE		sizeof (struct cpuid_registers)
 CPUID_EAX_OFFSET	offsetof (struct cpuid_registers, eax)
diff --git a/sysdeps/i386/ldsodefs.h b/sysdeps/i386/ldsodefs.h
index d80cf01..dae2d04 100644
--- a/sysdeps/i386/ldsodefs.h
+++ b/sysdeps/i386/ldsodefs.h
@@ -20,6 +20,7 @@
 #define	_I386_LDSODEFS_H	1
 
 #include <elf.h>
+#include <cpu-features.h>
 
 struct La_i86_regs;
 struct La_i86_retval;
diff --git a/sysdeps/unix/sysv/linux/x86/Makefile b/sysdeps/unix/sysv/linux/x86/Makefile
index d6be472..9e6ec44 100644
--- a/sysdeps/unix/sysv/linux/x86/Makefile
+++ b/sysdeps/unix/sysv/linux/x86/Makefile
@@ -15,7 +15,6 @@ sysdep_headers += sys/elf.h sys/perm.h sys/reg.h sys/vm86.h sys/debugreg.h sys/i
 endif
 
 ifeq ($(subdir),nptl)
-libpthread-sysdep_routines += init-arch
 libpthread-sysdep_routines += elision-lock elision-unlock elision-timed \
 			      elision-trylock
 endif
diff --git a/sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c b/sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c
index 8ac351e..a3c0c19 100644
--- a/sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c
+++ b/sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c
@@ -1,5 +1,5 @@
 #if IS_IN (ldconfig)
 # include <sysdeps/i386/dl-procinfo.c>
 #else
-# include <sysdeps/generic/dl-procinfo.c>
+# include <sysdeps/x86_64/dl-procinfo.c>
 #endif
diff --git a/sysdeps/x86/Makefile b/sysdeps/x86/Makefile
index 19f5eca..c262fdf 100644
--- a/sysdeps/x86/Makefile
+++ b/sysdeps/x86/Makefile
@@ -8,3 +8,14 @@ $(objpfx)tst-ld-sse-use.out: ../sysdeps/x86/tst-ld-sse-use.sh $(objpfx)ld.so
 	$(BASH) $< $(objpfx) '$(NM)' '$(OBJDUMP)' '$(READELF)' > $@; \
 	$(evaluate-test)
 endif
+
+ifeq ($(subdir),csu)
+gen-as-const-headers += cpu-features-offsets.sym rtld-global-offsets.sym
+endif
+
+ifeq ($(subdir),elf)
+sysdep-dl-routines += dl-get-cpu-features
+
+tests += tst-get-cpu-features
+tests-static += tst-get-cpu-features-static
+endif
diff --git a/sysdeps/i386/i686/multiarch/Versions b/sysdeps/x86/Versions
similarity index 87%
rename from sysdeps/i386/i686/multiarch/Versions
rename to sysdeps/x86/Versions
index 59b185a..e029237 100644
--- a/sysdeps/i386/i686/multiarch/Versions
+++ b/sysdeps/x86/Versions
@@ -1,4 +1,4 @@
-libc {
+ld {
   GLIBC_PRIVATE {
     __get_cpu_features;
   }
diff --git a/sysdeps/x86/cpu-features-offsets.sym b/sysdeps/x86/cpu-features-offsets.sym
new file mode 100644
index 0000000..a9d53d1
--- /dev/null
+++ b/sysdeps/x86/cpu-features-offsets.sym
@@ -0,0 +1,7 @@
+#define SHARED 1
+
+#include <ldsodefs.h>
+
+#define rtld_global_ro_offsetof(mem) offsetof (struct rtld_global_ro, mem)
+
+RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET rtld_global_ro_offsetof (_dl_x86_cpu_features)
diff --git a/sysdeps/x86_64/multiarch/init-arch.c b/sysdeps/x86/cpu-features.c
similarity index 65%
rename from sysdeps/x86_64/multiarch/init-arch.c
rename to sysdeps/x86/cpu-features.c
index aaad5fa..587080c 100644
--- a/sysdeps/x86_64/multiarch/init-arch.c
+++ b/sysdeps/x86/cpu-features.c
@@ -1,7 +1,6 @@
 /* Initialize CPU feature data.
    This file is part of the GNU C Library.
    Copyright (C) 2008-2015 Free Software Foundation, Inc.
-   Contributed by Ulrich Drepper <drepper@redhat.com>.
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -17,48 +16,40 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include <atomic.h>
 #include <cpuid.h>
-#include "init-arch.h"
+#include <cpu-features.h>
 
-
-struct cpu_features __cpu_features attribute_hidden;
-
-
-static void
-get_common_indeces (unsigned int *family, unsigned int *model)
+static inline void
+get_common_indeces (struct cpu_features *cpu_features,
+		    unsigned int *family, unsigned int *model)
 {
-  __cpuid (1, __cpu_features.cpuid[COMMON_CPUID_INDEX_1].eax,
-	   __cpu_features.cpuid[COMMON_CPUID_INDEX_1].ebx,
-	   __cpu_features.cpuid[COMMON_CPUID_INDEX_1].ecx,
-	   __cpu_features.cpuid[COMMON_CPUID_INDEX_1].edx);
-
-  unsigned int eax = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].eax;
+  unsigned int eax;
+  __cpuid (1, eax, cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx,
+	   cpu_features->cpuid[COMMON_CPUID_INDEX_1].ecx,
+	   cpu_features->cpuid[COMMON_CPUID_INDEX_1].edx);
+  GLRO(dl_x86_cpu_features).cpuid[COMMON_CPUID_INDEX_1].eax = eax;
   *family = (eax >> 8) & 0x0f;
   *model = (eax >> 4) & 0x0f;
 }
 
-
-void
-__init_cpu_features (void)
+static inline void
+init_cpu_features (struct cpu_features *cpu_features)
 {
-  unsigned int ebx;
-  unsigned int ecx;
-  unsigned int edx;
+  unsigned int ebx, ecx, edx;
   unsigned int family = 0;
   unsigned int model = 0;
   enum cpu_features_kind kind;
 
-  __cpuid (0, __cpu_features.max_cpuid, ebx, ecx, edx);
+  __cpuid (0, cpu_features->max_cpuid, ebx, ecx, edx);
 
   /* This spells out "GenuineIntel".  */
   if (ebx == 0x756e6547 && ecx == 0x6c65746e && edx == 0x49656e69)
     {
       kind = arch_kind_intel;
 
-      get_common_indeces (&family, &model);
+      get_common_indeces (cpu_features, &family, &model);
 
-      unsigned int eax = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].eax;
+      unsigned int eax = cpu_features->cpuid[COMMON_CPUID_INDEX_1].eax;
       unsigned int extended_family = (eax >> 20) & 0xff;
       unsigned int extended_model = (eax >> 12) & 0xf0;
       if (family == 0x0f)
@@ -68,14 +59,14 @@ __init_cpu_features (void)
 	}
       else if (family == 0x06)
 	{
-	  ecx = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].ecx;
+	  ecx = cpu_features->cpuid[COMMON_CPUID_INDEX_1].ecx;
 	  model += extended_model;
 	  switch (model)
 	    {
 	    case 0x1c:
 	    case 0x26:
 	      /* BSF is slow on Atom.  */
-	      __cpu_features.feature[index_Slow_BSF] |= bit_Slow_BSF;
+	      cpu_features->feature[index_Slow_BSF] |= bit_Slow_BSF;
 	      break;
 
 	    case 0x37:
@@ -91,7 +82,7 @@ __init_cpu_features (void)
 #if index_Fast_Unaligned_Load != index_Slow_SSE4_2
 # error index_Fast_Unaligned_Load != index_Slow_SSE4_2
 #endif
-	      __cpu_features.feature[index_Fast_Unaligned_Load]
+	      cpu_features->feature[index_Fast_Unaligned_Load]
 		|= (bit_Fast_Unaligned_Load
 		    | bit_Prefer_PMINUB_for_stringop
 		    | bit_Slow_SSE4_2);
@@ -121,7 +112,7 @@ __init_cpu_features (void)
 #if index_Fast_Rep_String != index_Prefer_PMINUB_for_stringop
 # error index_Fast_Rep_String != index_Prefer_PMINUB_for_stringop
 #endif
-	      __cpu_features.feature[index_Fast_Rep_String]
+	      cpu_features->feature[index_Fast_Rep_String]
 		|= (bit_Fast_Rep_String
 		    | bit_Fast_Copy_Backward
 		    | bit_Fast_Unaligned_Load
@@ -135,31 +126,31 @@ __init_cpu_features (void)
     {
       kind = arch_kind_amd;
 
-      get_common_indeces (&family, &model);
+      get_common_indeces (cpu_features, &family, &model);
 
-      ecx = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].ecx;
+      ecx = cpu_features->cpuid[COMMON_CPUID_INDEX_1].ecx;
 
       unsigned int eax;
       __cpuid (0x80000000, eax, ebx, ecx, edx);
       if (eax >= 0x80000001)
 	__cpuid (0x80000001,
-		 __cpu_features.cpuid[COMMON_CPUID_INDEX_80000001].eax,
-		 __cpu_features.cpuid[COMMON_CPUID_INDEX_80000001].ebx,
-		 __cpu_features.cpuid[COMMON_CPUID_INDEX_80000001].ecx,
-		 __cpu_features.cpuid[COMMON_CPUID_INDEX_80000001].edx);
+		 cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].eax,
+		 cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].ebx,
+		 cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].ecx,
+		 cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].edx);
     }
   else
     kind = arch_kind_other;
 
-  if (__cpu_features.max_cpuid >= 7)
+  if (cpu_features->max_cpuid >= 7)
     __cpuid_count (7, 0,
-		   __cpu_features.cpuid[COMMON_CPUID_INDEX_7].eax,
-		   __cpu_features.cpuid[COMMON_CPUID_INDEX_7].ebx,
-		   __cpu_features.cpuid[COMMON_CPUID_INDEX_7].ecx,
-		   __cpu_features.cpuid[COMMON_CPUID_INDEX_7].edx);
+		   cpu_features->cpuid[COMMON_CPUID_INDEX_7].eax,
+		   cpu_features->cpuid[COMMON_CPUID_INDEX_7].ebx,
+		   cpu_features->cpuid[COMMON_CPUID_INDEX_7].ecx,
+		   cpu_features->cpuid[COMMON_CPUID_INDEX_7].edx);
 
   /* Can we call xgetbv?  */
-  if (CPUID_OSXSAVE)
+  if (HAS_CPU_FEATURE (OSXSAVE))
     {
       unsigned int xcrlow;
       unsigned int xcrhigh;
@@ -169,15 +160,15 @@ __init_cpu_features (void)
 	  (bit_YMM_state | bit_XMM_state))
 	{
 	  /* Determine if AVX is usable.  */
-	  if (CPUID_AVX)
-	    __cpu_features.feature[index_AVX_Usable] |= bit_AVX_Usable;
+	  if (HAS_CPU_FEATURE (AVX))
+	    cpu_features->feature[index_AVX_Usable] |= bit_AVX_Usable;
 #if index_AVX2_Usable != index_AVX_Fast_Unaligned_Load
 # error index_AVX2_Usable != index_AVX_Fast_Unaligned_Load
 #endif
 	  /* Determine if AVX2 is usable.  Unaligned load with 256-bit
 	     AVX registers are faster on processors with AVX2.  */
-	  if (CPUID_AVX2)
-	    __cpu_features.feature[index_AVX2_Usable]
+	  if (HAS_CPU_FEATURE (AVX2))
+	    cpu_features->feature[index_AVX2_Usable]
 	      |= bit_AVX2_Usable | bit_AVX_Fast_Unaligned_Load;
 	  /* Check if OPMASK state, upper 256-bit of ZMM0-ZMM15 and
 	     ZMM16-ZMM31 state are enabled.  */
@@ -186,38 +177,26 @@ __init_cpu_features (void)
 	      (bit_Opmask_state | bit_ZMM0_15_state | bit_ZMM16_31_state))
 	    {
 	      /* Determine if AVX512F is usable.  */
-	      if (CPUID_AVX512F)
+	      if (HAS_CPU_FEATURE (AVX512F))
 		{
-		  __cpu_features.feature[index_AVX512F_Usable]
+		  cpu_features->feature[index_AVX512F_Usable]
 		    |= bit_AVX512F_Usable;
 		  /* Determine if AVX512DQ is usable.  */
-		  if (CPUID_AVX512DQ)
-		    __cpu_features.feature[index_AVX512DQ_Usable]
+		  if (HAS_CPU_FEATURE (AVX512DQ))
+		    cpu_features->feature[index_AVX512DQ_Usable]
 		      |= bit_AVX512DQ_Usable;
 		}
 	    }
 	  /* Determine if FMA is usable.  */
-	  if (CPUID_FMA)
-	    __cpu_features.feature[index_FMA_Usable] |= bit_FMA_Usable;
+	  if (HAS_CPU_FEATURE (FMA))
+	    cpu_features->feature[index_FMA_Usable] |= bit_FMA_Usable;
 	  /* Determine if FMA4 is usable.  */
-	  if (CPUID_FMA4)
-	    __cpu_features.feature[index_FMA4_Usable] |= bit_FMA4_Usable;
+	  if (HAS_CPU_FEATURE (FMA4))
+	    cpu_features->feature[index_FMA4_Usable] |= bit_FMA4_Usable;
 	}
     }
 
-  __cpu_features.family = family;
-  __cpu_features.model = model;
-  atomic_write_barrier ();
-  __cpu_features.kind = kind;
-}
-
-#undef __get_cpu_features
-
-const struct cpu_features *
-__get_cpu_features (void)
-{
-  if (__cpu_features.kind == arch_kind_unknown)
-    __init_cpu_features ();
-
-  return &__cpu_features;
+  cpu_features->family = family;
+  cpu_features->model = model;
+  cpu_features->kind = kind;
 }
diff --git a/sysdeps/x86_64/multiarch/init-arch.h b/sysdeps/x86/cpu-features.h
similarity index 59%
copy from sysdeps/x86_64/multiarch/init-arch.h
copy to sysdeps/x86/cpu-features.h
index cfc6e70..22e5abb 100644
--- a/sysdeps/x86_64/multiarch/init-arch.h
+++ b/sysdeps/x86/cpu-features.h
@@ -15,6 +15,9 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#ifndef cpu_features_h
+#define cpu_features_h
+
 #define bit_Fast_Rep_String		(1 << 0)
 #define bit_Fast_Copy_Backward		(1 << 1)
 #define bit_Slow_BSF			(1 << 2)
@@ -56,14 +59,15 @@
 #define bit_ZMM16_31_state	(1 << 7)
 
 /* The integer bit array index for the first set of internal feature bits.  */
-# define FEATURE_INDEX_1 0
+#define FEATURE_INDEX_1 0
 
 /* The current maximum size of the feature integer bit array.  */
-# define FEATURE_INDEX_MAX 1
+#define FEATURE_INDEX_MAX 1
 
 #ifdef	__ASSEMBLER__
 
 # include <ifunc-defines.h>
+# include <rtld-global-offsets.h>
 
 # define index_SSE2	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_EDX_OFFSET
 # define index_SSSE3	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET
@@ -86,9 +90,59 @@
 # define index_AVX512F_Usable		FEATURE_INDEX_1*FEATURE_SIZE
 # define index_AVX512DQ_Usable		FEATURE_INDEX_1*FEATURE_SIZE
 
-#else	/* __ASSEMBLER__ */
+# if defined (_LIBC) && !IS_IN (nonlib)
+#  ifdef __x86_64__
+#   ifdef SHARED
+#    if IS_IN (rtld)
+#     define LOAD_RTLD_GLOBAL_RO_RDX
+#     define HAS_FEATURE(offset, name) \
+  testl $(bit_##name), _rtld_local_ro+offset+(index_##name)(%rip)
+#    else
+#      define LOAD_RTLD_GLOBAL_RO_RDX \
+  mov _rtld_global_ro@GOTPCREL(%rip), %RDX_LP
+#     define HAS_FEATURE(offset, name) \
+  testl $(bit_##name), \
+	RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+offset+(index_##name)(%rdx)
+#    endif
+#   else /* SHARED */
+#    define LOAD_RTLD_GLOBAL_RO_RDX
+#    define HAS_FEATURE(offset, name) \
+  testl $(bit_##name), _dl_x86_cpu_features+offset+(index_##name)(%rip)
+#   endif /* !SHARED */
+#  else  /* __x86_64__ */
+#   ifdef SHARED
+#    define LOAD_FUNC_GOT_EAX(func) \
+  leal func@GOTOFF(%edx), %eax
+#    if IS_IN (rtld)
+#    define LOAD_GOT_AND_RTLD_GLOBAL_RO \
+  LOAD_PIC_REG(dx)
+#     define HAS_FEATURE(offset, name) \
+  testl $(bit_##name), offset+(index_##name)+_rtld_local_ro@GOTOFF(%edx)
+#    else
+#     define LOAD_GOT_AND_RTLD_GLOBAL_RO \
+  LOAD_PIC_REG(dx); \
+  mov _rtld_global_ro@GOT(%edx), %ecx
+#     define HAS_FEATURE(offset, name) \
+  testl $(bit_##name), \
+	RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+offset+(index_##name)(%ecx)
+#    endif
+#   else  /* SHARED */
+#    define LOAD_FUNC_GOT_EAX(func) \
+  leal func, %eax
+#    define LOAD_GOT_AND_RTLD_GLOBAL_RO
+#    define HAS_FEATURE(offset, name) \
+  testl $(bit_##name), _dl_x86_cpu_features+offset+(index_##name)
+#   endif /* !SHARED */
+#  endif /* !__x86_64__ */
+# else /* _LIBC && !nonlib */
+#  error "Sorry, <cpu-features.h> is unimplemented for assembler"
+# endif /* !_LIBC || nonlib */
+
+/* HAS_* evaluates to true if we may use the feature at runtime.  */
+# define HAS_CPU_FEATURE(name)	HAS_FEATURE (CPUID_OFFSET, name)
+# define HAS_ARCH_FEATURE(name) HAS_FEATURE (FEATURE_OFFSET, name)
 
-# include <sys/param.h>
+#else	/* __ASSEMBLER__ */
 
 enum
   {
@@ -99,7 +153,7 @@ enum
     COMMON_CPUID_INDEX_MAX
   };
 
-extern struct cpu_features
+struct cpu_features
 {
   enum cpu_features_kind
     {
@@ -119,60 +173,53 @@ extern struct cpu_features
   unsigned int family;
   unsigned int model;
   unsigned int feature[FEATURE_INDEX_MAX];
-} __cpu_features attribute_hidden;
-
+};
 
-extern void __init_cpu_features (void) attribute_hidden;
-# define INIT_ARCH() \
-  do							\
-    if (__cpu_features.kind == arch_kind_unknown)	\
-      __init_cpu_features ();				\
-  while (0)
-
-/* Used from outside libc.so to get access to the CPU features structure.  */
+/* Used from outside of glibc to get access to the CPU features
+   structure.  */
 extern const struct cpu_features *__get_cpu_features (void)
      __attribute__ ((const));
 
-# if IS_IN (libc)
-#  define __get_cpu_features()	(&__cpu_features)
+# if defined (_LIBC) && !IS_IN (nonlib)
+/* Unused for x86.  */
+#  define INIT_ARCH()
+#  define __get_cpu_features()	(&GLRO(dl_x86_cpu_features))
 # endif
 
-# define HAS_CPU_FEATURE(idx, reg, bit) \
-  ((__get_cpu_features ()->cpuid[idx].reg & (bit)) != 0)
-
-/* Following are the feature tests used throughout libc.  */
-
-/* CPUID_* evaluates to true if the feature flag is enabled.
-   We always use &__cpu_features because the HAS_CPUID_* macros
-   are called only within __init_cpu_features, where we can't
-   call __get_cpu_features without infinite recursion.  */
-# define HAS_CPUID_FLAG(idx, reg, bit) \
-  (((&__cpu_features)->cpuid[idx].reg & (bit)) != 0)
-
-# define CPUID_OSXSAVE \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_1, ecx, bit_OSXSAVE)
-# define CPUID_AVX \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_1, ecx, bit_AVX)
-# define CPUID_FMA \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_1, ecx, bit_FMA)
-# define CPUID_FMA4 \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_80000001, ecx, bit_FMA4)
-# define CPUID_RTM \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_7, ebx, bit_RTM)
-# define CPUID_AVX2 \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_7, ebx, bit_AVX2)
-# define CPUID_AVX512F \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_7, ebx, bit_AVX512F)
-# define CPUID_AVX512DQ \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_7, ebx, bit_AVX512DQ)
 
 /* HAS_* evaluates to true if we may use the feature at runtime.  */
-# define HAS_SSE2	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, edx, bit_SSE2)
-# define HAS_POPCOUNT	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, ecx, bit_POPCOUNT)
-# define HAS_SSSE3	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, ecx, bit_SSSE3)
-# define HAS_SSE4_1	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, ecx, bit_SSE4_1)
-# define HAS_SSE4_2	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, ecx, bit_SSE4_2)
-# define HAS_RTM	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_7, ebx, bit_RTM)
+# define HAS_CPU_FEATURE(name) \
+  ((__get_cpu_features ()->cpuid[index_##name].reg_##name & (bit_##name)) != 0)
+# define HAS_ARCH_FEATURE(name) \
+  ((__get_cpu_features ()->feature[index_##name] & (bit_##name)) != 0)
+
+# define index_SSE2		COMMON_CPUID_INDEX_1
+# define index_SSSE3		COMMON_CPUID_INDEX_1
+# define index_SSE4_1		COMMON_CPUID_INDEX_1
+# define index_SSE4_2		COMMON_CPUID_INDEX_1
+# define index_AVX		COMMON_CPUID_INDEX_1
+# define index_AVX2		COMMON_CPUID_INDEX_7
+# define index_AVX512F		COMMON_CPUID_INDEX_7
+# define index_AVX512DQ		COMMON_CPUID_INDEX_7
+# define index_RTM		COMMON_CPUID_INDEX_7
+# define index_FMA		COMMON_CPUID_INDEX_1
+# define index_FMA4		COMMON_CPUID_INDEX_80000001
+# define index_POPCOUNT		COMMON_CPUID_INDEX_1
+# define index_OSXSAVE		COMMON_CPUID_INDEX_1
+
+# define reg_SSE2		edx
+# define reg_SSSE3		ecx
+# define reg_SSE4_1		ecx
+# define reg_SSE4_2		ecx
+# define reg_AVX		ecx
+# define reg_AVX2		ebx
+# define reg_AVX512F		ebx
+# define reg_AVX512DQ		ebx
+# define reg_RTM		ebx
+# define reg_FMA		ecx
+# define reg_FMA4		ecx
+# define reg_POPCOUNT		ecx
+# define reg_OSXSAVE		ecx
 
 # define index_Fast_Rep_String		FEATURE_INDEX_1
 # define index_Fast_Copy_Backward	FEATURE_INDEX_1
@@ -188,19 +235,6 @@ extern const struct cpu_features *__get_cpu_features (void)
 # define index_AVX512F_Usable		FEATURE_INDEX_1
 # define index_AVX512DQ_Usable		FEATURE_INDEX_1
 
-# define HAS_ARCH_FEATURE(name) \
-  ((__get_cpu_features ()->feature[index_##name] & (bit_##name)) != 0)
+#endif	/* !__ASSEMBLER__ */
 
-# define HAS_FAST_REP_STRING		HAS_ARCH_FEATURE (Fast_Rep_String)
-# define HAS_FAST_COPY_BACKWARD		HAS_ARCH_FEATURE (Fast_Copy_Backward)
-# define HAS_SLOW_BSF			HAS_ARCH_FEATURE (Slow_BSF)
-# define HAS_FAST_UNALIGNED_LOAD	HAS_ARCH_FEATURE (Fast_Unaligned_Load)
-# define HAS_AVX			HAS_ARCH_FEATURE (AVX_Usable)
-# define HAS_AVX2			HAS_ARCH_FEATURE (AVX2_Usable)
-# define HAS_AVX512F			HAS_ARCH_FEATURE (AVX512F_Usable)
-# define HAS_AVX512DQ			HAS_ARCH_FEATURE (AVX512DQ_Usable)
-# define HAS_FMA			HAS_ARCH_FEATURE (FMA_Usable)
-# define HAS_FMA4			HAS_ARCH_FEATURE (FMA4_Usable)
-# define HAS_AVX_FAST_UNALIGNED_LOAD	HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
-
-#endif	/* __ASSEMBLER__ */
+#endif  /* cpu_features_h */
diff --git a/sysdeps/x86/dl-get-cpu-features.c b/sysdeps/x86/dl-get-cpu-features.c
new file mode 100644
index 0000000..080e5e8
--- /dev/null
+++ b/sysdeps/x86/dl-get-cpu-features.c
@@ -0,0 +1,27 @@
+/* This file is part of the GNU C Library.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#include <ldsodefs.h>
+
+#undef __get_cpu_features
+
+const struct cpu_features *
+__get_cpu_features (void)
+{
+  return &GLRO(dl_x86_cpu_features);
+}
diff --git a/sysdeps/i386/ldsodefs.h b/sysdeps/x86/libc-start.c
similarity index 50%
copy from sysdeps/i386/ldsodefs.h
copy to sysdeps/x86/libc-start.c
index d80cf01..9f0c045 100644
--- a/sysdeps/i386/ldsodefs.h
+++ b/sysdeps/x86/libc-start.c
@@ -1,5 +1,4 @@
-/* Run-time dynamic linker data structures for loaded ELF shared objects.
-   Copyright (C) 1995-2015 Free Software Foundation, Inc.
+/* Copyright (C) 2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -16,25 +15,27 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#ifndef	_I386_LDSODEFS_H
-#define	_I386_LDSODEFS_H	1
-
-#include <elf.h>
-
-struct La_i86_regs;
-struct La_i86_retval;
-
-#define ARCH_PLTENTER_MEMBERS						\
-    Elf32_Addr (*i86_gnu_pltenter) (Elf32_Sym *, unsigned int, uintptr_t *, \
-				    uintptr_t *, struct La_i86_regs *,	\
-				    unsigned int *, const char *name,	\
-				    long int *framesizep)
-
-#define ARCH_PLTEXIT_MEMBERS						\
-    unsigned int (*i86_gnu_pltexit) (Elf32_Sym *, unsigned int, uintptr_t *, \
-				     uintptr_t *, const struct La_i86_regs *, \
-				     struct La_i86_retval *, const char *)
-
-#include_next <ldsodefs.h>
-
+#ifdef SHARED
+# include <csu/libc-start.c>
+# else
+/* The main work is done in the generic function.  */
+# define LIBC_START_DISABLE_INLINE
+# define LIBC_START_MAIN generic_start_main
+# include <csu/libc-start.c>
+# include <cpu-features.h>
+# include <cpu-features.c>
+
+extern struct cpu_features _dl_x86_cpu_features;
+
+int
+__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
+		   int argc, char **argv,
+		   __typeof (main) init,
+		   void (*fini) (void),
+		   void (*rtld_fini) (void), void *stack_end)
+{
+  init_cpu_features (&_dl_x86_cpu_features);
+  return generic_start_main (main, argc, argv, init, fini, rtld_fini,
+			     stack_end);
+}
 #endif
diff --git a/sysdeps/x86/rtld-global-offsets.sym b/sysdeps/x86/rtld-global-offsets.sym
new file mode 100644
index 0000000..a9d53d1
--- /dev/null
+++ b/sysdeps/x86/rtld-global-offsets.sym
@@ -0,0 +1,7 @@
+#define SHARED 1
+
+#include <ldsodefs.h>
+
+#define rtld_global_ro_offsetof(mem) offsetof (struct rtld_global_ro, mem)
+
+RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET rtld_global_ro_offsetof (_dl_x86_cpu_features)
diff --git a/sysdeps/x86/tst-get-cpu-features-static.c b/sysdeps/x86/tst-get-cpu-features-static.c
new file mode 100644
index 0000000..03f5906
--- /dev/null
+++ b/sysdeps/x86/tst-get-cpu-features-static.c
@@ -0,0 +1 @@
+#include "tst-get-cpu-features.c"
diff --git a/sysdeps/i386/ldsodefs.h b/sysdeps/x86/tst-get-cpu-features.c
similarity index 50%
copy from sysdeps/i386/ldsodefs.h
copy to sysdeps/x86/tst-get-cpu-features.c
index d80cf01..c17060f 100644
--- a/sysdeps/i386/ldsodefs.h
+++ b/sysdeps/x86/tst-get-cpu-features.c
@@ -1,5 +1,5 @@
-/* Run-time dynamic linker data structures for loaded ELF shared objects.
-   Copyright (C) 1995-2015 Free Software Foundation, Inc.
+/* Test case for x86 __get_cpu_features interface
+   Copyright (C) 2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -16,25 +16,16 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#ifndef	_I386_LDSODEFS_H
-#define	_I386_LDSODEFS_H	1
+#include <stdlib.h>
+#include <cpu-features.h>
 
-#include <elf.h>
+static int
+do_test (void)
+{
+  if (__get_cpu_features ()->kind == arch_kind_unknown)
+    abort ();
+  return 0;
+}
 
-struct La_i86_regs;
-struct La_i86_retval;
-
-#define ARCH_PLTENTER_MEMBERS						\
-    Elf32_Addr (*i86_gnu_pltenter) (Elf32_Sym *, unsigned int, uintptr_t *, \
-				    uintptr_t *, struct La_i86_regs *,	\
-				    unsigned int *, const char *name,	\
-				    long int *framesizep)
-
-#define ARCH_PLTEXIT_MEMBERS						\
-    unsigned int (*i86_gnu_pltexit) (Elf32_Sym *, unsigned int, uintptr_t *, \
-				     uintptr_t *, const struct La_i86_regs *, \
-				     struct La_i86_retval *, const char *)
-
-#include_next <ldsodefs.h>
-
-#endif
+#define TEST_FUNCTION do_test ()
+#include "../../test-skeleton.c"
diff --git a/sysdeps/x86_64/cacheinfo.c b/sysdeps/x86_64/cacheinfo.c
index b99fb9a..0ff5309 100644
--- a/sysdeps/x86_64/cacheinfo.c
+++ b/sysdeps/x86_64/cacheinfo.c
@@ -21,40 +21,11 @@
 #include <stdlib.h>
 #include <unistd.h>
 #include <cpuid.h>
+#include "multiarch/init-arch.h"
 
-#ifndef __cpuid_count
-/* FIXME: Provide __cpuid_count if it isn't defined.  Copied from gcc
-   4.4.0.  Remove this if gcc 4.4 is the minimum requirement.  */
-# if defined(__i386__) && defined(__PIC__)
-/* %ebx may be the PIC register.  */
-#  define __cpuid_count(level, count, a, b, c, d)		\
-  __asm__ ("xchg{l}\t{%%}ebx, %1\n\t"			\
-	   "cpuid\n\t"					\
-	   "xchg{l}\t{%%}ebx, %1\n\t"			\
-	   : "=a" (a), "=r" (b), "=c" (c), "=d" (d)	\
-	   : "0" (level), "2" (count))
-# else
-#  define __cpuid_count(level, count, a, b, c, d)		\
-  __asm__ ("cpuid\n\t"					\
-	   : "=a" (a), "=b" (b), "=c" (c), "=d" (d)	\
-	   : "0" (level), "2" (count))
-# endif
-#endif
-
-#ifdef USE_MULTIARCH
-# include "multiarch/init-arch.h"
-
-# define is_intel __cpu_features.kind == arch_kind_intel
-# define is_amd __cpu_features.kind == arch_kind_amd
-# define max_cpuid __cpu_features.max_cpuid
-#else
-  /* This spells out "GenuineIntel".  */
-# define is_intel \
-  ebx == 0x756e6547 && ecx == 0x6c65746e && edx == 0x49656e69
-  /* This spells out "AuthenticAMD".  */
-# define is_amd \
-  ebx == 0x68747541 && ecx == 0x444d4163 && edx == 0x69746e65
-#endif
+#define is_intel GLRO(dl_x86_cpu_features).kind == arch_kind_intel
+#define is_amd GLRO(dl_x86_cpu_features).kind == arch_kind_amd
+#define max_cpuid GLRO(dl_x86_cpu_features).max_cpuid
 
 static const struct intel_02_cache_info
 {
@@ -235,21 +206,8 @@ intel_check_word (int name, unsigned int value, bool *has_level_2,
 	      /* Intel reused this value.  For family 15, model 6 it
 		 specifies the 3rd level cache.  Otherwise the 2nd
 		 level cache.  */
-	      unsigned int family;
-	      unsigned int model;
-#ifdef USE_MULTIARCH
-	      family = __cpu_features.family;
-	      model = __cpu_features.model;
-#else
-	      unsigned int eax;
-	      unsigned int ebx;
-	      unsigned int ecx;
-	      unsigned int edx;
-	      __cpuid (1, eax, ebx, ecx, edx);
-
-	      family = ((eax >> 20) & 0xff) + ((eax >> 8) & 0xf);
-	      model = (((eax >>16) & 0xf) << 4) + ((eax >> 4) & 0xf);
-#endif
+	      unsigned int family = GLRO(dl_x86_cpu_features).family;
+	      unsigned int model = GLRO(dl_x86_cpu_features).model;
 
 	      if (family == 15 && model == 6)
 		{
@@ -476,18 +434,6 @@ long int
 attribute_hidden
 __cache_sysconf (int name)
 {
-#ifdef USE_MULTIARCH
-  if (__cpu_features.kind == arch_kind_unknown)
-    __init_cpu_features ();
-#else
-  /* Find out what brand of processor.  */
-  unsigned int max_cpuid;
-  unsigned int ebx;
-  unsigned int ecx;
-  unsigned int edx;
-  __cpuid (0, max_cpuid, ebx, ecx, edx);
-#endif
-
   if (is_intel)
     return handle_intel (name, max_cpuid);
 
@@ -523,18 +469,6 @@ long int __x86_raw_shared_cache_size attribute_hidden = 1024 * 1024;
 int __x86_prefetchw attribute_hidden;
 #endif
 
-#ifndef DISABLE_PREFERRED_MEMORY_INSTRUCTION
-/* Instructions preferred for memory and string routines.
-
-  0: Regular instructions
-  1: MMX instructions
-  2: SSE2 instructions
-  3: SSSE3 instructions
-
-  */
-int __x86_preferred_memory_instruction attribute_hidden;
-#endif
-
 
 static void
 __attribute__((constructor))
@@ -551,14 +485,6 @@ init_cacheinfo (void)
   unsigned int level;
   unsigned int threads = 0;
 
-#ifdef USE_MULTIARCH
-  if (__cpu_features.kind == arch_kind_unknown)
-    __init_cpu_features ();
-#else
-  int max_cpuid;
-  __cpuid (0, max_cpuid, ebx, ecx, edx);
-#endif
-
   if (is_intel)
     {
       data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, max_cpuid);
@@ -574,34 +500,13 @@ init_cacheinfo (void)
 	  shared = handle_intel (_SC_LEVEL2_CACHE_SIZE, max_cpuid);
 	}
 
-      unsigned int ebx_1;
-
-#ifdef USE_MULTIARCH
-      eax = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].eax;
-      ebx_1 = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].ebx;
-      ecx = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].ecx;
-      edx = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].edx;
-#else
-      __cpuid (1, eax, ebx_1, ecx, edx);
-#endif
-
-      unsigned int family = (eax >> 8) & 0x0f;
-      unsigned int model = (eax >> 4) & 0x0f;
-      unsigned int extended_model = (eax >> 12) & 0xf0;
-
-#ifndef DISABLE_PREFERRED_MEMORY_INSTRUCTION
-      /* Intel prefers SSSE3 instructions for memory/string routines
-	 if they are available.  */
-      if ((ecx & 0x200))
-	__x86_preferred_memory_instruction = 3;
-      else
-	__x86_preferred_memory_instruction = 2;
-#endif
-
       /* Figure out the number of logical threads that share the
 	 highest cache level.  */
       if (max_cpuid >= 4)
 	{
+	  unsigned int family = GLRO(dl_x86_cpu_features).family;
+	  unsigned int model = GLRO(dl_x86_cpu_features).model;
+
 	  int i = 0;
 
 	  /* Query until desired cache level is enumerated.  */
@@ -653,7 +558,6 @@ init_cacheinfo (void)
 	  threads += 1;
 	  if (threads > 2 && level == 2 && family == 6)
 	    {
-	      model += extended_model;
 	      switch (model)
 		{
 		case 0x57:
@@ -676,7 +580,9 @@ init_cacheinfo (void)
 	intel_bug_no_cache_info:
 	  /* Assume that all logical threads share the highest cache level.  */
 
-	  threads = (ebx_1 >> 16) & 0xff;
+	  threads
+	    = ((GLRO(dl_x86_cpu_features).cpuid[COMMON_CPUID_INDEX_1].ebx
+		>> 16) & 0xff);
 	}
 
       /* Cap usage of highest cache level to the number of supported
@@ -691,25 +597,6 @@ init_cacheinfo (void)
       long int core = handle_amd (_SC_LEVEL2_CACHE_SIZE);
       shared = handle_amd (_SC_LEVEL3_CACHE_SIZE);
 
-#ifndef DISABLE_PREFERRED_MEMORY_INSTRUCTION
-# ifdef USE_MULTIARCH
-      eax = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].eax;
-      ebx = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].ebx;
-      ecx = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].ecx;
-      edx = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].edx;
-# else
-      __cpuid (1, eax, ebx, ecx, edx);
-# endif
-
-      /* AMD prefers SSSE3 instructions for memory/string routines
-	 if they are avaiable, otherwise it prefers integer
-	 instructions.  */
-      if ((ecx & 0x200))
-	__x86_preferred_memory_instruction = 3;
-      else
-	__x86_preferred_memory_instruction = 0;
-#endif
-
       /* Get maximum extended function. */
       __cpuid (0x80000000, max_cpuid_ex, ebx, ecx, edx);
 
diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
index cae6db3..d22359d 100644
--- a/sysdeps/x86_64/dl-machine.h
+++ b/sysdeps/x86_64/dl-machine.h
@@ -26,6 +26,7 @@
 #include <sysdep.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
+#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -205,6 +206,8 @@ dl_platform_init (void)
   if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
     /* Avoid an empty string which would disturb us.  */
     GLRO(dl_platform) = NULL;
+
+  init_cpu_features (&GLRO(dl_x86_cpu_features));
 }
 
 static inline ElfW(Addr)
diff --git a/sysdeps/i386/dl-procinfo.c b/sysdeps/x86_64/dl-procinfo.c
similarity index 61%
copy from sysdeps/i386/dl-procinfo.c
copy to sysdeps/x86_64/dl-procinfo.c
index b673b3c..851681a 100644
--- a/sysdeps/i386/dl-procinfo.c
+++ b/sysdeps/x86_64/dl-procinfo.c
@@ -1,7 +1,6 @@
-/* Data for i386 version of processor capability information.
-   Copyright (C) 2001-2015 Free Software Foundation, Inc.
+/* Data for x86-64 version of processor capability information.
+   Copyright (C) 2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
-   Contributed by Ulrich Drepper <drepper@redhat.com>, 2001.
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -17,10 +16,7 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-/* This information must be kept in sync with the _DL_HWCAP_COUNT and
-   _DL_PLATFORM_COUNT definitions in procinfo.h.
-
-   If anything should be added here check whether the size of each string
+/* If anything should be added here check whether the size of each string
    is still ok with the given array size.
 
    All the #ifdefs in the definitions are quite irritating but
@@ -44,33 +40,12 @@
 #endif
 
 #if !defined PROCINFO_DECL && defined SHARED
-  ._dl_x86_cap_flags
-#else
-PROCINFO_CLASS const char _dl_x86_cap_flags[32][8]
-#endif
-#ifndef PROCINFO_DECL
-= {
-    "fpu", "vme", "de", "pse", "tsc", "msr", "pae", "mce",
-    "cx8", "apic", "10", "sep", "mtrr", "pge", "mca", "cmov",
-    "pat", "pse36", "pn", "clflush", "20", "dts", "acpi", "mmx",
-    "fxsr", "sse", "sse2", "ss", "ht", "tm", "ia64", "pbe"
-  }
-#endif
-#if !defined SHARED || defined PROCINFO_DECL
-;
-#else
-,
-#endif
-
-#if !defined PROCINFO_DECL && defined SHARED
-  ._dl_x86_platforms
+  ._dl_x86_cpu_features
 #else
-PROCINFO_CLASS const char _dl_x86_platforms[4][5]
+PROCINFO_CLASS struct cpu_features _dl_x86_cpu_features
 #endif
 #ifndef PROCINFO_DECL
-= {
-    "i386", "i486", "i586", "i686"
-  }
+= { }
 #endif
 #if !defined SHARED || defined PROCINFO_DECL
 ;
diff --git a/sysdeps/x86_64/ldsodefs.h b/sysdeps/x86_64/ldsodefs.h
index 84d36e8..e3f2da2 100644
--- a/sysdeps/x86_64/ldsodefs.h
+++ b/sysdeps/x86_64/ldsodefs.h
@@ -20,6 +20,7 @@
 #define	_X86_64_LDSODEFS_H	1
 
 #include <elf.h>
+#include <cpu-features.h>
 
 struct La_x86_64_regs;
 struct La_x86_64_retval;
diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
index d7002a9..d10b4d4 100644
--- a/sysdeps/x86_64/multiarch/Makefile
+++ b/sysdeps/x86_64/multiarch/Makefile
@@ -1,5 +1,4 @@
 ifeq ($(subdir),csu)
-aux += init-arch
 tests += test-multiarch
 gen-as-const-headers += ifunc-defines.sym
 endif
diff --git a/sysdeps/x86_64/multiarch/Versions b/sysdeps/x86_64/multiarch/Versions
deleted file mode 100644
index 59b185a..0000000
--- a/sysdeps/x86_64/multiarch/Versions
+++ /dev/null
@@ -1,5 +0,0 @@
-libc {
-  GLIBC_PRIVATE {
-    __get_cpu_features;
-  }
-}
diff --git a/sysdeps/x86_64/multiarch/cacheinfo.c b/sysdeps/x86_64/multiarch/cacheinfo.c
deleted file mode 100644
index f87b8dc..0000000
--- a/sysdeps/x86_64/multiarch/cacheinfo.c
+++ /dev/null
@@ -1,2 +0,0 @@
-#define DISABLE_PREFERRED_MEMORY_INSTRUCTION
-#include "../cacheinfo.c"
diff --git a/sysdeps/x86_64/multiarch/ifunc-defines.sym b/sysdeps/x86_64/multiarch/ifunc-defines.sym
index a410d88..3df946f 100644
--- a/sysdeps/x86_64/multiarch/ifunc-defines.sym
+++ b/sysdeps/x86_64/multiarch/ifunc-defines.sym
@@ -4,7 +4,6 @@
 --
 
 CPU_FEATURES_SIZE	sizeof (struct cpu_features)
-KIND_OFFSET		offsetof (struct cpu_features, kind)
 CPUID_OFFSET		offsetof (struct cpu_features, cpuid)
 CPUID_SIZE		sizeof (struct cpuid_registers)
 CPUID_EAX_OFFSET	offsetof (struct cpuid_registers, eax)
diff --git a/sysdeps/x86_64/multiarch/init-arch.h b/sysdeps/x86_64/multiarch/init-arch.h
index cfc6e70..2b9988e 100644
--- a/sysdeps/x86_64/multiarch/init-arch.h
+++ b/sysdeps/x86_64/multiarch/init-arch.h
@@ -15,192 +15,8 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define bit_Fast_Rep_String		(1 << 0)
-#define bit_Fast_Copy_Backward		(1 << 1)
-#define bit_Slow_BSF			(1 << 2)
-#define bit_Fast_Unaligned_Load		(1 << 4)
-#define bit_Prefer_PMINUB_for_stringop	(1 << 5)
-#define bit_AVX_Usable			(1 << 6)
-#define bit_FMA_Usable			(1 << 7)
-#define bit_FMA4_Usable			(1 << 8)
-#define bit_Slow_SSE4_2			(1 << 9)
-#define bit_AVX2_Usable			(1 << 10)
-#define bit_AVX_Fast_Unaligned_Load	(1 << 11)
-#define bit_AVX512F_Usable		(1 << 12)
-#define bit_AVX512DQ_Usable		(1 << 13)
-
-/* CPUID Feature flags.  */
-
-/* COMMON_CPUID_INDEX_1.  */
-#define bit_SSE2	(1 << 26)
-#define bit_SSSE3	(1 << 9)
-#define bit_SSE4_1	(1 << 19)
-#define bit_SSE4_2	(1 << 20)
-#define bit_OSXSAVE	(1 << 27)
-#define bit_AVX		(1 << 28)
-#define bit_POPCOUNT	(1 << 23)
-#define bit_FMA		(1 << 12)
-#define bit_FMA4	(1 << 16)
-
-/* COMMON_CPUID_INDEX_7.  */
-#define bit_RTM		(1 << 11)
-#define bit_AVX2	(1 << 5)
-#define bit_AVX512F	(1 << 16)
-#define bit_AVX512DQ	(1 << 17)
-
-/* XCR0 Feature flags.  */
-#define bit_XMM_state  (1 << 1)
-#define bit_YMM_state  (2 << 1)
-#define bit_Opmask_state	(1 << 5)
-#define bit_ZMM0_15_state	(1 << 6)
-#define bit_ZMM16_31_state	(1 << 7)
-
-/* The integer bit array index for the first set of internal feature bits.  */
-# define FEATURE_INDEX_1 0
-
-/* The current maximum size of the feature integer bit array.  */
-# define FEATURE_INDEX_MAX 1
-
-#ifdef	__ASSEMBLER__
-
-# include <ifunc-defines.h>
-
-# define index_SSE2	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_EDX_OFFSET
-# define index_SSSE3	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET
-# define index_SSE4_1	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET
-# define index_SSE4_2	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET
-# define index_AVX	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET
-# define index_AVX2	COMMON_CPUID_INDEX_7*CPUID_SIZE+CPUID_EBX_OFFSET
-
-# define index_Fast_Rep_String		FEATURE_INDEX_1*FEATURE_SIZE
-# define index_Fast_Copy_Backward	FEATURE_INDEX_1*FEATURE_SIZE
-# define index_Slow_BSF			FEATURE_INDEX_1*FEATURE_SIZE
-# define index_Fast_Unaligned_Load	FEATURE_INDEX_1*FEATURE_SIZE
-# define index_Prefer_PMINUB_for_stringop FEATURE_INDEX_1*FEATURE_SIZE
-# define index_AVX_Usable		FEATURE_INDEX_1*FEATURE_SIZE
-# define index_FMA_Usable		FEATURE_INDEX_1*FEATURE_SIZE
-# define index_FMA4_Usable		FEATURE_INDEX_1*FEATURE_SIZE
-# define index_Slow_SSE4_2		FEATURE_INDEX_1*FEATURE_SIZE
-# define index_AVX2_Usable		FEATURE_INDEX_1*FEATURE_SIZE
-# define index_AVX_Fast_Unaligned_Load	FEATURE_INDEX_1*FEATURE_SIZE
-# define index_AVX512F_Usable		FEATURE_INDEX_1*FEATURE_SIZE
-# define index_AVX512DQ_Usable		FEATURE_INDEX_1*FEATURE_SIZE
-
-#else	/* __ASSEMBLER__ */
-
-# include <sys/param.h>
-
-enum
-  {
-    COMMON_CPUID_INDEX_1 = 0,
-    COMMON_CPUID_INDEX_7,
-    COMMON_CPUID_INDEX_80000001,	/* for AMD */
-    /* Keep the following line at the end.  */
-    COMMON_CPUID_INDEX_MAX
-  };
-
-extern struct cpu_features
-{
-  enum cpu_features_kind
-    {
-      arch_kind_unknown = 0,
-      arch_kind_intel,
-      arch_kind_amd,
-      arch_kind_other
-    } kind;
-  int max_cpuid;
-  struct cpuid_registers
-  {
-    unsigned int eax;
-    unsigned int ebx;
-    unsigned int ecx;
-    unsigned int edx;
-  } cpuid[COMMON_CPUID_INDEX_MAX];
-  unsigned int family;
-  unsigned int model;
-  unsigned int feature[FEATURE_INDEX_MAX];
-} __cpu_features attribute_hidden;
-
-
-extern void __init_cpu_features (void) attribute_hidden;
-# define INIT_ARCH() \
-  do							\
-    if (__cpu_features.kind == arch_kind_unknown)	\
-      __init_cpu_features ();				\
-  while (0)
-
-/* Used from outside libc.so to get access to the CPU features structure.  */
-extern const struct cpu_features *__get_cpu_features (void)
-     __attribute__ ((const));
-
-# if IS_IN (libc)
-#  define __get_cpu_features()	(&__cpu_features)
-# endif
-
-# define HAS_CPU_FEATURE(idx, reg, bit) \
-  ((__get_cpu_features ()->cpuid[idx].reg & (bit)) != 0)
-
-/* Following are the feature tests used throughout libc.  */
-
-/* CPUID_* evaluates to true if the feature flag is enabled.
-   We always use &__cpu_features because the HAS_CPUID_* macros
-   are called only within __init_cpu_features, where we can't
-   call __get_cpu_features without infinite recursion.  */
-# define HAS_CPUID_FLAG(idx, reg, bit) \
-  (((&__cpu_features)->cpuid[idx].reg & (bit)) != 0)
-
-# define CPUID_OSXSAVE \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_1, ecx, bit_OSXSAVE)
-# define CPUID_AVX \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_1, ecx, bit_AVX)
-# define CPUID_FMA \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_1, ecx, bit_FMA)
-# define CPUID_FMA4 \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_80000001, ecx, bit_FMA4)
-# define CPUID_RTM \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_7, ebx, bit_RTM)
-# define CPUID_AVX2 \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_7, ebx, bit_AVX2)
-# define CPUID_AVX512F \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_7, ebx, bit_AVX512F)
-# define CPUID_AVX512DQ \
-  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_7, ebx, bit_AVX512DQ)
-
-/* HAS_* evaluates to true if we may use the feature at runtime.  */
-# define HAS_SSE2	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, edx, bit_SSE2)
-# define HAS_POPCOUNT	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, ecx, bit_POPCOUNT)
-# define HAS_SSSE3	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, ecx, bit_SSSE3)
-# define HAS_SSE4_1	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, ecx, bit_SSE4_1)
-# define HAS_SSE4_2	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, ecx, bit_SSE4_2)
-# define HAS_RTM	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_7, ebx, bit_RTM)
-
-# define index_Fast_Rep_String		FEATURE_INDEX_1
-# define index_Fast_Copy_Backward	FEATURE_INDEX_1
-# define index_Slow_BSF			FEATURE_INDEX_1
-# define index_Fast_Unaligned_Load	FEATURE_INDEX_1
-# define index_Prefer_PMINUB_for_stringop FEATURE_INDEX_1
-# define index_AVX_Usable		FEATURE_INDEX_1
-# define index_FMA_Usable		FEATURE_INDEX_1
-# define index_FMA4_Usable		FEATURE_INDEX_1
-# define index_Slow_SSE4_2		FEATURE_INDEX_1
-# define index_AVX2_Usable		FEATURE_INDEX_1
-# define index_AVX_Fast_Unaligned_Load	FEATURE_INDEX_1
-# define index_AVX512F_Usable		FEATURE_INDEX_1
-# define index_AVX512DQ_Usable		FEATURE_INDEX_1
-
-# define HAS_ARCH_FEATURE(name) \
-  ((__get_cpu_features ()->feature[index_##name] & (bit_##name)) != 0)
-
-# define HAS_FAST_REP_STRING		HAS_ARCH_FEATURE (Fast_Rep_String)
-# define HAS_FAST_COPY_BACKWARD		HAS_ARCH_FEATURE (Fast_Copy_Backward)
-# define HAS_SLOW_BSF			HAS_ARCH_FEATURE (Slow_BSF)
-# define HAS_FAST_UNALIGNED_LOAD	HAS_ARCH_FEATURE (Fast_Unaligned_Load)
-# define HAS_AVX			HAS_ARCH_FEATURE (AVX_Usable)
-# define HAS_AVX2			HAS_ARCH_FEATURE (AVX2_Usable)
-# define HAS_AVX512F			HAS_ARCH_FEATURE (AVX512F_Usable)
-# define HAS_AVX512DQ			HAS_ARCH_FEATURE (AVX512DQ_Usable)
-# define HAS_FMA			HAS_ARCH_FEATURE (FMA_Usable)
-# define HAS_FMA4			HAS_ARCH_FEATURE (FMA4_Usable)
-# define HAS_AVX_FAST_UNALIGNED_LOAD	HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
-
-#endif	/* __ASSEMBLER__ */
+#ifdef  __ASSEMBLER__
+# include <cpu-features.h>
+#else
+# include <ldsodefs.h>
+#endif

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                                          |  199 ++++++++++++++++
 math/Makefile                                      |    2 +-
 sysdeps/i386/dl-machine.h                          |    3 +
 sysdeps/i386/dl-procinfo.c                         |   16 ++
 sysdeps/i386/i686/cacheinfo.c                      |    1 -
 sysdeps/i386/i686/fpu/multiarch/e_expf.c           |    8 +-
 sysdeps/i386/i686/fpu/multiarch/s_cosf.c           |    2 +-
 sysdeps/i386/i686/fpu/multiarch/s_sincosf.c        |    3 +-
 sysdeps/i386/i686/fpu/multiarch/s_sinf.c           |    2 +-
 sysdeps/i386/i686/multiarch/Makefile               |    1 -
 sysdeps/i386/i686/multiarch/Versions               |    5 -
 sysdeps/i386/i686/multiarch/bcopy.S                |   45 +---
 sysdeps/i386/i686/multiarch/bzero.S                |   39 +---
 sysdeps/i386/i686/multiarch/ifunc-defines.sym      |    1 -
 sysdeps/i386/i686/multiarch/ifunc-impl-list.c      |  199 ++++++++++------
 sysdeps/i386/i686/multiarch/memchr.S               |   36 +---
 sysdeps/i386/i686/multiarch/memcmp.S               |   39 +---
 sysdeps/i386/i686/multiarch/memcpy.S               |   29 +--
 sysdeps/i386/i686/multiarch/memcpy_chk.S           |   29 +--
 sysdeps/i386/i686/multiarch/memmove.S              |   60 ++----
 sysdeps/i386/i686/multiarch/memmove_chk.S          |   50 +----
 sysdeps/i386/i686/multiarch/mempcpy.S              |   29 +--
 sysdeps/i386/i686/multiarch/mempcpy_chk.S          |   29 +--
 sysdeps/i386/i686/multiarch/memrchr.S              |   36 +---
 sysdeps/i386/i686/multiarch/memset.S               |   39 +---
 sysdeps/i386/i686/multiarch/memset_chk.S           |   40 +---
 sysdeps/i386/i686/multiarch/rawmemchr.S            |   36 +---
 sysdeps/i386/i686/multiarch/s_fma.c                |    3 +-
 sysdeps/i386/i686/multiarch/s_fmaf.c               |    3 +-
 sysdeps/i386/i686/multiarch/strcasecmp.S           |   43 +---
 sysdeps/i386/i686/multiarch/strcat.S               |   44 +---
 sysdeps/i386/i686/multiarch/strchr.S               |   23 +--
 sysdeps/i386/i686/multiarch/strcmp.S               |   43 +---
 sysdeps/i386/i686/multiarch/strcpy.S               |   44 +---
 sysdeps/i386/i686/multiarch/strcspn.S              |   32 +---
 sysdeps/i386/i686/multiarch/strlen.S               |   23 +--
 sysdeps/i386/i686/multiarch/strncase.S             |   43 +---
 sysdeps/i386/i686/multiarch/strnlen.S              |   19 +--
 sysdeps/i386/i686/multiarch/strrchr.S              |   23 +--
 sysdeps/i386/i686/multiarch/strspn.S               |   32 +---
 sysdeps/i386/i686/multiarch/wcschr.S               |   19 +--
 sysdeps/i386/i686/multiarch/wcscmp.S               |   19 +--
 sysdeps/i386/i686/multiarch/wcscpy.S               |   19 +--
 sysdeps/i386/i686/multiarch/wcslen.S               |   19 +--
 sysdeps/i386/i686/multiarch/wcsrchr.S              |   19 +--
 sysdeps/i386/i686/multiarch/wmemcmp.S              |   23 +--
 sysdeps/i386/ldsodefs.h                            |    1 +
 sysdeps/unix/sysv/linux/x86/Makefile               |    1 -
 sysdeps/unix/sysv/linux/x86/elision-conf.c         |    4 +-
 sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c       |    2 +-
 sysdeps/x86/Makefile                               |   11 +
 sysdeps/x86/Versions                               |    5 +
 sysdeps/x86/cpu-features-offsets.sym               |    7 +
 sysdeps/x86/cpu-features.c                         |  202 ++++++++++++++++
 sysdeps/x86/cpu-features.h                         |  240 ++++++++++++++++++++
 sysdeps/x86/dl-get-cpu-features.c                  |   27 +++
 sysdeps/x86/libc-start.c                           |   41 ++++
 sysdeps/x86/rtld-global-offsets.sym                |    7 +
 sysdeps/x86/tst-get-cpu-features-static.c          |    1 +
 sysdeps/x86/tst-get-cpu-features.c                 |   31 +++
 sysdeps/x86_64/cacheinfo.c                         |  137 +----------
 sysdeps/x86_64/dl-machine.h                        |    3 +
 sysdeps/x86_64/dl-procinfo.c                       |   57 +++++
 sysdeps/x86_64/fpu/Makefile                        |    2 +-
 sysdeps/x86_64/fpu/math-tests-arch.h               |   42 +---
 sysdeps/x86_64/fpu/multiarch/e_asin.c              |    8 +-
 sysdeps/x86_64/fpu/multiarch/e_atan2.c             |    9 +-
 sysdeps/x86_64/fpu/multiarch/e_exp.c               |    9 +-
 sysdeps/x86_64/fpu/multiarch/e_log.c               |    9 +-
 sysdeps/x86_64/fpu/multiarch/e_pow.c               |    5 +-
 sysdeps/x86_64/fpu/multiarch/s_atan.c              |    9 +-
 sysdeps/x86_64/fpu/multiarch/s_ceil.S              |    5 +-
 sysdeps/x86_64/fpu/multiarch/s_ceilf.S             |    5 +-
 sysdeps/x86_64/fpu/multiarch/s_floor.S             |    5 +-
 sysdeps/x86_64/fpu/multiarch/s_floorf.S            |    4 +-
 sysdeps/x86_64/fpu/multiarch/s_fma.c               |    9 +-
 sysdeps/x86_64/fpu/multiarch/s_fmaf.c              |    9 +-
 sysdeps/x86_64/fpu/multiarch/s_nearbyint.S         |    4 +-
 sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S        |    5 +-
 sysdeps/x86_64/fpu/multiarch/s_rint.S              |    5 +-
 sysdeps/x86_64/fpu/multiarch/s_rintf.S             |    5 +-
 sysdeps/x86_64/fpu/multiarch/s_sin.c               |   14 +-
 sysdeps/x86_64/fpu/multiarch/s_tan.c               |    9 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.S    |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.S    |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S    |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.S    |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.S    |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S    |   10 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.S    |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.S    |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S    |   10 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.S    |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.S    |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S    |   10 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S    |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S    |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S    |   10 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.S |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.S |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S |   10 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core.S  |   10 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core.S   |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core.S   |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S  |   10 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S   |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S   |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core.S  |   10 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_logf4_core.S   |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_logf8_core.S   |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core.S  |   10 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core.S   |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core.S   |    8 +-
 .../x86_64/fpu/multiarch/svml_s_sincosf16_core.S   |   10 +-
 .../x86_64/fpu/multiarch/svml_s_sincosf4_core.S    |    8 +-
 .../x86_64/fpu/multiarch/svml_s_sincosf8_core.S    |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S  |   10 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S   |    8 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S   |    6 +-
 sysdeps/x86_64/ldsodefs.h                          |    1 +
 sysdeps/x86_64/multiarch/Makefile                  |    1 -
 sysdeps/x86_64/multiarch/Versions                  |    5 -
 sysdeps/x86_64/multiarch/cacheinfo.c               |    2 -
 sysdeps/x86_64/multiarch/ifunc-defines.sym         |    1 -
 sysdeps/x86_64/multiarch/ifunc-impl-list.c         |  139 +++++++----
 sysdeps/x86_64/multiarch/init-arch.c               |  223 ------------------
 sysdeps/x86_64/multiarch/init-arch.h               |  194 +---------------
 sysdeps/x86_64/multiarch/memcmp.S                  |    9 +-
 sysdeps/x86_64/multiarch/memcpy.S                  |   12 +-
 sysdeps/x86_64/multiarch/memcpy_chk.S              |   12 +-
 sysdeps/x86_64/multiarch/memmove.c                 |    6 +-
 sysdeps/x86_64/multiarch/memmove_chk.c             |    6 +-
 sysdeps/x86_64/multiarch/mempcpy.S                 |   12 +-
 sysdeps/x86_64/multiarch/mempcpy_chk.S             |   12 +-
 sysdeps/x86_64/multiarch/memset.S                  |    8 +-
 sysdeps/x86_64/multiarch/memset_chk.S              |    8 +-
 sysdeps/x86_64/multiarch/sched_cpucount.c          |    2 +-
 sysdeps/x86_64/multiarch/strcat.S                  |   10 +-
 sysdeps/x86_64/multiarch/strchr.S                  |    8 +-
 sysdeps/x86_64/multiarch/strcmp.S                  |   42 ++---
 sysdeps/x86_64/multiarch/strcpy.S                  |   10 +-
 sysdeps/x86_64/multiarch/strcspn.S                 |    8 +-
 sysdeps/x86_64/multiarch/strspn.S                  |    8 +-
 sysdeps/x86_64/multiarch/strstr.c                  |    5 +-
 sysdeps/x86_64/multiarch/test-multiarch.c          |   18 +-
 sysdeps/x86_64/multiarch/wcscpy.S                  |    7 +-
 sysdeps/x86_64/multiarch/wmemcmp.S                 |    9 +-
 147 files changed, 1600 insertions(+), 1906 deletions(-)
 delete mode 100644 sysdeps/i386/i686/multiarch/Versions
 create mode 100644 sysdeps/x86/Versions
 create mode 100644 sysdeps/x86/cpu-features-offsets.sym
 create mode 100644 sysdeps/x86/cpu-features.c
 create mode 100644 sysdeps/x86/cpu-features.h
 create mode 100644 sysdeps/x86/dl-get-cpu-features.c
 create mode 100644 sysdeps/x86/libc-start.c
 create mode 100644 sysdeps/x86/rtld-global-offsets.sym
 create mode 100644 sysdeps/x86/tst-get-cpu-features-static.c
 create mode 100644 sysdeps/x86/tst-get-cpu-features.c
 create mode 100644 sysdeps/x86_64/dl-procinfo.c
 delete mode 100644 sysdeps/x86_64/multiarch/Versions
 delete mode 100644 sysdeps/x86_64/multiarch/cacheinfo.c
 delete mode 100644 sysdeps/x86_64/multiarch/init-arch.c


hooks/post-receive
-- 
GNU C Library master sources


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]