[PATCH 1/1] x86: Tuning NT Threshold parameter for AMD machines.

Sajan Karumanchi sajan.karumanchi@gmail.com
Wed Aug 19 10:45:39 GMT 2020


Tuning NT threshold parameter to bring in performance gains of
memcpy/memove on AMD cpu's.

Based on Large and Walk bench variant results,
setting __x86_shared_non_temporal_threshold to 2/3 of shared cache size
brings in performance gains for memcpy/memmove on AMD machines.

Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
Signed-off-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
Signed-off-by: Sajan Karumanchi <sajan.karumanchi@amd.com>
---
 sysdeps/x86/cacheinfo.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index 217c21c34f..5487f382a8 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -829,7 +829,8 @@ init_cacheinfo (void)
     }
 
   if (cpu_features->data_cache_size != 0)
-    data = cpu_features->data_cache_size;
+    if (data == 0 || cpu_features->basic.kind != arch_kind_amd)
+      data = cpu_features->data_cache_size;
 
   if (data > 0)
     {
@@ -842,7 +843,8 @@ init_cacheinfo (void)
     }
 
   if (cpu_features->shared_cache_size != 0)
-    shared = cpu_features->shared_cache_size;
+    if (shared == 0 || cpu_features->basic.kind != arch_kind_amd)
+      shared = cpu_features->shared_cache_size;
 
   if (shared > 0)
     {
@@ -854,6 +856,17 @@ init_cacheinfo (void)
       __x86_shared_cache_size = shared;
     }
 
+  if (cpu_features->basic.kind == arch_kind_amd)
+  {
+  /* Large and Walk benchmarks in glibc shows 2/3 shared cache size is
+     the threshold value above which non-temporal store is performing better */
+  __x86_shared_non_temporal_threshold
+    = (cpu_features->non_temporal_threshold != 0
+       ? cpu_features->non_temporal_threshold
+       : __x86_shared_cache_size * 2 / 3);
+  }
+  else
+  {
   /* The large memcpy micro benchmark in glibc shows that 6 times of
      shared cache size is the approximate value above which non-temporal
      store becomes faster on a 8-core processor.  This is the 3/4 of the
@@ -862,6 +875,7 @@ init_cacheinfo (void)
     = (cpu_features->non_temporal_threshold != 0
        ? cpu_features->non_temporal_threshold
        : __x86_shared_cache_size * threads * 3 / 4);
+  }
 
   /* NB: The REP MOVSB threshold must be greater than VEC_SIZE * 8.  */
   unsigned int minimum_rep_movsb_threshold;
-- 
2.17.1



More information about the Libc-alpha mailing list