[PATCH 1/1] x86: Tuning NT Threshold parameter for AMD machines.
Sajan Karumanchi
sajan.karumanchi@gmail.com
Wed Aug 19 10:45:39 GMT 2020
Tuning NT threshold parameter to bring in performance gains of
memcpy/memove on AMD cpu's.
Based on Large and Walk bench variant results,
setting __x86_shared_non_temporal_threshold to 2/3 of shared cache size
brings in performance gains for memcpy/memmove on AMD machines.
Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
Signed-off-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
Signed-off-by: Sajan Karumanchi <sajan.karumanchi@amd.com>
---
sysdeps/x86/cacheinfo.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index 217c21c34f..5487f382a8 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -829,7 +829,8 @@ init_cacheinfo (void)
}
if (cpu_features->data_cache_size != 0)
- data = cpu_features->data_cache_size;
+ if (data == 0 || cpu_features->basic.kind != arch_kind_amd)
+ data = cpu_features->data_cache_size;
if (data > 0)
{
@@ -842,7 +843,8 @@ init_cacheinfo (void)
}
if (cpu_features->shared_cache_size != 0)
- shared = cpu_features->shared_cache_size;
+ if (shared == 0 || cpu_features->basic.kind != arch_kind_amd)
+ shared = cpu_features->shared_cache_size;
if (shared > 0)
{
@@ -854,6 +856,17 @@ init_cacheinfo (void)
__x86_shared_cache_size = shared;
}
+ if (cpu_features->basic.kind == arch_kind_amd)
+ {
+ /* Large and Walk benchmarks in glibc shows 2/3 shared cache size is
+ the threshold value above which non-temporal store is performing better */
+ __x86_shared_non_temporal_threshold
+ = (cpu_features->non_temporal_threshold != 0
+ ? cpu_features->non_temporal_threshold
+ : __x86_shared_cache_size * 2 / 3);
+ }
+ else
+ {
/* The large memcpy micro benchmark in glibc shows that 6 times of
shared cache size is the approximate value above which non-temporal
store becomes faster on a 8-core processor. This is the 3/4 of the
@@ -862,6 +875,7 @@ init_cacheinfo (void)
= (cpu_features->non_temporal_threshold != 0
? cpu_features->non_temporal_threshold
: __x86_shared_cache_size * threads * 3 / 4);
+ }
/* NB: The REP MOVSB threshold must be greater than VEC_SIZE * 8. */
unsigned int minimum_rep_movsb_threshold;
--
2.17.1
More information about the Libc-alpha
mailing list