[v2 3/3] x86: Enable non-temporal memset for Hygon processors

H.J. Lu hjl.tools@gmail.com
Sat Aug 24 20:32:59 GMT 2024


On Sun, Aug 18, 2024 at 11:59 PM Feifei Wang <wangfeifei@hygon.cn> wrote:
>
> This patch uses 'Avoid_Non_Temporal_Memset' flag to access
> the non-temporal memset implementation for hygon processors.
>
> Test Results:
>
> hygon1 arch
> x86_memset_non_temporal_threshold = 8MB
> size                          new performance time / old performance time
> 1MB                           0.994
> 4MB                           0.996
> 8MB                           0.670
> 16MB                          0.343
> 32MB                          0.355
>
> hygon2 arch
> x86_memset_non_temporal_threshold = 8MB
> size                          new performance time / old performance time
> 1MB                           1
> 4MB                           1
> 8MB                           1.312
> 16MB                          0.822
> 32MB                          0.830
>
> hygon3 arch
> x86_memset_non_temporal_threshold = 8MB
> size                          new performance time / old performance time
> 1MB                           1
> 4MB                           0.990
> 8MB                           0.737
> 16MB                          0.390
> 32MB                          0.401
>
> For hygon arch with this patch, non-temporal stores can improve
> performance by 20% - 65%.
>
> Signed-off-by: Feifei Wang <wangfeifei@hygon.cn>
> Reviewed-by: Jing Li <lijing@hygon.cn>
> ---
>  sysdeps/x86/cpu-features.c | 9 +++++++--
>  sysdeps/x86/dl-cacheinfo.h | 2 +-
>  2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index e6139e2837..1f30e237f5 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -756,9 +756,9 @@ init_cpu_features (struct cpu_features *cpu_features)
>    unsigned int stepping = 0;
>    enum cpu_features_kind kind;
>
> -  /* Default is avoid non-temporal memset for non Intel/AMD hardware. This is,
> +  /* Default is avoid non-temporal memset for non Intel/AMD/Hygon hardware. This is,
>       as of writing this, we only have benchmarks indicatings it profitability
> -     on Intel/AMD.  */
> +     on Intel/AMD/Hygon.  */
>    cpu_features->preferred[index_arch_Avoid_Non_Temporal_Memset]
>        |= bit_arch_Avoid_Non_Temporal_Memset;
>
> @@ -1116,6 +1116,11 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht
>        get_extended_indices (cpu_features);
>
>        update_active (cpu_features);
> +
> +      /* Benchmarks indicate non-temporal memset can be profitable on Hygon
> +       hardware.  */
> +      cpu_features->preferred[index_arch_Avoid_Non_Temporal_Memset]
> +           &= ~bit_arch_Avoid_Non_Temporal_Memset;
>      }
>    else
>      {
> diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
> index 8f4fe98d88..e9579505a3 100644
> --- a/sysdeps/x86/dl-cacheinfo.h
> +++ b/sysdeps/x86/dl-cacheinfo.h
> @@ -1071,7 +1071,7 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
>
>    /* Non-temporal stores are more performant on some hardware above
>       non_temporal_threshold. Currently Prefer_Non_Temporal is set for for both
> -     Intel and AMD hardware. */
> +     Intel, AMD and Hygon hardware. */
>    unsigned long int memset_non_temporal_threshold = SIZE_MAX;
>    if (!CPU_FEATURES_ARCH_P (cpu_features, Avoid_Non_Temporal_Memset))
>      memset_non_temporal_threshold = non_temporal_threshold;
> --
> 2.43.0
>

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.

-- 
H.J.


More information about the Libc-alpha mailing list