This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Enable AVX_Fast_Unaligned_Load by default for Zen
- From: Carlos O'Donell <carlos at redhat dot com>
- To: "Pawar, Amit" <Amit dot Pawar at amd dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Cc: "H.J. Lu" <hjl dot tools at gmail dot com>
- Date: Thu, 5 Jul 2018 16:09:56 -0400
- Subject: Re: Enable AVX_Fast_Unaligned_Load by default for Zen
- References: <MWHPR12MB19495CFD46E8BF180C0C45E397430@MWHPR12MB1949.namprd12.prod.outlook.com>
On 07/02/2018 05:56 AM, Pawar, Amit wrote:
> Hi all,
>
> Attached patch enables AVX_Fast_Unaligned_Load flag by default for Zen and Zen+ except Excavator case. Identifying this flag is moved to common path now.
>
> OK to commit ? if so please do it from my side.
>
> Thanks
> Amit Pawar
>
>
> 0001-Preferring-AVX_Fast_Unaligned_Load-as-default-from-Z.patch
>
>
> From d751c3ba0b98bbdc0c7363980eacb7b29657cb59 Mon Sep 17 00:00:00 2001
> From: Amit Pawar <Amit.Pawar@amd.com>
> Date: Mon, 2 Jul 2018 14:52:36 +0530
> Subject: [PATCH] Preferring AVX_Fast_Unaligned_Load as default from Zen.
>
> From Zen onwards this will be enabled. It was disabled for Excavator
> case and same will unchanged.
>
> * sysdeps/x86/cpu-features.c (get_common_indeces):
> AVX_Fast_Unaligned_Load is enabled when AVX2 is detected.
> * sysdeps/x86/cpu-features.c (init_cpu_features):
> AVX_Fast_Unaligned_Load is disabled for Excavator core.
> ---
> sysdeps/x86/cpu-features.c | 16 +++++++++++-----
> 1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index 0fc3674..01525db 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -78,8 +78,15 @@ get_common_indeces (struct cpu_features *cpu_features,
> /* The following features depend on AVX being usable. */
> /* Determine if AVX2 is usable. */
> if (CPU_FEATURES_CPU_P (cpu_features, AVX2))
> + {
> cpu_features->feature[index_arch_AVX2_Usable]
> |= bit_arch_AVX2_Usable;
> +
> + /* Unaligned load with 256-bit AVX registers are faster on
> + * Intel/AMD processors with AVX2. */
> + cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
> + |= bit_arch_AVX_Fast_Unaligned_Load;
OK. This moves the check to the common path, so Intel and AMD checks can
use the feature.
> + }
> /* Determine if FMA is usable. */
> if (CPU_FEATURES_CPU_P (cpu_features, FMA))
> cpu_features->feature[index_arch_FMA_Usable]
> @@ -298,11 +305,6 @@ init_cpu_features (struct cpu_features *cpu_features)
> }
> }
>
> - /* Unaligned load with 256-bit AVX registers are faster on
> - Intel processors with AVX2. */
> - if (CPU_FEATURES_ARCH_P (cpu_features, AVX2_Usable))
> - cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
> - |= bit_arch_AVX_Fast_Unaligned_Load;
OK. Removes it from the Intel-only path.
>
> /* Since AVX512ER is unique to Xeon Phi, set Prefer_No_VZEROUPPER
> if AVX512ER is available. Don't use AVX512 to avoid lower CPU
> @@ -354,6 +356,10 @@ init_cpu_features (struct cpu_features *cpu_features)
> cpu_features->feature[index_arch_Fast_Unaligned_Load]
> |= (bit_arch_Fast_Unaligned_Load
> | bit_arch_Fast_Copy_Backward);
> +
> + /* Unaligned AVX loads are slower.*/
> + cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
> + &= ~bit_arch_AVX_Fast_Unaligned_Load;
This disables it for all family == 0x15.
Existing code notes Excavator is model >= 0x06 && model <= 0x7f.
If that's the case then the above code should be in a new set
of brackets.
e.g.
352 /* "Excavator" */
353 if (model >= 0x60 && model <= 0x7f)
{
354 cpu_features->feature[index_arch_Fast_Unaligned_Load]
355 |= (bit_arch_Fast_Unaligned_Load
356 | bit_arch_Fast_Copy_Backward);
/* Unaligned AVX loads are slower.*/
cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
&= ~bit_arch_AVX_Fast_Unaligned_Load;
}
Can you confirm that this is correct?
Which set of conditionals identifies Excavator?
> }
> }
> else
> -- 2.7.4
--
Cheers,
Carlos.