[PATCH 1/3] x86:Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors

Thu Jun 27 06:32:30 GMT 2024

> On Wen, Jun 16,2024 7:01 PM  Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * MayShao:
>> 
>> > From: MayShao <mayshao-oc@zhaoxin.com>
>> >
>> > Fix code indentation issues under the Zhaoxin branch.
>> >
>> > Unaligned AVX load are slower on KH-40000 and KX-7000, so disable
>> > the AVX_Fast_Unaligned_Load.
>> >
>> > Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to
>> > use sse2_unaligned version of memset,strcpy and strcat.
>> 
>> Somewhat related to that, do you have documentation of the behavior of
>> *aligned* 128-bit loads?  Are they guaranteed to be atomic?
>> At least if MOVAPD, MOVAPS, MOVDQA are used?
>
> I can confirm is that aligned 128-bit loads (such as MOVAPD, MOVAPS, 
> MOVDQA) in the WB memory region are atomic, and for unaligned 
> 128-bit loads, it can also be guaranteed to be atomic if within a cacheline.

This is great news.  Could you update this GCC bug with the information?

  Bug 104688 - gcc and libatomic can use SSE for 128-bit atomic loads
  on Intel and AMD CPUs with AVX
  <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688>

I think it means we can teach GCC to use 128-bit atomic loads
unconditionally for AVX targets (bypassing libatomic).

Thanks,
Florian