[PATCH 1/3] x86:Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors
Florian Weimer
fweimer@redhat.com
Thu Jun 27 06:32:30 GMT 2024
> On Wen, Jun 16,2024 7:01 PM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * MayShao:
>>
>> > From: MayShao <mayshao-oc@zhaoxin.com>
>> >
>> > Fix code indentation issues under the Zhaoxin branch.
>> >
>> > Unaligned AVX load are slower on KH-40000 and KX-7000, so disable
>> > the AVX_Fast_Unaligned_Load.
>> >
>> > Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to
>> > use sse2_unaligned version of memset,strcpy and strcat.
>>
>> Somewhat related to that, do you have documentation of the behavior of
>> *aligned* 128-bit loads? Are they guaranteed to be atomic?
>> At least if MOVAPD, MOVAPS, MOVDQA are used?
>
> I can confirm is that aligned 128-bit loads (such as MOVAPD, MOVAPS,
> MOVDQA) in the WB memory region are atomic, and for unaligned
> 128-bit loads, it can also be guaranteed to be atomic if within a cacheline.
This is great news. Could you update this GCC bug with the information?
Bug 104688 - gcc and libatomic can use SSE for 128-bit atomic loads
on Intel and AMD CPUs with AVX
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688>
I think it means we can teach GCC to use 128-bit atomic loads
unconditionally for AVX targets (bypassing libatomic).
Thanks,
Florian
More information about the Libc-alpha
mailing list