Recently a "haswell" sub-arch was introduced to be similar to the old i686 subarch for x86. It is documented as requiring BMI1, BMI2, LZCNT, MOVBE, POPCNT, AVX2 and FMA, but undocumented also checks the CPU is an Intel CPU before using the faster paths. Considering this is very similar to the old scandal of the Intel compiler, I would suggest glibc fixes that before it becomes public knowledge.
We fix performance issues as they are identified, see bug 19467 for an example. However, changes in this area require a deep understanding of CPU architecture (and, preferably, future plans, so that deployed code does not take a performance hit when switching to newer CPUs). Just because something is implemented at the instruction set level doesn't mean the implementation is efficient. For example, the first CPU generation with wider vector registers often still uses old, narrower execution units, so not using the wider registers can be more efficient.
What does that comment have to do with the fact that you are providing the optimized path to all Intel CPUs including future ones if they implement the necessary extensions, but deliberately do not give AMD processors the same fast path? Current gen AMD cpus are more likely to benefit from Haswell optimized binaries than some future Intel chip, or mobile Intel chip.
And please note that I am talking about the picking up of 3rdparty libraries would potentially to use this new sub-arch dir 'lib64/haswell', I am not talking about glibc internal CPU scheduling which is completely separate and very much optimized for every arch. It is a specific line of code that is wrong here cpu_features.c:400.
Okay, fell free to bring this up on libc-alpha. Let's see what the Intel and AMD maintainers think.
FWIW, I verified that with glibc 2.28, the "haswell" subdirectories are not searched on an AMD EPYC 7251 CPU.
*** Bug 24979 has been marked as a duplicate of this bug. ***
Sorry to necro an old thread but I wanted to know if there was any more discussion around this? I patched cpu-features.c to allow arch_kind_amd cpus to match the haswell platform definition and have been getting good very good performance results with the glibc-bench benchmarks as well as a haswell optimized openblas which in turn improved results for R and octave (on a znver2 cpu). It seems intel as already has done most of the hard work for amd here?
(In reply to Joey Riches from comment #7) > Sorry to necro an old thread but I wanted to know if there was any more > discussion around this? I'm still blocked until I get definitive feedback from AMD.
I'm happy to report that I've been in contact with the right people at AMD for a while. I do not know yet what the exact outcome will be (if the “haswell” directory will be used), but there will be a way to automatically load AVX2-optimized libraries on AMD CPUs as well.
I surveyed the existing code and wrote a summary: hwcaps subdirectory selection in the dynamic loader <https://sourceware.org/pipermail/libc-alpha/2020-May/113757.html>
I agree that a "generational" revision scheme would be helpful in the long run. For that information we can consult compiler databases (gcc/gcc/common/config/i386/i386-common.c + gcc/config/i386/i386.h or llvm/clang/include/clang/Basic/X86Target.def) and make some sort of table to work with. The LLVM project's version seems to already imply a generational scheme, although they appear to be taking a lot of liberties by eliding a lot of the stuff they don't use. They also don't seem to care about less-used stuff like VIA. The GCC database does include the VIA CPUs, but the part about eden-x4 not having any sort of AVX is a bit dubious. Hell, their documentation mentions it having AVX2...
Regarding VIA, the GCC onlinedocs say that eden-x4 supports AVX and AVX2, but as you have pointed out, they are not enabled in GCC's i386-common.c. I have had a look at CPUID dump from the instlatx64 project for an eden-x4 (see http://users.atw.hu/instlatx64/CentaurHauls/CentaurHauls00006FE_CNR_Isaiah_CPUID.txt ) and if I decode it correctly, the GCC onlinedocs also miss some supported instruction sets: MOVBE, POPCNT, AES, PCLMUL, FSGSBASE, RDRND, BMI, BMI2 and F16C. This means that all instruction sets listed for Haswell should be supported except for FMA. As far as I can tell, if RDRND is also removed from the list, this is the least common denominator among all AVX2 cpus (technically bdver4 supports RDRND, but Linux disables it by default due to buggy BIOS support, see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c49a0a80137c7ca7d6ced4c812c9e07a949f6f24)
> Regarding VIA, the GCC onlinedocs say that eden-x4 supports AVX and AVX2, but as you have pointed out, they are not enabled in GCC's i386-common.c. I have reported the problem to gcc as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95030. Let me forward your cpuid links.
(In reply to Mingye Wang from comment #13) > > Regarding VIA, the GCC onlinedocs say that eden-x4 supports AVX and AVX2, but as you have pointed out, they are not enabled in GCC's i386-common.c. > > I have reported the problem to gcc as > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95030. Let me forward your > cpuid links. Thanks for forwarding the information. I just realized that when I had a look at the CPUID dump, I overlook something, as I only focussed on the instruction sets list for haswell, but the eden-x4 actually supports a few more that are not in that list: PREFETCHW, RDSEED, ADX, ABM, XSAVE, and maybe also XSAVEOPT and XSAVEC, but I am not sure about the latter two.
Created attachment 12811 [details] What I think the discussion is going for
Sorry, I didn't update the ticket. I think the consensus is not to make further changes to the existing hwcaps mechanism due to the issues mentioned in comment 10. Instead, we are focusing on a new approach: https://sourceware.org/pipermail/libc-alpha/2020-June/115250.html The psABI document has been updated: https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9a6b9396884b67c7c
This is effectively fixed in glibc 2.33 by: commit f267e1c9dd7fb8852cc32d6eafd96bbcfd5cbb2b Author: Florian Weimer <fweimer@redhat.com> Date: Fri Dec 4 09:13:43 2020 +0100 x86_64: Add glibc-hwcaps support The subdirectories match those in the x86-64 psABI: https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9a6b9396884b67c7c Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Libraries for Epyc and other current AMD CPUs need to installed into glibc-hwcaps/x86-64-v3 subdirectory in order to be picked up.
Will glibc still load from from `/usr/lib64/haswell` or will that functionality be stripped out? Unfortunately x86_64-v3 includes XSAVE instructions which excludes haswell so moving to x86_64-v3 will be a performance regression for our haswell users.
(In reply to Joey Riches from comment #18) > Will glibc still load from from `/usr/lib64/haswell` or will that > functionality be stripped out? No formal decision has been made about this. > Unfortunately x86_64-v3 includes XSAVE instructions which excludes haswell > so moving to x86_64-v3 will be a performance regression for our haswell > users. XSAVE is *required* for AVX register support. If a Haswell-based system does not support this feature, the kernel has been booted with the “noxsave” option, or a hypervisor has been misconfigured.
Thanks for the clarification, the confusion stemmed from gcc docs not listing xsave for haswell. https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html Thanks again for all the work on this issue.
(In reply to Joey Riches from comment #20) > Thanks for the clarification, the confusion stemmed from gcc docs not > listing xsave for haswell. > https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html Ah, right, that's one of the issues raised in the sister bug 24080.