have_avx returns 1 if CPUID.01H:ECX.AVX is set. This is wrong: the AVX bit indicates that the CPU understands AVX instructions, but it can be set even if the kernel hasn't enabled YMM state.
The correct check is to also read CPUID.01H:ECX.OSXSAVE to check that xgetbv is enabled and then to use xgetbv to check XCR0.SSE and XCR0.AVX.
This comes from an Intel blog post here:
and is borne out by the Exceptions Type 6 section of the SDM, which indicates that VEX-coded instructions will #UD if XCR0 does not have those two bits set.
This bug presumably caused this crash:
That's the generic model but why should anyone care on Linux?
See the gmane link. There is apparently at least one person running a legacy kernel on Sandy Bridge hardware, and glibc gets SIGILL on that box.
I'm not affected myself. I suspect that only servers will ever see this, because desktop machines will probably have other problems (like no graphics).
I checked in a patch.
Which is of course wrong.
Don't reopen bugs like this.
Should it be a new bug, then? From my admittedly imperfect recollection of x86 assembly, this:
+ testl $((1 << 28) | (1 << 27)), %ecx
+ je 2f
is checking whether one of the bits is set, not whether both are set. I think you'll get SIGILL when you do xgetbv if OSXSAVE isn't set. (And you'll get a different failure if YMM is enabled but AVX isn't present, in the unlikely event that such a CPU exists.)