New x86-64 micro-architecture levels

Mon Jul 13 07:40:29 GMT 2020

* Richard Biener:

>> Looks good.  I like it.
>
> Likewise.  Btw, did you check that VIA family chips slot into Level A
> at least?

Those seem to lack SSE4.2, so they land in the baseline.

> Where do AMD bdverN slot in?

bdver1 to bdver3 (as defined by GCC) should land in Level B (so Level A
if that is dropped).  bdver4 and znver1 (and later) should land in
Level C.

>>  My only concerns are
>>
>> 1. Names like “x86-100”, “x86-101”, what features do they support?
>
> Indeed I didn't get the -100, -101 part.  On the GCC side I'd have
> suggested -march=generic-{A,B,C,D} implying the respective
> -mtune.

With literal A, B, C, D, or are they just placeholders?  If not literal
levels, then what we should use there?

I like the simplicity of numbers.  I used letters in the proposal to
avoid confusion if we alter the proposal by dropping or levels, shifting
the meaning of those that come later.  I expect to switch back to
numbers again for the final version.

> Do the patches end up annotating ELF binaries with the architecture
> level and does ld.so check that info?

This is a separate feature that H.J. has been working on.

> For example IIRC there's a penalty to switch between VEX and
> not VEX encoded instructions so even on AVX capable hardware
> it might be profitable to use non-AVX libraries if the program is
> using only architecture level A?

But this is impossible to know in general.  It may also be possible that
the library contains an inner loop that can be nicely vectorized with
AVX instructions, but not with SSE4.2 instructions and earlier.  Then
preferring the non-AVX version would be a mistake.

Regarding the transition penalty, I believe this is mostly addressed by
those VZEROUPPER instructions?  I've already explained why I think those
aren't a viable optimization target, given the current calling
convention.

My glibc patches already provide a way to mask subdirectories which
would otherwise be selected, so manual optimization is still possible.

> On that side, does architecture level B+ suggest using VEX encoding
> everywhere?  It would be indeed nice to have the architecture levels
> documented in the psABI.

I think this falls under optimization, and I really did not want to
discuss.

If there is a plan to change/amend the calling convention and some of
the levels should prefer to that, it's a different matter, of course.
(glibc can only give you four callee-saved 256-bit wide registers
easily, though, more would need close cooperation with GCC.)

The new glibc-hwcaps scheme in glibc scales a bit better than the old
one, so we do not have to settle this immediately and could add
additional subdirectories for objects that follow new calling convention
requirements.

>> 2. I have a library with AVX2 and FMA, which directory should it go?
>
> Eventually GCC/gas can annotate objects with the lowest architecture
> level that is applicable?

H.J. has patches for ELF program properties.  I think
GNU_PROPERTY_X86_ISA_1_NEEDED would convey this information.  This
proposal and the glibc patches are independent of that.

If that function ever gets deployed, I plan to add those notes to
ld.so.cache, so that ld.so can select shared objects based on them (or
any allocated ELF note, really).  Efficient LD_LIBRARY_PATH support is
not possible, I think, so those designated glibc-hwcaps subdirectories
still have a place.

Thanks,
Florian