[PATCH 5/5] x86: support AVX10.1 vector size restrictions

Thu Aug 31 07:18:08 GMT 2023

On 31.08.2023 07:56, Jiang, Haochen wrote:
>>>>> I agree it isn't necessary, but as expressed before I view it as desirable.
>>>>> Apart from the sentence you quoted the spec later also says "There are
>>>>> currently no plans to support an Intel AVX10/128 implementation." For my
>>>>> choice of also supporting the 128-bit restriction I'd like to put emphasis
>>>>> on "currently". I think I said before that emulation environments (qemu,
>>>>> sde to name just two well-known examples) are free to implement such
>>>>> further restricted ISAs without then becoming out-of-spec.
>>>>>
>>>>> Plus supporting this mode right away has made me make certain adjustments
>>>>> in what I'd call more clean a way, which I view as desirable as well.
>>>>
>>>> Since AVX10 spec doesn't specify if mask registers should be limited to
>>>> 16 bits for AVX10/128, doing it in assembler is premature.
>>>
>>> It's hard to see why they would remain wider. The more that they were 16
>>> bits only in AVX512F.
>>>
>>> Plus of course nobody needs to use the options to enforce the 128-bit
>>> limit. The way I've coded it, it matches what the specification says.
>>>
>>
>> AVX10 spec only has
>>
>> Quadword opmask instructions will only be supported on processors
>> supporting vector lengths of 512 bits.
>>
>> It doesn't say anything about 32-bit mask.   32-bit mask can be useful
>> even with 16 byte vector.

How's that any different for 64-bit mask with 32-byte vector?

> The concern form my side is if there is an extreme case that overloads
> registers, we might need to spill 32-bit register to 32-bit mask register
> in the compiler.

How's that any different for spilling of 64-bit registers?

> Another minor concern is if there is finally a AVX10/128, although I do
> not see that could happen, if we get a wrong choice here, it will take
> some more time to correct the final assembler on the user side, which
> I mean on the real OS.
> 
> However, I suppose both ok for me whether to allow 32-bit mask since
> AVX10/128 is nowhere near in the future and it is a toy code to play with.
> We could be some kind of conservative at first by just allowing 16-bit
> mask register. Also, the code change is quite easy and no much worry on
> changing that.

Exactly: I'd rather be overly restrictive initially (which people can
easily work around by using .arch suitably around individual insns)
rather than being too permissive and then failing to flag mistakes.

Jan