[PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions

Jan Beulich jbeulich@suse.com
Fri Jul 9 12:16:36 GMT 2021


On 09.07.2021 13:47, Cui, Lili wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Monday, July 5, 2021 2:30 PM
>> To: Cui, Lili <lili.cui@intel.com>; hjl.tools@gmail.com
>> Cc: binutils@sourceware.org
>> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions
>>
>> On 01.07.2021 09:47, Cui,Lili wrote:
>>>  opcodes/i386-opc.tbl           | 376 +++++++++++++++++++++
>> VCVT{,T}SH2{,U}SI should have EvexWIG for their non-64bit encodings.
>> But really it's unclear why each of them has three templates when the
>> corresponding pre-existing SD and SS insns get away with two. I would have
>> expected new templates to have been cloned from similar existing ones,
>> rather than introducing new ones (with new inconsistencies). Of course
>> there's (again) the possibility that you've spotted a bug with pre-existing
>> templates, but then - if you don't want to fix those right away - I'd expect you
>> to at least point out why you deviate from what we've got.
>>
> vcvtss2si has two templates because there is a special judgment in check_long_reg function, then the instruction can encode as EVEX.W = 1 without explicit VexW1.
> 
> if (intel_syntax
>     && i.tm.opcode_modifier.toqword
>     && i.types[0].bitfield.class != RegSIMD)
>           {
>             /* Convert to QWORD.  We want REX byte. */
>             i.suffix = QWORD_MNEM_SUFFIX;
>           }
> 
> I add a special judgment in check_word_reg function, then VCVT{,T}SH2{,U}SI can also have two templates. I think that in order to reduce the number of templates and make the code less readable, this is a trade-off. I changed it anyway. Jan, what are your thoughts here?
> 
>     else if (i.types[op].bitfield.qword
>              && (i.tm.operand_types[op].bitfield.class == Reg
>                  || i.tm.operand_types[op].bitfield.instance == Accum)
>              && i.tm.operand_types[op].bitfield.qword)
>       {
>         if (intel_syntax
>             && i.tm.opcode_modifier.toqword
>             && i.types[0].bitfield.class != RegSIMD)
>           {
>             /* Convert to QWORD.  We want REX byte. */
>             i.suffix = QWORD_MNEM_SUFFIX;
>           }
>       }

I think this is the right thing to do, to keep things as symmetric / 
consistent as possible. The one part I don't understand though is
the check against Accum. I'm also not sure you really need to check
both i.types[] and i.tm.operand_types[] for qword: Doesn't this
function run after template matching, in which case you only need
to check the actual register type, not the one(s) the template
permits? And finally - but without seeing the context I may be
wrong here - as presented I'd suggest the two nested if()-s to be
folded.

Jan



More information about the Binutils mailing list