[PATCH] Support APX zero-upper

Thu May 9 07:56:39 GMT 2024

> On 28.04.2024 12:54, Cui, Lili wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -1920,7 +1920,7 @@ static INLINE bool need_evex_encoding (const
> insn_template *t)
> >    return i.encoding == encoding_evex
> >  	|| i.encoding == encoding_evex512
> >  	|| (t->opcode_modifier.vex && i.encoding == encoding_egpr)
> > @@ -4285,8 +4286,9 @@ build_apx_evex_prefix (void)
> >      i.vex.bytes[3] &= ~0x08;
> >
> >    /* Encode the NDD bit of the instruction promoted from the legacy
> > -     space.  */
> > -  if (i.vex.register_specifier && i.tm.opcode_space ==
> > SPACE_EVEXMAP4)
> > +     space. ZU shares the same bit with NDD.  */  if
> > + ((i.vex.register_specifier && i.tm.opcode_space == SPACE_EVEXMAP4)
> > +      || i.tm.opcode_modifier.zu)
> >      i.vex.bytes[3] |= 0x10;
> >
> >    /* Encode the NF bit.  */
> > @@ -9204,7 +9206,7 @@ match_template (char mnem_suffix)
> >    /* APX insns acting on byte operands are WIG, yet that can't be expressed
> >       in the templates (they're also covering word/dword/qword operands).
> */
> >    if (t->opcode_space == SPACE_EVEXMAP4 && !t->opcode_modifier.vexw
> &&
> > -      i.types[i.operands - 1].bitfield.byte)
> > +      i.types[i.operands - 1].bitfield.byte &&
> > + !t->opcode_modifier.zu)
> 
> With a change request at the bottom this won't be needed anymore either, I
> think.
> 

This's a good idea.

> > @@ -14060,3 +14077,15 @@ JMPABS_Fixup (instr_info *ins, int bytemode,
> int sizeflag)
> >      return OP_IMREG (ins, bytemode, sizeflag);
> >    return OP_OFF64 (ins, bytemode, sizeflag);  }
> > +
> > +static bool
> > +IMUL_Fixup (instr_info *ins, int bytemode, int sizeflag) {
> > +  /* Although imul do not support NDD, the EVEX.ND bit is used to control
> > +     whether its destination register has its upper bits zeroed when OSIZE
> > +     is 16b.  */
> > +  if (ins->vex.nd)
> > +    ins->mnemonicendp = stpcpy (ins->obuf, "imulzu");
> 
> Despite the comment this handling isn't restricted to 16-bit operand size.
> 
> > +  return OP_G (ins, bytemode, sizeflag); }
> 
> Further for SETZUcc I can't even spot how you check that EVEX.NDD=1. With
> EVEX.NDD=0 aiui this is ordinary SETcc, just EVEX-encoded.
> 

Good point, I also found other issues with "{nf} imulzu"( {nf} was flushed), I added a macro %ZU for them and dropped IMUL_Fixup. Also added more test cases for them.

+       case 'U':
+         if (l == 1 && (last[0] == 'Z'))
+           {
+             /* Although IMUL/SETcc does not support NDD, the EVEX.ND bit is
+                used to control whether its destination register has its upper
+                bits zeroed when OSIZE is 16b/8b.  */
+             if (ins->vex.nd)
+               {
+                 oappend (ins, "zu");
+                 /* When we print zu for the EVEX instruction, we no longer
+                    need prefix {evex}. */
+                 if (evex_printed == true && startswith (ins->obufp, "{evex}"))
+                   ins->obufp += 6;
+               }
+           }
+         else
+           abort ();
+         break;

> > @@ -528,6 +530,7 @@ loopne, 0xe0, x64,
> > JumpByte|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Disp8 }
> >
> >  // Set byte on flag instructions.
> >  set<cc>, 0xf9<cc:opc>/0, i386,
> Modrm|No_wSuf|No_lSuf|No_sSuf|No_qSuf,
> > { Reg8|Unspecified|BaseIndex }
> > +setzu<cc>, 0xf24<cc:opc>/0, APX_F,
> > +Modrm|No_wSuf|No_lSuf|No_sSuf|No_qSuf|EVexMap4|ZU, { Reg8 }
> 
> Didn't we kind of agree to also permit
> 
> set<cc>, 0xf24<cc:opc>/0, APX_F, Modrm|No_bSuf|No_sSuf|EVexMap4|ZU,
> { Reg32|Reg64 }
> 

We discussed this internally, and the spec folks thought that adding two SETZU formats to the spec was a bit redundant and might confuse users. Therefore, the spec will not be updated, it's a bit strange that binutils adds a separate format.

Thanks,
Lili.