[PATCH V2] Support APX CFCMOV

Mon Jul 1 06:37:14 GMT 2024

> On 25.06.2024 11:56, Cui, Lili wrote:
> > Changes in V2
> > 1. Added EVEX_NF to operandconstraint to indicate setting EVEX.NF to 1.
> > 2. Refined test cases to cover all cc formats.
> > 3. Added invalid test case `{nf} cfcmovb %dx,%ax,%r31w` and reported error
> for it.
> > 4. Used CFCMOV_Fixup_op instead of CFCMOV_Fixup_op0 and
> CFCMOV_Fixup_op1 to handle both operands as NOP_Fixup().
> 
> Just like NOP_Fixup() (and others), the new one would better be just
> CFCMOV_Fixup() (without the _op suffix).
> 

Suggestion is better.

> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -4426,7 +4426,9 @@ build_apx_evex_prefix (void)
> >      }
> >
> >    /* Encode the NF bit.  */
> > -  if (i.has_nf)
> > +  /* For CFCMOV, when the insn template supports EVEX_NF, it means that
> it
> > +     requires EVEX.NF to be 1.  */
> > +  if (i.has_nf || i.tm.opcode_modifier.operandconstraint == EVEX_NF)
> >      i.vex.bytes[3] |= 0x04;
> >  }
> 
> Especially with it now being an operand constraint, would you mind leaving
> the original comment alone? It would only be at risk of going stale when
> another EVexNF insn is added.
>

Removed the additional comment.

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-cfcmov.s
> > @@ -0,0 +1,74 @@
> > +# Check 64bit EVEX-promoted CMOVcc instructions
> > +
> > +	.text
> > +_start:
> > +	cfcmovb	%dx,%ax
> > +	cfcmovb	%dx,%ax,%r31w
> > +	cfcmovb	%dx,291(%r8,%rax,4)
> > +	cfcmovb	%ecx,%edx
> > +	cfcmovb	%ecx,%edx,%r10d
> > +	cfcmovb	%ecx,291(%r8,%rax,4)
> > +	cfcmovb	%r31,%r15
> > +	cfcmovb	%r31,%r15,%r11
> > +	cfcmovb	%r31,291(%r8,%rax,4)
> > +	cfcmovb	291(%r8,%rax,4),%dx
> > +	cfcmovb	291(%r8,%rax,4),%dx,%ax
> > +	cfcmovb	291(%r8,%rax,4),%ecx
> > +	cfcmovb	291(%r8,%rax,4),%ecx,%edx
> > +	cfcmovb	291(%r8,%rax,4),%r31
> > +	cfcmovb	291(%r8,%rax,4),%r31,%r15
> > +	cmovb	%dx,%ax,%r31w
> > +	cmovb	%ecx,%edx,%r10d
> > +	cmovb	%r31,%r15,%r11
> > +	cmovb	291(%r8,%rax,4),%dx,%ax
> > +	cmovb	291(%r8,%rax,4),%ecx,%edx
> > +	cmovb	291(%r8,%rax,4),%r31,%r15
> > +
> > +	.irp m, cfcmovbe, cfcmovl, cfcmovle, cfcmovnb, cfcmovnbe, cfcmovnl,
> cfcmovnle, cfcmovno, cfcmovnp, cfcmovns, cfcmovnz, cfcmovo, cfcmovp,
> cfcmovs, cfcmovz, cfcmovae, cfcmove, cfcmovne, cfcmova, cfcmovge, cfcmovg
> > +	\m	%dx,%ax
> > +	.endr
> 
> To keep line length under control and to help readability:
> 
> 	.irp cc, be, l, le, nb, nbe, nl, nle, no, np, ns, nz, o, p, s, z, ae, e, ne, a, e,
> g
> 	cfcmov\cc	%dx,%ax
> 	.endr
> 
> This way it's also much easier to see that some forms are still missing:
> na, nae, c, nc, pe, and po come to mind. Not sure whether that's then an
> exhaustive list. It may also help if the list was sorted by some suitable criteria,
> such that it's easier to locate individual items.
> 
> > +	.irp m, cmovbe, cmovl, cmovle, cmovnb, cmovnbe, cmovnl, cmovnle,
> cmovno, cmovnp, cmovns, cmovnz, cmovo, cmovp, cmovs, cmovz, cmovae,
> cmove, cmovne, cmova, cmovge, cmovg
> > +	\m	%dx,%ax,%r31w
> > +	.endr
> 
> This then also makes noticeable that you don't need two .irp-s here (which
> would always need keeping in sync):
> 
> 	.irp cc, be, l, le, nb, nbe, nl, nle, no, np, ns, nz, o, p, s, z, ae, e, ne, a, e,
> g
> 	cfcmov\cc	%dx,%ax
> 	cmov\cc		%dx,%ax,%r31w
> 	.endr
> 

Your suggestion is tidier, added all the cc formats in SDM order, I had only added the jcc part in V2.

> I'm even inclined to ask that ...
> 
> > +	cfcmovb		%dx,%ax
> > +	cfcmovb.s	%dx,%ax
> > +	{load} cfcmovb	%dx,%ax
> > +	{store} cfcmovb	%dx,%ax
> > +
> > +	.intel_syntax noprefix
> > +	cfcmovb	ax,dx
> > +	cfcmovb	r31w,ax,dx
> > +	cfcmovb	WORD PTR [r8+rax*4+291],dx
> > +	cfcmovb	edx,ecx
> > +	cfcmovb	r10d,edx,ecx
> > +	cfcmovb	DWORD PTR [r8+rax*4+291],ecx
> > +	cfcmovb	r15,r31
> > +	cfcmovb	r11,r15,r31
> > +	cfcmovb	QWORD PTR [r8+rax*4+291],r31
> > +	cfcmovb	dx,WORD PTR [r8+rax*4+291]
> > +	cfcmovb	ax,dx,WORD PTR [r8+rax*4+291]
> > +	cfcmovb	ecx,DWORD PTR [r8+rax*4+291]
> > +	cfcmovb	edx,ecx,DWORD PTR [r8+rax*4+291]
> > +	cfcmovb	r31,QWORD PTR [r8+rax*4+291]
> > +	cfcmovb	r15,r31,QWORD PTR [r8+rax*4+291]
> > +	cmovb r31w,ax,dx
> > +	cmovb r10d,edx,ecx
> > +	cmovb r11,r15,r31
> > +	cmovb ax,dx,WORD PTR [r8+rax*4+291]
> > +	cmovb edx,ecx,DWORD PTR [r8+rax*4+291]
> > +	cmovb r15,r31,QWORD PTR [r8+rax*4+291]
> > +
> > +	.irp m, cfcmovbe, cfcmovl, cfcmovle, cfcmovnb, cfcmovnbe, cfcmovnl,
> cfcmovnle, cfcmovno, cfcmovnp, cfcmovns, cfcmovnz, cfcmovo, cfcmovp,
> cfcmovs, cfcmovz, cfcmovae, cfcmove, cfcmovne, cfcmova, cfcmovge, cfcmovg
> > +	\m ax,dx
> > +	.endr
> > +
> > +	.irp m, cmovbe, cmovl, cmovle, cmovnb, cmovnbe, cmovnl, cmovnle,
> cmovno, cmovnp, cmovns, cmovnz, cmovo, cmovp, cmovs, cmovz, cmovae,
> cmove, cmovne, cmova, cmovge, cmovg
> > +	\m r31w,ax,dx
> > +	.endr
> 
> ... these be folded with the earlier ones, too.
> 

I prefer to keep ".irp" under .intel_syntax noprefix part, this is more symmetrical.

> > @@ -14039,3 +14047,26 @@ JMPABS_Fixup (instr_info *ins, int bytemode,
> int sizeflag)
> >      return OP_IMREG (ins, bytemode, sizeflag);
> >    return OP_OFF64 (ins, bytemode, sizeflag);  }
> > +
> > +static bool
> > +CFCMOV_Fixup_op (instr_info *ins, int opnd, int sizeflag) {
> > +  /* EVEX.NF is used as a direction bit in the 2-operand case to reverse the
> > +     source and destination operands.  */
> > +  if (!ins->vex.nd && ins->vex.nf)
> > +    {
> > +      if (opnd == 0)
> > +	return OP_E (ins, v_swap_mode, sizeflag);
> 
> There's still no testing of this use of v_swap_mode afaics.
> 

I thought adding -Msuffix in x86-64-apx-cfcmov-intel.d and doing the following tests were the tests you wanted, now it seems that I'm missing something. Could you add more information? Thanks.

[       ]*[a-f0-9]+:[   ]*62 f4 7d 08 42 c2[    ]+cfcmovb ax,dx
[       ]*[a-f0-9]+:[   ]*62 f4 7d 0c 42 d0[    ]+cfcmovb.s ax,dx
[       ]*[a-f0-9]+:[   ]*62 f4 7d 08 42 c2[    ]+cfcmovb ax,dx
[       ]*[a-f0-9]+:[   ]*62 f4 7d 0c 42 d0[    ]+cfcmovb.s ax,dx

> > +      /* These bits have been consumed and should be cleared.  */
> > +      ins->vex.nf = false;
> > +      ins->vex.mask_register_specifier = 0;
> 
> Hmm, I thought I had asked for this and ...
> 
> > +      return OP_G (ins, v_mode, sizeflag);
> > +    }
> > +
> > +  if (opnd == 0)
> > +    return OP_G (ins, v_mode, sizeflag);
> > +  /* These bits have been consumed and should be cleared.  */
> > + ins->vex.nf = false;  ins->vex.mask_register_specifier = 0;
> 
> ... this state update to be folded; I may be misremembering though. Any future
> updating shouldn't require touching two entirely identical places in the same
> (small) function. Plus when taking the first OP_E() path, you fail to clear -
> >vex.nf right now anyway (i.e. another reason to do it once uniformly).
> 

I think you mean to clear vex.nf and vex.mask_register_specifier in only one place.
I checked the logic, I think we can't fold them, we want to clear vex.nf before returning the last operand, the last operand has two cases,

Case 1: (!ins->vex.nd && ins->vex.nf) is true, and vex.nf needs to be cleared after judgment.
Case 2: ins->vex.nf is true, and vex.nf needs to be cleared.( I added a condition to clear it exactly)

   if (opnd == 0)
     return OP_G (ins, v_mode, sizeflag);
-  /* These bits have been consumed and should be cleared.  */
-  ins->vex.nf = false;
-  ins->vex.mask_register_specifier = 0;
+  if (ins->vex.nf)
+    {
+      /* These bits have been consumed and should be cleared.  */
+      ins->vex.nf = false;
+      ins->vex.mask_register_specifier = 0;
+    }
   return OP_E (ins, v_mode, sizeflag);
 }

Thanks,
Lili.