x86: Add support for Intel AMX instructions

Cui, Lili lili.cui@intel.com
Wed Jul 8 08:49:29 GMT 2020


> What about the high bit of VEX.VVVV being zero?
> 
> What about the case of there not being a SIB byte?
> 
> What about the case of any two operands being the same, which I think the
> assembler also still doesn't error on, as one can see ...
>
I added them this time.

> tmm3,tmm4,tmm5
> > +[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3
> > +[ 	]*[a-f0-9]+:[ 	]*c4 e2 73 5e c1[ 	]*tdpbssd tmm0,tmm1,tmm1
> > +[ 	]*[a-f0-9]+:[ 	]*c4 e2 73 5e c8[ 	]*tdpbssd tmm1,tmm0,tmm1
> > +[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 5e c9[ 	]*tdpbssd tmm1,tmm1,tmm0
> 
> ... here (in my earlier reply I had specifically given the comment in the
> context of the "inval" test).

Sorry, I misunderstood it before.  Below is my updated patch, thanks.

Subject: [PATCH] x86: Add support for Intel AMX instructions

gas/
	* doc/c-i386.texi: Document amx_int8, amx_bf16 and amx_tile.
	* config/tc-i386.c (i386_error): Add invalid_sib_address.
	(cpu_arch): Add .amx_int8, .amx_bf16 and .amx_tile.
	(cpu_noarch): Add noamx_int8, noamx_bf16 and noamx_tile.
	(match_simd_size): Add tmmword check.
	(operand_type_match): Add tmmword.
	(type_names): Add rTMM.
	(i386_error): Add invalid_tmm_register_set.
	(check_VecOperands): Handle invalid_sib_address and
	invalid_tmm_register_set.
	(match_template): Handle invalid_sib_address.
	(build_modrm_byte): Handle non-vector SIB and zmmword.
	(i386_index_check): Disallow RegIP for non-vector SIB.
	(check_register): Handle zmmword.
	* testsuite/gas/i386/i386.exp: Add AMX new tests.
	* testsuite/gas/i386/intel-regs.d: Add tmm.
	* testsuite/gas/i386/intel-regs.s: Add tmm.
	* testsuite/gas/i386/x86-64-amx-intel.d: New.
	* testsuite/gas/i386/x86-64-amx-inval.l: New.
	* testsuite/gas/i386/x86-64-amx-inval.s: New.
	* testsuite/gas/i386/x86-64-amx.d: New.
	* testsuite/gas/i386/x86-64-amx.s: New.
	* testsuite/gas/i386/x86-64-amx-bad.d: New.
	* testsuite/gas/i386/x86-64-amx-bad.s: New.

opcodes/
	* i386-dis.c (TMM): New.
	(EXtmm): Likewise.
	(VexTmm): Likewise.
	(MVexSIBMEM): Likewise.
	(vex_sibmem_mode): Likewise.
	(tmm_mode): Likewise.
	(REG_VEX_0F3849_P_0_W_0_M_1): Likewise.
	(MOD_VEX_0F3849_P_0_W_0): Likewise.
	(MOD_VEX_0F3849_P_2_W_0): Likewise.
	(MOD_VEX_0F3849_P_3_W_0): Likewise.
	(MOD_VEX_0F384B_P_1_W_0): Likewise.
	(MOD_VEX_0F384B_P_2_W_0): Likewise.
	(MOD_VEX_0F384B_P_3_W_0): Likewise.
	(MOD_VEX_0F385C_P_1_W_0): Likewise.
	(MOD_VEX_0F385E_P_0_W_0): Likewise.
	(MOD_VEX_0F385E_P_1_W_0): Likewise.
	(MOD_VEX_0F385E_P_2_W_0): Likewise.
	(MOD_VEX_0F385E_P_3_W_0): Likewise.
	(RM_VEX_0F3849_P_0_W_0_M_1_R_0): Likewise.
	(PREFIX_VEX_0F3849): Likewise.
	(PREFIX_VEX_0F384B): Likewise.
	(PREFIX_VEX_0F385C): Likewise.
	(PREFIX_VEX_0F385E): Likewise.
	(X86_64_0F01_REG_3): Likewise.
	(X86_64_VEX_0F3849_P_0_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0): Likewise.
	(X86_64_VEX_0F3849_P_2_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F3849_P_3_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F384B_P_1_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F384B_P_2_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F384B_P_3_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385C_P_1_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_0_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_1_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_2_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_3_W_0_M_0_L_0): Likewise.
	(VEX_W_0F3849_P_0): Likewise.
	(VEX_W_0F3849_P_2): Likewise.
	(VEX_W_0F3849_P_3): Likewise.
	(VEX_W_0F384B_P_1): Likewise.
	(VEX_W_0F384B_P_2): Likewise.
	(VEX_W_0F384B_P_3): Likewise.
	(VEX_W_0F385C_P_1): Likewise.
	(VEX_W_0F385E_P_0): Likewise.
	(VEX_W_0F385E_P_1): Likewise.
	(VEX_W_0F385E_P_2): Likewise.
	(VEX_W_0F385E_P_3): Likewise.
	(VEX_LEN_0F3849_P_0_W_0_M_0): Likewise.
	(VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0): Likewise.
	(VEX_LEN_0F3849_P_2_W_0_M_0): Likewise.
	(VEX_LEN_0F3849_P_3_W_0_M_0): Likewise.
	(VEX_LEN_0F384B_P_1_W_0_M_0): Likewise.
	(VEX_LEN_0F384B_P_2_W_0_M_0): Likewise.
	(VEX_LEN_0F384B_P_3_W_0_M_0): Likewise.
	(VEX_LEN_0F385C_P_1_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_0_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_1_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_2_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_3_W_0_M_0): Likewise.
	(names_tmm): Likewise.
	(att_names_tmm): Likewise.
	(intel_operand_size): Handle void_mode.
	(OP_XMM): Handle tmm_mode.
	(OP_EX): Likewise.
	(OP_VEX): Likewise.
	* i386-gen.c (cpu_flag_init): Add entries for
	CpuAMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	(operand_type_shorthands): Add RegTMM.
	(operand_type_init): Likewise.
	(operand_types): Add Tmmword.
	(cpu_flag_init): Add CPU_AMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	(cpu_flags): Add CpuAMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	* i386-opc.h (CpuAMX_INT8): New.
	(CpuAMX_BF16): Likewise.
	(CpuAMX_TILE): Likewise.
	(SIBMEM): Likewise.
	(Tmmword): Likewise.
	(i386_cpu_flags): Add cpuamx_int8, cpuamx_bf16 and cpuamx_tile.
	(i386_opcode_modifier): Extend width of fields vexvvvv and sib.
	(i386_operand_type): Add tmmword.
	* i386-opc.tbl: Add AMX instructions.
	* i386-reg.tbl: Add AMX registers.
	* i386-init.h: Regenerated.
	* i386-tbl.h: Likewise.
---
 gas/config/tc-i386.c                      |  97 +++++-
 gas/doc/c-i386.texi                       |   7 +
 gas/testsuite/gas/i386/i386.exp           |   4 +
 gas/testsuite/gas/i386/intel-regs.d       |   4 +
 gas/testsuite/gas/i386/intel-regs.s       |   4 +
 gas/testsuite/gas/i386/x86-64-amx-bad.d   |  20 ++
 gas/testsuite/gas/i386/x86-64-amx-bad.s   |  40 +++
 gas/testsuite/gas/i386/x86-64-amx-intel.d |  70 ++++
 gas/testsuite/gas/i386/x86-64-amx-inval.l |  17 +
 gas/testsuite/gas/i386/x86-64-amx-inval.s |  22 ++
 gas/testsuite/gas/i386/x86-64-amx.d       |  70 ++++
 gas/testsuite/gas/i386/x86-64-amx.s       |  61 ++++
 opcodes/i386-dis.c                        | 390 +++++++++++++++++++++-
 opcodes/i386-gen.c                        |  18 +
 opcodes/i386-opc.h                        |  16 +-
 opcodes/i386-opc.tbl                      |  23 ++
 opcodes/i386-reg.tbl                      |   9 +
 17 files changed, 853 insertions(+), 19 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-bad.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-bad.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 2e0eb24753..96f9d2a926 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -290,8 +290,10 @@ enum i386_error
     unsupported_with_intel_mnemonic,
     unsupported_syntax,
     unsupported,
+    invalid_sib_address,
     invalid_vsib_address,
     invalid_vector_register_set,
+    invalid_tmm_register_set,
     unsupported_vector_index_register,
     unsupported_broadcast,
     broadcast_needed,
@@ -372,6 +374,9 @@ struct _i386_insn
     /* Has ZMM register operands.  */
     bfd_boolean has_regzmm;
 
+    /* Has TMM register operands.  */
+    bfd_boolean has_regtmm;
+
     /* Has GOTPC or TLS relocation.  */
     bfd_boolean has_gotpc_tls_reloc;
 
@@ -1201,6 +1206,12 @@ static const arch_entry cpu_arch[] =
     CPU_WAITPKG_FLAGS, 0 },
   { STRING_COMMA_LEN (".cldemote"), PROCESSOR_UNKNOWN,
     CPU_CLDEMOTE_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx_int8"), PROCESSOR_UNKNOWN,
+    CPU_AMX_INT8_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx_bf16"), PROCESSOR_UNKNOWN,
+    CPU_AMX_BF16_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx_tile"), PROCESSOR_UNKNOWN,
+    CPU_AMX_TILE_FLAGS, 0 },
   { STRING_COMMA_LEN (".movdiri"), PROCESSOR_UNKNOWN,
     CPU_MOVDIRI_FLAGS, 0 },
   { STRING_COMMA_LEN (".movdir64b"), PROCESSOR_UNKNOWN,
@@ -1259,6 +1270,9 @@ static const noarch_entry cpu_noarch[] =
   { STRING_COMMA_LEN ("noavx512_bitalg"), CPU_ANY_AVX512_BITALG_FLAGS },
   { STRING_COMMA_LEN ("noibt"), CPU_ANY_IBT_FLAGS },
   { STRING_COMMA_LEN ("noshstk"), CPU_ANY_SHSTK_FLAGS },
+  { STRING_COMMA_LEN ("noamx_int8"), CPU_ANY_AMX_INT8_FLAGS },
+  { STRING_COMMA_LEN ("noamx_bf16"), CPU_ANY_AMX_BF16_FLAGS },
+  { STRING_COMMA_LEN ("noamx_tile"), CPU_ANY_AMX_TILE_FLAGS },
   { STRING_COMMA_LEN ("nomovdiri"), CPU_ANY_MOVDIRI_FLAGS },
   { STRING_COMMA_LEN ("nomovdir64b"), CPU_ANY_MOVDIR64B_FLAGS },
   { STRING_COMMA_LEN ("noavx512_bf16"), CPU_ANY_AVX512_BF16_FLAGS },
@@ -2159,7 +2173,9 @@ match_simd_size (const insn_template *t, unsigned int wanted,
 	   || (i.types[given].bitfield.ymmword
 	       && !t->operand_types[wanted].bitfield.ymmword)
 	   || (i.types[given].bitfield.zmmword
-	       && !t->operand_types[wanted].bitfield.zmmword));
+	       && !t->operand_types[wanted].bitfield.zmmword)
+	   || (i.types[given].bitfield.tmmword
+	       && !t->operand_types[wanted].bitfield.tmmword));
 }
 
 /* Return 1 if there is no conflict in any size between operand GIVEN
@@ -2296,6 +2312,7 @@ operand_type_match (i386_operand_type overlap,
   temp.bitfield.xmmword = 0;
   temp.bitfield.ymmword = 0;
   temp.bitfield.zmmword = 0;
+  temp.bitfield.tmmword = 0;
   if (operand_type_all_zero (&temp))
     goto mismatch;
 
@@ -3304,6 +3321,7 @@ const type_names[] =
   { OPERAND_TYPE_REGXMM, "rXMM" },
   { OPERAND_TYPE_REGYMM, "rYMM" },
   { OPERAND_TYPE_REGZMM, "rZMM" },
+  { OPERAND_TYPE_REGTMM, "rTMM" },
   { OPERAND_TYPE_REGMASK, "Mask reg" },
 };
 
@@ -5790,7 +5808,7 @@ check_VecOperands (const insn_template *t)
 
   /* For VSIB byte, we need a vector register for index, and all vector
      registers must be distinct.  */
-  if (t->opcode_modifier.sib)
+  if (t->opcode_modifier.sib && t->opcode_modifier.sib != SIBMEM)
     {
       if (!i.index_reg
 	  || !((t->opcode_modifier.sib == VECSIB128
@@ -5849,6 +5867,23 @@ check_VecOperands (const insn_template *t)
 	}
     }
 
+  /* For AMX instructions with three tmmword operands, all tmmword operand must be
+     distinct */
+  if (t->operand_types[0].bitfield.tmmword
+      && i.reg_operands == 3)
+    {
+      if (register_number (i.op[0].regs)
+          == register_number (i.op[1].regs)
+          || register_number (i.op[0].regs)
+             == register_number (i.op[2].regs)
+          || register_number (i.op[1].regs)
+             == register_number (i.op[2].regs))
+	{
+	  i.error = invalid_tmm_register_set;
+	  return 1;
+	}
+    }
+
   /* Check if broadcast is supported by the instruction and is applied
      to the memory operand.  */
   if (i.broadcast)
@@ -6584,12 +6619,18 @@ match_template (char mnem_suffix)
 	  as_bad (_("unsupported instruction `%s'"),
 		  current_templates->start->name);
 	  return NULL;
+	case invalid_sib_address:
+	  err_msg = _("invalid SIB address");
+	  break;
 	case invalid_vsib_address:
 	  err_msg = _("invalid VSIB address");
 	  break;
 	case invalid_vector_register_set:
 	  err_msg = _("mask, index, and destination registers must be distinct");
 	  break;
+	case invalid_tmm_register_set:
+	  err_msg = _("tmm register must be distinct");
+	  break;
 	case unsupported_vector_index_register:
 	  err_msg = _("unsupported vector index register");
 	  break;
@@ -7923,8 +7964,11 @@ build_modrm_byte (void)
 	  else if (i.op[dest].regs->reg_type.bitfield.class == RegSIMD
 		   || i.op[source].regs->reg_type.bitfield.class == RegSIMD)
 	    {
-	      if (i.types[dest].bitfield.zmmword
-		  || i.types[source].bitfield.zmmword)
+	      if (i.types[dest].bitfield.tmmword
+		  || i.types[source].bitfield.tmmword)
+		i.has_regtmm = TRUE;
+	      else if (i.types[dest].bitfield.zmmword
+		       || i.types[source].bitfield.zmmword)
 		i.has_regzmm = TRUE;
 	      else if (i.types[dest].bitfield.ymmword
 		       || i.types[source].bitfield.ymmword)
@@ -7966,7 +8010,9 @@ build_modrm_byte (void)
 
 	  if (i.tm.opcode_modifier.sib)
 	    {
-	      if (i.index_reg->reg_num == RegIZ)
+	      /* The index register of VSIB shouldn't be RegIZ.  */
+	      if (i.tm.opcode_modifier.sib != SIBMEM
+		  && i.index_reg->reg_num == RegIZ)
 		abort ();
 
 	      i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
@@ -7989,8 +8035,19 @@ build_modrm_byte (void)
 		      i.types[op].bitfield.disp32s = 1;
 		    }
 		}
-	      i.sib.index = i.index_reg->reg_num;
-	      set_rex_vrex (i.index_reg, REX_X, FALSE);
+
+	      /* Since the mandatory SIB always has index register, so
+		 the code logic remains unchanged. The non-mandatory SIB
+		 without index register is allowed and will be handled
+		 later.  */
+	      if (i.index_reg)
+		{
+		  if (i.index_reg->reg_num == RegIZ)
+		    i.sib.index = NO_INDEX_REGISTER;
+		  else
+		    i.sib.index = i.index_reg->reg_num;
+		  set_rex_vrex (i.index_reg, REX_X, FALSE);
+		}
 	    }
 
 	  default_seg = &ds;
@@ -8004,7 +8061,9 @@ build_modrm_byte (void)
 		{
 		  i386_operand_type newdisp;
 
-		  gas_assert (!i.tm.opcode_modifier.sib);
+		  /* Both check for VSIB and mandatory non-vector SIB. */
+		  gas_assert (!i.tm.opcode_modifier.sib
+			      || i.tm.opcode_modifier.sib == SIBMEM);
 		  /* Operand is just <disp>  */
 		  if (flag_code == CODE_64BIT)
 		    {
@@ -8142,7 +8201,11 @@ build_modrm_byte (void)
 	      i.sib.scale = i.log2_scale_factor;
 	      if (i.index_reg == 0)
 		{
-		  gas_assert (!i.tm.opcode_modifier.sib);
+		  /* Only check for VSIB. */
+		  gas_assert (i.tm.opcode_modifier.sib != VECSIB128
+			      && i.tm.opcode_modifier.sib != VECSIB256
+			      && i.tm.opcode_modifier.sib != VECSIB512);
+
 		  /* <disp>(%esp) becomes two byte modrm with no index
 		     register.  We've already stored the code for esp
 		     in i.rm.regmem ie. ESCAPE_TO_TWO_BYTE_ADDRESSING.
@@ -8267,7 +8330,9 @@ build_modrm_byte (void)
 		break;
 	      if (i.types[op].bitfield.class == RegSIMD)
 		{
-		  if (i.types[op].bitfield.zmmword)
+		  if (i.types[op].bitfield.tmmword)
+		    i.has_regtmm = TRUE;
+		  else if (i.types[op].bitfield.zmmword)
 		    i.has_regzmm = TRUE;
 		  else if (i.types[op].bitfield.ymmword)
 		    i.has_regymm = TRUE;
@@ -10926,9 +10991,10 @@ i386_index_check (const char *operand_string)
 		      || !i.index_reg->reg_type.bitfield.baseindex)))
 	    goto bad_address;
 
-	  /* bndmk, bndldx, and bndstx have special restrictions. */
+	  /* bndmk, bndldx, bndstx and mandatory non-vector SIB have special restrictions. */
 	  if (current_templates->start->base_opcode == 0xf30f1b
-	      || (current_templates->start->base_opcode & ~1) == 0x0f1a)
+	      || (current_templates->start->base_opcode & ~1) == 0x0f1a
+	      || current_templates->start->opcode_modifier.sib == SIBMEM)
 	    {
 	      /* They cannot use RIP-relative addressing. */
 	      if (i.base_reg && i.base_reg->reg_num == RegIP)
@@ -10938,7 +11004,7 @@ i386_index_check (const char *operand_string)
 		}
 
 	      /* bndldx and bndstx ignore their scale factor. */
-	      if (current_templates->start->base_opcode != 0xf30f1b
+	      if ((current_templates->start->base_opcode & ~1) == 0x0f1a
 		  && i.log2_scale_factor)
 		as_warn (_("register scaling is being ignored here"));
 	    }
@@ -12440,6 +12506,11 @@ static bfd_boolean check_register (const reg_entry *r)
 	}
     }
 
+  if (r->reg_type.bitfield.tmmword
+      && (!cpu_arch_flags.bitfield.cpuamx_tile
+          || flag_code != CODE_64BIT))
+    return FALSE;
+
   if (r->reg_type.bitfield.class == RegBND && !cpu_arch_flags.bitfield.cpumpx)
     return FALSE;
 
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index d4e6fcb698..cb86cc7968 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -226,6 +226,12 @@ accept various extension mnemonics.  For example,
 @code{noenqcmd},
 @code{noserialize},
 @code{notsxldtrk},
+@code{amx_int8},
+@code{noamx_int8},
+@code{amx_bf16},
+@code{noamx_bf16},
+@code{amx_tile},
+@code{noamx_tile},
 @code{vmx},
 @code{vmfunc},
 @code{smx},
@@ -1504,6 +1510,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
+@item @samp{.amx_int8} @tab @samp{.amx_bf16} @tab @samp{.amx_tile}
 @item @samp{.3dnow} @tab @samp{.3dnowa} @tab @samp{.sse4a} @tab @samp{.sse5}
 @item @samp{.syscall} @tab @samp{.rdtscp} @tab @samp{.svme}
 @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 55929d3acb..bd4adb07ef 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -1137,6 +1137,10 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-lfence-ret-d"
     run_dump_test "x86-64-lfence-ret-e"
     run_dump_test "x86-64-lfence-byte"
+    run_list_test "x86-64-amx-inval"
+    run_dump_test "x86-64-amx"
+    run_dump_test "x86-64-amx-intel"
+    run_dump_test "x86-64-amx-bad"
 
     if { ![istarget "*-*-aix*"]
       && ![istarget "*-*-beos*"]
diff --git a/gas/testsuite/gas/i386/intel-regs.d b/gas/testsuite/gas/i386/intel-regs.d
index 65bcb6ca7d..480b291c91 100644
--- a/gas/testsuite/gas/i386/intel-regs.d
+++ b/gas/testsuite/gas/i386/intel-regs.d
@@ -6,6 +6,7 @@
 
 Disassembly of section \.text:
 0+0 <.*>:
+.*[ 	]+R_386_32[ 	]+tmm1
 .*[ 	]+R_386_16[ 	]+eax
 .*[ 	]+R_386_16[ 	]+rax
 .*[ 	]+R_386_16[ 	]+axl
@@ -53,4 +54,7 @@ Disassembly of section \.text:
 
 .* <ymm8>:
 .*[ 	]+<ymm8>
+
+.* <tmm0>:
+.*[ 	]+<tmm0>
 #pass
diff --git a/gas/testsuite/gas/i386/intel-regs.s b/gas/testsuite/gas/i386/intel-regs.s
index 66ab16dfc5..44e369bb0f 100644
--- a/gas/testsuite/gas/i386/intel-regs.s
+++ b/gas/testsuite/gas/i386/intel-regs.s
@@ -1,6 +1,8 @@
 	.text
 	.intel_syntax noprefix
 
+	mov	eax, tmm1
+
 	.arch i286
 	.code16
 	mov	ax, eax			; add	[bx+si], al
@@ -59,3 +61,5 @@
 	mov	rax, r8
 ymm8:
 	jmp	ymm8
+tmm0:
+	jmp	tmm0
diff --git a/gas/testsuite/gas/i386/x86-64-amx-bad.d b/gas/testsuite/gas/i386/x86-64-amx-bad.d
new file mode 100644
index 0000000000..2957b6a15b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-bad.d
@@ -0,0 +1,20 @@
+#as:
+#objdump: -drw
+#name: x86_64 AMX insns
+#source: x86-64-amx-bad.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <\.text>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 d2 5c[ 	]*\(bad\)[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*dc 90 90 90 90 90[ 	]*fcoml.*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 56 5c[ 	]*\(bad\)[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*dc 90 90 90 90 90[ 	]*fcoml.*
+[ 	]*[a-f0-9]+:[ 	]*c4 62 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,\(bad\)
+[ 	]*[a-f0-9]+:[ 	]*c4 c2 52 5c dc[ 	]*tdpbf16ps %tmm5,\(bad\),%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 32 5c dc[ 	]*tdpbf16ps \(bad\),%tmm4,%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 09[ 	]*tileloadd \(bad\),%tmm1
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx-bad.s b/gas/testsuite/gas/i386/x86-64-amx-bad.s
new file mode 100644
index 0000000000..f0db1a9493
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-bad.s
@@ -0,0 +1,40 @@
+.text
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.W = 1 (illegal value).
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0xd2
+	.byte 0x5c
+	.byte 0xdc
+	.fill 0x05, 0x01, 0x90
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.L = 1 (illegal value).
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0x56
+	.byte 0x5c
+	.byte 0xdc
+	.fill 0x05, 0x01, 0x90
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.R = 0 (illegal value).
+	.byte 0xc4
+	.byte 0x62
+	.byte 0x52
+	.byte 0x5c
+	.byte 0xdc
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.B = 0 (illegal value).
+	.byte 0xc4
+	.byte 0xc2
+	.byte 0x52
+	.byte 0x5c
+	.byte 0xdc
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.VVVV = 0110 (illegal value).
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0x32
+	.byte 0x5c
+	.byte 0xdc
+	#tileloadd (%rax),%tmm1 set R/M= 001 (illegal value) without SIB.
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0x7b
+	.byte 0x4b
+	.byte 0x09
+
diff --git a/gas/testsuite/gas/i386/x86-64-amx-intel.d b/gas/testsuite/gas/i386/x86-64-amx-intel.d
new file mode 100644
index 0000000000..fc5e0745ea
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d
@@ -0,0 +1,70 @@
+#as:
+#objdump: -d -Mintel
+#name: x86_64 AMX insns in Intel syntax
+#source: x86-64-amx.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 04 51[ 	]*ldtilecfg \[rcx\+rdx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 04 51[ 	]*sttilecfg \[rcx\+rdx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps tmm3,tmm4,tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 tmm1,\[rcx\+riz\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored \[rcx\+riz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored \[ecx\+eiz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored \[rcx\+rdx\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored \[ecx\+edx\*2\],tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero tmm7
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 01[ 	]*ldtilecfg \[rcx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 03[ 	]*ldtilecfg \[rbx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 01[ 	]*sttilecfg \[rcx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 03[ 	]*sttilecfg \[rbx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps tmm3,tmm4,tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 tmm1,\[rcx\+riz\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored \[rcx\+riz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored \[ecx\+eiz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored \[rcx\+rdx\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored \[ecx\+edx\*2\],tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero tmm7
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx-inval.l b/gas/testsuite/gas/i386/x86-64-amx-inval.l
new file mode 100644
index 0000000000..e7c284fd71
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-inval.l
@@ -0,0 +1,17 @@
+.* Assembler messages:
+.*:5: Error: `\(%rip\)' cannot be used here
+.*:6: Error: `\(%rip\)' cannot be used here
+.*:7: Error: `\(%rip\)' cannot be used here
+.*:8: Error: operand size mismatch for `tdpbssd'
+.*:9: Error: operand size mismatch for `vaddps'
+.*:10: Error: tmm register must be distinct for `tdpbssd'
+.*:11: Error: tmm register must be distinct for `tdpbssd'
+.*:12: Error: tmm register must be distinct for `tdpbssd'
+.*:15: Error: `\[rip\]' cannot be used here
+.*:16: Error: `\[rip\]' cannot be used here
+.*:17: Error: `\[rip\]' cannot be used here
+.*:18: Error: operand size mismatch for `tdpbssd'
+.*:19: Error: operand size mismatch for `vaddps'
+.*:20: Error: tmm register must be distinct for `tdpbssd'
+.*:21: Error: tmm register must be distinct for `tdpbssd'
+.*:22: Error: tmm register must be distinct for `tdpbssd'
diff --git a/gas/testsuite/gas/i386/x86-64-amx-inval.s b/gas/testsuite/gas/i386/x86-64-amx-inval.s
new file mode 100644
index 0000000000..6e29453669
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-inval.s
@@ -0,0 +1,22 @@
+# Check illegal SIBMEM and register size used in AMX instructions
+
+    .text
+_start:
+    tileloadd (%rip), %tmm1
+    tileloaddt1 (%rip), %tmm1
+    tilestored %tmm1, (%rip)
+    tdpbssd %xmm1, %xmm2, %xmm3
+    vaddps %tmm1, %tmm2, %tmm3
+    tdpbssd %tmm1, %tmm1, %tmm0
+    tdpbssd %tmm1, %tmm0, %tmm1
+    tdpbssd %tmm0, %tmm1, %tmm1
+
+    .intel_syntax noprefix
+    tileloadd tmm1, [rip]
+    tileloaddt1 tmm1, [rip]
+    tilestored [rip], tmm1
+    tdpbssd xmm3, xmm2, xmm1
+    vaddps %tmm1, %tmm2, %tmm3
+    tdpbssd tmm0, tmm1, tmm1
+    tdpbssd tmm1, tmm0, tmm1
+    tdpbssd tmm1, tmm1, tmm0
diff --git a/gas/testsuite/gas/i386/x86-64-amx.d b/gas/testsuite/gas/i386/x86-64-amx.d
new file mode 100644
index 0000000000..ad6f42240b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx.d
@@ -0,0 +1,70 @@
+#as:
+#objdump: -d
+#name: x86_64 AMX insns
+#source: x86-64-amx.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 04 51[ 	]*ldtilecfg \(%rcx,%rdx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 04 51[ 	]*sttilecfg \(%rcx,%rdx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 \(%rcx,%riz,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%rcx,%riz,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%ecx,%eiz,1\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored %tmm5,\(%rcx,%rdx,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored %tmm1,\(%ecx,%edx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero %tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero %tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero %tmm7
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 01[ 	]*ldtilecfg \(%rcx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 03[ 	]*ldtilecfg \(%rbx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 01[ 	]*sttilecfg \(%rcx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 03[ 	]*sttilecfg \(%rbx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 \(%rcx,%riz,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%rcx,%riz,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%ecx,%eiz,1\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored %tmm5,\(%rcx,%rdx,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored %tmm1,\(%ecx,%edx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero %tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero %tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero %tmm7
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx.s b/gas/testsuite/gas/i386/x86-64-amx.s
new file mode 100644
index 0000000000..c70543152b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx.s
@@ -0,0 +1,61 @@
+
+  .allow_index_reg
+  .text
+_start:
+  ldtilecfg  (%rcx,%rdx,2)
+  sttilecfg  (%rcx,%rdx,2)
+  tdpbf16ps %tmm5, %tmm4, %tmm3
+  tdpbssd %tmm3, %tmm2, %tmm1
+  tdpbsud %tmm3, %tmm2, %tmm1
+  tdpbusd %tmm3, %tmm2, %tmm1
+  tdpbuud %tmm3, %tmm2, %tmm1
+  tileloadd foo, %tmm5
+  tileloadd (%rcx), %tmm5
+  tileloadd (%ecx), %tmm5
+  tileloadd (%rcx,%rdx,1), %tmm5
+  tileloadd (%ecx,%edx,2), %tmm1
+  tileloaddt1 foo, %tmm5
+  tileloaddt1 (%rcx), %tmm5
+  tileloaddt1 (%ecx), %tmm5
+  tileloaddt1 (%rcx,%rdx,1), %tmm5
+  tileloaddt1 (%ecx,%edx,2), %tmm1
+  tileloaddt1 (%rcx,%riz,2), %tmm1
+  tilerelease
+  tilestored %tmm5, (%rcx)
+  tilestored %tmm5, (%ecx)
+  tilestored %tmm5, (%rcx,%rdx,1)
+  tilestored %tmm1, (%ecx,%edx,2)
+  tilezero %tmm0
+  tilezero %tmm5
+  tilezero %tmm7
+
+
+  .intel_syntax noprefix
+  ldtilecfg  [rcx]
+  ldtilecfg  [rbx]
+  sttilecfg  [rcx]
+  sttilecfg  [rbx]
+  tdpbf16ps tmm3, tmm4, tmm5
+  tdpbssd tmm1, tmm2, tmm3
+  tdpbsud tmm1, tmm2, tmm3
+  tdpbusd tmm1, tmm2, tmm3
+  tdpbuud tmm1, tmm2, tmm3
+  tileloadd tmm5, foo
+  tileloadd tmm5, [rcx]
+  tileloadd tmm5, [ecx]
+  tileloadd tmm5, [rcx+rdx]
+  tileloadd tmm1, [ecx+edx*2]
+  tileloaddt1 tmm5, foo
+  tileloaddt1 tmm5, [rcx]
+  tileloaddt1 tmm5, [ecx]
+  tileloaddt1 tmm5, [rcx+rdx]
+  tileloaddt1 tmm1, [ecx+edx*2]
+  tileloaddt1 tmm1, [rcx+riz*2]
+  tilerelease
+  tilestored [rcx], tmm5
+  tilestored [ecx], tmm5
+  tilestored [rcx+rdx], tmm5
+  tilestored [ecx+edx*2], tmm1
+  tilezero tmm0
+  tilezero tmm5
+  tilezero tmm7
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 956e2c3539..2b4ad3cd4e 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -375,6 +375,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define XMScalar { OP_XMM, scalar_mode }
 #define XMGatherQ { OP_XMM, vex_vsib_q_w_dq_mode }
 #define XMM { OP_XMM, xmm_mode }
+#define TMM { OP_XMM, tmm_mode }
 #define XMxmmq { OP_XMM, xmmq_mode }
 #define EM { OP_EM, v_mode }
 #define EMS { OP_EM, v_swap_mode }
@@ -391,6 +392,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define EXxS { OP_EX, x_swap_mode }
 #define EXxmm { OP_EX, xmm_mode }
 #define EXymm { OP_EX, ymm_mode }
+#define EXtmm { OP_EX, tmm_mode }
 #define EXxmmq { OP_EX, xmmq_mode }
 #define EXEvexHalfBcstXmmq { OP_EX, evex_half_bcst_xmmq_mode }
 #define EXxmm_mb { OP_EX, xmm_mb_mode }
@@ -421,6 +423,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define Vex128 { OP_VEX, vex128_mode }
 #define Vex256 { OP_VEX, vex256_mode }
 #define VexGdq { OP_VEX, dq_mode }
+#define VexTmm { OP_VEX, tmm_mode }
 #define EXdVexScalarS { OP_EX_Vex, d_scalar_swap_mode }
 #define EXqVexScalarS { OP_EX_Vex, q_scalar_swap_mode }
 #define EXVexW { OP_EX_VexW, x_mode }
@@ -451,6 +454,8 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define MVexVSIBQWpX { OP_M, vex_vsib_q_w_dq_mode }
 #define MVexVSIBQDWpX { OP_M, vex_vsib_q_w_d_mode }
 
+#define MVexSIBMEM { OP_M, vex_sibmem_mode }
+
 /* Used handle "rep" prefix for string instructions.  */
 #define Xbr { REP_Fixup, eSI_reg }
 #define Xvr { REP_Fixup, eSI_reg }
@@ -542,6 +547,8 @@ enum
   ymmq_mode,
   /* 32-byte YMM or 16-byte word operand */
   ymmxmm_mode,
+  /* TMM operand */
+  tmm_mode,
   /* d_mode in 32bit, q_mode in 64bit mode.  */
   m_mode,
   /* pair of v_mode operands */
@@ -595,6 +602,8 @@ enum
   vex_vsib_q_w_dq_mode,
   /* Similar to vex_vsib_q_w_dq_mode, with smaller memory.  */
   vex_vsib_q_w_d_mode,
+  /* mandatory non-vector SIB.  */
+  vex_sibmem_mode,
 
   /* scalar, ignore vector length.  */
   scalar_mode,
@@ -743,6 +752,7 @@ enum
   REG_VEX_0F72,
   REG_VEX_0F73,
   REG_VEX_0FAE,
+  REG_VEX_0F3849_P_0_W_0_M_1,
   REG_VEX_0F38F3,
   REG_XOP_LWPCB,
   REG_XOP_LWP,
@@ -826,6 +836,17 @@ enum
   MOD_0FE7_PREFIX_2,
   MOD_0FF0_PREFIX_3,
   MOD_0F382A_PREFIX_2,
+  MOD_VEX_0F3849_P_0_W_0,
+  MOD_VEX_0F3849_P_2_W_0,
+  MOD_VEX_0F3849_P_3_W_0,
+  MOD_VEX_0F384B_P_1_W_0,
+  MOD_VEX_0F384B_P_2_W_0,
+  MOD_VEX_0F384B_P_3_W_0,
+  MOD_VEX_0F385C_P_1_W_0,
+  MOD_VEX_0F385E_P_0_W_0,
+  MOD_VEX_0F385E_P_1_W_0,
+  MOD_VEX_0F385E_P_2_W_0,
+  MOD_VEX_0F385E_P_3_W_0,
   MOD_0F38F5_PREFIX_2,
   MOD_0F38F6_PREFIX_0,
   MOD_0F38F8_PREFIX_1,
@@ -963,6 +984,7 @@ enum
   RM_0F1E_P_1_MOD_3_REG_7,
   RM_0FAE_REG_6_MOD_3_P_0,
   RM_0FAE_REG_7_MOD_3,
+  RM_VEX_0F3849_P_0_W_0_M_1_R_0
 };
 
 enum
@@ -1298,9 +1320,13 @@ enum
   PREFIX_VEX_0F3845,
   PREFIX_VEX_0F3846,
   PREFIX_VEX_0F3847,
+  PREFIX_VEX_0F3849,
+  PREFIX_VEX_0F384B,
   PREFIX_VEX_0F3858,
   PREFIX_VEX_0F3859,
   PREFIX_VEX_0F385A,
+  PREFIX_VEX_0F385C,
+  PREFIX_VEX_0F385E,
   PREFIX_VEX_0F3878,
   PREFIX_VEX_0F3879,
   PREFIX_VEX_0F388C,
@@ -1673,7 +1699,19 @@ enum
   X86_64_0F01_REG_0,
   X86_64_0F01_REG_1,
   X86_64_0F01_REG_2,
-  X86_64_0F01_REG_3
+  X86_64_0F01_REG_3,
+  X86_64_VEX_0F3849_P_0_W_0_M_0_L_0,
+  X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0,
+  X86_64_VEX_0F3849_P_2_W_0_M_0_L_0,
+  X86_64_VEX_0F3849_P_3_W_0_M_0_L_0,
+  X86_64_VEX_0F384B_P_1_W_0_M_0_L_0,
+  X86_64_VEX_0F384B_P_2_W_0_M_0_L_0,
+  X86_64_VEX_0F384B_P_3_W_0_M_0_L_0,
+  X86_64_VEX_0F385C_P_1_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_0_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_1_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_2_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_3_W_0_M_0_L_0
 };
 
 enum
@@ -1758,7 +1796,19 @@ enum
   VEX_LEN_0F381A_P_2_M_0,
   VEX_LEN_0F3836_P_2,
   VEX_LEN_0F3841_P_2,
+  VEX_LEN_0F3849_P_0_W_0_M_0,
+  VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0,
+  VEX_LEN_0F3849_P_2_W_0_M_0,
+  VEX_LEN_0F3849_P_3_W_0_M_0,
+  VEX_LEN_0F384B_P_1_W_0_M_0,
+  VEX_LEN_0F384B_P_2_W_0_M_0,
+  VEX_LEN_0F384B_P_3_W_0_M_0,
   VEX_LEN_0F385A_P_2_M_0,
+  VEX_LEN_0F385C_P_1_W_0_M_0,
+  VEX_LEN_0F385E_P_0_W_0_M_0,
+  VEX_LEN_0F385E_P_1_W_0_M_0,
+  VEX_LEN_0F385E_P_2_W_0_M_0,
+  VEX_LEN_0F385E_P_3_W_0_M_0,
   VEX_LEN_0F38DB_P_2,
   VEX_LEN_0F38F2_P_0,
   VEX_LEN_0F38F3_R_1_P_0,
@@ -1926,9 +1976,20 @@ enum
   VEX_W_0F382F_P_2_M_0,
   VEX_W_0F3836_P_2,
   VEX_W_0F3846_P_2,
+  VEX_W_0F3849_P_0,
+  VEX_W_0F3849_P_2,
+  VEX_W_0F3849_P_3,
+  VEX_W_0F384B_P_1,
+  VEX_W_0F384B_P_2,
+  VEX_W_0F384B_P_3,
   VEX_W_0F3858_P_2,
   VEX_W_0F3859_P_2,
   VEX_W_0F385A_P_2_M_0,
+  VEX_W_0F385C_P_1,
+  VEX_W_0F385E_P_0,
+  VEX_W_0F385E_P_1,
+  VEX_W_0F385E_P_2,
+  VEX_W_0F385E_P_3,
   VEX_W_0F3878_P_2,
   VEX_W_0F3879_P_2,
   VEX_W_0F38CF_P_2,
@@ -3045,6 +3106,16 @@ static const char *att_names_zmm[] = {
   "%zmm28", "%zmm29", "%zmm30", "%zmm31"
 };
 
+static const char **names_tmm;
+static const char *intel_names_tmm[] = {
+  "tmm0", "tmm1", "tmm2", "tmm3",
+  "tmm4", "tmm5", "tmm6", "tmm7"
+};
+static const char *att_names_tmm[] = {
+  "%tmm0", "%tmm1", "%tmm2", "%tmm3",
+  "%tmm4", "%tmm5", "%tmm6", "%tmm7"
+};
+
 static const char **names_mask;
 static const char *intel_names_mask[] = {
   "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7"
@@ -3413,6 +3484,10 @@ static const struct dis386 reg_table[][8] = {
     { MOD_TABLE (MOD_VEX_0FAE_REG_2) },
     { MOD_TABLE (MOD_VEX_0FAE_REG_3) },
   },
+  /* REG_VEX_0F3849_P_0_W_0_M_1 */
+  {
+    { RM_TABLE (RM_VEX_0F3849_P_0_W_0_M_1_R_0) },
+  },
   /* REG_VEX_0F38F3 */
   {
     { Bad_Opcode },
@@ -5794,6 +5869,22 @@ static const struct dis386 prefix_table[][4] = {
     { "vpsllv%LW", { XM, Vex, EXx }, 0 },
   },
 
+  /* PREFIX_VEX_0F3849 */
+  {
+    { VEX_W_TABLE (VEX_W_0F3849_P_0) },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F3849_P_2) },
+    { VEX_W_TABLE (VEX_W_0F3849_P_3) },
+  },
+
+  /* PREFIX_VEX_0F384B */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F384B_P_1) },
+    { VEX_W_TABLE (VEX_W_0F384B_P_2) },
+    { VEX_W_TABLE (VEX_W_0F384B_P_3) },
+  },
+
   /* PREFIX_VEX_0F3858 */
   {
     { Bad_Opcode },
@@ -5815,6 +5906,21 @@ static const struct dis386 prefix_table[][4] = {
     { MOD_TABLE (MOD_VEX_0F385A_PREFIX_2) },
   },
 
+  /* PREFIX_VEX_0F385C */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F385C_P_1) },
+    { Bad_Opcode },
+  },
+
+  /* PREFIX_VEX_0F385E */
+  {
+    { VEX_W_TABLE (VEX_W_0F385E_P_0) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_1) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_2) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_3) },
+  },
+
   /* PREFIX_VEX_0F3878 */
   {
     { Bad_Opcode },
@@ -6830,6 +6936,78 @@ static const struct dis386 x86_64_table[][2] = {
     { "lidt{Q|Q}", { M }, 0 },
     { "lidt", { M }, 0 },
   },
+
+  /* X86_64_VEX_0F3849_P_0_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "ldtilecfg", { M }, 0 },
+  },
+
+  /* X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tilerelease", { Skip_MODRM }, 0 },
+  },
+
+  /* X86_64_VEX_0F3849_P_2_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "sttilecfg", { M }, 0 },
+  },
+
+  /* X86_64_VEX_0F3849_P_3_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tilezero", { TMM, Skip_MODRM }, 0 },
+  },
+
+  /* X86_64_VEX_0F384B_P_1_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tilestored", { MVexSIBMEM, TMM }, 0 },
+  },
+
+  /* X86_64_VEX_0F384B_P_2_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tileloaddt1", { TMM, MVexSIBMEM }, 0 },
+  },
+
+  /* X86_64_VEX_0F384B_P_3_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tileloadd", { TMM, MVexSIBMEM }, 0 },
+  },
+
+  /* X86_64_VEX_0F385C_P_1_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbf16ps", { TMM, EXtmm, VexTmm }, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_0_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbuud", {TMM, EXtmm, VexTmm }, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_1_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbsud", {TMM, EXtmm, VexTmm }, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_2_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbusd", {TMM, EXtmm, VexTmm }, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_3_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbssd", {TMM, EXtmm, VexTmm }, 0 },
+  },
 };
 
 static const struct dis386 three_byte_table[][256] = {
@@ -8671,9 +8849,9 @@ static const struct dis386 vex_table[][256] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3847) },
     /* 48 */
     { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F3849) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F384B) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -8692,9 +8870,9 @@ static const struct dis386 vex_table[][256] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3859) },
     { PREFIX_TABLE (PREFIX_VEX_0F385A) },
     { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F385C) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F385E) },
     { Bad_Opcode },
     /* 60 */
     { Bad_Opcode },
@@ -9432,12 +9610,72 @@ static const struct dis386 vex_len_table[][2] = {
     { "vphminposuw",	{ XM, EXx }, 0 },
   },
 
+  /* VEX_LEN_0F3849_P_0_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_0_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0) },
+  },
+
+  /* VEX_LEN_0F3849_P_2_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_2_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F3849_P_3_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_3_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F384B_P_1_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F384B_P_1_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F384B_P_2_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F384B_P_2_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F384B_P_3_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F384B_P_3_W_0_M_0_L_0) },
+  },
+
   /* VEX_LEN_0F385A_P_2_M_0 */
   {
     { Bad_Opcode },
     { VEX_W_TABLE (VEX_W_0F385A_P_2_M_0) },
   },
 
+  /* VEX_LEN_0F385C_P_1_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385C_P_1_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_0_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_0_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_1_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_1_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_2_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_2_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_3_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_3_W_0_M_0_L_0) },
+  },
+
   /* VEX_LEN_0F38DB_P_2 */
   {
     { "vaesimc",	{ XM, EXx }, 0 },
@@ -9930,6 +10168,30 @@ static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F3846_P_2 */
     { "vpsravd",	{ XM, Vex, EXx }, 0 },
   },
+  {
+    /* VEX_W_0F3849_P_0 */
+    { MOD_TABLE (MOD_VEX_0F3849_P_0_W_0) },
+  },
+  {
+    /* VEX_W_0F3849_P_2 */
+    { MOD_TABLE (MOD_VEX_0F3849_P_2_W_0) },
+  },
+  {
+    /* VEX_W_0F3849_P_3 */
+    { MOD_TABLE (MOD_VEX_0F3849_P_3_W_0) },
+  },
+  {
+    /* VEX_W_0F384B_P_1 */
+    { MOD_TABLE (MOD_VEX_0F384B_P_1_W_0) },
+  },
+  {
+    /* VEX_W_0F384B_P_2 */
+    { MOD_TABLE (MOD_VEX_0F384B_P_2_W_0) },
+  },
+  {
+    /* VEX_W_0F384B_P_3 */
+    { MOD_TABLE (MOD_VEX_0F384B_P_3_W_0) },
+  },
   {
     /* VEX_W_0F3858_P_2 */
     { "vpbroadcastd", { XM, EXxmm_md }, 0 },
@@ -9942,6 +10204,26 @@ static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F385A_P_2_M_0 */
     { "vbroadcasti128", { XM, Mxmm }, 0 },
   },
+  {
+    /* VEX_W_0F385C_P_1 */
+    { MOD_TABLE (MOD_VEX_0F385C_P_1_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_0 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_0_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_1 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_1_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_2 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_2_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_3 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_3_W_0) },
+  },
   {
     /* VEX_W_0F3878_P_2 */
     { "vpbroadcastb",	{ XM, EXxmm_mb }, 0 },
@@ -10388,6 +10670,57 @@ static const struct dis386 mod_table[][2] = {
     /* MOD_0F382A_PREFIX_2 */
     { "movntdqa",	{ XM, Mx }, 0 },
   },
+  {
+    /* MOD_VEX_0F3849_P_0_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_0_W_0_M_0) },
+    { REG_TABLE (REG_VEX_0F3849_P_0_W_0_M_1) },
+  },
+  {
+    /* MOD_VEX_0F3849_P_2_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_2_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F3849_P_3_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_3_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F384B_P_1_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_P_1_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F384B_P_2_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_P_2_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F384B_P_3_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_P_3_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385C_P_1_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385C_P_1_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_0_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_0_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_1_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_1_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_2_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_2_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_3_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_3_W_0_M_0) },
+  },
   {
     /* MOD_0F38F5_PREFIX_2 */
     { "wrussK",		{ M, Gdq }, PREFIX_OPCODE },
@@ -10949,6 +11282,10 @@ static const struct dis386 rm_table[][8] = {
     { "sfence",		{ Skip_MODRM }, 0 },
 
   },
+  {
+    /* RM_VEX_0F3849_P_0_W_0_M_1_R_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0) },
+  },
 };
 
 #define INTERNAL_DISASSEMBLER_ERROR _("<internal disassembler error>")
@@ -11845,6 +12182,7 @@ print_insn (bfd_vma pc, disassemble_info *info)
       names_xmm = intel_names_xmm;
       names_ymm = intel_names_ymm;
       names_zmm = intel_names_zmm;
+      names_tmm = intel_names_tmm;
       index64 = intel_index64;
       index32 = intel_index32;
       names_mask = intel_names_mask;
@@ -11867,6 +12205,7 @@ print_insn (bfd_vma pc, disassemble_info *info)
       names_xmm = att_names_xmm;
       names_ymm = att_names_ymm;
       names_zmm = att_names_zmm;
+      names_tmm = att_names_tmm;
       index64 = att_index64;
       index32 = att_index32;
       names_mask = att_names_mask;
@@ -14023,6 +14362,15 @@ OP_E_memory (int bytemode, int sizeflag)
 	  base = sib.base;
 	  codep++;
 	}
+      else
+	{
+	  /* mandatory non-vector SIB must have sib */
+	  if (bytemode == vex_sibmem_mode)
+	    {
+	      oappend ("(bad)");
+	      return;
+	    }
+	}
       rbase = base + add;
 
       switch (modrm.mod)
@@ -15050,6 +15398,7 @@ OP_XMM (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       && bytemode != xmmq_mode
       && bytemode != evex_half_bcst_xmmq_mode
       && bytemode != ymm_mode
+      && bytemode != tmm_mode
       && bytemode != scalar_mode)
     {
       switch (vex.length)
@@ -15088,6 +15437,16 @@ OP_XMM (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	  abort ();
 	}
     }
+  else if (bytemode == tmm_mode)
+    {
+      if (reg >= 8)
+        {
+	  oappend ("(bad)");
+	  return;
+        }
+      names = names_tmm;
+    }
+
   else if (bytemode == ymm_mode)
     names = names_ymm;
   else
@@ -15212,6 +15571,7 @@ OP_EX (int bytemode, int sizeflag)
       && bytemode != xmmq_mode
       && bytemode != evex_half_bcst_xmmq_mode
       && bytemode != ymm_mode
+      && bytemode != tmm_mode
       && bytemode != d_scalar_swap_mode
       && bytemode != q_scalar_swap_mode
       && bytemode != vex_scalar_w_dq_mode)
@@ -15247,6 +15607,15 @@ OP_EX (int bytemode, int sizeflag)
 	  abort ();
 	}
     }
+  else if (bytemode == tmm_mode)
+    {
+      if (reg >= 8)
+        {
+	  oappend ("(bad)");
+	  return;
+        }
+      names = names_tmm;
+    }
   else if (bytemode == ymm_mode)
     names = names_ymm;
   else
@@ -15802,6 +16171,17 @@ OP_VEX (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       return;
     }
 
+  if (bytemode == tmm_mode)
+    {
+      if (reg >= 8)
+        {
+	  oappend ("(bad)");
+	  return;
+        }
+      oappend (names_tmm[reg]);
+      return;
+    }
+
   switch (vex.length)
     {
     case 128:
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 7230f87344..3334155071 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -297,6 +297,12 @@ static initializer cpu_flag_init[] =
     "CpuWAITPKG" },
   { "CPU_CLDEMOTE_FLAGS",
     "CpuCLDEMOTE" },
+  { "CPU_AMX_INT8_FLAGS",
+    "CpuAMX_INT8" },
+  { "CPU_AMX_BF16_FLAGS",
+    "CpuAMX_BF16" },
+  { "CPU_AMX_TILE_FLAGS",
+    "CpuAMX_TILE" },
   { "CPU_MOVDIRI_FLAGS",
     "CpuMOVDIRI" },
   { "CPU_MOVDIR64B_FLAGS",
@@ -383,6 +389,12 @@ static initializer cpu_flag_init[] =
     "CpuAVX512_BITALG" },
   { "CPU_ANY_AVX512_BF16_FLAGS",
     "CpuAVX512_BF16" },
+  { "CPU_ANY_AMX_INT8_FLAGS",
+    "CpuAMX_INT8" },
+  { "CPU_ANY_AMX_BF16_FLAGS",
+    "CpuAMX_BF16" },
+  { "CPU_ANY_AMX_TILE_FLAGS",
+    "CpuAMX_TILE|CpuAMX_INT8|CpuAMX_BF16" },
   { "CPU_ANY_MOVDIRI_FLAGS",
     "CpuMOVDIRI" },
   { "CPU_ANY_MOVDIR64B_FLAGS",
@@ -459,6 +471,8 @@ static initializer operand_type_init[] =
     "Class=RegSIMD|Ymmword" },
   { "OPERAND_TYPE_REGZMM",
     "Class=RegSIMD|Zmmword" },
+  { "OPERAND_TYPE_REGTMM",
+    "Class=RegSIMD|Tmmword" },
   { "OPERAND_TYPE_REGMASK",
     "Class=RegMask" },
   { "OPERAND_TYPE_REGBND",
@@ -611,6 +625,9 @@ static bitfield cpu_flags[] =
   BITFIELD (CpuPCONFIG),
   BITFIELD (CpuWAITPKG),
   BITFIELD (CpuCLDEMOTE),
+  BITFIELD (CpuAMX_INT8),
+  BITFIELD (CpuAMX_BF16),
+  BITFIELD (CpuAMX_TILE),
   BITFIELD (CpuMOVDIRI),
   BITFIELD (CpuMOVDIR64B),
   BITFIELD (CpuENQCMD),
@@ -741,6 +758,7 @@ static bitfield operand_types[] =
   BITFIELD (Xmmword),
   BITFIELD (Ymmword),
   BITFIELD (Zmmword),
+  BITFIELD (Tmmword),
   BITFIELD (Unspecified),
 #ifdef OTUnused
   BITFIELD (OTUnused),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index c65febbe81..b8a6dfc25c 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -223,6 +223,12 @@ enum
   /* CET instructions support required */
   CpuIBT,
   CpuSHSTK,
+  /* AMX-INT8 instructions required */
+  CpuAMX_INT8,
+  /* AMX-BF16 instructions required */
+  CpuAMX_BF16,
+  /* AMX-TILE instructions required */
+  CpuAMX_TILE,
   /* GFNI instructions required */
   CpuGFNI,
   /* VAES instructions required */
@@ -372,6 +378,9 @@ typedef union i386_cpu_flags
       unsigned int cpuptwrite:1;
       unsigned int cpuibt:1;
       unsigned int cpushstk:1;
+      unsigned int cpuamx_int8:1;
+      unsigned int cpuamx_bf16:1;
+      unsigned int cpuamx_tile:1;
       unsigned int cpugfni:1;
       unsigned int cpuvaes:1;
       unsigned int cpuvpclmulqdq:1;
@@ -574,7 +583,9 @@ enum
 #define VECSIB128	1
 #define VECSIB256	2
 #define VECSIB512	3
+#define SIBMEM		4
   SIB,
+
   /* SSE to AVX support required */
   SSE2AVX,
   /* No AVX equivalent */
@@ -702,7 +713,7 @@ typedef struct i386_opcode_modifier
   unsigned int vexw:2;
   unsigned int vexopcode:3;
   unsigned int vexsources:2;
-  unsigned int sib:2;
+  unsigned int sib:3;
   unsigned int sse2avx:1;
   unsigned int noavx:1;
   unsigned int evex:3;
@@ -807,6 +818,8 @@ enum
   Ymmword,
   /* ZMMWORD size.  */
   Zmmword,
+  /* TMMWORD size.  */
+  Tmmword,
   /* Unspecified memory size.  */
   Unspecified,
 
@@ -851,6 +864,7 @@ typedef union i386_operand_type
       unsigned int xmmword:1;
       unsigned int ymmword:1;
       unsigned int zmmword:1;
+      unsigned int tmmword:1;
       unsigned int unspecified:1;
 #ifdef OTUnused
       unsigned int unused:(OTNumOfBits - OTUnused);
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index cd6833c5ae..2a8ec52b41 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -52,6 +52,7 @@
 #define RegXMM Class=RegSIMD|Xmmword
 #define RegYMM Class=RegSIMD|Ymmword
 #define RegZMM Class=RegSIMD|Zmmword
+#define RegTMM Class=RegSIMD|Tmmword
 
 #define RegMask Class=RegMask
 
@@ -88,6 +89,7 @@
 #define VecSIB128 SIB=VECSIB128
 #define VecSIB256 SIB=VECSIB256
 #define VecSIB512 SIB=VECSIB512
+#define Sibmem SIB=SIBMEM|Modrm
 
 #define EVex128 EVex=EVEX128
 #define EVex256 EVex=EVEX256
@@ -4093,3 +4095,24 @@ xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|
 xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }
 
 // TSXLDTRK instructions end.
+
+// AMX instructions.
+
+ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }
+sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }
+
+tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+
+tileloadd, 2, 0xf24b, None, 1, CpuAMX_TILE|Cpu64, Sibmem|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }
+tileloaddt1, 2, 0x664b, None, 1, CpuAMX_TILE|Cpu64, Sibmem|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }
+tilestored, 2, 0xf34b, None, 1, CpuAMX_TILE|Cpu64, Sibmem|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }
+
+tilerelease, 0, 0x49c0, None, 2, CpuAMX_TILE|Cpu64, Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }
+
+tilezero, 1, 0xf249, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM }
+
+// AMX instructions end.
diff --git a/opcodes/i386-reg.tbl b/opcodes/i386-reg.tbl
index cdff763ca7..ca7eeba488 100644
--- a/opcodes/i386-reg.tbl
+++ b/opcodes/i386-reg.tbl
@@ -278,6 +278,15 @@ zmm28, Class=RegSIMD|Zmmword, RegVRex|RegRex, 4, Dw2Inval, Dw2Inval
 zmm29, Class=RegSIMD|Zmmword, RegVRex|RegRex, 5, Dw2Inval, Dw2Inval
 zmm30, Class=RegSIMD|Zmmword, RegVRex|RegRex, 6, Dw2Inval, Dw2Inval
 zmm31, Class=RegSIMD|Zmmword, RegVRex|RegRex, 7, Dw2Inval, Dw2Inval
+// TMM registers for AMX
+tmm0, Class=RegSIMD|Tmmword, 0, 0, Dw2Inval, Dw2Inval
+tmm1, Class=RegSIMD|Tmmword, 0, 1, Dw2Inval, Dw2Inval
+tmm2, Class=RegSIMD|Tmmword, 0, 2, Dw2Inval, Dw2Inval
+tmm3, Class=RegSIMD|Tmmword, 0, 3, Dw2Inval, Dw2Inval
+tmm4, Class=RegSIMD|Tmmword, 0, 4, Dw2Inval, Dw2Inval
+tmm5, Class=RegSIMD|Tmmword, 0, 5, Dw2Inval, Dw2Inval
+tmm6, Class=RegSIMD|Tmmword, 0, 6, Dw2Inval, Dw2Inval
+tmm7, Class=RegSIMD|Tmmword, 0, 7, Dw2Inval, Dw2Inval
 // Bound registers for MPX
 bnd0, Class=RegBND, 0, 0, Dw2Inval, Dw2Inval
 bnd1, Class=RegBND, 0, 1, Dw2Inval, Dw2Inval
-- 
2.17.1

Thanks,
Lili.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-x86-Add-support-for-Intel-AMX-instructions.patch
Type: application/octet-stream
Size: 55075 bytes
Desc: 0001-x86-Add-support-for-Intel-AMX-instructions.patch
URL: <https://sourceware.org/pipermail/binutils/attachments/20200708/9036bfc3/attachment-0001.obj>


More information about the Binutils mailing list