[PATCH v2 1/2] gas, aarch64: Add AdvSIMD lut extension

Wed May 22 10:17:54 GMT 2024

Hi Andrew,

Thanks for the comments. Please find responses inline.

On 5/21/2024 2:57 PM, Andrew Carlotti wrote:
> On Thu, May 16, 2024 at 11:35:18AM +0100, Saurabh Jha wrote:
>> Introduces instructions for the Advanced SIMD lut extension for AArch64.
>> They are documented in the following links:
>> * luti2: https://developer.arm.com/documentation/ddi0602/2024-03/SIMD-FP-Instructions/LUTI2--Lookup-table-read-with-2-bit-indices-?lang=en
>> * luti4: https://developer.arm.com/documentation/ddi0602/2024-03/SIMD-FP-Instructions/LUTI4--Lookup-table-read-with-4-bit-indices-?lang=en
>>
>> These instructions needed definition of some new operands. We will first
>> discuss operands for the third operand of the instructions and then
>> discuss a vector register list operand needed for the second operand.
>>
>> The third operands are vectors with bit indices and without type
>> qualifiers. They are called Em_INDEX1_14, Em_INDEX2_13, and Em_INDEX3_12
>> and they have 1 bit, 2 bit, and 3 bit indices respectively. For these
>> new operands, we defined new parsing case branch and a new instruction
>> class. We also modified the existing reglane inserters and extractors
>> to handle the new operands. The lsb and width of these operands are
>> the same as many existing operands but the convention is to give
>> different names to fields that serve different purpose so we
>> introduced new fields in aarch64-opc.c and aarch64-opc.h for these
>> operands.
>>
>> For the second operand of these instructions, we introduced a new
>> operand called LVn_LUT. This represents a vector register list with
>> stride 1. We defined new inserter and extractor for this new operand and
>> it is encoded in FLD_Rn. We are enforcing the number of registers in the
>> reglist using opcode flag rather than operand flag as this is what other
>> SIMD vector register list operands are doing. The disassembly also uses
>> opcode flag to print the correct number of registers.
>> ---
>> Hi,
>>
>> Regression tested for aarch64-none-elf and found no regressions.
>>
>> Ok for binutils-master? I don't have commit access so can someone please
>> commit on my behalf?
>>
>> Regards,
>> Saurabh
> 
>> diff --git a/gas/config/tc-aarch64.c b/gas/config/tc-aarch64.c
>> index 6ad4fae8b0ece71e5ac448be889846369c657420..bfba6efc6417e15887b0349c671e074e2238adc0 100644
>> --- a/gas/config/tc-aarch64.c
>> +++ b/gas/config/tc-aarch64.c
>> @@ -1513,6 +1513,54 @@ parse_vector_reg_list (char **ccp, aarch64_reg_type type,
>>     return error ? PARSE_FAIL : (ret_val << 2) | (nb_regs - 1);
>>   }
>>   
>> +/* Parse a SIMD vector register with a bit index. The SIMD vectors with
>> +   bit indices don't have type qualifiers.
>> +
>> +   Return null if the string pointed to by *CCP is not a valid AdvSIMD
>> +   vector register with a bit index.
>> +
>> +   Otherwise return the register and the bit index information
>> +   in *typeinfo.
>> +
>> +   The validity of the bit index itself is checked separately in encoding.
>> + */
>> +
>> +static const reg_entry *
>> +parse_simd_vector_with_bit_index (char **ccp, struct vector_type_el *typeinfo)
>> +{
>> +  char *str = *ccp;
>> +  const reg_entry *reg = parse_reg (&str);
>> +  struct vector_type_el atype;
>> +
>> +  // Setting it here as this is the convention followed in the
>> +  // rest of the code with indices.
>> +  atype.defined = NTA_HASINDEX;
>> +  // This will be set to correct value in parse_index_expressions.
>> +  atype.index = 0;
>> +  // The rest of the fields are not applicable for this operand.
>> +  atype.type = NT_invtype;
>> +  atype.width = -1;
>> +  atype.element_size = 0;
>> +
>> +  if (reg == NULL)
>> +    return NULL;
>> +
>> +  if (reg->type != REG_TYPE_V)
>> +    return NULL;
>> +
>> +  // Parse the bit index.
>> +  if (!skip_past_char (&str, '['))
>> +    return NULL;
>> +  if (!parse_index_expression (&str, &atype.index))
>> +    return NULL;
>> +  if (!skip_past_char (&str, ']'))
>> +    return NULL;
>> +
>> +  *typeinfo = atype;
>> +  *ccp = str;
>> +  return reg;
>> +}
>> +
>>   /* Directives: register aliases.  */
>>   
>>   static reg_entry *
>> @@ -6761,6 +6809,23 @@ parse_operands (char *str, const aarch64_opcode *opcode)
>>   	  reg_type = REG_TYPE_Z;
>>   	  goto vector_reg_index;
>>   
>> +	case AARCH64_OPND_Em_INDEX1_14:
>> +	case AARCH64_OPND_Em_INDEX2_13:
>> +	case AARCH64_OPND_Em_INDEX3_12:
>> +	  // These are SIMD vector operands with bit indices. For example,
>> +	  // 'V27[3]'. These operands don't have type qualifiers before
>> +	  // indices.
>> +	  reg = parse_simd_vector_with_bit_index(&str, &vectype);
>> +
>> +	  if (!reg)
>> +	    goto failure;
>> +	  gas_assert (vectype.defined & NTA_HASINDEX);
>> +
>> +	  info->qualifier = AARCH64_OPND_QLF_NIL;
>> +	  info->reglane.regno = reg->number;
>> +	  info->reglane.index = vectype.index;
>> +	  break;
>> +
> 
> Is the new function and separate handling necessary?  There's already support
> in the section below for indexed operands without qualifiers on SVE registers.
> I tested removing the reg->type check from the below line, and nothing broke in
> the testsuite, so maybe that's an option.
>> if (reg->type == REG_TYPE_Z && vectype.type == NT_invtype)
> 
> If that's not an option, could you move this block of code so it isn't in the
> middle of the vector_reg_index cases?
I tried doing it but couldn't get it to work. It does seem like the code 
path assumes that the register is going to have qualifiers. So I have 
kept it unchanged. But I have moved the block of code downwards as you 
suggested.

> 
>>   	case AARCH64_OPND_Ed:
>>   	case AARCH64_OPND_En:
>>   	case AARCH64_OPND_Em:
>> @@ -6812,6 +6877,7 @@ parse_operands (char *str, const aarch64_opcode *opcode)
>>   	  goto vector_reg_list;
>>   
>>   	case AARCH64_OPND_LVn:
>> +	case AARCH64_OPND_LVn_LUT:
>>   	case AARCH64_OPND_LVt:
>>   	case AARCH64_OPND_LVt_AL:
>>   	case AARCH64_OPND_LEt:
>> @@ -10477,6 +10543,7 @@ static const struct aarch64_option_cpu_value_table aarch64_features[] = {
>>     {"rcpc3",		AARCH64_FEATURE (RCPC3), AARCH64_FEATURE (RCPC2)},
>>     {"cpa",		AARCH64_FEATURE (CPA), AARCH64_NO_FEATURES},
>>     {"faminmax",		AARCH64_FEATURE (FAMINMAX), AARCH64_FEATURE (SIMD)},
>> +  {"lut",		AARCH64_FEATURE (LUT), AARCH64_FEATURE (SIMD)},
>>     {NULL,		AARCH64_NO_FEATURES, AARCH64_NO_FEATURES},
>>   };
>>   
> ...
>> diff --git a/include/opcode/aarch64.h b/include/opcode/aarch64.h
>> index 2fca9528c2012be983c2414a30fa5930e57e5c92..63456021a1d167c747ca913a355dd02cf90fc726 100644
>> --- a/include/opcode/aarch64.h
>> +++ b/include/opcode/aarch64.h
>> @@ -232,6 +232,8 @@ enum aarch64_feature_bit {
>>     AARCH64_FEATURE_CPA,
>>     /* FAMINMAX instructions.  */
>>     AARCH64_FEATURE_FAMINMAX,
>> +  /* LUT instructions.  */
>> +  AARCH64_FEATURE_LUT,
>>     AARCH64_NUM_FEATURES
>>   };
>>   
>> @@ -518,10 +520,14 @@ enum aarch64_opnd
>>     AARCH64_OPND_Em,	/* AdvSIMD Vector Element Vm.  */
>>     AARCH64_OPND_Em16,	/* AdvSIMD Vector Element Vm restricted to V0 - V15 when
>>   			   qualifier is S_H.  */
>> +  AARCH64_OPND_Em_INDEX1_14,  /* AdvSIMD 1-bit encoded index in Vm at [14]  */
>> +  AARCH64_OPND_Em_INDEX2_13,  /* AdvSIMD 2-bit encoded index in Vm at [14:13]  */
>> +  AARCH64_OPND_Em_INDEX3_12,  /* AdvSIMD 3-bit encoded index in Vm at [14:12]  */
>>     AARCH64_OPND_LVn,	/* AdvSIMD Vector register list used in e.g. TBL.  */
>>     AARCH64_OPND_LVt,	/* AdvSIMD Vector register list used in ld/st.  */
>>     AARCH64_OPND_LVt_AL,	/* AdvSIMD Vector register list for loading single
>>   			   structure to all lanes.  */
>> +  AARCH64_OPND_LVn_LUT,	/* AdvSIMD Vector register list used in lut.  */
>>     AARCH64_OPND_LEt,	/* AdvSIMD Vector Element list.  */
>>   
>>     AARCH64_OPND_CRn,	/* Co-processor register in CRn field.  */
>> @@ -1018,7 +1024,8 @@ enum aarch64_insn_class
>>     the,
>>     sve2_urqvs,
>>     sve_index1,
>> -  rcpc3
>> +  rcpc3,
>> +  lut
>>   };
>>   
>>   /* Opcode enumerators.  */
>> diff --git a/opcodes/aarch64-asm.h b/opcodes/aarch64-asm.h
>> index 88e389bfebda001efbb578a6e144dd5e2513cf78..edeb6d8de7e2c3e117e0ad91a02b93c0e040a061 100644
>> --- a/opcodes/aarch64-asm.h
>> +++ b/opcodes/aarch64-asm.h
>> @@ -47,6 +47,7 @@ AARCH64_DECL_OPD_INSERTER (ins_reglane);
>>   AARCH64_DECL_OPD_INSERTER (ins_reglist);
>>   AARCH64_DECL_OPD_INSERTER (ins_ldst_reglist);
>>   AARCH64_DECL_OPD_INSERTER (ins_ldst_reglist_r);
>> +AARCH64_DECL_OPD_INSERTER (ins_lut_reglist);
>>   AARCH64_DECL_OPD_INSERTER (ins_ldst_elemlist);
>>   AARCH64_DECL_OPD_INSERTER (ins_advsimd_imm_shift);
>>   AARCH64_DECL_OPD_INSERTER (ins_imm);
>> diff --git a/opcodes/aarch64-asm.c b/opcodes/aarch64-asm.c
>> index 5a55ca2f86db2d45b6cb54b5ee22606ec27c51fd..338ed54165d26cec2f0634bc62c1d7355ca4956a 100644
>> --- a/opcodes/aarch64-asm.c
>> +++ b/opcodes/aarch64-asm.c
>> @@ -168,6 +168,27 @@ aarch64_ins_reglane (const aarch64_operand *self, const aarch64_opnd_info *info,
>>         assert (reglane_index < 4);
>>         insert_field (FLD_SM3_imm2, code, reglane_index, 0);
>>       }
>> +  else if (inst->opcode->iclass == lut)
>> +    {
>> +      unsigned reglane_index = info->reglane.index;
>> +      switch (info->type)
>> +	{
>> +	case AARCH64_OPND_Em_INDEX1_14:
>> +	  assert (reglane_index < 2);
>> +	  insert_field (FLD_imm1_14, code, reglane_index, 0);
>> +	  break;
>> +	case AARCH64_OPND_Em_INDEX2_13:
>> +	  assert (reglane_index < 4);
>> +	  insert_field (FLD_imm2_13, code, reglane_index, 0);
>> +	  break;
>> +	case AARCH64_OPND_Em_INDEX3_12:
>> +	  assert (reglane_index < 8);
>> +	  insert_field (FLD_imm3_12, code, reglane_index, 0);
>> +	  break;
>> +	default:
>> +	  return false;
>> +	}
>> +    }
>>     else
>>       {
>>         /* index for e.g. SQDMLAL <Va><d>, <Vb><n>, <Vm>.<Ts>[<index>]
>> @@ -286,6 +307,17 @@ aarch64_ins_ldst_reglist_r (const aarch64_operand *self ATTRIBUTE_UNUSED,
>>     return true;
>>   }
>>   
>> +/* Insert regnos of register list operand for AdvSIMD lut instructions.  */
>> +bool
>> +aarch64_ins_lut_reglist (const aarch64_operand *self, const aarch64_opnd_info *info,
>> +		     aarch64_insn *code,
>> +		     const aarch64_inst *inst ATTRIBUTE_UNUSED,
>> +		     aarch64_operand_error *errors ATTRIBUTE_UNUSED)
>> +{
>> +  insert_field (self->fields[0], code, info->reglist.first_regno, 0);
>> +  return true;
>> +}
>> +
>>   /* Insert Q, opcode<2:1>, S, size and Rt fields for a register element list
>>      operand e.g. Vt in AdvSIMD load/store single element instructions.  */
>>   bool
>> diff --git a/opcodes/aarch64-dis.h b/opcodes/aarch64-dis.h
>> index 86494cc30937b1d7e4caf90630caec30c8b31d3e..9e8f7c214d70390a72f93e38655a5ac0f562d085 100644
>> --- a/opcodes/aarch64-dis.h
>> +++ b/opcodes/aarch64-dis.h
>> @@ -70,6 +70,7 @@ AARCH64_DECL_OPD_EXTRACTOR (ext_reglane);
>>   AARCH64_DECL_OPD_EXTRACTOR (ext_reglist);
>>   AARCH64_DECL_OPD_EXTRACTOR (ext_ldst_reglist);
>>   AARCH64_DECL_OPD_EXTRACTOR (ext_ldst_reglist_r);
>> +AARCH64_DECL_OPD_EXTRACTOR (ext_lut_reglist);
>>   AARCH64_DECL_OPD_EXTRACTOR (ext_ldst_elemlist);
>>   AARCH64_DECL_OPD_EXTRACTOR (ext_advsimd_imm_shift);
>>   AARCH64_DECL_OPD_EXTRACTOR (ext_shll_imm);
>> diff --git a/opcodes/aarch64-dis.c b/opcodes/aarch64-dis.c
>> index 96f42ae862a395bf3aa498c495fdcea9a3d12a41..130d2c1fae005c25a4615a88190b62ffd059cdb1 100644
>> --- a/opcodes/aarch64-dis.c
>> +++ b/opcodes/aarch64-dis.c
>> @@ -398,6 +398,23 @@ aarch64_ext_reglane (const aarch64_operand *self, aarch64_opnd_info *info,
>>         /* index for e.g. SM3TT2A <Vd>.4S, <Vn>.4S, <Vm>S[<imm2>].  */
>>         info->reglane.index = extract_field (FLD_SM3_imm2, code, 0);
>>       }
>> +  else if (inst->opcode->iclass == lut)
>> +    {
>> +      switch (info->type)
>> +	{
>> +	case AARCH64_OPND_Em_INDEX1_14:
>> +	  info->reglane.index = extract_field (FLD_imm1_14, code, 0);
>> +	  break;
>> +	case AARCH64_OPND_Em_INDEX2_13:
>> +	  info->reglane.index = extract_field (FLD_imm2_13, code, 0);
>> +	  break;
>> +	case AARCH64_OPND_Em_INDEX3_12:
>> +	  info->reglane.index = extract_field (FLD_imm3_12, code, 0);
>> +	  break;
>> +	default:
>> +	  return false;
>> +	}
>> +    }
>>     else
>>       {
>>         /* Index only for e.g. SQDMLAL <Va><d>, <Vb><n>, <Vm>.<Ts>[<index>]
>> @@ -533,6 +550,21 @@ aarch64_ext_ldst_reglist_r (const aarch64_operand *self ATTRIBUTE_UNUSED,
>>     return true;
>>   }
>>   
>> +/* Decode AdvSIMD vector register list for AdvSIMD lut instructions.
>> +   The number of of registers in the list is determined by the opcode
>> +   flag.  */
>> +bool
>> +aarch64_ext_lut_reglist (const aarch64_operand *self, aarch64_opnd_info *info,
>> +		     const aarch64_insn code,
>> +		     const aarch64_inst *inst ATTRIBUTE_UNUSED,
>> +		     aarch64_operand_error *errors ATTRIBUTE_UNUSED)
>> +{
>> +  info->reglist.first_regno = extract_field (self->fields[0], code, 0);
>> +  info->reglist.num_regs = get_opcode_dependent_value (inst->opcode);
>> +  info->reglist.stride = 1;
>> +  return true;
>> +}
>> +
>>   /* Decode Q, opcode<2:1>, S, size and Rt fields of Vt in AdvSIMD
>>      load/store single element instructions.  */
>>   bool
>> diff --git a/opcodes/aarch64-opc.h b/opcodes/aarch64-opc.h
>> index 4e781f000cc38c12058530e5851b08083d42af52..23e634f1250de579661bbeb14d611b868b76bc8d 100644
>> --- a/opcodes/aarch64-opc.h
>> +++ b/opcodes/aarch64-opc.h
>> @@ -147,6 +147,7 @@ enum aarch64_field_kind
>>     FLD_imm1_2,
>>     FLD_imm1_8,
>>     FLD_imm1_10,
>> +  FLD_imm1_14,
>>     FLD_imm1_15,
>>     FLD_imm1_16,
>>     FLD_imm2_0,
>> @@ -154,6 +155,7 @@ enum aarch64_field_kind
>>     FLD_imm2_8,
>>     FLD_imm2_10,
>>     FLD_imm2_12,
>> +  FLD_imm2_13,
>>     FLD_imm2_15,
>>     FLD_imm2_16,
>>     FLD_imm2_19,
>> diff --git a/opcodes/aarch64-opc.c b/opcodes/aarch64-opc.c
>> index e88c616f4a9f3657756b919dc1196c08831c3cc5..61ab4c14a6393150f29a3fa1679a30b642bf8844 100644
>> --- a/opcodes/aarch64-opc.c
>> +++ b/opcodes/aarch64-opc.c
>> @@ -337,6 +337,7 @@ const aarch64_field fields[] =
>>       {  2,  1 },	/* imm1_2: general immediate in bits [2].  */
>>       {  8,  1 },	/* imm1_8: general immediate in bits [8].  */
>>       { 10,  1 },	/* imm1_10: general immediate in bits [10].  */
>> +    { 14,  1 },	/* imm1_14: general immediate in bits [14].  */
>>       { 15,  1 },	/* imm1_15: general immediate in bits [15].  */
>>       { 16,  1 },	/* imm1_16: general immediate in bits [16].  */
>>       {  0,  2 },	/* imm2_0: general immediate in bits [1:0].  */
>> @@ -344,6 +345,7 @@ const aarch64_field fields[] =
>>       {  8,  2 },	/* imm2_8: general immediate in bits [9:8].  */
>>       { 10,  2 }, /* imm2_10: 2-bit immediate, bits [11:10] */
>>       { 12,  2 }, /* imm2_12: 2-bit immediate, bits [13:12] */
>> +    { 13,  2 }, /* imm2_13: 2-bit immediate, bits [14:13] */
>>       { 15,  2 }, /* imm2_15: 2-bit immediate, bits [16:15] */
>>       { 16,  2 }, /* imm2_16: 2-bit immediate, bits [17:16] */
>>       { 19,  2 }, /* imm2_19: 2-bit immediate, bits [20:19] */
>> @@ -2554,6 +2556,10 @@ operand_general_constraint_met_p (const aarch64_opnd_info *opnds, int idx,
>>         num = get_opcode_dependent_value (opcode);
>>         switch (type)
>>   	{
>> +	case AARCH64_OPND_LVn_LUT:
>> +	  if (!check_reglist (opnd, mismatch_detail, idx, num, 1))
>> +	    return 0;
>> +	  break;
>>   	case AARCH64_OPND_LVt:
>>   	  assert (num >= 1 && num <= 4);
>>   	  /* Unless LD1/ST1, the number of registers should be equal to that
>> @@ -3165,6 +3171,14 @@ operand_general_constraint_met_p (const aarch64_opnd_info *opnds, int idx,
>>   	   and is halfed because complex numbers take two elements.  */
>>   	num = aarch64_get_qualifier_nelem (opnds[0].qualifier)
>>   	      * aarch64_get_qualifier_esize (opnds[0].qualifier) / 2;
>> +      else if (opcode->iclass == lut)
>> +	{
>> +	  size = get_operand_fields_width (get_operand_from_code (type)) - 5;
>> +	  if (!check_reglane (opnd, mismatch_detail, idx, "v", 0, 31,
>> +			      0, (1 << size) - 1))
>> +	    return 0;
>> +	  break;
>> +	}
>>         else
>>   	num = 16;
>>         num = num / aarch64_get_qualifier_esize (qualifier) - 1;
>> @@ -4069,6 +4083,14 @@ aarch64_print_operand (char *buf, size_t size, bfd_vma pc,
>>   		style_imm (styler, "%" PRIi64, opnd->reglane.index));
>>         break;
>>   
>> +    case AARCH64_OPND_Em_INDEX1_14:
>> +    case AARCH64_OPND_Em_INDEX2_13:
>> +    case AARCH64_OPND_Em_INDEX3_12:
>> +      snprintf (buf, size, "%s[%s]",
>> +		style_reg (styler, "v%d", opnd->reglane.regno),
>> +		style_imm (styler, "%" PRIi64, opnd->reglane.index));
>> +      break;
>> +
>>       case AARCH64_OPND_VdD1:
>>       case AARCH64_OPND_VnD1:
>>         snprintf (buf, size, "%s[%s]",
>> @@ -4077,6 +4099,7 @@ aarch64_print_operand (char *buf, size_t size, bfd_vma pc,
>>         break;
>>   
>>       case AARCH64_OPND_LVn:
>> +    case AARCH64_OPND_LVn_LUT:
>>       case AARCH64_OPND_LVt:
>>       case AARCH64_OPND_LVt_AL:
>>       case AARCH64_OPND_LEt:
>> diff --git a/opcodes/aarch64-tbl.h b/opcodes/aarch64-tbl.h
>> index 5b1c8561ac6147e64ba99b6e9fba85ed8ee712c4..6d7aa3d770ad34071bfe67f95d974eaa7b6cdbbd 100644
>> --- a/opcodes/aarch64-tbl.h
>> +++ b/opcodes/aarch64-tbl.h
>> @@ -1004,6 +1004,24 @@
>>     QLF3(V_16B, V_16B, V_16B),	\
>>   }
>>   
>> +/* e.g. luti2 <Vd>.16B, { <Vn>.16B }, <Vm>[index].  */
>> +/* The third operand is an AdvSIMD vector with a bit index
>> +   and without a type qualifier and is checked separately
>> +   based on operand enum.  */
>> +#define QL_VVUB			\
>> +{				\
>> +  QLF3(V_16B , V_16B , NIL),	\
>> +}
>> +
>> +/* e.g. luti2 <Vd>.8H, { <Vn>.8H }, <Vm>[index].  */
>> +/* The third operand is an AdvSIMD vector with a bit index
>> +   and without a type qualifier and is checked separately
>> +   based on operand enum.  */
>> +#define QL_VVUH			\
>> +{				\
>> +  QLF3(V_8H , V_8H , NIL),	\
>> +}
>> +
>>   /* e.g. EXT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>, #<index>.  */
>>   #define QL_VEXT			\
>>   {					\
>> @@ -2669,6 +2687,8 @@ static const aarch64_feature_set aarch64_feature_faminmax_sve2 =
>>     AARCH64_FEATURES (2, FAMINMAX, SVE2);
>>   static const aarch64_feature_set aarch64_feature_faminmax_sme2 =
>>     AARCH64_FEATURES (3, SVE2, FAMINMAX, SME2);
>> +static const aarch64_feature_set aarch64_feature_lut =
>> +  AARCH64_FEATURE (LUT);
>>   
>>   #define CORE		&aarch64_feature_v8
>>   #define FP		&aarch64_feature_fp
>> @@ -2740,6 +2760,7 @@ static const aarch64_feature_set aarch64_feature_faminmax_sme2 =
>>   #define FAMINMAX  &aarch64_feature_faminmax
>>   #define FAMINMAX_SVE2  &aarch64_feature_faminmax_sve2
>>   #define FAMINMAX_SME2  &aarch64_feature_faminmax_sme2
>> +#define LUT &aarch64_feature_lut
>>   
>>   #define CORE_INSN(NAME,OPCODE,MASK,CLASS,OP,OPS,QUALS,FLAGS) \
>>     { NAME, OPCODE, MASK, CLASS, OP, CORE, OPS, QUALS, FLAGS, 0, 0, NULL }
>> @@ -2925,6 +2946,8 @@ static const aarch64_feature_set aarch64_feature_faminmax_sme2 =
>>   #define FAMINMAX_SME2_INSN(NAME,OPCODE,MASK,OPS,QUALS) \
>>     { NAME, OPCODE, MASK, sme_size_22_hsd, 0, FAMINMAX_SME2, OPS, QUALS, \
>>       F_STRICT | 0, 0, 1, NULL }
>> +#define LUT_INSN(NAME,OPCODE,MASK,OPS,QUALS,FLAGS)		\
>> +  { NAME, OPCODE, MASK, lut, 0, LUT, OPS, QUALS, FLAGS, 0, 0, NULL }
>>   
>>   #define MOPS_CPY_OP1_OP2_PME_INSN(NAME, OPCODE, MASK, FLAGS, CONSTRAINTS) \
>>     MOPS_INSN (NAME, OPCODE, MASK, 0, \
>> @@ -4275,6 +4298,11 @@ const struct aarch64_opcode aarch64_opcode_table[] =
>>     FAMINMAX_SME2_INSN ("famax", 0xc120b940, 0xff23ffe3, OP3 (SME_Zdnx4, SME_Zdnx4, SME_Zmx4), OP_SVE_VVV_HSD),
>>     FAMINMAX_SME2_INSN ("famin", 0xc120b141, 0xff21ffe1, OP3 (SME_Zdnx2, SME_Zdnx2, SME_Zmx2), OP_SVE_VVV_HSD),
>>     FAMINMAX_SME2_INSN ("famin", 0xc120b941, 0xff23ffe3, OP3 (SME_Zdnx4, SME_Zdnx4, SME_Zmx4), OP_SVE_VVV_HSD),
>> +  /* AdvSIMD lut.  */
>> +  LUT_INSN ("luti2", 0x4e801000, 0xffe09c00, OP3 (Vd, LVn_LUT, Em_INDEX2_13), QL_VVUB, F_OD(1)),
>> +  LUT_INSN ("luti2", 0x4ec00000, 0xffe08c00, OP3 (Vd, LVn_LUT, Em_INDEX3_12), QL_VVUH, F_OD(1)),
>> +  LUT_INSN ("luti4", 0x4e402000, 0xffe0bc00, OP3 (Vd, LVn_LUT, Em_INDEX1_14), QL_VVUB, F_OD(1)),
>> +  LUT_INSN ("luti4", 0x4e401000, 0xffe09c00, OP3 (Vd, LVn_LUT, Em_INDEX2_13), QL_VVUH, F_OD(2)),
>>     /* Move wide (immediate).  */
>>     CORE_INSN ("movn", 0x12800000, 0x7f800000, movewide, OP_MOVN, OP2 (Rd, HALF), QL_DST_R, F_SF | F_HAS_ALIAS),
>>     CORE_INSN ("mov",  0x12800000, 0x7f800000, movewide, OP_MOV_IMM_WIDEN, OP2 (Rd, IMM_MOV), QL_DST_R, F_SF | F_ALIAS | F_CONV),
>> @@ -6531,12 +6559,20 @@ const struct aarch64_opcode aarch64_opcode_table[] =
>>         "a SIMD vector element")						\
>>       Y(SIMD_ELEMENT, reglane, "Em16", 0, F(FLD_Rm),			\
>>         "a SIMD vector element limited to V0-V15")			\
>> +    Y(SIMD_ELEMENT, reglane, "Em_INDEX1_14", 0, F(FLD_Rm, FLD_imm1_14),	\
>> +      "a SIMD vector without a type qualifier encoding a bit index")	\
>> +    Y(SIMD_ELEMENT, reglane, "Em_INDEX2_13", 0, F(FLD_Rm, FLD_imm2_13),	\
>> +      "a SIMD vector without a type qualifier encoding a bit index")	\
>> +    Y(SIMD_ELEMENT, reglane, "Em_INDEX3_12", 0, F(FLD_Rm, FLD_imm3_12),	\
>> +      "a SIMD vector without a type qualifier encoding a bit index")	\
> 
> I think this is a better fit for simple_index (instead of reglane).  See also
> how the existing SME luti operands work.  Unless I've missed something, using
> simple_index would mean that you don't need to edit the inserter or extractor
> functions.
> 
Yes, you are right. I used simple_index and it worked. I have also 
removed references to added inserters and extractors from the cover 
letter in the new version of this patch here 
https://sourceware.org/pipermail/binutils/2024-May/134230.html.
>>       Y(SIMD_REGLIST, reglist, "LVn", 0, F(FLD_Rn),			\
>>         "a SIMD vector register list")					\
>>       Y(SIMD_REGLIST, ldst_reglist, "LVt", 0, F(),			\
>>         "a SIMD vector register list")					\
>>       Y(SIMD_REGLIST, ldst_reglist_r, "LVt_AL", 0, F(),			\
>>         "a SIMD vector register list")					\
>> +    Y(SIMD_REGLIST, lut_reglist, "LVn_LUT", 0, F(FLD_Rn),		\
>> +      "a SIMD vector register list")					\
>>       Y(SIMD_REGLIST, ldst_elemlist, "LEt", 0, F(),			\
>>         "a SIMD vector element list")					\
>>       Y(IMMEDIATE, imm, "CRn", 0, F(FLD_CRn),				\
>