[RFC] x86: proposal for a new .insn directive

Fri Jan 13 11:58:16 GMT 2023

All,

certain other architectures (Arm, RISC-V) have such, and x86 would imo
benefit from such even more: It is notoriously difficult to encode new
insns with operands which a certain version of gas doesn't support yet.
This is in particular related to the building of the ModR/M and SIB
bytes as well the VEX/XOP/EVEX prefixes.

I would appreciate feedback on the proposal (in form of an assembly
source file, providing examples at the same time). Besides pointing
out issues / oversights, thoughts on the various TBDs would be helpful.

Thanks, Jan

	.text
insn:

#	.insn [<prefix>] [<encoding>] <major-opcode>[+r|/<extension>] [,<operand>[,...]]

# Legacy encoding prefixes altering encoding space (0x0f, 0x0f38, 0x0f3a)
# have to be specified as high byte(s) of <major-opcode>. This also extends
# to certain FPU opcodes or sub-spaces like that of major opcode 0x0f01.

# Legacy encoding prefixes altering meaning (0x66, 0xF2, 0xF3) may be
# specified as high byte of <major-opcode> (perhaps already including an
# encoding space prefix). Other prefixes should be spelled out as usual
# ahead of <major-opcode> or, for segment overrides, with the memory
# operand.

# Operand order may not match that of the instruction actually being
# expressed: While for a memory operand (of which there can be only one) it
# is clear how to encode it in the resulting ModR/M byte, register operands
# are encoded strictly in the order
# - ModR/M.rm, ModR/M.reg for 2-operand insns,
# - ModR/M.rm, {E,}VEX.vvvv, ModR/M.reg for 3-operand insns, and
# - Imm{4,5}, ModR/M.rm, {E,}VEX.vvvv, ModR/M.reg for 4-operand insns,
# obviously with the ModR/M.rm slot skipped when there is a memory operand,
# and obviously with the ModR/M.reg slot skipped when there is an extension
# opcode. (For Intel syntax of course all in the opposite order.)

# Immediate operands (including immediate-like displacements, i.e. when not
# part of ModR/M addressing) should be specified by separate .byte / .word /
# .long / .quad (or alike) directives.
# TBD: How to deal with this for RIP-relative addressing?
# TBD: How to deal with this for 4-operand insns?

# When register operand size varies for an actual insn (like e.g. for MOVZX or
# VPMOVZX*), registers nevertheless need spelling out in a uniform manner, such
# that any of them could be used to derive operand size attributes (e.g.
# operand size prefix, REX.W, VEX.W, or VEX.L) as well as the EVEX Disp8
# scaling factor.
# TBD: Could also go from largest operand size, albeit that may end up confusing
#      in AT&T mode, where memory operands don't have size, yet the memory
#      operand may have larger size than the register one(s) (and would hence be
#      the one which the <len> attribute - see below - needs deriving from).

# For VEX / XOP / EVEX <encoding> is arranged like this:
# {VEX,XOP,EVEX}[.<len>][.<prefix>][.<space>][.<w>]
# where
# - <len> can be LIG, 128, 256, or (EVEX only) 512 as well as L0/L1 for
#   VEX / XOP and L0-L3 for EVEX,
# - <prefix> can be NP, 66, F3, or F2,
# - <space> can be
#   - 0f, 0f38, 0f3a, or M0...M31 for VEX,
#   - 08...3f (hex) for XOP,
#   - 0f, 0f38, 0f3a, or M0...M15 for EVEX,
# - <w> can be WIG, W0, or W1.
# Omitted <len> means "infer from operand size" if there is at least one
# sized operand, or LIG otherwise.
# Omitted <prefix> means NP.
# Omitted <space> implies encoding is taken from <major-opcode>.
# Omitted <w> means "infer from GPR operand size" if there is at least
# one GPR operand, or WIG otherwise.

# TBD: Is operand order being dependent on AT&T vs Intel syntax okay?

	.insn 0x90					# nop
	.insn 0xf390					# pause
	.insn rep 0x90					# pause
	.insn 0xd9c9					# fxch
	.insn 0xf30f01d9				# vmgexit

	.insn 0x89, %ecx, %eax				# mov %ecx, %eax
	.insn 0x89, %ax, %cx				# mov %ax, %cx

	.insn 0x8b, (%eax), %ecx			# mov (%eax), %ecx

	.insn 0x0fc8+r, %edx				# bswap %edx

	.insn lock 0x80/0, %fs:(%eax); .byte 1		# lock addb $1, %fs:(%eax)

1:
	.insn 0xe2; .byte 1b-.-1			# loop 1b
	.insn 0xc7f8; .long 1b-.-4			# xbegin 1b

	.insn 0x0fb6, %ax, %cx				# movzx %al, %cx
	.insn 0x0fb7, %eax, %ecx			# movzx %ax, %ecx

	.insn VEX.66.0F 0x58, %xmm0, %xmm1, %xmm2	# vaddpd %xmm0, %xmm1, %xmm2
	.insn VEX.66 0x0f58, %ymm0, %ymm1, %ymm2	# vaddpd %ymm0, %ymm1, %ymm2
	.insn VEX.LIG.F3.0F 0x58, %xmm0, %xmm1, %xmm2	# vaddss %xmm0, %xmm1, %xmm2

	.insn VEX.66.0F3A.W0 0x68, %xmm0, %xmm1, (%edx), %xmm3		# vfmaddps %xmm0, %xmm1, (%edx), %xmm3
	.insn VEX.66.0F3A.W1 0x68, %xmm0, %xmm1, (%edx), %xmm3		# vfmaddps %xmm0, %xmm1, %xmm3, (%edx)
	.insn VEX.66.0F3A.W1 0x68, %xmm0, %xmm1, %xmm2, (%ebx)		# vfmaddps %xmm0, %xmm1, %xmm2, (%ebx)

	.insn VEX.66.0F3A.W0 0x48, $0, %xmm0, %xmm1, (%edx), %xmm3	# vpermil2ps $0, %xmm0, %xmm1, (%edx), %xmm3
	.insn VEX.66.0F3A.W1 0x48, $1, %xmm0, %xmm1, (%edx), %xmm3	# vpermil2ps $1, %xmm0, %xmm1, %xmm3, (%edx)
	.insn VEX.66.0F3A.W1 0x48, $2, %xmm0, %xmm1, %xmm2, (%ebx)	# vpermil2ps $2, %xmm0, %xmm1, %xmm2, (%ebx)

	.insn VEX.L0.0F.W0 0x93, %eax, %k0		# kmovw %eax, %k0

	.insn VEX.256.0F.WIG 0x77			# vzeroall

	.insn EVEX.NP.0F.W0 0x58, {rn-sae}, %zmm0, %zmm1, %zmm2		# vaddps {rn-sae}, %zmm0, %zmm1, %zmm2
	.insn EVEX.66.0F.W1 0x58, 8(%eax){1to8}, %zmm1, %zmm2{%k2}{z}	# vaddpd 8(%eax){1to8}, %zmm0, %zmm1{%k2}{z}

# TBD: How to specify the Disp8 scaling factor here? (In Intel syntax we can simply
#      use memory operand size.)
	.insn EVEX.66.0F38.W0 0x88, 4(%eax), %ymm1	# vexpandps 4(%eax), %ymm1