Common SSE4.1/SSE5 insns broken

H.J. Lu hjl@lucon.org
Fri Dec 28 15:45:00 GMT 2007


On Fri, Dec 28, 2007 at 10:10:34AM +0100, Jakub Jelinek wrote:
> Hi!
> 
> Doesn't CpuSSE4_1|CpuSSE5 mean it requires SSE4.1 AND SSE5 rather than
> SSE4.1 OR SSE5?
> 
> ptest, 2, 0x660f3817, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
> roundpd, 3, 0x660f3a09, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
> roundps, 3, 0x660f3a08, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
> roundsd, 3, 0x660f3a0b, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
> roundss, 3, 0x660f3a0a, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
> 
> Say e.g.:
> 
> .arch generic64
> .arch .sse5
> ptest           %xmm1,%xmm0
> frczss          %xmm2, %xmm1
> 
> fails to assemble with
> Warning: `ptest' is not supported on `generic64.sse5'
> Error: suffix or operands invalid for `ptest'
> 
> and likewise for .arch .sse4.1.  Works if both .sse5 and .sse4.1
> are present.  Do we need yet another bit for the common
> SSE4.1 / SSE5 instructions, which .sse4.1, .sse5 would
> both set (and be set in unknown too)?
> 

I am checking in this patch to fix it.

Thanks.


H.J.
----
gas/testsuite/

2007-12-28  H.J. Lu  <hongjiu.lu@intel.com>

	* gas/i386/arch-1.d: New file.
	* gas/i386/arch-1.s: Likewise.
	* gas/i386/arch-2.d: Likewise.
	* gas/i386/arch-2.s: Likewise.
	* gas/i386/arch-3.d: Likewise.
	* gas/i386/arch-3.s: Likewise.
	* gas/i386/arch-4.d: Likewise.
	* gas/i386/arch-4.s: Likewise.

	* gas/i386/i386.exp: Run arch-1, arch-2, arch-3 and arch-4.

opcodes/

2007-12-28  H.J. Lu  <hongjiu.lu@intel.com>

	* i386-gen.c (cpu_flag_init): Add CpuSSE4_1_Or_5 to
	CPU_SSE4_1_FLAGS, CPU_SSE4_2_FLAGS and CPU_SSE5_FLAGS.
	(cpu_flags): Add CpuSSE4_1_Or_5.

	* i386-init.h: Regenerated.
	* i386-tbl.h: Likewise.

	* i386-opc.h (CpuSSE4_1_Or_5): New.
	(CpuLM): Updated.
	(i386_cpu_flags): Add cpusse4_1_or_5.

	* i386-opc.tbl: Use CpuSSE4_1_Or_5 instead of CpuSSE4_1|CpuSSE5
	on ptest roundpd, roundps, roundsd and roundss.

--- binutils/gas/testsuite/gas/i386/arch-1.d.arch	2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-1.d	2007-12-28 07:42:49.000000000 -0800
@@ -0,0 +1,15 @@
+#objdump: -dw
+#name: i386 arch 1
+
+.*:     file format .*
+
+Disassembly of section .text:
+
+0+ <.text>:
+[ 	]*[a-f0-9]+:	66 0f 38 17 c1       	ptest  %xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 09 c1 00    	roundpd \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 08 c1 00    	roundps \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 0b c1 00    	roundsd \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 0a c1 00    	roundss \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 38 41 d9       	phminposuw %xmm1,%xmm3
+#pass
--- binutils/gas/testsuite/gas/i386/arch-1.s.arch	2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-1.s	2007-12-28 07:41:44.000000000 -0800
@@ -0,0 +1,9 @@
+# Test .arch .sse4.1
+.arch generic32
+.arch .sse4.1
+ptest		%xmm1,%xmm0
+roundpd		$0,%xmm1,%xmm0
+roundps		$0,%xmm1,%xmm0
+roundsd		$0,%xmm1,%xmm0
+roundss		$0,%xmm1,%xmm0
+phminposuw	%xmm1,%xmm3
--- binutils/gas/testsuite/gas/i386/arch-2.d.arch	2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-2.d	2007-12-28 07:42:59.000000000 -0800
@@ -0,0 +1,15 @@
+#objdump: -dw
+#name: i386 arch 2
+
+.*:     file format .*
+
+Disassembly of section .text:
+
+0+ <.text>:
+[ 	]*[a-f0-9]+:	66 0f 38 17 c1       	ptest  %xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 09 c1 00    	roundpd \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 08 c1 00    	roundps \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 0b c1 00    	roundsd \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 0a c1 00    	roundss \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	f2 0f 38 f1 d9       	crc32l %ecx,%ebx
+#pass
--- binutils/gas/testsuite/gas/i386/arch-2.s.arch	2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-2.s	2007-12-28 07:41:48.000000000 -0800
@@ -0,0 +1,9 @@
+# Test .arch .sse4.2
+.arch generic32
+.arch .sse4.2
+ptest		%xmm1,%xmm0
+roundpd		$0,%xmm1,%xmm0
+roundps		$0,%xmm1,%xmm0
+roundsd		$0,%xmm1,%xmm0
+roundss		$0,%xmm1,%xmm0
+crc32		%ecx,%ebx
--- binutils/gas/testsuite/gas/i386/arch-3.d.arch	2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-3.d	2007-12-28 07:43:08.000000000 -0800
@@ -0,0 +1,15 @@
+#objdump: -dw
+#name: i386 arch 3
+
+.*:     file format .*
+
+Disassembly of section .text:
+
+0+ <.text>:
+[ 	]*[a-f0-9]+:	66 0f 38 17 c1       	ptest  %xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 09 c1 00    	roundpd \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 08 c1 00    	roundps \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 0b c1 00    	roundsd \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 0a c1 00    	roundss \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	f2 0f 38 f1 d9       	crc32l %ecx,%ebx
+#pass
--- binutils/gas/testsuite/gas/i386/arch-3.s.arch	2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-3.s	2007-12-28 07:41:53.000000000 -0800
@@ -0,0 +1,9 @@
+# Test .arch .sse4
+.arch generic32
+.arch .sse4
+ptest		%xmm1,%xmm0
+roundpd		$0,%xmm1,%xmm0
+roundps		$0,%xmm1,%xmm0
+roundsd		$0,%xmm1,%xmm0
+roundss		$0,%xmm1,%xmm0
+crc32		%ecx,%ebx
--- binutils/gas/testsuite/gas/i386/arch-4.d.arch	2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-4.d	2007-12-28 07:43:16.000000000 -0800
@@ -0,0 +1,15 @@
+#objdump: -dw
+#name: i386 arch 4
+
+.*:     file format .*
+
+Disassembly of section .text:
+
+0+ <.text>:
+[ 	]*[a-f0-9]+:	66 0f 38 17 c1       	ptest  %xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 09 c1 00    	roundpd \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 08 c1 00    	roundps \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 0b c1 00    	roundsd \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	66 0f 3a 0a c1 00    	roundss \$0x0,%xmm1,%xmm0
+[ 	]*[a-f0-9]+:	0f 7a 12 ca          	frczss %xmm2,%xmm1
+#pass
--- binutils/gas/testsuite/gas/i386/arch-4.s.arch	2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-4.s	2007-12-28 07:41:59.000000000 -0800
@@ -0,0 +1,9 @@
+# Test .arch .sse5
+.arch generic32
+.arch .sse5
+ptest           %xmm1,%xmm0
+roundpd		$0,%xmm1,%xmm0
+roundps		$0,%xmm1,%xmm0
+roundsd		$0,%xmm1,%xmm0
+roundss		$0,%xmm1,%xmm0
+frczss          %xmm2, %xmm1
--- binutils/gas/testsuite/gas/i386/i386.exp.arch	2007-12-23 21:28:11.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/i386.exp	2007-12-28 07:29:28.000000000 -0800
@@ -98,6 +98,10 @@ if [expr ([istarget "i*86-*-*"] ||  [ist
     run_dump_test "i386"
     run_dump_test "compat"
     run_dump_test "compat-intel"
+    run_dump_test "arch-1"
+    run_dump_test "arch-2"
+    run_dump_test "arch-3"
+    run_dump_test "arch-4"
 
     # These tests require support for 8 and 16 bit relocs,
     # so we only run them for ELF and COFF targets.
--- binutils/opcodes/i386-gen.c.arch	2007-12-23 21:28:11.000000000 -0800
+++ binutils/opcodes/i386-gen.c	2007-12-28 07:08:45.000000000 -0800
@@ -93,9 +93,9 @@ static initializer cpu_flag_init [] =
   { "CPU_SSSE3_FLAGS",
     "CpuMMX|CpuMMX2|CpuSSE|CpuSSE2|CpuSSE3|CpuSSSE3" },
   { "CPU_SSE4_1_FLAGS",
-    "CpuMMX|CpuMMX2|CpuSSE|CpuSSE2|CpuSSE3|CpuSSSE3|CpuSSE4_1" },
+    "CpuMMX|CpuMMX2|CpuSSE|CpuSSE2|CpuSSE3|CpuSSSE3|CpuSSE4_1|CpuSSE4_1_Or_5" },
   { "CPU_SSE4_2_FLAGS",
-    "CpuMMX|CpuMMX2|CpuSSE|CpuSSE2|CpuSSE3|CpuSSSE3|CpuSSE4_1|CpuSSE4_2" },
+    "CpuMMX|CpuMMX2|CpuSSE|CpuSSE2|CpuSSE3|CpuSSSE3|CpuSSE4_1|CpuSSE4_2|CpuSSE4_1_Or_5" },
   { "CPU_3DNOW_FLAGS",
     "CpuMMX|Cpu3dnow" },
   { "CPU_3DNOWA_FLAGS",
@@ -109,7 +109,7 @@ static initializer cpu_flag_init [] =
   { "CPU_ABM_FLAGS",
     "CpuABM" },
   { "CPU_SSE5_FLAGS",
-    "CpuMMX|CpuMMX2|CpuSSE|CpuSSE2|CpuSSE3|CpuSSE4a|CpuABM|CpuSSE5"}
+    "CpuMMX|CpuMMX2|CpuSSE|CpuSSE2|CpuSSE3|CpuSSE4a|CpuABM|CpuSSE5|CpuSSE4_1_Or_5"}
 };
 
 static initializer operand_type_init [] =
@@ -234,6 +234,7 @@ static bitfield cpu_flags[] =
   BITFIELD (CpuSSE4_2),
   BITFIELD (CpuSSE4a),
   BITFIELD (CpuSSE5),
+  BITFIELD (CpuSSE4_1_Or_5),
   BITFIELD (Cpu3dnow),
   BITFIELD (Cpu3dnowA),
   BITFIELD (CpuPadLock),
--- binutils/opcodes/i386-opc.h.arch	2007-12-23 21:28:11.000000000 -0800
+++ binutils/opcodes/i386-opc.h	2007-12-28 07:04:57.000000000 -0800
@@ -82,8 +82,10 @@
 #define CpuSSE4_2	(CpuSSE4_1 + 1)
 /* SSE5 support required */
 #define CpuSSE5		(CpuSSE4_2 + 1)
+/* SSE4.1 or SSE5 support required */
+#define CpuSSE4_1_Or_5	(CpuSSE5 + 1)
 /* 64bit support available, used by -march= in assembler.  */
-#define CpuLM		(CpuSSE5 + 1)
+#define CpuLM		(CpuSSE4_1_Or_5 + 1)
 /* 64bit support required  */
 #define Cpu64		(CpuLM + 1)
 /* Not supported in the 64bit mode  */
@@ -132,6 +134,7 @@ typedef union i386_cpu_flags
       unsigned int cpusse4_1:1;
       unsigned int cpusse4_2:1;
       unsigned int cpusse5:1;
+      unsigned int cpusse4_1_or_5:1;
       unsigned int cpulm:1;
       unsigned int cpu64:1;
       unsigned int cpuno64:1;
--- binutils/opcodes/i386-opc.tbl.arch	2007-12-23 21:28:11.000000000 -0800
+++ binutils/opcodes/i386-opc.tbl	2007-12-28 07:07:07.000000000 -0800
@@ -1373,11 +1373,11 @@ pmovzxwq, 2, 0x660f3834, None, 3, CpuSSE
 pmovzxdq, 2, 0x660f3835, None, 3, CpuSSE4_1, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
 pmuldq, 2, 0x660f3828, None, 3, CpuSSE4_1, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
 pmulld, 2, 0x660f3840, None, 3, CpuSSE4_1, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
-ptest, 2, 0x660f3817, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
-roundpd, 3, 0x660f3a09, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
-roundps, 3, 0x660f3a08, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
-roundsd, 3, 0x660f3a0b, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
-roundss, 3, 0x660f3a0a, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
+ptest, 2, 0x660f3817, None, 3, CpuSSE4_1_Or_5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
+roundpd, 3, 0x660f3a09, None, 3, CpuSSE4_1_Or_5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
+roundps, 3, 0x660f3a08, None, 3, CpuSSE4_1_Or_5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
+roundsd, 3, 0x660f3a0b, None, 3, CpuSSE4_1_Or_5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
+roundss, 3, 0x660f3a0a, None, 3, CpuSSE4_1_Or_5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
 
 // SSE4.2 instructions.
 



More information about the Binutils mailing list