As recently discussed on the Linux kernel mailing list, in gas the semantics for choosing the i386 NOP sequence are incorrect and will cause unexpected failures. They should chosen depending on march where new instructions can be used and not mtune where new instructions must not be used. http://thread.gmane.org/gmane.linux.kernel/730834/focus=731161 There is also another related issue with i686 class CPUs. Some do not actually support the NOPL instruction. Examples include Via C3, Via Eden, AMD Geode LX (as used in OLPC), Transmeta Crusoe and it appears broken on Virtual PC. Changing i686 to use f32_patt instead alt_long_patt allows binaries created with -march=i686 to run on all i686 class CPUs with only a minor optimization loss versus the gain from CMOV optimisation. The attached patch corrects both these issues.
Created attachment 2995 [details] binutils-cvs-fix-i386-nop.patch
Arch may be changed by the .arch directive. However, when we generate NOPs, we lost the .arch information. We need to know the arch for NOP, which may not be necessarily the same as the one specified by -march=.
But currently the only arch flags checked in i386_align_code was cpui686 which is where we should disable use of NOPL for mentioned reasons.
(In reply to comment #3) > But currently the only arch flags checked in i386_align_code was cpui686 which > is where we should disable use of NOPL for mentioned reasons. If you want to run the binary on CPU which doesn't support all i686 instructions, you shouldn't pass -march=i686 to assembler.
By default, assembler assumes CPU takes all instructions supported by assembler. You can use -march=XXX to limit instructions to only those supported by XXX.
Created attachment 2997 [details] binutils-cvs-fix2-i386-nop.patch Add bitfield checking back to patch for .arch settings. First point, do you not first agree that choosing instructions based on mtune is seriously broken? The other point here is about compatibility. Yes, these CPUs are missing this one instruction but since most builds these days are i686 by default this will break on these systems for users. It's only been fortunate that GCC strips out mtune options to gas. The optimisation lost is minor. I'm obviously an unknown person but the link given shows Linus Torvalds opinion on this which may or not mean much to you.
(In reply to comment #6) > Created an attachment (id=2997) > binutils-cvs-fix2-i386-nop.patch > > Add bitfield checking back to patch for .arch settings. > > First point, do you not first agree that choosing instructions based on mtune > is seriously broken? > > The other point here is about compatibility. Yes, these CPUs are missing this > one instruction but since most builds these days are i686 by default this will > break on these systems for users. It's only been fortunate that GCC strips out > mtune options to gas. The optimisation lost is minor. > > I'm obviously an unknown person but the link given shows Linus Torvalds opinion > on this which may or not mean much to you. For assembler, PROCESSOR_UNKNOWN means CPU will take any instructions, including those new NOPs. If you want to limit NOPs to certain ISA, XXX, you need to use -match=XXX/-mtune=XXX. I checked in a patch http://sourceware.org/ml/binutils/2008-10/msg00124.html Now you can use the .arch directive to limit NOPs to XXX.
(In reply to comment #6) > Created an attachment (id=2997) > binutils-cvs-fix2-i386-nop.patch > > Add bitfield checking back to patch for .arch settings. > > First point, do you not first agree that choosing instructions based on mtune > is seriously broken? > I agree. If it is true, assembler is broken. If you can show me a small testcase, I will fix it.
Sorry but how can PROCESSOR_UNKNOWN mean it can run any instructions? It's unknown! Therefore we must use lowest common denominator to expect it to work on all CPUs. Tune means scheduling. Arch means instructions. Test case. Pass -march=i486 and -mtune=i686 and it'll use alt_long_patt which won't work on i486. Fail.
(In reply to comment #9) > Sorry but how can PROCESSOR_UNKNOWN mean it can run any instructions? It's > unknown! Therefore we must use lowest common denominator to expect it to work on > all CPUs. If we do that, x86 assembler won't take any SSE instructions by default. It will break gcc. > Tune means scheduling. > Arch means instructions. That is true. > > Test case. Pass -march=i486 and -mtune=i686 and it'll use alt_long_patt which > won't work on i486. Fail. Where is the testcase?
This testcase works for me: [hjl@gnu-6 tmp]$ cat nops.s .text nop: movsbl %al,%esi .p2align 4 [hjl@gnu-6 tmp]$ as --32 -o nops.o nops.s -march=i486 -mtune=i686 [hjl@gnu-6 tmp]$ objdump -dw nops.o nops.o: file format elf32-i386 Disassembly of section .text: 00000000 <nop>: 0: 0f be f0 movsbl %al,%esi 3: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 9: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi [hjl@gnu-6 tmp]$ as --32 -o nops.o nops.s -march=i686 -mtune=i686 [hjl@gnu-6 tmp]$ objdump -dw nops.o nops.o: file format elf32-i386 Disassembly of section .text: 00000000 <nop>: 0: 0f be f0 movsbl %al,%esi 3: 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:0x0(%eax,%eax,1) [hjl@gnu-6 tmp]$
Created attachment 4835 [details] testcase.s
(In reply to comment #8) > I agree. If it is true, assembler is broken. If you can show me > a small testcase, I will fix it. I can confirm the bug and I'd like to take you up on that offer, but I may have fallen short of the *small* test case :) I attached testcase.s which is some assembly caught from a gcc invokation inside glibc's compile process. It's 1300 lines (sorry!) but does confirm the usage of the nopl instruction when -mtune=i686 is passed : $ as -Qy -o foo testcase.s $ objdump -d foo | grep nopl $ as -Qy -mtune=i686 -o foo testcase.s $ objdump -d foo | grep nopl 3d: 0f 1f 00 nopl (%eax) Reproduced on both: GNU assembler version 2.19.51.0.14-37.fc12 20090722 GNU assembler version 2.20.51.0.2-17.fc14 20091009 Someone more savvy with assembly code could probably cut down my test case into something more trivial with a small amount of effort - I'm a bit out of my comfort zone.
(In reply to comment #12) > Created an attachment (id=4835) > testcase.s > Works for me: [hjl@gnu-6 tmp]$ as -Qy -mtune=i686 -o foo testcase.s -march=i486 --32 [hjl@gnu-6 tmp]$ objdump -d foo | grep nopl [hjl@gnu-6 tmp]$ You need -march=i486 to disable instructions.
Created attachment 4836 [details] testcase2.s Duh. The previous testcase.s was generated using "gcc -g" so its naturally huge. Taking off the -g results in a much smaller (95 line) test case which shows the issue in an identical manner.
Created attachment 4837 [details] testcase3.s One more. By deleting (more or less) random lines from testcase2.s I came up with a 9-line test case which produces 2 nopl instructions when assembled with -mtune=i686. $ as -mtune=i686 -o foo testcase3.s $ objdump -d foo Disassembly of section .text: 00000000 <.text>: 0: 55 push %ebp 1: 5d pop %ebp 2: c3 ret 3: 0f 1f 44 00 00 nopl 0x0(%eax,%eax,1) 8: c3 ret 9: 0f 1f 80 00 00 00 00 nopl 0x0(%eax)
Oops, sorry for the test case spam. I did not see your comment. I understand now. "as -mtune=i686" basically works out to be "as -march=i686 -mtune=i686" Hence generating i686-specific instructions is acceptable/expected. My confusion stemmed from the fact that this is a minor behavioural difference from the equivalently named parameters to gcc. (maybe this is why gcc does not pass those parameters down to the assembler by default)
That does not work for me. $ cat nopl.s cmove %eax,%edx .align 8 $ as --32 -mtune=i686 nopl.s $ objdump -dr | grep nopl 3: 0f 1f 44 00 00 nopl 0x0(%eax,%eax,1) $ as --32 -mtune=i686 -march=i486 nopl.s nopl.s: Assembler messages: nopl.s:1: Error: `cmove' is not supported on `i486' There should be a way to set -mtune while letting -march alone.
(In reply to comment #18) > That does not work for me. > > $ cat nopl.s > cmove %eax,%edx > .align 8 > $ as --32 -mtune=i686 nopl.s > $ objdump -dr | grep nopl > 3: 0f 1f 44 00 00 nopl 0x0(%eax,%eax,1) > $ as --32 -mtune=i686 -march=i486 nopl.s > nopl.s: Assembler messages: > nopl.s:1: Error: `cmove' is not supported on `i486' > > There should be a way to set -mtune while letting -march alone. Please state exactly what you have in mind. From $ cat nopl.s cmove %eax,%edx .align 8 you want i686 instructions, but you don't want to tune for i686. Is that correct?
I want that the assembler does not sneak in nopl.
(In reply to comment #20) > I want that the assembler does not sneak in nopl. Then you should use -mtune=i586 or an option not to tune for i686.
But the docs say that -mtune only uses non-generic insns when -march is also given.
-mtune uses all available instructions. By default, all instructions are enabled.
nopl is not always available on i686.
From Intel IA32/Intel64 SDM: The multi-byte form of NOP is available on processors with model encoding: • CPUID.01H.EAX[Bytes 11:8] = 0110B or 1111B Intel i686 has 111B.
What does that say about other vender's models?
Fixed by http://sourceware.org/ml/binutils-cvs/2010-08/msg00057.html
Still broken. $ cat nopl.s nop .align 8 $ gas/as-new --32 -mtune=i686 nopl.s $ objdump -d a.out: file format elf32-i386 Disassembly of section .text: 00000000 <.text>: 0: 90 nop 1: 0f 1f 80 00 00 00 00 nopl 0x0(%eax)
Fix: [rfc] nopl should not be output on -mtune=i686 http://sourceware.org/ml/binutils/2011-02/msg00070.html
CVSROOT: /cvs/src Module name: src Changes by: hjl@sourceware.org 2011-02-08 20:21:26 Modified files: gas : ChangeLog gas/config : tc-i386.c gas/testsuite : ChangeLog gas/testsuite/gas/i386: nops-1-i686.d nops-3-i686.d nops-4-i686.d Log message: Use f32_patt in i386_align_code when tuning for i686. gas/ 2011-02-08 H.J. Lu <hongjiu.lu@intel.com> PR gas/6957 * config/tc-i386.c (i386_align_code): Use f32_patt when tuning for i686. gas/testsuite/ 2011-02-08 H.J. Lu <hongjiu.lu@intel.com> PR gas/6957 * gas/i386/nops-1-i686.d: Updated. * gas/i386/nops-3-i686.d: Likewise. * gas/i386/nops-4-i686.d: Likewise. Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/src/gas/ChangeLog.diff?cvsroot=src&r1=1.4399&r2=1.4400 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gas/config/tc-i386.c.diff?cvsroot=src&r1=1.462&r2=1.463 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gas/testsuite/ChangeLog.diff?cvsroot=src&r1=1.1846&r2=1.1847 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gas/testsuite/gas/i386/nops-1-i686.d.diff?cvsroot=src&r1=1.4&r2=1.5 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gas/testsuite/gas/i386/nops-3-i686.d.diff?cvsroot=src&r1=1.3&r2=1.4 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gas/testsuite/gas/i386/nops-4-i686.d.diff?cvsroot=src&r1=1.3&r2=1.4
Fixed.
CVSROOT: /cvs/src Module name: src Branch: binutils-2_21-branch Changes by: gingold@sourceware.org 2011-03-23 16:00:21 Modified files: gas : ChangeLog gas/config : tc-i386.c gas/testsuite : ChangeLog gas/testsuite/gas/i386: nops-1-i686.d nops-3-i686.d nops-4-i686.d Log message: Backport of: Changes by: hjl@sourceware.org 2011-02-08 20:21:26 Modified files: gas : ChangeLog gas/config : tc-i386.c gas/testsuite : ChangeLog gas/testsuite/gas/i386: nops-1-i686.d nops-3-i686.d nops-4-i686.d Log message: Use f32_patt in i386_align_code when tuning for i686. gas/ 2011-02-08 H.J. Lu <hongjiu.lu@intel.com> PR gas/6957 * config/tc-i386.c (i386_align_code): Use f32_patt when tuning for i686. gas/testsuite/ 2011-02-08 H.J. Lu <hongjiu.lu@intel.com> PR gas/6957 * gas/i386/nops-1-i686.d: Updated. * gas/i386/nops-3-i686.d: Likewise. * gas/i386/nops-4-i686.d: Likewise. Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/src/gas/ChangeLog.diff?cvsroot=src&only_with_tag=binutils-2_21-branch&r1=1.4320.2.22&r2=1.4320.2.23 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gas/config/tc-i386.c.diff?cvsroot=src&only_with_tag=binutils-2_21-branch&r1=1.452&r2=1.452.2.1 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gas/testsuite/ChangeLog.diff?cvsroot=src&only_with_tag=binutils-2_21-branch&r1=1.1802.2.7&r2=1.1802.2.8 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gas/testsuite/gas/i386/nops-1-i686.d.diff?cvsroot=src&only_with_tag=binutils-2_21-branch&r1=1.4&r2=1.4.2.1 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gas/testsuite/gas/i386/nops-3-i686.d.diff?cvsroot=src&only_with_tag=binutils-2_21-branch&r1=1.3&r2=1.3.2.1 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gas/testsuite/gas/i386/nops-4-i686.d.diff?cvsroot=src&only_with_tag=binutils-2_21-branch&r1=1.3&r2=1.3.2.1
*** Bug 14066 has been marked as a duplicate of this bug. ***