Hi, I'm Soomin Kim from KAIST SoftSec Lab. We are reporting two x86-64 assembler bugs we found, which are all relevant to Intel assembly syntax. The bugs were discovered while we manipulated the label names of toy assembly programs. -------------------------------- The first bug: ``` $ cat ./variant1.s .intel_syntax noprefix .text or: ret call or $ as -msyntax=intel -o ./variant1.o ./variant1.s ./variant1.s: Assembler messages: ./variant1.s:5: Error: invalid use of operator "or" ``` `GNU as` rejects this program because of the token `or`. Note that this program is generated from the below assembly program by changing the label name: ``` $ cat ./normal1.s .intel_syntax noprefix .text LABEL: ret call LABEL $ as -msyntax=intel -o ./normal1.o ./normal1.s ``` Unlike `variant1.s`, `GNU as` can compile this program. However, it was indeed hard for me to find on the Internet why the name (`or`) matters. For example, a Wikipedia webpage (https://en.wikipedia.org/wiki/X86_assembly_language) lists several keywords but does not include `or`. Surprisingly, `or` does not raise a problem in AT&T syntax. Please refer to the below program: ``` $ cat ./variant2.s .text or: ret call or $ as -msyntax=att -o ./variant2.o ./variant2.s ``` We thought this is a bug of `GNU as` because (1) the one written in AT&T was accepted by `GNU as`, and (2) there are no reasons to reject the case. Other usages of `or` (an instruction mnemonic, for example) cannot be applied to the argument of `call` instruction, and clearly there is a definition of the label `or`. -------------------------------- The second bug: ``` $ cat ./variant1.s .intel_syntax noprefix .data rsp: .long 1 .long 2 .long 3 .long 4 .text lea rax, [rsp] // rsp here is intended to refer to a pointer in .data section $ as -msyntax=intel -o ./variant1.o ./variant1.s $ objdump -d ./variant1.o ./variant1.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <.text>: 0: 48 8d 04 24 lea (%rsp),%rax ``` This bug is somewhat similar to the first bug, but has a different aspect. We'd better show the original assembly program to make it easy to understand this bug. ``` $ cat ./normal.s .intel_syntax noprefix .data LABEL: .long 1 .long 2 .long 3 .long 4 .text lea rax, [LABEL] $ as -msyntax=intel -o ./normal1.o ./normal1.s ``` The code semantics of the original program is loading the pointer LABEL to the register `rax`. However, after we change the name of the label to `rsp`, which is an existing register name, the resulting program certainly has different code semantics. The binary code from `GNU as` moves a value stored in the register `rsp` to `rax`. The problem here is that even though there is an ambiguity in choosing the right target between the label `rsp` and the register `rsp`, `GNU as` randomly chooses one of them, so the program has an unintended behavior. Likewise, this issue will never happen with AT&T syntax. Please refer to the below code: ``` $ cat ./variant2.s .data rsp: .long 1 .long 2 .long 3 .long 4 .text leaq (rsp), %rax $ as -msyntax=att -o ./variant2.o ./variant2.s $ objdump -d ./variant2.o ./variant2.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <.text>: 0: 48 8d 04 25 00 00 00 00 lea 0x0,%rax ``` The label `rsp` is successfully transformed into a relocation entry in the object file. -------------------------------- We have seen two different situations where the names of labels can make `GNU as` confused. We thought these are very interesting, as it is rather hard to strictly say that `GNU as` is wrong. We think there are two possibilities: (1) Intel syntax rejects the use of an opcode name as a label, or (2) `GNU as` just mishandles the label. In one sense, the ambiguity of Intel syntax (due to the absence of an official Intel assembly syntax manual) is the problem. For decades, many assemblers have been developed ad-hoc without any standards. So, it seems to be a hard decision problem to allow/deny several tokens or to choose the right usage. On the other hand, `GNU as` need to handle both two cases. They may reduce the usability and correctness of `GNU as`. A user might want to write a function named `or`, but get rejected by `GNU as`. A user might want to load a data pointer named `rsp`, but the resulting program loads a stack pointer, which can differ from the user's intention. We suggest that `GNU as` should compile the first case, and `GNU as` should *not* compile the second case or should raise the alarm for the one.
Also this: ``` E:\lh_mouse\Desktop>as --version GNU assembler (GNU Binutils) 2.40 Copyright (C) 2023 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License version 3 or later. This program has absolutely no warranty. This assembler was configured for a target of `x86_64-w64-mingw32'. E:\lh_mouse\Desktop>cat test.s .intel_syntax noprefix mov eax, dword ptr shr[rip] E:\lh_mouse\Desktop>as test.s test.s: Assembler messages: test.s:2: Error: invalid use of operator "shr" ``` GCC generates `mov eax, dword ptr shr[rip]` and Clang generates `mov eax, dword ptr [rip + shr]`, but AS accepts neither.
Dup. *** This bug has been marked as a duplicate of bug 12240 ***
I see. The first bug seems to be a duplicate of bug 12240. How about the second one? Bug 12240 may not cover the second bug.
(In reply to Soomin Kim from comment #3) > I see. The first bug seems to be a duplicate of bug 12240. How about the > second one? Bug 12240 may not cover the second bug. I think it is a feature.
There is apparently confusion about using register names, instruction names or other keywords as labels. People who write Intel assembly shall know that those are 'bad' names for labels. That is to say, expecting GAS to parse `rsp` as a label is impractical. Rather, GAS should have rejected that `rsp:` thing. Something such as 'error: `rsp` cannot be used as a label name` is more clear, more precise, and safer, than silently producing unexpected code.