Bug 30336 - The GNU Assembler has bugs in Intel syntax
Summary: The GNU Assembler has bugs in Intel syntax
Status: RESOLVED DUPLICATE of bug 12240
Alias: None
Product: binutils
Classification: Unclassified
Component: gas (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-04-12 01:51 UTC by Soomin Kim
Modified: 2024-01-26 19:04 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Soomin Kim 2023-04-12 01:51:30 UTC
Hi, I'm Soomin Kim from KAIST SoftSec Lab.

We are reporting two x86-64 assembler bugs we found, which are all relevant to Intel assembly syntax. The bugs were discovered while we manipulated the label names of toy assembly programs.

--------------------------------

The first bug:
```
$ cat ./variant1.s
.intel_syntax noprefix
.text
or:
ret
call or
$ as -msyntax=intel -o ./variant1.o ./variant1.s
./variant1.s: Assembler messages:
./variant1.s:5: Error: invalid use of operator "or"
```
`GNU as` rejects this program because of the token `or`. Note that this program is generated from the below assembly program by changing the label name:
```
$ cat ./normal1.s
.intel_syntax noprefix
.text
LABEL:
ret
call LABEL
$ as -msyntax=intel -o ./normal1.o ./normal1.s
```
Unlike `variant1.s`, `GNU as` can compile this program. However, it was indeed hard for me to find on the Internet why the name (`or`) matters. For example, a Wikipedia webpage (https://en.wikipedia.org/wiki/X86_assembly_language) lists several keywords but does not include `or`.

Surprisingly, `or` does not raise a problem in AT&T syntax. Please refer to the below program:
```
$ cat ./variant2.s
.text
or:
ret
call or
$ as -msyntax=att -o ./variant2.o ./variant2.s
```
We thought this is a bug of `GNU as` because (1) the one written in AT&T was accepted by `GNU as`, and (2) there are no reasons to reject the case. Other usages of `or` (an instruction mnemonic, for example) cannot be applied to the argument of `call` instruction, and clearly there is a definition of the label `or`.

--------------------------------

The second bug:
```
$ cat ./variant1.s
.intel_syntax noprefix

.data
rsp:
.long 1
.long 2
.long 3
.long 4

.text
lea rax, [rsp] // rsp here is intended to refer to a pointer in .data section
$ as -msyntax=intel -o ./variant1.o ./variant1.s
$ objdump -d ./variant1.o
./variant1.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <.text>:
   0:   48 8d 04 24             lea    (%rsp),%rax
```
This bug is somewhat similar to the first bug, but has a different aspect. We'd better show the original assembly program to make it easy to understand this bug.
```
$ cat ./normal.s
.intel_syntax noprefix

.data
LABEL:
.long 1
.long 2
.long 3
.long 4

.text
lea rax, [LABEL]
$ as -msyntax=intel -o ./normal1.o ./normal1.s
```
The code semantics of the original program is loading the pointer LABEL to the register `rax`. However, after we change the name of the label to `rsp`, which is an existing register name, the resulting program certainly has different code semantics. The binary code from `GNU as` moves a value stored in the register `rsp` to `rax`.

The problem here is that even though there is an ambiguity in choosing the right target between the label `rsp` and the register `rsp`, `GNU as` randomly chooses one of them, so the program has an unintended behavior.

Likewise, this issue will never happen with AT&T syntax. Please refer to the below code:
```
$ cat ./variant2.s
.data
rsp:
.long 1
.long 2
.long 3
.long 4

.text
leaq (rsp), %rax
$ as -msyntax=att -o ./variant2.o ./variant2.s
$ objdump -d ./variant2.o

./variant2.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <.text>:
   0:   48 8d 04 25 00 00 00 00         lea    0x0,%rax
```
The label `rsp` is successfully transformed into a relocation entry in the object file.

--------------------------------

We have seen two different situations where the names of labels can make `GNU as` confused. We thought these are very interesting, as it is rather hard to strictly say that `GNU as` is wrong.

We think there are two possibilities:
(1) Intel syntax rejects the use of an opcode name as a label, or
(2) `GNU as` just mishandles the label.

In one sense, the ambiguity of Intel syntax (due to the absence of an official Intel assembly syntax manual) is the problem. For decades, many assemblers have been developed ad-hoc without any standards. So, it seems to be a hard decision problem to allow/deny several tokens or to choose the right usage.

On the other hand, `GNU as` need to handle both two cases. They may reduce the usability and correctness of `GNU as`. A user might want to write a function named `or`, but get rejected by `GNU as`. A user might want to load a data pointer named `rsp`, but the resulting program loads a stack pointer, which can differ from the user's intention.

We suggest that `GNU as` should compile the first case, and `GNU as` should *not* compile the second case or should raise the alarm for the one.
Comment 1 LIU Hao 2023-04-27 15:55:41 UTC
Also this:

```
E:\lh_mouse\Desktop>as --version
GNU assembler (GNU Binutils) 2.40
Copyright (C) 2023 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `x86_64-w64-mingw32'.

E:\lh_mouse\Desktop>cat test.s
.intel_syntax noprefix
mov eax, dword ptr shr[rip]

E:\lh_mouse\Desktop>as test.s
test.s: Assembler messages:
test.s:2: Error: invalid use of operator "shr"

```

GCC generates `mov eax, dword ptr shr[rip]` and Clang generates `mov eax, dword ptr [rip + shr]`, but AS accepts neither.
Comment 2 H.J. Lu 2023-04-28 15:46:38 UTC
Dup.

*** This bug has been marked as a duplicate of bug 12240 ***
Comment 3 Soomin Kim 2023-04-29 02:10:45 UTC
I see. The first bug seems to be a duplicate of bug 12240. How about the second one? Bug 12240 may not cover the second bug.
Comment 4 H.J. Lu 2023-05-01 16:49:04 UTC
(In reply to Soomin Kim from comment #3)
> I see. The first bug seems to be a duplicate of bug 12240. How about the
> second one? Bug 12240 may not cover the second bug.

I think it is a feature.
Comment 5 LIU Hao 2023-05-01 17:10:59 UTC
There is apparently confusion about using register names, instruction names or other keywords as labels. People who write Intel assembly shall know that those are 'bad' names for labels.

That is to say, expecting GAS to parse `rsp` as a label is impractical. Rather, GAS should have rejected that `rsp:` thing. Something such as 'error: `rsp` cannot be used as a label name` is more clear, more precise, and safer, than silently producing unexpected code.