This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: x86 issues
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: "Jan Beulich" <jbeulich at novell dot com>
- Cc: binutils at sourceware dot org
- Date: Wed, 13 Feb 2008 07:46:30 -0800
- Subject: Re: x86 issues
- References: <47B30BE6.76E4.0078.0@novell.com>
Hi Jan,
Thanks for your patches. I will check them out.
I am planning to clean up x86 assembler myself. Linux x86 assembler
supports both Intel and AT&T syntax. One difference is
* In AT&T syntax the size of memory operands is determined from the
last character of the instruction mnemonic. Mnemonic suffixes of
`b', `w', `l' and `q' specify byte (8-bit), word (16-bit), long
(32-bit) and quadruple word (64-bit) memory references. Intel
syntax accomplishes this by prefixing memory operands (_not_ the
instruction mnemonics) with `byte ptr', `word ptr', `dword ptr'
and `qword ptr'. Thus, Intel `mov al, byte ptr FOO' is `movb FOO,
%al' in AT&T syntax.
Linux assembler was originally written for AT&T syntax and Intel
syntax support was added much later. There are some issues:
1. Assembler wasn't used to store operand size information for each
operand. I have changed the assembler infrastructure to add operand
size support. In Intel mode, we now have all operand sizes. In AT&T
mode, the memory operand has no size information.
2. Assembler uses mnemonic suffix for both memory operand size and
instruction encoding. Intel syntax uses mnemonic suffix for encoding
even if it isn't needed for operand size.
3. Assembler marks an instruction with No_bSuf, No_wSuf, No_lSuf,
No_sSuf, No_qSuf, No_ldSuf, It also servers 2 purposes, one for valid
register/memory operand size and the other for encoding.
We can clean up assembler with:
1. Don't use mnemonic suffix for both memory operand size and
instruction encoding.
2. Replace No_bSuf, No_wSuf, No_lSuf, No_sSuf, No_qSuf, No_ldSuf in
instruction template with sizes allowed, like B, W, L, Q, S, LD. It
should only be used in AT&T syntax to match memory operand size.
3. Add new fields in instruction template for encoding if needed.
4. Break one template into 2 or more if needed.
Do you have any comments?
Thanks.
H.J.
On Feb 13, 2008 6:25 AM, Jan Beulich <jbeulich@novell.com> wrote:
> As a follow-up to the changes just submitted/committed, here are two
> more things I think aren't working right at present:
>
> - There's no way to specify whether a floating point unit is available. I
> think that pseudo-ops .arch .{8087,287,387} should be added along
> with respective constants, so that instructions can be qualified
> accordingly and the floating point registers (it's really just st which
> ought to be affected) aren't visible without any of these enabled.
>
> - Currently, set_cpu_arch() infers 64-bit support from the current
> .codeXX setting. This seems rather odd, since this makes the following
> sequence behave rather paradoxically:
> .code64
> .arch i8086
> .code64
> (in that it fails on the second .code64). Instead, 64-bit mode should
> be a sub-feature like MMX/SSE etc. are, and perhaps entering (for
> example) 8086 should implicitly switch to 16-bit mode (or otherwise,
> the CPU change itself should fail).
>
> While I can certainly prepare patches to fix both, I'd first like to
> understand whether the described intended behavior is acceptable.
>
> Jan
>
>