[PATCH 1/3] RISC-V: re-arrange opcode table for consistent alias handling

Jan Beulich jbeulich@suse.com
Wed Jul 12 08:15:06 GMT 2023


On 11.07.2023 23:02, Fangrui Song wrote:
> On Fri, Sep 16, 2022 at 2:53 AM Nelson Chu <nelson@rivosinc.com> wrote:
>>
>> On Thu, Sep 15, 2022 at 3:42 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>
>>> On 15.09.2022 04:30, Nelson Chu wrote:
>>>> On Tue, Sep 13, 2022 at 9:02 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>> --- a/opcodes/riscv-opc.c
>>>>> +++ b/opcodes/riscv-opc.c
>>>>> @@ -290,9 +290,9 @@ const struct riscv_opcode riscv_opcodes[
>>>>>  {"jalr",        0, INSN_CLASS_I, "d,s,j",     MATCH_JALR, MASK_JALR, match_opcode, INSN_JSR },
>>>>>  {"j",           0, INSN_CLASS_C, "Ca",        MATCH_C_J, MASK_C_J, match_opcode, INSN_ALIAS|INSN_BRANCH },
>>>>>  {"j",           0, INSN_CLASS_I, "a",         MATCH_JAL, MASK_JAL|MASK_RD, match_opcode, INSN_ALIAS|INSN_BRANCH },
>>>>> +{"jal",         0, INSN_CLASS_I, "a",         MATCH_JAL|(X_RA << OP_SH_RD), MASK_JAL|MASK_RD, match_opcode, INSN_ALIAS|INSN_JSR },
>>>>>  {"jal",         0, INSN_CLASS_I, "d,a",       MATCH_JAL, MASK_JAL, match_opcode, INSN_JSR },
>>>>>  {"jal",        32, INSN_CLASS_C, "Ca",        MATCH_C_JAL, MASK_C_JAL, match_opcode, INSN_ALIAS|INSN_JSR },
>>>>> -{"jal",         0, INSN_CLASS_I, "a",         MATCH_JAL|(X_RA << OP_SH_RD), MASK_JAL|MASK_RD, match_opcode, INSN_ALIAS|INSN_JSR },
>>>>>  {"call",        0, INSN_CLASS_I, "d,c",       (X_T1 << OP_SH_RS1), (int) M_CALL, match_never, INSN_MACRO },
>>>>>  {"call",        0, INSN_CLASS_I, "c",         (X_RA << OP_SH_RS1)|(X_RA << OP_SH_RD), (int) M_CALL, match_never, INSN_MACRO },
>>>>>  {"tail",        0, INSN_CLASS_I, "c",         (X_T1 << OP_SH_RS1), (int) M_CALL, match_never, INSN_MACRO },
>>>>> @@ -310,13 +310,13 @@ const struct riscv_opcode riscv_opcodes[
>>>>>  {"move",        0, INSN_CLASS_C, "d,CV",      MATCH_C_MV, MASK_C_MV, match_c_add, INSN_ALIAS },
>>>>>  {"move",        0, INSN_CLASS_I, "d,s",       MATCH_ADDI, MASK_ADDI|MASK_IMM, match_opcode, INSN_ALIAS },
>>>>>  {"zext.b",      0, INSN_CLASS_I, "d,s",       MATCH_ANDI|ENCODE_ITYPE_IMM (255), MASK_ANDI | MASK_IMM, match_opcode, INSN_ALIAS },
>>>>> -{"andi",        0, INSN_CLASS_C, "Cs,Cw,Co",  MATCH_C_ANDI, MASK_C_ANDI, match_opcode, INSN_ALIAS },
>>>>> -{"andi",        0, INSN_CLASS_I, "d,s,j",     MATCH_ANDI, MASK_ANDI, match_opcode, 0 },
>>>>>  {"and",         0, INSN_CLASS_C, "Cs,Cw,Ct",  MATCH_C_AND, MASK_C_AND, match_opcode, INSN_ALIAS },
>>>>>  {"and",         0, INSN_CLASS_C, "Cs,Ct,Cw",  MATCH_C_AND, MASK_C_AND, match_opcode, INSN_ALIAS },
>>>>>  {"and",         0, INSN_CLASS_C, "Cs,Cw,Co",  MATCH_C_ANDI, MASK_C_ANDI, match_opcode, INSN_ALIAS },
>>>>>  {"and",         0, INSN_CLASS_I, "d,s,t",     MATCH_AND, MASK_AND, match_opcode, 0 },
>>>>>  {"and",         0, INSN_CLASS_I, "d,s,j",     MATCH_ANDI, MASK_ANDI, match_opcode, INSN_ALIAS },
>>>>> +{"andi",        0, INSN_CLASS_C, "Cs,Cw,Co",  MATCH_C_ANDI, MASK_C_ANDI, match_opcode, INSN_ALIAS },
>>>>> +{"andi",        0, INSN_CLASS_I, "d,s,j",     MATCH_ANDI, MASK_ANDI, match_opcode, 0 },
>>>>
>>>> Doesn't ANDI a base instruction?
>>>
>>> Of course. Like for all aliases, there is a corresponding base
>>> instruction. I guess I simply don't understand what you mean to
>>> express with the question.
>>>
>>>>  The operand "d,s,j" of AND is an
>>>> alias of ANDI, so the original order seems correct.  Always dump *.i
>>>> instructions to the non-i type looks weird, and llvm-dump seems has
>>>> the same behavior as current GNU objdump.
>>>>
>>>> % cat tmp.s
>>>> and a0, a1, 0x10
>>>> % riscv64-unknown-elf-as tmp.s -o tmp.o
>>>> % riscv64-unknown-elf-objdump -d tmp.o
>>>>
>>>> tmp.o:     file format elf64-littleriscv
>>>>
>>>>
>>>> Disassembly of section .text:
>>>>
>>>> 0000000000000000 <.text>:
>>>>
>>>>    0: 0105f513          and a0,a1,16
>>>
>>> What's weird about that? And if that's weird, would you mind spelling
>>> out the conditions under which aliases are to be preferred over base
>>> instructions when disassembling? There actually is a "These aliases are
>>> for assembly but not disassembly" comment somewhere in the file,
>>> clarifying for two of the aliases that they ought to come after their
>>> base insns. But for all other aliases which aren't simply a different
>>> (but not shorter) name for the same insn (e.g. "bgt" vs "blt") I'd
>>> assume the aliases should be preferred, for the reason stated in the
>>> patch description. That said - I can see it being a matter of taste
>>> for <insn>i vs <insn>, but if so this should be spelled out somewhere.
>>
>> Yeah, that's what I worried about.  At the beginning, I think dumping
>> a base instruction as another base instruction looks weird.  But these
>> days I also noticed that - we also dump compressed instructions as
>> base i without "c." prefixes, so why I feel weird is just that I'm
>> used to it because of historical behavior.  I have no objection to
>> this, so please go ahead if there are no objections for a period of
>> time.  But if there are any objections, then we probably can mark
>> these aliases by something like INSN_ALIAS_CANNOT_DUMP in the opcode
>> table, and that's what Kito suggested to me before, but I didn't think
>> it was a serious problem at the time.
>>
>> Thanks
>> Nelson
> 
> I apologize as I haven't read all prior discussions. For many
> instructions, the "i" form is written in the ISA manual and prevalent.

Why "prevalent"? The "i"-less forms are mentioned there as well, aren't
they? Then why not use them ...

> I wonder whether we can give these add/and/xor/etc without "i" lower
> priority so that objdump -d will not show them, even without using -M
> no-aliases.

... unless use of aliases was suppressed? In other arches' assembly
that I know (to some degree) and which knows the concept of aliases,
aliases are typically the preferred way of disassembling, for
typically producing easier to grok output.

There are other aspects to consider here, related to the handling of
equates in assembly sources. I did bring this up before, but it feels
like a minefield - it first would need firmly establishing what exactly
assembler behavior is intended to be. Aiui no-one has properly thought
of this, including for the (surprisingly similar) handling in tc-mips.c
(making me guess that RISC-V code may have been derived from that).

Jan

> % cat b.s
> add a0,a1,13
> and a2,a3,4
> xor a2,a3,4
> or a2,a3,4
> sll a2,a3,4
> % riscv64-linux-gnu-gcc -c b.s
> % ~/Dev/binutils-gdb/out/riscv64/binutils/objdump -d b.o
> 
> b.o:     file format elf64-littleriscv
> 
> 
> Disassembly of section .text:
> 
> 0000000000000000 <.text>:
>    0:   00d58513                add     a0,a1,13
>    4:   0046f613                and     a2,a3,4
>    8:   0046c613                xor     a2,a3,4
>    c:   0046e613                or      a2,a3,4
>   10:   00469613                sll     a2,a3,0x4
> 
> 
> When LLVM integrated assembler added these aliases
> (https://reviews.llvm.org/D50046), these instructions are assigned a
> low priority "let EmitPriority = 0" so llvm-objdump -d will never show
> them.



More information about the Binutils mailing list