[PATCH 0/2] RISC-V: Better support for long instructions (64 < x <= 176 [bits])
Nelson Chu
nelson@rivosinc.com
Tue Nov 22 00:43:15 GMT 2022
On Sat, Nov 19, 2022 at 3:11 PM Tsukasa OI <research_trasio@irq.a4lg.com> wrote:
>
> Hello,
>
> Commit bb996692bd96 ("RISC-V/gas: allow generating up to 176-bit
> instructions with .insn") was suddenly merged into master by Jan Beulich.
> Although this encoding is not considered frozen (see the section "Expanded
> Instruction-Length Encoding" in the ISA Manual), such attempt makes sense.
>
> However, it was insufficient in some ways. I quickly found a stack-based
> buffer overflow and fixed that. Besides that, I found following issues
> related to long (assembler: ".insn"-based, disassembler: unrecognized)
> instructions:
>
>
>
> [Assembler: False "value conflicts with instruction length" error]
>
> .insn 22, 0xfedcba98765432100123456789abcdef55aa33cc607f
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 44 hexadecimal digits (22 bytes)
>
> GAS successfully compiles this (0x607f is one of the 2-byte prefixes of
> 22-byte instructions). However, it fails on following examples.
>
> .insn 22, 0xfedcba98765432100123456789ab607f
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 32 hexadecimal digits (16 bytes)
>
> with following error:
>
> a.s: Assembler messages:
> a.s:1: Error: value conflicts with instruction length `22,0xfedcba98765432100123456789ab607f'
>
> This is not good. Yes, the value is 16 bytes but upper 6-bytes of a 22-byte
> instruction could be just zeroed out. It should be valid just like:
>
> .insn 22, 0x607f
>
> This is a testcase from insn.s and current GAS can handle this. The only
> reason which the example above fails is, the 16-byte constant is handled as
> O_big (a big number). The root cause of this is:
>
> if (bytes != imm_expr->X_add_number * CHARS_PER_LITTLENUM)
> return _("value conflicts with instruction length");
>
> where:
>
> bytes : Expected instruction length
> imm_expr->X_add_number : MINIMUM Number of entries (of generic_bignum)
> required to represent a big number.
> Non-zero if O_big.
> CHARS_PER_LITTLENUM : A big number is represented as an array of
> little numbers (current GAS: unsigned short).
> This is the number of chars of a little number.
>
> That means, if a big number can be represented with different bytes as
> expected, ".insn" fails with "value conflicts with instruction length".
>
> Replacing this with:
>
> if (bytes < imm_expr->X_add_number * CHARS_PER_LITTLENUM) /* == -> < */
> return _("value conflicts with instruction length");
>
> would resolve this problem on the current GAS but this depends on how
> the big numbers are represented. If big number changes its base type
> (such as 2^32), it will stop working.
>
> My solution (on PATCH 2/2) is much complex. It computes exact bytes
> required to repsent a big number and fails only if number of represented
> bytes exceeds expected number of bytes.
>
> unsigned int llen = 0;
> for (LITTLENUM_TYPE lval = generic_bignum[imm_expr->X_add_number - 1]; lval != 0; llen++)
> lval >>= BITS_PER_CHAR;
> unsigned int repr_bytes = (imm_expr->X_add_number - 1) * CHARS_PER_LITTLENUM + llen;
> if (bytes < repr_bytes)
> return _("value conflicts with instruction length");
>
> Although it may not work on all possible bigint encoding configurations, my
> implementation is flexible enough that any realistic bigint configurations
> would be compatible.
>
> This is the result after this fix (and the disassembler fix):
>
> 0: 607f 89ab 4567 0123 .byte 0x7f, 0x60, 0xab, 0x89, 0x67, 0x45, 0x23, 0x01, 0x10, 0x32, 0x54, 0x76, 0x98, 0xba, 0xdc, 0xfe, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
> 8: 3210 7654 ba98 fedc
> 10: 0000 0000 0000
>
>
>
> [Assembler: DWARF line number information is missing with big numbers]
>
> We can compile this on the current GAS (note: line number is prefixed):
>
> 1: .insn 22, 0x607f
> 2: .insn 22, 0xfedcba98765432100123456789abcdef55aa33cc607f
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 44 hexadecimal digits (22 bytes)
>
> with "as -gdwarf-2 -o a.o -c a.s"
> or "as -g -o a.o -c a.s".
>
> However, if we extract the debug information, the result will be rather
> strange. Here's the result of "objdump -WL a.o":
>
> a.o: file format elf64-littleriscv
>
> Contents of the .debug_line section:
>
> CU: a.s:
> File name Line number Starting address View Stmt
> a.s 1 0 x
> a.s - 0x2c
>
> A 22-byte instruction on the address 0 (line 1) is fine. But wait, where
> the another 22-byte instruction on the address 0x16 (line 2) go? Before
> describing, we'll see what's going on in the line 1.
>
> 1. s_riscv_insn (".insn" directive) is called
> a. riscv_ip (".insn" with known instruction encoding) is called
> but fails with "unrecognized opcode".
> b. riscv_ip_hardcode (".insn" with raw values) is called
> and succeeds.
> c. append_insn is called **when imm_expr.X_op is not O_big**)
> -> Go to 2
> 2. append_insn (output an instruction)
> a. dwarf2_emit_insn is called (DWARF debug information is appended)
> b. relocation / fixups
> (if relaxed instructions are emitted, return early)
> c. add_fixed_insn (add the instruction to the end of the output)
>
> The reason why it fails on line 2 (O_big) is 1.c. Because DWARF debug
> information is appended on 2.a., this part is not called because append_insn
> is not called when imm_expr.X_op is O_big. Instead, it does completely
> separate handling on the riscv_ip_hardcode function (1.b.).
>
> My patchset (PATCH 2/2) unifies this separate handling with regular
> instructions. That means, 1.c. is changed so that append_insn is called
> even if imm_expr.X_op is O_big. To unify handling, it adds insn_long_opcode
> (long opcode encoding up to 22-bytes) to struct riscv_cl_insn and the
> install_insn function (called by add_fixed_insn function) is modified to
> handle long instructions. riscv_ip_hardcode function is modified to store
> long instruction encodings to new insn_long_opcode field.
>
> As a result, ".insn" directive with a big number emits correct
> debug information like this:
>
> a.o: file format elf64-littleriscv
>
> Contents of the .debug_line section:
>
> CU: a.s:
> File name Line number Starting address View Stmt
> a.s 1 0 x
> a.s 2 0x16 1 x
> a.s - 0x2c
>
>
>
> [Disassembler: Instruction is trimmed with 64-bits]
>
> In this section, we reuse the object file generated by the section above
> (two 22-byte instructions).
>
> 0000000000000000 <.text>:
> 0: 607f 0000 0000 0000 .byte 0x7f, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
> 8: 0000 0000 0000 0000
> 10: 0000 0000 0000
> 16: 607f 33cc 55aa cdef .byte 0x7f, 0x60, 0xcc, 0x33, 0xaa, 0x55, 0xef, 0xcd, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
> 1e: 89ab 4567 0123 3210 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 26: 7654 ba98 fedc zeroed out after first 64-bits
>
> See ".byte" at the address 0x16. It's trimmed at 64-bits.
> The resolution is simple. If we simply add a char* argument (containing all
> instruction bits), we can easily resolve this.
>
> 0000000000000000 <.text>:
> 0: 607f 0000 0000 0000 .byte 0x7f, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
> 8: 0000 0000 0000 0000
> 10: 0000 0000 0000
> 16: 607f 33cc 55aa cdef .byte 0x7f, 0x60, 0xcc, 0x33, 0xaa, 0x55, 0xef, 0xcd, 0xab, 0x89, 0x67, 0x45, 0x23, 0x01, 0x10, 0x32, 0x54, 0x76, 0x98, 0xba, 0xdc, 0xfe
> 1e: 89ab 4567 0123 3210 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 26: 7654 ba98 fedc all instruction bits are correct
>
>
>
> With this patchset, although we don't support any "valid" long instructions
> yet, we can support long "invalid" instructions (only emitted through
> ".insn" and the encoded bits do not match to any valid instructions and
> printed as ".byte").
>
> This patchset is also important because it makes the assembler behavior
> consistent (does not depend whether the raw value can be represented with
> a small constant).
>
>
>
> Thanks,
> Tsukasa
>
>
>
>
> Tsukasa OI (2):
> RISC-V: Make .insn tests stricter
> RISC-V: Better support for long instructions
Thanks for Jan's feedback and it makes sense. I have no further
opinion here, so I will totally respect and follow Jan judgment for
this series. Anyway, thanks for spending time to fix this.
Nelson
> gas/config/tc-riscv.c | 50 +++++++++++++++++++++++-----
> gas/testsuite/gas/riscv/insn-dwarf.d | 10 +++++-
> gas/testsuite/gas/riscv/insn-na.d | 28 ++++++++++------
> gas/testsuite/gas/riscv/insn.d | 32 +++++++++++++++---
> gas/testsuite/gas/riscv/insn.s | 9 +++++
> opcodes/riscv-dis.c | 13 +++++---
> 6 files changed, 113 insertions(+), 29 deletions(-)
>
>
> base-commit: 15253318be0995200cc59929ca32eedbfd041e45
> --
> 2.38.1
>
More information about the Binutils
mailing list