Bug 31595 - Abort in AArch64 disassembler's get_sreg_qualifier_from_value() function
Summary: Abort in AArch64 disassembler's get_sreg_qualifier_from_value() function
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: binutils (show other bugs)
Version: 2.43 (HEAD)
: P2 normal
Target Milestone: ---
Assignee: Victor Do Nascimento
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-04-02 12:47 UTC by Nick Clifton
Modified: 2024-04-17 10:22 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Clifton 2024-04-02 12:47:29 UTC
Attempting to disassemble the latest version of glibc compiled for the AArch64 for Fedora Rawhide results in:

  $ objdump -D lib64/libc.so.6
  objdump: opcodes/aarch64-dis.c:251: get_sreg_qualifier_from_value: 
   Assertion `value <= 0x4 && aarch64_get_qualifier_standard_value (qualifier) == value' failed.
  Abort (core dumped)

This was using the version of libc.so.6 obtained from glibc-2.39.9000-10.fc41.aarch64.rpm but I can also reproduce the problem with a libc.so.6 from RHEL-9.  I suspect that any recent-ish version of libc.so will do.

I suspect that the issue is with the processing of the rcpc3 size field, since the stack backtrace shows that get_sreg_qualifier_from_value is called from do_special_decoding at opcodes/aarch64-dis.c:2678.
Comment 1 Victor Do Nascimento 2024-04-05 14:14:41 UTC
I've managed to replicate the bug encountered attempting to generate the objdump on glibc-2.39.9000-10.fc41.aarch64.rpm.

I'm deciphering what the assembly instruction would have been that is encoded by those particular 32 bits which give rise to the error in question.

Comparing to the valid RCPC3 instructions, the combination of bits found in FLD_rcpc3_size and FLD_opc1 in the 2646841401 decimal value is not valid, so it seems to me that the disassembler is right to complain.

Something is therefore fishy somewhere in the assembly.

Will post more findings as I get them.
Comment 2 Nick Clifton 2024-04-08 08:37:05 UTC
Hi Victor,

  Thanks very much for taking a look at this. :-)

Cheers
  Nick
Comment 3 Victor Do Nascimento 2024-04-08 12:25:32 UTC
So a trivial reproducer for the reported issue would be attempting to disassemble `.inst 0x9dc39839'.

Looking at `readelf -S ./libc.so.6', we see that the crash happens within the .gnu.hash section of the elf file.  This, combined with the fact we used the -D flag when disassembling leads me to the conclusion that we're trying to disassemble non-instruction bytes, which due to ill-luck, look an awful lot like a valid instruction.

Only problem is, it differs from the relevant valid instruction by a combination of three bits which would encode an invalid operand qualifier. We thus get far enough into the disassembly of those 32 bits that objdump doesn't know it should display undef or similar.

This thus seems like a quality of implementation issue. Normal disassembly of executable sections of code appear to be functioning correctly, but I guess a rethink is needed in terms of how assertions are used in disassembly.

My impression is that their use in a context such as in the use of `objdump --disassemble-all` ought be predicated on whether or not we're disassembling in a strictly executable code-only section of the object file or not...
Comment 4 Nick Clifton 2024-04-08 12:45:11 UTC
(In reply to Victor Do Nascimento from comment #3)
Hi Victor,

> Looking at `readelf -S ./libc.so.6', we see that the crash happens within
> the .gnu.hash section of the elf file.  This, combined with the fact we used
> the -D flag when disassembling leads me to the conclusion that we're trying
> to disassemble non-instruction bytes, which due to ill-luck, look an awful
> lot like a valid instruction.

Ah - that makes sense.

 
> This thus seems like a quality of implementation issue. Normal disassembly
> of executable sections of code appear to be functioning correctly, but I
> guess a rethink is needed in terms of how assertions are used in disassembly.
> 
> My impression is that their use in a context such as in the use of `objdump
> --disassemble-all` ought be predicated on whether or not we're disassembling
> in a strictly executable code-only section of the object file or not...

In my opinion, the disassembler should never trigger an abort (or an assertion), even if it is being asked to decode an illegal bit sequence.  Instead it should just display the bits with an annotation that they are illegal.  In fact when a user is disassembling with the -D/--disassemble-all it should be clear that they expect illegal bit sequences to be encountered, and objdump should really be able to cope.

(This also goes back to my long standing opinion that library functions should never call abort.  Instead they should always report back to their caller that they have encountered some kind of problem, and allow the caller to decide what to do).

My suggestion is that you change get_sreg_qualifier_from_value() so that it returns AARCH64_OPND_QLF_NIL if it encounters an error.  (Or maybe a new aarch64_opnd_qualifier value such as AARCH64_OPND_QLF_ERR).  And then update the callers of get_sreg_qualifier_from_value to take some kind of action if this result is returned.  A bit if a hassle I know, but I think that it is the right thing to do.

Cheers
  Nick
Comment 5 Victor Do Nascimento 2024-04-08 12:52:22 UTC
(In reply to Nick Clifton from comment #4)
Hi Nick,

> In my opinion, the disassembler should never trigger an abort (or an
> assertion), even if it is being asked to decode an illegal bit sequence. 
> Instead it should just display the bits with an annotation that they are
> illegal.  In fact when a user is disassembling with the -D/--disassemble-all
> it should be clear that they expect illegal bit sequences to be encountered,
> and objdump should really be able to cope.

Agreed.

> (This also goes back to my long standing opinion that library functions
> should never call abort.  Instead they should always report back to their
> caller that they have encountered some kind of problem, and allow the caller
> to decide what to do).
> 
> My suggestion is that you change get_sreg_qualifier_from_value() so that it
> returns AARCH64_OPND_QLF_NIL if it encounters an error.  (Or maybe a new
> aarch64_opnd_qualifier value such as AARCH64_OPND_QLF_ERR).  And then update
> the callers of get_sreg_qualifier_from_value to take some kind of action if
> this result is returned.  A bit if a hassle I know, but I think that it is
> the right thing to do.

Thanks for the suggestion, it'll come in handy.  I do agree it's the right course of action to take.

cheers,
Victor

> Cheers
>   Nick
Comment 6 Victor Do Nascimento 2024-04-17 09:45:31 UTC
Proposed fix submitted:
  https://sourceware.org/pipermail/binutils/2024-April/133638.html