Bug 29267

Summary: readelf..info misreports DW_FORM_loclistx, DW_FORM_rnglistx
Product: binutils Reporter: Vsevolod Alekseyev <sevaa>
Component: binutilsAssignee: Nick Clifton <nickc>
Status: RESOLVED FIXED    
Severity: normal CC: h3xrabbit, nickc
Priority: P2    
Version: 2.39   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed: 2022-06-23 00:00:00
Attachments: Proposed Patch
Proposed Patch
Proposed Patch
Test binary
Proposed patch take 2

Description Vsevolod Alekseyev 2022-06-20 18:04:07 UTC
When readelf --debug-dump=info prints attributes of type DW_FORM_loclistx, DW_FORM_rnglistx, it prints what should be the offset of the loclist/rangelist in the respective section. The raw value of this attribute is supposed to be an index into the offsets table in the compilation unit header inside the debug_loclists  or debug_rnglists sections, respectively, while the address of the offset table itself should be taken from the DW_AT_loclists_base or DW_AT_rnglists_base, respecively, in the CU's top DIE.

But it looks like readelf looks up the offset in the debug_addr section, instead of debug_loclists/debug_rnglists section.

This even causes warnings of type "Offset into section .debug_addr too big:", because the offset was never meant to be in .debug_addr.

I've got a repro for that from clang++-11, but it's big. I'll try to find a smaller one.
Comment 1 Vsevolod Alekseyev 2022-06-22 13:56:01 UTC
The culprit is, probably, the case statement at dwarf.c:2774, where fetch_indexed_addr() is being called both for DW_FORM_addrxN (where it belongs) and for DW_FORM_loclistx/DW_FORM_rnglistx (where it does not). For the latter, fetch_indexed_value should be used instead.

The text description is off, too. The loclistx value doesn't represent an address index.
Comment 2 Nick Clifton 2022-06-22 15:06:03 UTC
Created attachment 14159 [details]
Proposed Patch

Based upon your observation, would you mind trying out this patch and letting me know if it works ?
Comment 3 Vsevolod Alekseyev 2022-06-22 15:30:28 UTC
Comment on attachment 14159 [details]
Proposed Patch

Does not compile, "index" is not declared.
Comment 4 Vsevolod Alekseyev 2022-06-22 15:59:42 UTC
Even declared as "dwarf_vma index", the numbers are all wrong.

Looks like fetch_indexed_value is not taking into account the DW_AT_loclists_base/DW_AT_rnglists_base attribute value, which stores the address of the offset table of the current compile unit's contribution in debug_loclists/debug_rnglists, respectively. Similar issue here: #29266.
Comment 5 Nick Clifton 2022-06-22 16:03:25 UTC
Created attachment 14160 [details]
Proposed Patch

oops - I missed out a thunk...
Comment 6 Nick Clifton 2022-06-22 16:26:10 UTC
Created attachment 14161 [details]
Proposed Patch

Ok, lets see if the third time is the charm.  

I am not 100% about the offset computation inside fetch_indexed_value() though, so some tweaking may still be needed.  It would be helpful if there was a test case I could examine...
Comment 7 Vsevolod Alekseyev 2022-06-22 16:33:55 UTC
Created attachment 14162 [details]
Test binary

Have a test binary.
Comment 8 Vsevolod Alekseyev 2022-06-22 17:00:28 UTC
The warnings that "readelf --debug-dump=info" is giving on that binary are all bogus; they all have to do with readelf looking for the offset in the wrong place.
Comment 9 Vsevolod Alekseyev 2022-06-22 17:11:57 UTC
The patches are supposed to be cumulative, right? And the first one is already not in the bug? So there is no way to apply them on top of HEAD anymore, unless I've saved all three?
Comment 10 Vsevolod Alekseyev 2022-06-22 20:05:39 UTC
There is a bit of parsing complication here that I don't think the current parser quite appreciates. The DWARF bitness may vary between CUs in indexed sections, and short of going through the headers, linked list style, there is no way to determine the bitness for any given section from the DIE data.

The structure of debug_loclists goes like this:

CU_header_0:
  length AKA bitness indicator (4 or 12)
  version (2)
  address_size (1)
  selector_size (1)
  offset_entry_count (4)
  offsets[0] (4 or 8)    <-- DW_AT_loclists_base points here
  offsets[1] (4 or 8)
  ...more offsets
...the actual loclists

The offsets table contain the offsets of the target loclists, relative to the offsets table start. The size of the offset (4 or 8) is determined by the length field in the header, the usual DWARF style.

Now, from the value of DW_AT_loclists_base alone, it's pretty much impossible to tell whether the section is 32- or 64-bit DWARF (except for the 0th CU, where the DW_AT_loclists_base can be either 12 or 20). In subsequent CUs, you can't seek back from the offset table to the top of header, because it's variable length. If you look at the dword at DW_AT_loclists_base-20, it may be 0xffffffff by pure accident. 

Similar situation in debug_rnglists, debug_addr, debug_str_offsets.

A proper parser should go through all CU headers in the section (you can sort of fast-forward through them by skipping by length), determine and store the bitness of each, and then recover the bitness of any particular CU by matching the DW_AT_rnglists_base against that.

One *slightly* better alternative would be - reuse the bitness from the 
corresponding CU in the debug_info. While it is possible to compose a correct DWARF dataset where the bitness between contributions from the same compile unit in different section varies, it would be a pain in the neck for the compiler vendor.

Or you can just assume 32 bits. The spec encourages the implementors not to use 64-bit DWARF unless absolutely necessary. How often does one see 4GB+ sections in a binary?
Comment 11 Nick Clifton 2022-06-23 10:52:25 UTC
(In reply to Vsevolod Alekseyev from comment #10)
Hi Vsevolod,

> The DWARF bitness may vary between CUs in indexed
> sections, 

Presumably this would be an unusual case.  At least I would hope so...

I think however that would should be able to handle it.  The code in binutils/dwarf.c already stores the loclists_base on a per-CU basis,
so we can easily add code to store the bitness too.  I am looking into
creating a patch that will do this...
Comment 12 Vsevolod Alekseyev 2022-06-26 14:37:17 UTC
It's not as bad; a gentleman at the DWARF mailing list pointed out there was a rule in section 7.4 near the end, that the bitness of the same CU's contributions in different sections should be the same:

"The 32-bit and 64-bit DWARF format conventions must not be intermixed within a single compilation unit."

So reusing the bitness from the corresponding CU in the debug_info is a proper and correct thing to do.
Comment 14 Sourceware Commits 2022-06-28 11:31:10 UTC
The master branch has been updated by Nick Clifton <nickc@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=dbcbf67ca565ec29f13a2302dcdf9b01ef7832ca

commit dbcbf67ca565ec29f13a2302dcdf9b01ef7832ca
Author: Nick Clifton <nickc@redhat.com>
Date:   Tue Jun 28 12:30:19 2022 +0100

    Fix the display of the idnex values for DW_FORM_loclistx and DW_FORM_rnglistx.  Correct the display of .debug.loclists sections.
    
            PR 29267
            * dwarf.c (display_debug_rnglists): New function, broken out of..
            (display_debug_ranges): ... here.
            (read_and_display_attr_value): Correct calculation of index
            displayed for DW_FORM_loclistx and DW_FORM_rnglistx.
            * testsuite/binutils-all/x86-64/pr26808.dump: Update expected
            output.
Comment 15 Nick Clifton 2022-06-28 11:43:21 UTC
Hi Vsevolod,

OK, I have checked in a patch which, combined with others now also checked in, should address this issue.  Please can you give the current binutils sources a whirl and see if readelf's output is now correct.

Cheers
  Nick
Comment 16 Vsevolod Alekseyev 2022-06-28 13:59:01 UTC
I see you've changed the output format. That will take some time to catch up.
Comment 17 Vsevolod Alekseyev 2022-06-30 00:52:09 UTC
The loclist entries in sections with nonblank offset tables in are dumped differently; the start/end address of location entries is not resolved relative to the corresponding CU's base PC. This is inconsistent with the past behavior, and rather misleading.

Didn't check the rangelists yet.
Comment 18 Vsevolod Alekseyev 2022-06-30 01:55:43 UTC
Same story in rnglists.
Comment 19 Nick Clifton 2022-07-01 13:29:53 UTC
(In reply to Vsevolod Alekseyev from comment #17)

> The loclist entries in sections with nonblank offset tables in are dumped
> differently; the start/end address of location entries is not resolved
> relative to the corresponding CU's base PC. This is inconsistent with the
> past behavior, and rather misleading.

I made these changes in order to bring readelf's output more inline with
the output from eu-readelf.  I have been using that tool's output as a 
comparison for the updates for this issue, and using the same general format
helps with that.

But I am also willing to undo unnecessary formatting changes, so please 
can you provide me an example of the before and after formatting, so that
I can be sure that I am changing the correct things. :-)
Comment 20 Vsevolod Alekseyev 2022-07-01 15:57:22 UTC
At this juncture, we gave up on using readelf as a reference implementation of DWARF parsing. You may do with this issue as you please.
Comment 21 Vsevolod Alekseyev 2023-10-02 19:42:49 UTC
Created attachment 15150 [details]
Proposed patch take 2

Tested against llvm-dwarfdump on the same corpus of binaries as #30880.
Comment 22 Sourceware Commits 2023-10-03 08:28:12 UTC
The master branch has been updated by Nick Clifton <nickc@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=8c7125feaaa495b2aa0fed6cd077e35d0d90df43

commit 8c7125feaaa495b2aa0fed6cd077e35d0d90df43
Author: Vsevolod Alekseyev <sevaa@sprynet.com>
Date:   Tue Oct 3 09:27:27 2023 +0100

    Fix: readelf..info misreports DW_FORM_loclistx, DW_FORM_rnglistx
    
      PR 29267
      * dwarf.c (fetch_indexed_value): Delete. (fetch_indexed_offset): Correct base address calculation. (read_and_display_attr_value): Replace uses of fetch_indexed_value with fetch_indexed_offset.
Comment 23 Nick Clifton 2023-10-03 08:34:43 UTC
Thanks - patch applied.
Comment 24 Vsevolod Alekseyev 2023-10-03 14:22:05 UTC
Thanks. I'll take a look at 30880 next.

On a side note - this one is marked as fixed, and 30792 is not, is that right?
Comment 25 Nick Clifton 2023-10-03 14:56:46 UTC
(In reply to Vsevolod Alekseyev from comment #24)
> On a side note - this one is marked as fixed, and 30792 is not, is that
> right?

Doh - sorry - forgot to close 30792.   Doing it now...