Bug 23504 - index cache: Produce and consume DWARF5 format
Summary: index cache: Produce and consume DWARF5 format
Status: NEW
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on: 31132 24820
Blocks: 27453 31363
  Show dependency treegraph
 
Reported: 2018-08-10 14:59 UTC by Simon Marchi
Modified: 2024-02-09 20:04 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Marchi 2018-08-10 14:59:31 UTC
Currently, the DWARF index cache only produces and consumes files in GDB's own index format.  We should make it use the DWARF5 format too.

The main difficulty is that producing a DWARF5 format generates two files:

- The index itself (.debug_names)
- An addendum to the string section (.debug_str)

It is expected that the .debug_str addendum will be appended to the possibly existing .debug_str section of the existing binary.  There are offset values in .debug_names that are therefore larger than the size of the original .debug_str, if they point to strings in the addendum.

With the index cache, we don't modify the original binary.  So when loading a DWARF5 index from the index cache, we need some special treatment: if requesting a string at an offset larger than the size of the original .debug_str, we actually need to look it up in the .debug_str addendum (with the offset adjusted, of course).
Comment 1 Tom Tromey 2020-11-12 15:49:47 UTC
I think gdb generates the DWARF 5 index incorrectly -- it fills in
the full (and canonicalized) names of symbols.  See PR 24820.

Anyway, if that is fixed, then I think the separate .debug_str problem
will go away -- because, IIUC, the index will only refer to strings
that are already present in the existing string section.
Comment 2 Tom Tromey 2021-02-21 22:33:44 UTC
(In reply to Tom Tromey from comment #1)

> Anyway, if that is fixed, then I think the separate .debug_str problem
> will go away -- because, IIUC, the index will only refer to strings
> that are already present in the existing string section.

This is unfortunately mistaken.  If the .debug_info uses an inline
string, then it won't appear in .debug_str.

Maybe we can hack around this somehow.
One idea would be to collect these strings (they should be rare,
I'd imagine) and have a supplementary .debug_str in the index file.
These could be addressed using the length of the main .debug_str
as a base, to make it easy to tell which section to consult.
Comment 3 Mark Wielaard 2021-02-21 22:48:18 UTC
(In reply to Tom Tromey from comment #2)
> (In reply to Tom Tromey from comment #1)
> 
> > Anyway, if that is fixed, then I think the separate .debug_str problem
> > will go away -- because, IIUC, the index will only refer to strings
> > that are already present in the existing string section.
> 
> This is unfortunately mistaken.  If the .debug_info uses an inline
> string, then it won't appear in .debug_str.

I might be missing some context why the string representation is important. But if it is, then note that besides DW_FORM_strp, DW_FORM_string and DW_FORM_strx[1234] (which will indirectly end up in .debug_str), there is also DW_FORM_line_strp (which points to .debug_line_str) which is used for file/path related strings (gcc 11 for example will make sure that the CU DIE name and compdir end up there).

And of course there is DW_FORM_strp_sup and DW_FORM_GNU_strp_alt for shared strings in the multi/sup file .debug_str.
Comment 4 Tom Tromey 2021-02-22 02:27:03 UTC
The context here is GDB generating an index and putting it into the index
cache.  .debug_names refers to symbol names using their index in .debug_str,
so if a string does not appear there, then how would GDB cope?
Currently I think it writes out a new .debug_str section, but of course
(see the other PR), what's currently done is wrong; and also this seemed
weird for the index cache, because in that scenario nobody is rewriting
the original object file.

Not sure if this is clear or not...
Comment 5 Mark Wielaard 2021-02-24 14:22:18 UTC
(In reply to Tom Tromey from comment #4)
> The context here is GDB generating an index and putting it into the index
> cache.  .debug_names refers to symbol names using their index in .debug_str,
> so if a string does not appear there, then how would GDB cope?
> Currently I think it writes out a new .debug_str section, but of course
> (see the other PR), what's currently done is wrong; and also this seemed
> weird for the index cache, because in that scenario nobody is rewriting
> the original object file.
> 
> Not sure if this is clear or not...

It is clear, I just don't know enough about the index cache to understand which design makes most sense. It seems for strings you need some way to tell which section they came from, either .debug_string (the default), .debug_info (for DW_FORM_string, if the index already associates a DIE offset with that, maybe just have a flag that it looks up the DW_AT_name there?) and .debug_line_str (DW_FORM_line_str) for those symbols which have a name that also represents a file/path (dunno how to easily represent that). And then there are the DW_FORM_strp_sup/DW_FORM_GNU_strp_alt strings (then the .debug_strp is in another file...).

I think the summary is that DWARF5 got a lot of ways to store a string :)
Comment 6 Tom Tromey 2021-03-28 16:15:25 UTC
(In reply to Mark Wielaard from comment #5)

> It is clear, I just don't know enough about the index cache to understand
> which design makes most sense. It seems for strings you need some way to
> tell which section they came from, either .debug_string [ ... ]

The issue is that .debug_names can only reference strings from .debug_str.
Quoting from DWARF 5:

  The string offsets in the first array refer to names in the .debug_str (or .debug_str.dwo ) section.

So, if gdb tries to create a new index, and it needs a string that isn't in
.debug_str for some reason, then it must have a way to add a string to that section.

Now, currently this happens a lot, because gdb puts the wrong names into the index.
However, it's possible for this to happen even when gdb is changed to work correctly,
because nothing guarantees that some DIE's name attribute will be in .debug_str.

> These could be addressed using the length of the main .debug_str
> as a base, to make it easy to tell which section to consult.

It turns out gdb already does this.  Which Simon mentioned originally and
I somehow neglected to read and/or understand.

So I think what should probably happen is:

* Change the DWARF 5 index writer to use BFD to create a new file
  that has .debug_names and .debug_str (and maybe .debug_aranges) sections;
* Write only the newly-needed strings to .debug_str (what gdb already does, essentially);
* Change the index cache reader to know to use the extended .debug_str when
  necessary

I think this would make DWARF 5 index cache management work the same as .gdb_index.
It would require some tweaks to gdb-add-index and to the manual.
Comment 7 Tom Tromey 2021-03-28 19:41:42 UTC
> * Change the DWARF 5 index writer to use BFD to create a new file
>   that has .debug_names and .debug_str (and maybe .debug_aranges) sections;
> * Write only the newly-needed strings to .debug_str (what gdb already does,
> essentially);

I've done this part.
Comment 8 Tom Tromey 2021-03-29 15:51:26 UTC
> It would require some tweaks to gdb-add-index and to the manual.

It occurs to me that the new single-file mode could be used by
the index cache, but the "save" command could continue to work
the current way, so that these changes aren't needed.
Comment 9 Tom Tromey 2022-04-22 18:36:50 UTC
If .debug_aranges is missing from the main file,
then the writer has to create that as well.
Otherwise, it won't work when reading the index.
Comment 10 Tom Tromey 2022-09-20 22:41:10 UTC
(In reply to Tom Tromey from comment #9)
> If .debug_aranges is missing from the main file,
> then the writer has to create that as well.
> Otherwise, it won't work when reading the index.

I tend to think now that this should be a separate bug.
It is pre-existing after all.
Comment 11 Tom Tromey 2023-12-10 15:51:44 UTC
(In reply to Tom Tromey from comment #7)
> > * Change the DWARF 5 index writer to use BFD to create a new file
> >   that has .debug_names and .debug_str (and maybe .debug_aranges) sections;
> > * Write only the newly-needed strings to .debug_str (what gdb already does,
> > essentially);
> 
> I've done this part.

I lost this patch somewhere but it's not difficult to rewrite.