With target board cc-with-dwz: ... (gdb) break N1::C1::baz^M Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.^M (gdb) FAIL: gdb.cp/breakpoint-locs.exp: break N1::C1::baz ... with target board unix: ... (gdb) break N1::C1::baz^M Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations)^M (gdb) PASS: gdb.cp/breakpoint-locs.exp: break N1::C1::baz ...
Fixed by: ... diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c index dd4fac52ca8..9f36ce8d23a 100644 --- a/gdb/dwarf2/read.c +++ b/gdb/dwarf2/read.c @@ -16614,7 +16614,8 @@ cooked_indexer::index_dies (cutu_reader *reader, case DW_TAG_subprogram: if ((m_language == language_fortran - || m_language == language_ada) + || m_language == language_ada + || m_language == language_cplus) && this_entry != nullptr) { info_ptr = recurse (reader, info_ptr, this_entry, true); ...
gdb-12-branch -> gdb-13-branch regression
Created attachment 15044 [details] breakpoint-locs.gz (stripped with --keep-only-debug) Reproducer: ... $ gdb -q -batch -iex "maint set worker-threads 0" ./breakpoint-locs -ex "break N1::C1::baz" Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. ... With the set worker-thread 0, reproduces 100%: ... $ for n in $(seq 1 10); do ./gdb.sh -q -batch -iex "maint set worker-threads 0" ./breakpoint-locs -ex "break N1::C1::baz"; done Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. ... Without, it doesn't: ... $ for n in $(seq 1 10); do ./gdb.sh -q -batch ./breakpoint-locs -ex "break N1::C1::baz"; done Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations) Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations) Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations) Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations) Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations) Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations) ...
I'm starting to understand somewhat why it's not 100% reproducible. With this patch: ... diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c index dd4fac52ca8..ed36b5e890b 100644 --- a/gdb/dwarf2/read.c +++ b/gdb/dwarf2/read.c @@ -16889,6 +16889,13 @@ cooked_index_functions::expand_symtabs_matching = lookup_name_without_params.split_name (lang); std::string last_name = gdb::to_string (name_vec.back ()); + for (const cooked_index_entry *entry : table->find (last_name, + completing)) + { + QUIT; + fprintf (stderr, "Trying entry: %lx\n", (unsigned long)entry->die_offset); + } + for (const cooked_index_entry *entry : table->find (last_name, completing)) { @@ -16953,6 +16960,7 @@ cooked_index_functions::expand_symtabs_matching continue; } + fprintf (stderr, "matched entry: %lx\n", (unsigned long)entry->die_offset); if (!dw2_expand_symtabs_matching_one (entry->per_cu, per_objfile, file_matcher, expansion_notify)) ... I get: ... $ gdb -q -batch ./breakpoint-locs -ex "break N1::C1::baz" Trying entry: 14 Trying entry: 10e Trying entry: 199 matched entry: 14 Trying entry: 2e Trying entry: 2e Trying entry: 2e Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. ... or: ... $ gdb -q -batch ./breakpoint-locs -ex "break N1::C1::baz" Trying entry: 10e Trying entry: 14 Trying entry: 199 matched entry: 10e Trying entry: 2e matched entry: 2e Trying entry: 2e Trying entry: 2e Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations) ... The root cause of the difference seems to be that the order of table->find is not stable.
(In reply to Tom de Vries from comment #4) > The root cause of the difference seems to be that the order of table->find > is not stable. And that seems to be caused by cooked_index_entries originating from a PU to be included in the shard processing either one or another CU including the PU. More concretely, we have cooked_index_entries: ... PU: { 0x14 0x19 0x2e 0x2e 0x39 } CU1: { 0x10e 0x121 0x134 0x13b 0x13b } CU2: { 0x199 0x1ac 0x1ac 0x1c2 0x1c2 } ... And sometimes we have these two shards: ... CU1 + PU: { 0x14 0x19 0x2e 0x2e 0x39 0x10e 0x121 0x134 0x13b 0x13b } CU2: { 0x199 0x1ac 0x1ac 0x1c2 0x1c2 } ... and sometimes these two: ... CU1: { 0x10e 0x121 0x134 0x13b 0x13b } CU2 + PU: { 0x14 0x19 0x2e 0x2e 0x39 0x199 0x1ac 0x1ac 0x1c2 0x1c2 } ...
(In reply to Tom de Vries from comment #5) > (In reply to Tom de Vries from comment #4) > > The root cause of the difference seems to be that the order of table->find > > is not stable. > > And that seems to be caused by cooked_index_entries originating from a PU to > be included in the shard processing either one or another CU including the > PU. It's a race on this bit of code in cooked_indexer::index_imported_unit: ... dwarf2_per_objfile *per_objfile = reader->cu->per_objfile; cutu_reader *new_reader = ensure_cu_exists (reader, per_objfile, sect_off, is_dwz, true); if (new_reader != nullptr) { index_dies (new_reader, new_reader->info_ptr, nullptr, false); ... For the first thread to arrive new_reader != nullptr, after that new_reader == nullptr.
(In reply to Tom de Vries from comment #1) > Fixed by: > ... > diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c > index dd4fac52ca8..9f36ce8d23a 100644 > --- a/gdb/dwarf2/read.c > +++ b/gdb/dwarf2/read.c > @@ -16614,7 +16614,8 @@ cooked_indexer::index_dies (cutu_reader *reader, > > case DW_TAG_subprogram: > if ((m_language == language_fortran > - || m_language == language_ada) > + || m_language == language_ada > + || m_language == language_cplus) > && this_entry != nullptr) > { > info_ptr = recurse (reader, info_ptr, this_entry, true); > ... I can no longer reproduce that this works.
(In reply to Tom de Vries from comment #6) > It's a race on this bit of code in cooked_indexer::index_imported_unit: > ... The intent of this code is to scan a given PU just once. That is, the race is intentional. > The root cause of the difference seems to be that the order of table->find is not stable. I wonder what the entries are and why the order matters. Maybe some kind of inlining post-processing is needed, dunno.
With this patch we got a bit more info: ... diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c index 92ef0e35c5e..c53565dd317 100644 --- a/gdb/dwarf2/cooked-index.c +++ b/gdb/dwarf2/cooked-index.c @@ -583,6 +583,8 @@ cooked_index::dump (gdbarch *arch) const gdb_printf (" flags: %s\n", to_string (entry->flags).c_str ()); gdb_printf (" DIE offset: 0x%" PRIx64 "\n", to_underlying (entry->die_offset)); + gdb_printf (" per_cu offset: 0x%" PRIx64 "\n", + to_underlying (entry->per_cu->sect_off)); if (entry->parent_entry != nullptr) gdb_printf (" parent: ((cooked_index_entry *) %p) [%s]\n", ... In the failing case, we have: ... Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23. ... [5] ((cooked_index_entry *) 0x7f5384002f00) name: baz canonical: baz DWARF tag: DW_TAG_subprogram flags: 0x0 [] DIE offset: 0x2e per_cu offset: 0xe3 parent: ((cooked_index_entry *) 0x7f5384002ed0) [C1] ... and in the passing case we have: ... Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations) ... [12] ((cooked_index_entry *) 0x7f6968002f00) name: baz canonical: baz DWARF tag: DW_TAG_subprogram flags: 0x0 [] DIE offset: 0x2e per_cu offset: 0x16e parent: ((cooked_index_entry *) 0x7f6968002ed0) [C1] ... So, the per_cu doesn't refer to the PU, but rather to the importing CU that wins the race. The PU should have the info that it has two includer CU, and expansion of both should be triggered. So, my guess is that the PU dies have the wrong per_cu value.
Reexploring the DW_TAG_inlined_subroutine approach of comment 1 (in combination with the proposed fix for PR30739): ... diff --git a/gdb/dwarf2/cooked-index.h b/gdb/dwarf2/cooked-index.h index 5aacb321c91..d5a2339dd87 100644 --- a/gdb/dwarf2/cooked-index.h +++ b/gdb/dwarf2/cooked-index.h @@ -141,7 +141,7 @@ struct cooked_index_entry : public allocate_on_obstack || tag == DW_TAG_constant || tag == DW_TAG_enumerator); case FUNCTIONS_DOMAIN: - return tag == DW_TAG_subprogram; + return tag == DW_TAG_subprogram || tag == DW_TAG_inlined_subroutine; case TYPES_DOMAIN: return tag_is_type (tag); case MODULES_DOMAIN: diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c index a64f82bd24a..eef1cffa95d 100644 --- a/gdb/dwarf2/read.c +++ b/gdb/dwarf2/read.c @@ -16363,7 +16363,8 @@ cooked_indexer::scan_attributes (dwarf2_per_cu_data *scanning_per_cu, && new_info_ptr > watermark_ptr && *parent_entry == nullptr) *maybe_defer = form_addr (origin_offset, origin_is_dwz); - else if (*parent_entry == nullptr) + else if (*parent_entry == nullptr + || abbrev->tag == DW_TAG_inlined_subroutine) { CORE_ADDR lookup = form_addr (origin_offset, origin_is_dwz); void *obj = m_die_range_map.find (lookup); @@ -16616,7 +16617,8 @@ cooked_indexer::index_dies (cutu_reader *reader, case DW_TAG_subprogram: if ((m_language == language_fortran - || m_language == language_ada) + || m_language == language_ada + || m_language == language_cplus) && this_entry != nullptr) { info_ptr = recurse (reader, info_ptr, this_entry, true); ... What we're trying to achieve here is to add two DW_TAG_inlined_subroutine entries (one for each CU) to the cooked index table, with the parents pointing to the encapsulating namespace. But, we have: ... [17] ((cooked_index_entry *) 0x4b1f420) name: baz canonical: baz DWARF tag: DW_TAG_inlined_subroutine flags: 0x0 [] DIE offset: 0x148 parent: ((cooked_index_entry *) 0x4b1f240) [C1] ... [19] ((cooked_index_entry *) 0x4b1f570) name: baz canonical: baz DWARF tag: DW_TAG_inlined_subroutine flags: 0x0 [] DIE offset: 0x1cf parent: ((cooked_index_entry *) 0) ... or: ... [5] ((cooked_index_entry *) 0x7fc7b4002f90) name: baz canonical: baz DWARF tag: DW_TAG_inlined_subroutine flags: 0x0 [] DIE offset: 0x148 parent: ((cooked_index_entry *) 0) ... [16] ((cooked_index_entry *) 0x7fc7bc0030b0) name: baz canonical: baz DWARF tag: DW_TAG_inlined_subroutine flags: 0x0 [] DIE offset: 0x1cf parent: ((cooked_index_entry *) 0x7fc7bc002ed0) [C1] ... So, either one or the other has the C1 parent, but we need it for both. The problem is that the m_die_range_map data is limited to the scope of a CU reading session, and we need m_die_range_map data written in one session in another session. Of course that's going to get interesting when the CUs are in different shards.
Created attachment 15052 [details] WIP patch This WIP patch fixes the test-case. It moves m_deferred_entries and m_die_range_map into cooked_index_shard, and handles the deferred entries sequentially in a loop after the parallel_for_each creating the shards. That loop itself should still be wrapped in a "while (changed)", but that wasn't necessary for this particular test-case. It defers too many entries, but that's a performance problem, I first wanted to make a simple implementation that can demonstrate that it fixes the problem.
Posted RFC: https://sourceware.org/pipermail/gdb-patches/2023-August/201887.html
https://sourceware.org/pipermail/gdb-patches/2023-October/202891.html
I finally looked into this a bit. I think that for inline functions specifically, what gdb should do is track all the CUs that include a particular CU. Then if an inline function is found in an included CU, all including CUs should be expanded. It's unclear to me if any other tags need this treatment. Making it more targeted would mean that other requests would be faster, due to less CU expansion.
The inclusion handling code is pretty gross & wrong (or at least the comments are wrong); and then it touches the queue code, which we also know to be bad :(
I wonder if gdb-index and debug-names correctly attribute these symbols to the including CU. If not then we'll see the bug there as well.
(In reply to Tom Tromey from comment #16) > I wonder if gdb-index and debug-names correctly attribute these > symbols to the including CU. If not then we'll see the bug there as well. I forgot to mention -- if there is a DIE in the index that comes from a partial unit, the .debug_names writer must attribute that DIE to some other full CU. This has to be done because with .debug_names, gdb won't scan the DIE tree and so won't know about imports. However, it's also not really possible to read a PU in isolation -- only when reading a full CU. What I'd propose is having the .debug_names entry written as referring to the DIE in the PU, but emit the CU as the "canonical" includer... except when the DIE is an inline function, in which case all CUs must be listed. This would have also be documented as a gdb extension, because DWARF seems to ignore this issue.
(In reply to Tom Tromey from comment #17) > What I'd propose is having the .debug_names entry written > as referring to the DIE in the PU, but emit the CU as the > "canonical" includer... except when the DIE is an inline > function, in which case all CUs must be listed. > > This would have also be documented as a gdb extension, because > DWARF seems to ignore this issue. When re-reading an index like this, we'll end up with a different data structure than the one we started out with, which seems unfortunate. That is, the scanner will end up with the per-CU data having "includer" links, but the debug-names reader will just know that function "xyz" appears in CUs 1/2/3, if that makes sense. It'd be nicer to have a single data structure, but then we're limited by what .debug_names can represent.
Reproduces with current trunk.
(In reply to Tom Tromey from comment #18) > When re-reading an index like this, we'll end up with a different > data structure than the one we started out with, which seems > unfortunate. That is, the scanner will end up with the per-CU > data having "includer" links, but the debug-names reader will > just know that function "xyz" appears in CUs 1/2/3, if that > makes sense. > > It'd be nicer to have a single data structure, but then we're > limited by what .debug_names can represent. I was thinking about this again and I think the situation can be salvaged without more .debug_names extensions. The basic idea is to special case any DW_TAG_subroutine that is marked as having been inlined. If such a subroutine occurs in CU "A", then every outermost CU that includes "A" (either directly or indirectly) should get a copy of the subroutine's symbol -- that is, the symbol should be recorded as coming from the including CU. This way, the cooked index will arrange to expand the outermost CUs in response to "break function". And, similarly, .debug_names and .gdb_index will attribute these functions to their outermost CUs as well. (For .debug_names we can give it some phony DIE offset if needed as well, though I tend to think it isn't at the current time.) Maybe it's as easy as recording some kind of "proto symbol" on the per-CU as scanning is done. Copying them in would have to be a post-pass (i.e., part of finalization) due to how inclusions are done.
(In reply to Tom Tromey from comment #16) > I wonder if gdb-index and debug-names correctly attribute these > symbols to the including CU. If not then we'll see the bug there as well. Good point. FWIW, the approach of adding DW_TAG_inlined_subroutine entries to the index (I'm currently rebasing that series) seems to work. With .gdb_index, we get: ... [721] N1::C1::baz: 2 [global, function] 1 [global, other] 2 [global, other] ... and with .debug_names: ... [ 5] _ZN2N12C13bazEv: <4> DW_TAG_subprogram DW_IDX_compile_unit=1 DW_IDX_die_offset=<0x31> DW_IDX_GNU_language=33 DW_IDX_GNU_linkage_name=1 <5> DW_TAG_inlined_subroutine DW_IDX_compile_unit=1 DW_IDX_die_offset=<0xac> DW_IDX_GNU_language=33 DW_IDX_GNU_linkage_name=1 <5> DW_TAG_inlined_subroutine DW_IDX_compile_unit=2 DW_IDX_die_offset=<0x13d> DW_IDX_GNU_language=33 DW_IDX_GNU_linkage_name=1 [ 9] baz: <6> DW_TAG_subprogram DW_IDX_compile_unit=1 DW_IDX_die_offset=<0x31> DW_IDX_GNU_language=33 DW_IDX_parent=1 <7> DW_TAG_inlined_subroutine DW_IDX_compile_unit=1 DW_IDX_die_offset=<0xac> DW_IDX_GNU_language=33 DW_IDX_parent=1 <7> DW_TAG_inlined_subroutine DW_IDX_compile_unit=2 DW_IDX_die_offset=<0x13d> DW_IDX_GNU_language=33 DW_IDX_parent=1 ... and in both cases, doing "break N1::C1::baz" results in two breakpoint locations. For the .gdb_index case, the "other" might be need to be "function" instead, but regardless it seems to work.
(In reply to Tom de Vries from comment #21) > FWIW, the approach of adding DW_TAG_inlined_subroutine entries to the index > (I'm currently rebasing that series) seems to work. IIRC this requires scanning all function bodies. I think we definitely don't want to do that. It should be sufficient to notice inlined subroutines with DW_INL_*. I know some languages do recurse here, but IMO those aren't good examples to be emulated. It just makes gdb slower for these languages. IMO, no need to make C++ pay for those decisions.
Let me walk that back a little. Languages with named nested functions already require a full walk. This includes Ada and Fortran, maybe others (not sure). Perhaps we should this for GNU C (aka really just C since we probably don't want to do version sniffing for this) as well (though this has issues since we don't really have a syntax for this). So whether or not this is done for C++ et al is more of an empirical question. My intuition is that scanning is slow and that it is cheaper to skip DIE trees when possible. It's possible this intuition is incorrect, though. The way to tell is to do the experiment, but taking care to look at performance on large-enough C++ programs, preferably ones that have a reasonably ordinary distribution of function size. Any problems should be more noticeable as the function size grows.
It occurred to me today that I don't understand how recursing solves the CU inclusion part of the problem.
Ok, I see now. It will find all the spots where the function has actually been inlined.