30728 – [gdb/symtab, cc-with-dwz] FAIL: gdb.cp/breakpoint-locs.exp: break N1::C1::baz

Bug 30728 - [gdb/symtab, cc-with-dwz] FAIL: gdb.cp/breakpoint-locs.exp: break N1::C1::baz

Summary: [gdb/symtab, cc-with-dwz] FAIL: gdb.cp/breakpoint-locs.exp: break N1::C1::baz

Status:	NEW

Alias:	None

Product:	gdb
Classification:	Unclassified
Component:	symtab (show other bugs)
Version:	HEAD

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2023-08-06 15:50 UTC by Tom de Vries
Modified:	2024-05-06 13:57 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
breakpoint-locs.gz (stripped with --keep-only-debug) (3.14 KB, application/gzip) 2023-08-06 18:15 UTC, Tom de Vries	Details
WIP patch (2.59 KB, patch) 2023-08-10 08:49 UTC, Tom de Vries	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tom de Vries 2023-08-06 15:50:44 UTC

With target board cc-with-dwz:
...
(gdb) break N1::C1::baz^M
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.^M
(gdb) FAIL: gdb.cp/breakpoint-locs.exp: break N1::C1::baz
...

with target board unix:
...
(gdb) break N1::C1::baz^M
Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations)^M
(gdb) PASS: gdb.cp/breakpoint-locs.exp: break N1::C1::baz
...

Comment 1 Tom de Vries 2023-08-06 16:16:06 UTC

Fixed by:
...
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index dd4fac52ca8..9f36ce8d23a 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -16614,7 +16614,8 @@ cooked_indexer::index_dies (cutu_reader *reader,
 
 	    case DW_TAG_subprogram:
 	      if ((m_language == language_fortran
-		   || m_language == language_ada)
+		   || m_language == language_ada
+		   || m_language == language_cplus)
 		  && this_entry != nullptr)
 		{
 		  info_ptr = recurse (reader, info_ptr, this_entry, true);
...

Comment 2 Tom de Vries 2023-08-06 16:31:25 UTC

gdb-12-branch -> gdb-13-branch regression

Comment 3 Tom de Vries 2023-08-06 18:15:14 UTC

Created attachment 15044 [details]
breakpoint-locs.gz (stripped with --keep-only-debug)

Reproducer:
...
$ gdb -q -batch -iex "maint set worker-threads 0" ./breakpoint-locs -ex "break N1::C1::baz"
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
...

With the set worker-thread 0, reproduces 100%:
...
$ for n in $(seq 1 10); do ./gdb.sh -q -batch -iex "maint set worker-threads 0" ./breakpoint-locs -ex "break N1::C1::baz"; done
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
...

Without, it doesn't:
...
$ for n in $(seq 1 10); do ./gdb.sh -q -batch ./breakpoint-locs -ex "break N1::C1::baz"; done
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations)
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations)
Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations)
Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations)
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations)
Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations)
...

Comment 4 Tom de Vries 2023-08-06 22:00:24 UTC

I'm starting to understand somewhat why it's not 100% reproducible.  With this patch:
...
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index dd4fac52ca8..ed36b5e890b 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -16889,6 +16889,13 @@ cooked_index_functions::expand_symtabs_matching
 	= lookup_name_without_params.split_name (lang);
       std::string last_name = gdb::to_string (name_vec.back ());
 
+      for (const cooked_index_entry *entry : table->find (last_name,
+							  completing))
+	{
+	  QUIT;
+	  fprintf (stderr, "Trying entry: %lx\n", (unsigned long)entry->die_offset);
+	}
+
       for (const cooked_index_entry *entry : table->find (last_name,
 							  completing))
 	{
@@ -16953,6 +16960,7 @@ cooked_index_functions::expand_symtabs_matching
 		continue;
 	    }
 
+	  fprintf (stderr, "matched entry: %lx\n", (unsigned long)entry->die_offset);
 	  if (!dw2_expand_symtabs_matching_one (entry->per_cu, per_objfile,
 						file_matcher,
 						expansion_notify))
...
I get:
...
$ gdb -q -batch ./breakpoint-locs -ex "break N1::C1::baz"
Trying entry: 14
Trying entry: 10e
Trying entry: 199
matched entry: 14
Trying entry: 2e
Trying entry: 2e
Trying entry: 2e
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
...
or:
...
$ gdb -q -batch ./breakpoint-locs -ex "break N1::C1::baz"
Trying entry: 10e
Trying entry: 14
Trying entry: 199
matched entry: 10e
Trying entry: 2e
matched entry: 2e
Trying entry: 2e
Trying entry: 2e
Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations)
...

The root cause of the difference seems to be that the order of table->find is not stable.

Comment 5 Tom de Vries 2023-08-07 10:23:49 UTC

(In reply to Tom de Vries from comment #4)
> The root cause of the difference seems to be that the order of table->find
> is not stable.

And that seems to be caused by cooked_index_entries originating from a PU to be included in the shard processing either one or another CU including the PU.

More concretely, we have cooked_index_entries:
...
PU: { 0x14 0x19 0x2e 0x2e 0x39 }
CU1: { 0x10e 0x121 0x134 0x13b 0x13b }
CU2: { 0x199 0x1ac 0x1ac 0x1c2 0x1c2 }
...

And sometimes we have these two shards:
...
CU1 + PU: { 0x14 0x19 0x2e 0x2e 0x39 0x10e 0x121 0x134 0x13b 0x13b } 
CU2: { 0x199 0x1ac 0x1ac 0x1c2 0x1c2 }
...
and sometimes these two:
...
CU1: { 0x10e 0x121 0x134 0x13b 0x13b } 
CU2 + PU: { 0x14 0x19 0x2e 0x2e 0x39 0x199 0x1ac 0x1ac 0x1c2 0x1c2 } 
...

Comment 6 Tom de Vries 2023-08-07 10:44:11 UTC

(In reply to Tom de Vries from comment #5)
> (In reply to Tom de Vries from comment #4)
> > The root cause of the difference seems to be that the order of table->find
> > is not stable.
> 
> And that seems to be caused by cooked_index_entries originating from a PU to
> be included in the shard processing either one or another CU including the
> PU.

It's a race on this bit of code in cooked_indexer::index_imported_unit:
...
  dwarf2_per_objfile *per_objfile = reader->cu->per_objfile;
  cutu_reader *new_reader = ensure_cu_exists (reader, per_objfile, sect_off,
                                              is_dwz, true);
  if (new_reader != nullptr)
    {
      index_dies (new_reader, new_reader->info_ptr, nullptr, false);
...

For the first thread to arrive new_reader != nullptr, after that new_reader == nullptr.

Comment 7 Tom de Vries 2023-08-07 15:14:02 UTC

(In reply to Tom de Vries from comment #1)
> Fixed by:
> ...
> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
> index dd4fac52ca8..9f36ce8d23a 100644
> --- a/gdb/dwarf2/read.c
> +++ b/gdb/dwarf2/read.c
> @@ -16614,7 +16614,8 @@ cooked_indexer::index_dies (cutu_reader *reader,
>  
>  	    case DW_TAG_subprogram:
>  	      if ((m_language == language_fortran
> -		   || m_language == language_ada)
> +		   || m_language == language_ada
> +		   || m_language == language_cplus)
>  		  && this_entry != nullptr)
>  		{
>  		  info_ptr = recurse (reader, info_ptr, this_entry, true);
> ...

I can no longer reproduce that this works.

Comment 8 Tom Tromey 2023-08-07 16:35:52 UTC

(In reply to Tom de Vries from comment #6)

> It's a race on this bit of code in cooked_indexer::index_imported_unit:
> ...

The intent of this code is to scan a given PU just once.
That is, the race is intentional.

> The root cause of the difference seems to be that the order of table->find is not stable.

I wonder what the entries are and why the order matters.
Maybe some kind of inlining post-processing is needed, dunno.

Comment 9 Tom de Vries 2023-08-08 23:32:52 UTC

With this patch we got a bit more info:
...
diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c
index 92ef0e35c5e..c53565dd317 100644
--- a/gdb/dwarf2/cooked-index.c
+++ b/gdb/dwarf2/cooked-index.c
@@ -583,6 +583,8 @@ cooked_index::dump (gdbarch *arch) const
       gdb_printf ("    flags:      %s\n", to_string (entry->flags).c_str ());
       gdb_printf ("    DIE offset: 0x%" PRIx64 "\n",
 		  to_underlying (entry->die_offset));
+      gdb_printf ("    per_cu offset: 0x%" PRIx64 "\n",
+		  to_underlying (entry->per_cu->sect_off));
 
       if (entry->parent_entry != nullptr)
 	gdb_printf ("    parent:     ((cooked_index_entry *) %p) [%s]\n",
...

In the failing case, we have:
...
Breakpoint 1 at 0x4004db: file /data/vries/gdb/src/gdb/testsuite/gdb.cp/breakpoint-locs.h, line 23.
   ...
   [5] ((cooked_index_entry *) 0x7f5384002f00)
    name:       baz
    canonical:  baz
    DWARF tag:  DW_TAG_subprogram
    flags:      0x0 []
    DIE offset: 0x2e
    per_cu offset: 0xe3
    parent:     ((cooked_index_entry *) 0x7f5384002ed0) [C1]
...
and in the passing case we have:
...
Breakpoint 1 at 0x4004db: N1::C1::baz. (2 locations)
  ...
   [12] ((cooked_index_entry *) 0x7f6968002f00)
    name:       baz
    canonical:  baz
    DWARF tag:  DW_TAG_subprogram
    flags:      0x0 []
    DIE offset: 0x2e
    per_cu offset: 0x16e
    parent:     ((cooked_index_entry *) 0x7f6968002ed0) [C1]
...

So, the per_cu doesn't refer to the PU, but rather to the importing CU that wins the race.

The PU should have the info that it has two includer CU, and expansion of both should be triggered.

So, my guess is that the PU dies have the wrong per_cu value.

Comment 10 Tom de Vries 2023-08-09 15:31:10 UTC

Reexploring the DW_TAG_inlined_subroutine approach of comment 1 (in combination with the proposed fix for PR30739):
...
diff --git a/gdb/dwarf2/cooked-index.h b/gdb/dwarf2/cooked-index.h
index 5aacb321c91..d5a2339dd87 100644
--- a/gdb/dwarf2/cooked-index.h
+++ b/gdb/dwarf2/cooked-index.h
@@ -141,7 +141,7 @@ struct cooked_index_entry : public allocate_on_obstack
 		|| tag == DW_TAG_constant
 		|| tag == DW_TAG_enumerator);
       case FUNCTIONS_DOMAIN:
-	return tag == DW_TAG_subprogram;
+	return tag == DW_TAG_subprogram	|| tag == DW_TAG_inlined_subroutine;
       case TYPES_DOMAIN:
 	return tag_is_type (tag);
       case MODULES_DOMAIN:
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index a64f82bd24a..eef1cffa95d 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -16363,7 +16363,8 @@ cooked_indexer::scan_attributes (dwarf2_per_cu_data *scanning_per_cu,
 	      && new_info_ptr > watermark_ptr
 	      && *parent_entry == nullptr)
 	    *maybe_defer = form_addr (origin_offset, origin_is_dwz);
-	  else if (*parent_entry == nullptr)
+	  else if (*parent_entry == nullptr
+		   || abbrev->tag == DW_TAG_inlined_subroutine)
 	    {
 	      CORE_ADDR lookup = form_addr (origin_offset, origin_is_dwz);
 	      void *obj = m_die_range_map.find (lookup);
@@ -16616,7 +16617,8 @@ cooked_indexer::index_dies (cutu_reader *reader,
 
 	    case DW_TAG_subprogram:
 	      if ((m_language == language_fortran
-		   || m_language == language_ada)
+		   || m_language == language_ada
+		   || m_language == language_cplus)
 		  && this_entry != nullptr)
 		{
 		  info_ptr = recurse (reader, info_ptr, this_entry, true);
...


What we're trying to achieve here is to add two DW_TAG_inlined_subroutine entries (one for each CU) to the cooked index table, with the parents pointing to the encapsulating namespace.

But, we have:
...
    [17] ((cooked_index_entry *) 0x4b1f420)
    name:       baz
    canonical:  baz
    DWARF tag:  DW_TAG_inlined_subroutine
    flags:      0x0 []
    DIE offset: 0x148
    parent:     ((cooked_index_entry *) 0x4b1f240) [C1]
  ...
   [19] ((cooked_index_entry *) 0x4b1f570)
    name:       baz
    canonical:  baz
    DWARF tag:  DW_TAG_inlined_subroutine
    flags:      0x0 []
    DIE offset: 0x1cf
    parent:     ((cooked_index_entry *) 0)
...
or:
...
    [5] ((cooked_index_entry *) 0x7fc7b4002f90)
    name:       baz
    canonical:  baz
    DWARF tag:  DW_TAG_inlined_subroutine
    flags:      0x0 []
    DIE offset: 0x148
    parent:     ((cooked_index_entry *) 0)
  ...
    [16] ((cooked_index_entry *) 0x7fc7bc0030b0)
    name:       baz
    canonical:  baz
    DWARF tag:  DW_TAG_inlined_subroutine
    flags:      0x0 []
    DIE offset: 0x1cf
    parent:     ((cooked_index_entry *) 0x7fc7bc002ed0) [C1]
...

So, either one or the other has the C1 parent, but we need it for both.

The problem is that the m_die_range_map data is limited to the scope of a CU reading session, and we need m_die_range_map data written in one session in another session.

Of course that's going to get interesting when the CUs are in different shards.

Comment 11 Tom de Vries 2023-08-10 08:49:49 UTC

Created attachment 15052 [details]
WIP patch

This WIP patch fixes the test-case.

It moves m_deferred_entries and m_die_range_map into cooked_index_shard, and handles the deferred entries sequentially in a loop after the parallel_for_each creating the shards.

That loop itself should still be wrapped in a "while (changed)", but that wasn't necessary for this particular test-case.

It defers too many entries, but that's a performance problem, I first wanted to make a simple implementation that can demonstrate that it fixes the problem.

Comment 12 Tom de Vries 2023-08-25 15:56:34 UTC

Posted RFC: https://sourceware.org/pipermail/gdb-patches/2023-August/201887.html

Comment 13 Tom de Vries 2023-10-02 12:55:42 UTC

https://sourceware.org/pipermail/gdb-patches/2023-October/202891.html

Comment 14 Tom Tromey 2024-01-15 00:47:52 UTC

I finally looked into this a bit.

I think that for inline functions specifically, what gdb should
do is track all the CUs that include a particular CU.
Then if an inline function is found in an included CU, all
including CUs should be expanded.

It's unclear to me if any other tags need this treatment.
Making it more targeted would mean that other requests would
be faster, due to less CU expansion.

Comment 15 Tom Tromey 2024-01-15 01:34:24 UTC

The inclusion handling code is pretty gross & wrong
(or at least the comments are wrong); and then it touches
the queue code, which we also know to be bad :(

Comment 16 Tom Tromey 2024-01-15 01:54:01 UTC

I wonder if gdb-index and debug-names correctly attribute these
symbols to the including CU.  If not then we'll see the bug there as well.

Comment 17 Tom Tromey 2024-02-09 19:48:27 UTC

(In reply to Tom Tromey from comment #16)
> I wonder if gdb-index and debug-names correctly attribute these
> symbols to the including CU.  If not then we'll see the bug there as well.

I forgot to mention -- if there is a DIE in the index that
comes from a partial unit, the .debug_names writer must attribute
that DIE to some other full CU.  This has to be done because
with .debug_names, gdb won't scan the DIE tree and so won't
know about imports.  However, it's also not really possible to
read a PU in isolation -- only when reading a full CU.

What I'd propose is having the .debug_names entry written
as referring to the DIE in the PU, but emit the CU as the
"canonical" includer... except when the DIE is an inline
function, in which case all CUs must be listed.

This would have also be documented as a gdb extension, because
DWARF seems to ignore this issue.

Comment 18 Tom Tromey 2024-02-16 02:42:02 UTC

(In reply to Tom Tromey from comment #17)

> What I'd propose is having the .debug_names entry written
> as referring to the DIE in the PU, but emit the CU as the
> "canonical" includer... except when the DIE is an inline
> function, in which case all CUs must be listed.
> 
> This would have also be documented as a gdb extension, because
> DWARF seems to ignore this issue.

When re-reading an index like this, we'll end up with a different
data structure than the one we started out with, which seems
unfortunate.  That is, the scanner will end up with the per-CU
data having "includer" links, but the debug-names reader will
just know that function "xyz" appears in CUs 1/2/3, if that
makes sense.

It'd be nicer to have a single data structure, but then we're
limited by what .debug_names can represent.

Comment 19 Tom de Vries 2024-05-06 13:57:18 UTC

Reproduces with current trunk.