Bug 29286

Summary: SUMMARY: ThreadSanitizer: data race gdb/dwarf2/read.c:4100 in dw_expand_symtabs_matching_file_matcher
Product: gdb Reporter: Tom de Vries <vries>
Component: gdbAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: tromey
Priority: P2    
Version: HEAD   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Bug Depends on:    
Bug Blocks: 29366, 31751    
Attachments: gdb.log
gdb.log (bitfields eliminated in dwarf2_per_cu_data)
Tentative patch
gdb.log
Tentative patch, updated
gdb.log (gdb.dwarf2/dw2-symtab-includes.exp)
WIP patch
gdb.log (gdb.dwarf2/dwz.exp)
Patch fixing all detected data races

Description Tom de Vries 2022-06-25 09:19:43 UTC
Created attachment 14168 [details]
gdb.log

I build gdb with -fsanitizer=thread and gcc 7.5.0, using patches:
- https://sourceware.org/pipermail/gdb-patches/2022-June/190336.html
- https://sourceware.org/pipermail/gdb-patches/2022-June/190337.html

The latter is not necessary when using gcc 8 or later.  I'm guessing the former is not necessary on a PIE-by-default system.

When running the testsuite, I run into:
...
FAIL: gdb.ada/access_to_packed_array.exp: maint expand-symtabs
...

Due to:
...
SUMMARY: ThreadSanitizer: data race /home/vries/gdb_versions/devel/src/gdb/dwarf2/read.c:4100 in dw_expand_symtabs_matching_file_matcher^
...
See gdb.log attachement for more detail.

I see just one FAIL sofar, but the problem seems to be quite common:
...
$ grep -c "SUMMARY: ThreadSanitizer" gdb.log
253
...
Comment 1 Tom de Vries 2022-06-25 09:57:30 UTC
So in dw_expand_symtabs_matching_file_matcher we do:
...
      per_cu->mark = 0;
...
and in cooked_index::do_finalize:
...
      if ((entry->per_cu->lang != language_cplus
           && entry->per_cu->lang != language_ada)
          || (entry->flags & IS_LINKAGE) != 0)
        entry->canonical = entry->name;
...
and in read.h, definition struct dwarf2_per_cu_data we have:
...
  /* A temporary mark bit used when iterating over all CUs in                                 
     expand_symtabs_matching.  */
  unsigned int mark : 1;
...
and:
...
  /* The language of this CU.  */
  ENUM_BITFIELD (language) lang : LANGUAGE_BITS;
...
so members of the same bitfield.
Comment 2 Tom de Vries 2022-06-25 11:10:20 UTC
FWIW, I've also compiled with gcc-12, and verified that gdb has a dependency on the libtsan from the package libtsan2-gcc12, and I still run into the same issue.
Comment 3 Tom de Vries 2022-06-25 11:29:42 UTC
Created attachment 14169 [details]
gdb.log (bitfields eliminated in dwarf2_per_cu_data)

After making all the bitfields in the struct proper fields, we run into a different problem.

We have a race between prepare_one_comp_unit gdb/dwarf2/read.c:23527:
...
    cu->per_cu->lang = dwarf_lang_to_enum_language (attr->constant_value (0));
...
and :
...
cooked_index::do_finalize gdb/dwarf2/cooked-index.c:199:
...
      if ((entry->per_cu->lang != language_cplus
           && entry->per_cu->lang != language_ada)
          || (entry->flags & IS_LINKAGE) != 0)
...

Hmm, AFAIU, building with 0 worker threads would guarantee this order, so the entry->per_cu->lang will then always be unknown when tested in cooked-index?
Comment 4 Tom de Vries 2022-06-25 12:14:02 UTC
(In reply to Tom de Vries from comment #3)
> Hmm, AFAIU, building with 0 worker threads would guarantee this order, so
> the entry->per_cu->lang will then always be unknown when tested in
> cooked-index?

Not quite.  When building with 0 worker threads, I observed the following order:
- prepare_one_comp_unit is called for the first time, and
  cu->per_cu->language is set
- in cooked_index::do_finalize, the cu->per_cu->language field is read
- then prepare_one_comp_unit is called once more, overwriting the
  cu->per_cu->language field with the same value

AFAICT, the race condition that thread sanitizer complains about is that the second write happens after the read, but could happen before it.

I've tried only setting the the cu->per_cu->lang field if it's unknown, and that seems to fix this particular problem.

But when re-introducing the bitfields, the original problem reappears.
Comment 5 Tom Tromey 2022-06-25 12:26:10 UTC
(In reply to Tom de Vries from comment #1)
> So in dw_expand_symtabs_matching_file_matcher we do:
> ...
>       per_cu->mark = 0;
> ...
> and in cooked_index::do_finalize:

I think probably cooked_index_functions::expand_symtabs_matching
should wait for finalization to be done.

> When building with 0 worker threads

You can "maint set worker-threads 0"
Comment 6 Tom de Vries 2022-06-25 14:17:23 UTC
Created attachment 14170 [details]
Tentative patch

(In reply to Tom Tromey from comment #5)
> (In reply to Tom de Vries from comment #1)
> > So in dw_expand_symtabs_matching_file_matcher we do:
> > ...
> >       per_cu->mark = 0;
> > ...
> > and in cooked_index::do_finalize:
> 
> I think probably cooked_index_functions::expand_symtabs_matching
> should wait for finalization to be done.
> 

I gave this a try, it works for the example, now testing.
Comment 7 Tom de Vries 2022-06-25 16:03:06 UTC
Created attachment 14171 [details]
gdb.log

While testing, found another one.

Replicating the same fix works for the test-case:
...
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index 6e581a6a8ea..f1a5b71b66e 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -4280,6 +4280,11 @@ dwarf2_base_index_functions::find_pc_sect_compunit_symtab
     warning (_("(Internal error: pc %s in read in CU, but not in symtab.)"),
             paddress (objfile->arch (), pc));
 
+  cooked_index_vector *table
+    = (static_cast<cooked_index_vector *>
+       (per_objfile->per_bfd->index_table.get ()));
+  table->wait ();
+
   result = recursively_find_pc_sect_compunit_symtab
     (dw2_instantiate_symtab (data, per_objfile, false), pc);
 
...
Comment 8 Tom Tromey 2022-06-25 16:17:21 UTC
Patch looks good, I wrote essentially the same one but then
didn't have time to upload it earlier.
Comment 9 Tom Tromey 2022-06-25 16:19:15 UTC
(In reply to Tom de Vries from comment #7)
> Created attachment 14171 [details]
> gdb.log
> 
> While testing, found another one.
> 
> Replicating the same fix works for the test-case:
> ...
> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
> index 6e581a6a8ea..f1a5b71b66e 100644
> --- a/gdb/dwarf2/read.c
> +++ b/gdb/dwarf2/read.c
> @@ -4280,6 +4280,11 @@
> dwarf2_base_index_functions::find_pc_sect_compunit_symtab
>      warning (_("(Internal error: pc %s in read in CU, but not in symtab.)"),
>              paddress (objfile->arch (), pc));
>  
> +  cooked_index_vector *table
> +    = (static_cast<cooked_index_vector *>
> +       (per_objfile->per_bfd->index_table.get ()));
> +  table->wait ();
> +
>    result = recursively_find_pc_sect_compunit_symtab
>      (dw2_instantiate_symtab (data, per_objfile, false), pc);

This will probably have to be a new override method in
cooked_index_functions that waits and then calls the
superclass method.  The base functions can't assume
there is a cooked index.
Comment 10 Tom de Vries 2022-06-25 20:33:32 UTC
Created attachment 14172 [details]
Tentative patch, updated
Comment 11 Tom de Vries 2022-06-25 21:26:58 UTC
Created attachment 14173 [details]
gdb.log (gdb.dwarf2/dw2-symtab-includes.exp)

Another one.

A data race between is_dwz in dwarf2_find_containing_comp_unit:
...
      if (mid_cu->is_dwz > offset_in_dwz
          || (mid_cu->is_dwz == offset_in_dwz
              && mid_cu->sect_off + mid_cu->length > sect_off))
...
and again lang in prepare_one_comp_unit:
...
    cu->per_cu->lang = dwarf_lang_to_enum_language (attr->constant_value (0));
...

Not sure how to fix this yet.
Comment 12 Tom de Vries 2022-06-25 22:38:18 UTC
(In reply to Tom de Vries from comment #10)
> Created attachment 14172 [details]
> Tentative patch, updated

Hm, the find_pc_sect_compunit_symtab bit causes regressions in test-cases gdb.dwarf2/dwzbuildid.exp and gdb.dwarf2/dw2-error.exp.
Comment 13 Tom de Vries 2022-06-26 13:16:23 UTC
Created attachment 14174 [details]
WIP patch
Comment 14 Tom de Vries 2022-06-26 13:36:32 UTC
Created attachment 14175 [details]
gdb.log (gdb.dwarf2/dwz.exp)

AFAIU, the test-case contains (PU: Partial unit, CU: Compilation Unit):
- PU A
- CU B
- CU C importing PU A

A parallel for is called, which starts calling process_psymtab_comp_unit in threads for A, B and C in parallel.

In the main thread, CU C is handled, the import is encountered, cooked_indexer::index_imported_unit is called, then cooked_indexer::ensure_cu_exists, which calls process_psymtab_comp_unit for PU A, which calls cutu_reader::cutu_reader for PU A.

In thread T3, PU A is handled, and cutu_reader::cutu_reader is called for PU A.

The thread sanitizer then detects that we're trying to read/write this_cu->dwarf_version in the same PU from different threads, and errors out.
Comment 15 Tom de Vries 2022-06-28 05:47:24 UTC
Created attachment 14180 [details]
Patch fixing all detected data races
Comment 16 Tom de Vries 2022-06-28 05:48:57 UTC
(In reply to Tom de Vries from comment #15)
> Created attachment 14180 [details]
> Patch fixing all detected data races

[gdb/symtab] Fix some data races

Fix some data races reported by thread sanitizer.

When build with -fsanitize=thread and gcc 12, passes testsuite with no
data races.

There are a few problems when testing:
- we have a heap-use-after-free in gdb.dwarf2/dw2-icc-opaque.exp, but
  that's a known issue, also triggers with -fsanitize-address.
  Reported at PR25723 comment 1.
- we run into timeouts in test-cases gdb.gdb/python-helper.exp and
  gdb.threads/detach-step-over.exp, presumably due to sanitizer overhead.
- with all tui test, at the point of calling newterm we run into
  "WARNING: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong
  thread)".  This also happens when first setting "maint set worker-threads
  0", so this doesn't seem to be related to gdb's multi-threading behaviour.
Comment 17 Tom de Vries 2022-06-28 12:38:30 UTC
(In reply to Tom de Vries from comment #16)
> - with all tui test, at the point of calling newterm we run into
>   "WARNING: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong
>   thread)".  This also happens when first setting "maint set worker-threads
>   0", so this doesn't seem to be related to gdb's multi-threading behaviour.

I can reproduce this with a standalone testcase callling newterm, so this is indeed unrelated to gdb.
Comment 18 Tom de Vries 2022-06-29 13:12:54 UTC
(In reply to Tom de Vries from comment #16)
> - we have a heap-use-after-free in gdb.dwarf2/dw2-icc-opaque.exp, but
>   that's a known issue, also triggers with -fsanitize-address.
>   Reported at PR25723 comment 1.

Filed as PR29295.
Comment 19 Tom de Vries 2022-06-29 15:34:48 UTC
Submitted RFC: https://sourceware.org/pipermail/gdb-patches/2022-June/190420.html
Comment 20 Tom de Vries 2022-07-12 14:59:29 UTC
(In reply to Tom de Vries from comment #17)
> (In reply to Tom de Vries from comment #16)
> > - with all tui test, at the point of calling newterm we run into
> >   "WARNING: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong
> >   thread)".  This also happens when first setting "maint set worker-threads
> >   0", so this doesn't seem to be related to gdb's multi-threading behaviour.
> 
> I can reproduce this with a standalone testcase callling newterm, so this is
> indeed unrelated to gdb.

Handled at https://sourceware.org/bugzilla/show_bug.cgi?id=29328
Comment 21 Tom de Vries 2022-07-14 07:10:10 UTC
(In reply to Tom de Vries from comment #14)
> Created attachment 14175 [details]
> gdb.log (gdb.dwarf2/dwz.exp)
> 
> AFAIU, the test-case contains (PU: Partial unit, CU: Compilation Unit):
> - PU A
> - CU B
> - CU C importing PU A
> 
> A parallel for is called, which starts calling process_psymtab_comp_unit in
> threads for A, B and C in parallel.
> 
> In the main thread, CU C is handled, the import is encountered,
> cooked_indexer::index_imported_unit is called, then
> cooked_indexer::ensure_cu_exists, which calls process_psymtab_comp_unit for
> PU A, which calls cutu_reader::cutu_reader for PU A.
> 
> In thread T3, PU A is handled, and cutu_reader::cutu_reader is called for PU
> A.
> 
> The thread sanitizer then detects that we're trying to read/write
> this_cu->dwarf_version in the same PU from different threads, and errors out.

https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=33fd0a33639897e1c3403389de882cadd344a494
Comment 22 Tom de Vries 2022-07-14 07:13:17 UTC
(In reply to Tom de Vries from comment #6)
> Created attachment 14170 [details]
> Tentative patch
> 
> (In reply to Tom Tromey from comment #5)
> > (In reply to Tom de Vries from comment #1)
> > > So in dw_expand_symtabs_matching_file_matcher we do:
> > > ...
> > >       per_cu->mark = 0;
> > > ...
> > > and in cooked_index::do_finalize:
> > 
> > I think probably cooked_index_functions::expand_symtabs_matching
> > should wait for finalization to be done.
> > 
> 
> I gave this a try, it works for the example, now testing.

Submitted as https://sourceware.org/pipermail/gdb-patches/2022-July/190743.html
Comment 23 Tom de Vries 2022-07-14 18:50:27 UTC
(In reply to Tom de Vries from comment #22)
> (In reply to Tom de Vries from comment #6)
> > Created attachment 14170 [details]
> > Tentative patch
> > 
> > (In reply to Tom Tromey from comment #5)
> > > (In reply to Tom de Vries from comment #1)
> > > > So in dw_expand_symtabs_matching_file_matcher we do:
> > > > ...
> > > >       per_cu->mark = 0;
> > > > ...
> > > > and in cooked_index::do_finalize:
> > > 
> > > I think probably cooked_index_functions::expand_symtabs_matching
> > > should wait for finalization to be done.
> > > 
> > 
> > I gave this a try, it works for the example, now testing.
> 
> Submitted as
> https://sourceware.org/pipermail/gdb-patches/2022-July/190743.html

https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=7d1a572d6b5194d36a96f36b3d28ce591341deb6
Comment 24 Tom de Vries 2022-07-14 18:53:20 UTC
There are a couple of "add wait() here" patches mentioned in this PR for which I can no longer trigger the offending thread sanitizer warning/error.

My understanding of those patches is not good enough to decide that they are needed anyway, so I'm basically waiting for the next thread sanitizer error to proceed with this.
Comment 25 Tom Tromey 2022-09-23 20:44:41 UTC
What's the state of this bug?
It's on the gdb 13 list, but I'm wondering if the known
issues are fixed.
Comment 26 Tom de Vries 2022-09-24 05:39:01 UTC
(In reply to Tom Tromey from comment #25)
> What's the state of this bug?
> It's on the gdb 13 list, but I'm wondering if the known
> issues are fixed.

The state of this bug is recorded in comment 24.