Summary: | SUMMARY: ThreadSanitizer: data race gdb/dwarf2/read.c:4100 in dw_expand_symtabs_matching_file_matcher | ||
---|---|---|---|
Product: | gdb | Reporter: | Tom de Vries <vries> |
Component: | gdb | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | NEW --- | ||
Severity: | normal | CC: | tromey |
Priority: | P2 | ||
Version: | HEAD | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Bug Depends on: | |||
Bug Blocks: | 29366, 31751 | ||
Attachments: |
gdb.log
gdb.log (bitfields eliminated in dwarf2_per_cu_data) Tentative patch gdb.log Tentative patch, updated gdb.log (gdb.dwarf2/dw2-symtab-includes.exp) WIP patch gdb.log (gdb.dwarf2/dwz.exp) Patch fixing all detected data races |
Description
Tom de Vries
2022-06-25 09:19:43 UTC
So in dw_expand_symtabs_matching_file_matcher we do: ... per_cu->mark = 0; ... and in cooked_index::do_finalize: ... if ((entry->per_cu->lang != language_cplus && entry->per_cu->lang != language_ada) || (entry->flags & IS_LINKAGE) != 0) entry->canonical = entry->name; ... and in read.h, definition struct dwarf2_per_cu_data we have: ... /* A temporary mark bit used when iterating over all CUs in expand_symtabs_matching. */ unsigned int mark : 1; ... and: ... /* The language of this CU. */ ENUM_BITFIELD (language) lang : LANGUAGE_BITS; ... so members of the same bitfield. FWIW, I've also compiled with gcc-12, and verified that gdb has a dependency on the libtsan from the package libtsan2-gcc12, and I still run into the same issue. Created attachment 14169 [details]
gdb.log (bitfields eliminated in dwarf2_per_cu_data)
After making all the bitfields in the struct proper fields, we run into a different problem.
We have a race between prepare_one_comp_unit gdb/dwarf2/read.c:23527:
...
cu->per_cu->lang = dwarf_lang_to_enum_language (attr->constant_value (0));
...
and :
...
cooked_index::do_finalize gdb/dwarf2/cooked-index.c:199:
...
if ((entry->per_cu->lang != language_cplus
&& entry->per_cu->lang != language_ada)
|| (entry->flags & IS_LINKAGE) != 0)
...
Hmm, AFAIU, building with 0 worker threads would guarantee this order, so the entry->per_cu->lang will then always be unknown when tested in cooked-index?
(In reply to Tom de Vries from comment #3) > Hmm, AFAIU, building with 0 worker threads would guarantee this order, so > the entry->per_cu->lang will then always be unknown when tested in > cooked-index? Not quite. When building with 0 worker threads, I observed the following order: - prepare_one_comp_unit is called for the first time, and cu->per_cu->language is set - in cooked_index::do_finalize, the cu->per_cu->language field is read - then prepare_one_comp_unit is called once more, overwriting the cu->per_cu->language field with the same value AFAICT, the race condition that thread sanitizer complains about is that the second write happens after the read, but could happen before it. I've tried only setting the the cu->per_cu->lang field if it's unknown, and that seems to fix this particular problem. But when re-introducing the bitfields, the original problem reappears. (In reply to Tom de Vries from comment #1) > So in dw_expand_symtabs_matching_file_matcher we do: > ... > per_cu->mark = 0; > ... > and in cooked_index::do_finalize: I think probably cooked_index_functions::expand_symtabs_matching should wait for finalization to be done. > When building with 0 worker threads You can "maint set worker-threads 0" Created attachment 14170 [details] Tentative patch (In reply to Tom Tromey from comment #5) > (In reply to Tom de Vries from comment #1) > > So in dw_expand_symtabs_matching_file_matcher we do: > > ... > > per_cu->mark = 0; > > ... > > and in cooked_index::do_finalize: > > I think probably cooked_index_functions::expand_symtabs_matching > should wait for finalization to be done. > I gave this a try, it works for the example, now testing. Created attachment 14171 [details]
gdb.log
While testing, found another one.
Replicating the same fix works for the test-case:
...
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index 6e581a6a8ea..f1a5b71b66e 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -4280,6 +4280,11 @@ dwarf2_base_index_functions::find_pc_sect_compunit_symtab
warning (_("(Internal error: pc %s in read in CU, but not in symtab.)"),
paddress (objfile->arch (), pc));
+ cooked_index_vector *table
+ = (static_cast<cooked_index_vector *>
+ (per_objfile->per_bfd->index_table.get ()));
+ table->wait ();
+
result = recursively_find_pc_sect_compunit_symtab
(dw2_instantiate_symtab (data, per_objfile, false), pc);
...
Patch looks good, I wrote essentially the same one but then didn't have time to upload it earlier. (In reply to Tom de Vries from comment #7) > Created attachment 14171 [details] > gdb.log > > While testing, found another one. > > Replicating the same fix works for the test-case: > ... > diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c > index 6e581a6a8ea..f1a5b71b66e 100644 > --- a/gdb/dwarf2/read.c > +++ b/gdb/dwarf2/read.c > @@ -4280,6 +4280,11 @@ > dwarf2_base_index_functions::find_pc_sect_compunit_symtab > warning (_("(Internal error: pc %s in read in CU, but not in symtab.)"), > paddress (objfile->arch (), pc)); > > + cooked_index_vector *table > + = (static_cast<cooked_index_vector *> > + (per_objfile->per_bfd->index_table.get ())); > + table->wait (); > + > result = recursively_find_pc_sect_compunit_symtab > (dw2_instantiate_symtab (data, per_objfile, false), pc); This will probably have to be a new override method in cooked_index_functions that waits and then calls the superclass method. The base functions can't assume there is a cooked index. Created attachment 14172 [details]
Tentative patch, updated
Created attachment 14173 [details]
gdb.log (gdb.dwarf2/dw2-symtab-includes.exp)
Another one.
A data race between is_dwz in dwarf2_find_containing_comp_unit:
...
if (mid_cu->is_dwz > offset_in_dwz
|| (mid_cu->is_dwz == offset_in_dwz
&& mid_cu->sect_off + mid_cu->length > sect_off))
...
and again lang in prepare_one_comp_unit:
...
cu->per_cu->lang = dwarf_lang_to_enum_language (attr->constant_value (0));
...
Not sure how to fix this yet.
(In reply to Tom de Vries from comment #10) > Created attachment 14172 [details] > Tentative patch, updated Hm, the find_pc_sect_compunit_symtab bit causes regressions in test-cases gdb.dwarf2/dwzbuildid.exp and gdb.dwarf2/dw2-error.exp. Created attachment 14174 [details]
WIP patch
Created attachment 14175 [details]
gdb.log (gdb.dwarf2/dwz.exp)
AFAIU, the test-case contains (PU: Partial unit, CU: Compilation Unit):
- PU A
- CU B
- CU C importing PU A
A parallel for is called, which starts calling process_psymtab_comp_unit in threads for A, B and C in parallel.
In the main thread, CU C is handled, the import is encountered, cooked_indexer::index_imported_unit is called, then cooked_indexer::ensure_cu_exists, which calls process_psymtab_comp_unit for PU A, which calls cutu_reader::cutu_reader for PU A.
In thread T3, PU A is handled, and cutu_reader::cutu_reader is called for PU A.
The thread sanitizer then detects that we're trying to read/write this_cu->dwarf_version in the same PU from different threads, and errors out.
Created attachment 14180 [details]
Patch fixing all detected data races
(In reply to Tom de Vries from comment #15) > Created attachment 14180 [details] > Patch fixing all detected data races [gdb/symtab] Fix some data races Fix some data races reported by thread sanitizer. When build with -fsanitize=thread and gcc 12, passes testsuite with no data races. There are a few problems when testing: - we have a heap-use-after-free in gdb.dwarf2/dw2-icc-opaque.exp, but that's a known issue, also triggers with -fsanitize-address. Reported at PR25723 comment 1. - we run into timeouts in test-cases gdb.gdb/python-helper.exp and gdb.threads/detach-step-over.exp, presumably due to sanitizer overhead. - with all tui test, at the point of calling newterm we run into "WARNING: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong thread)". This also happens when first setting "maint set worker-threads 0", so this doesn't seem to be related to gdb's multi-threading behaviour. (In reply to Tom de Vries from comment #16) > - with all tui test, at the point of calling newterm we run into > "WARNING: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong > thread)". This also happens when first setting "maint set worker-threads > 0", so this doesn't seem to be related to gdb's multi-threading behaviour. I can reproduce this with a standalone testcase callling newterm, so this is indeed unrelated to gdb. (In reply to Tom de Vries from comment #16) > - we have a heap-use-after-free in gdb.dwarf2/dw2-icc-opaque.exp, but > that's a known issue, also triggers with -fsanitize-address. > Reported at PR25723 comment 1. Filed as PR29295. (In reply to Tom de Vries from comment #17) > (In reply to Tom de Vries from comment #16) > > - with all tui test, at the point of calling newterm we run into > > "WARNING: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong > > thread)". This also happens when first setting "maint set worker-threads > > 0", so this doesn't seem to be related to gdb's multi-threading behaviour. > > I can reproduce this with a standalone testcase callling newterm, so this is > indeed unrelated to gdb. Handled at https://sourceware.org/bugzilla/show_bug.cgi?id=29328 (In reply to Tom de Vries from comment #14) > Created attachment 14175 [details] > gdb.log (gdb.dwarf2/dwz.exp) > > AFAIU, the test-case contains (PU: Partial unit, CU: Compilation Unit): > - PU A > - CU B > - CU C importing PU A > > A parallel for is called, which starts calling process_psymtab_comp_unit in > threads for A, B and C in parallel. > > In the main thread, CU C is handled, the import is encountered, > cooked_indexer::index_imported_unit is called, then > cooked_indexer::ensure_cu_exists, which calls process_psymtab_comp_unit for > PU A, which calls cutu_reader::cutu_reader for PU A. > > In thread T3, PU A is handled, and cutu_reader::cutu_reader is called for PU > A. > > The thread sanitizer then detects that we're trying to read/write > this_cu->dwarf_version in the same PU from different threads, and errors out. https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=33fd0a33639897e1c3403389de882cadd344a494 (In reply to Tom de Vries from comment #6) > Created attachment 14170 [details] > Tentative patch > > (In reply to Tom Tromey from comment #5) > > (In reply to Tom de Vries from comment #1) > > > So in dw_expand_symtabs_matching_file_matcher we do: > > > ... > > > per_cu->mark = 0; > > > ... > > > and in cooked_index::do_finalize: > > > > I think probably cooked_index_functions::expand_symtabs_matching > > should wait for finalization to be done. > > > > I gave this a try, it works for the example, now testing. Submitted as https://sourceware.org/pipermail/gdb-patches/2022-July/190743.html (In reply to Tom de Vries from comment #22) > (In reply to Tom de Vries from comment #6) > > Created attachment 14170 [details] > > Tentative patch > > > > (In reply to Tom Tromey from comment #5) > > > (In reply to Tom de Vries from comment #1) > > > > So in dw_expand_symtabs_matching_file_matcher we do: > > > > ... > > > > per_cu->mark = 0; > > > > ... > > > > and in cooked_index::do_finalize: > > > > > > I think probably cooked_index_functions::expand_symtabs_matching > > > should wait for finalization to be done. > > > > > > > I gave this a try, it works for the example, now testing. > > Submitted as > https://sourceware.org/pipermail/gdb-patches/2022-July/190743.html https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=7d1a572d6b5194d36a96f36b3d28ce591341deb6 There are a couple of "add wait() here" patches mentioned in this PR for which I can no longer trigger the offending thread sanitizer warning/error. My understanding of those patches is not good enough to decide that they are needed anyway, so I'm basically waiting for the next thread sanitizer error to proceed with this. What's the state of this bug? It's on the gdb 13 list, but I'm wondering if the known issues are fixed. (In reply to Tom Tromey from comment #25) > What's the state of this bug? > It's on the gdb 13 list, but I'm wondering if the known > issues are fixed. The state of this bug is recorded in comment 24. |