The DWARF reader has some "pretend language" code that tries to find the language for a partial unit. The idea is that the partial unit won't have a language setting of its own, so the importing CU's language should be used. This approach is wrong now on two accounts. First, a PU's language cannot be set as dwarf2_per_cu::set_lang explicitly disallows this. Second, gdb.fortran/mixed-lang-stack.exp, when run with dwz (or maybe dwz-5, I don't recall) will create a file where C and C++ units import the same PU. So, a consistent language cannot be assigned. Now, the specific case of a C/C++ clash could be resolved by assuming "C". Another approach is to re-read all the units for each language, though this could hugely impact the scanning time.
(In reply to Tom Tromey from comment #0) > Another approach is to re-read all the units for each > language, though this could hugely impact the scanning time. Huh, I thought that this is how it worked, that the partial unit was re-read in the context of each time it's DW_AT_import'ed, as if it was really copy pasted at each import location. I see that in DWARF 5, a partial unit can have a DW_AT_language. Do we ever use this one?
(In reply to Simon Marchi from comment #1) > (In reply to Tom Tromey from comment #0) > > Another approach is to re-read all the units for each > > language, though this could hugely impact the scanning time. > > Huh, I thought that this is how it worked, that the partial unit was re-read > in the context of each time it's DW_AT_import'ed, as if it was really copy > pasted at each import location. That's how DWARF seems to conceptualize it but this is very expensive and gdb's implementation just assumes that each import is only at the top level, since this is what dwz does and dwz is the only known producer of this output. > I see that in DWARF 5, a partial unit can have a DW_AT_language. Do we ever > use this one? Yes, if the DIE has the attribute, it will be used. See cutu_reader::prepare_one_comp_unit
(In reply to Tom Tromey from comment #2) > (In reply to Simon Marchi from comment #1) > > (In reply to Tom Tromey from comment #0) > > > Another approach is to re-read all the units for each > > > language, though this could hugely impact the scanning time. > > > > Huh, I thought that this is how it worked, that the partial unit was re-read > > in the context of each time it's DW_AT_import'ed, as if it was really copy > > pasted at each import location. > > That's how DWARF seems to conceptualize it but this is very expensive > and gdb's implementation just assumes that each import is only at the > top level, since this is what dwz does and dwz is the only known producer > of this output. But it would be just as expensive than if the de-duplication didn't happen, right? > > > I see that in DWARF 5, a partial unit can have a DW_AT_language. Do we ever > > use this one? > > Yes, if the DIE has the attribute, it will be used. > See cutu_reader::prepare_one_comp_unit I had another question while preparing the DW_IDX_* DWARF proposal. I don't really see how an index is supposed to work with partial units. Suppose that you have two CUs that you run through a compression tool like dwz. The tool identifies a common sub-tree between the two CUs. It moves that sub-tree to a partial unit and replaces the two instances in the CUs with some DW_TAG_imported_unit DIEs. The sub-tree moved to the PU contains some names that should be indexed. What should be in the (.debug_names) index? Index entries must point to a specific CU, and to a specific DIE by giving the offset within that CU. Should partial units be in the index CU list? Partial units are actually called "partial compilation units" (as opposed to full compilation units), so based on the vocabulary... yes? But even if an index entry pointed to a DIE in a PU, would it be useful to consumers? I guess not, since the contents of the PU doesn't make sense on its own. I was thinking that such an entry would also need to reference a full CU that imports the PU, so that the consumer knows how to reach that DIE with the right context. It would be similar to how it works with foreign type units. From DWARF 5: > When an index entry refers to a foreign type unit, it may have attributes for both CU and (foreign) TU. For such entries, the CU attribute gives the consumer a reference to the CU that may be used to locate a split DWARF object file that contains the type unit.
> But it would be just as expensive than if the de-duplication didn't happen, > right? I think it would have to be more expensive since (1) each new PU has some fixed overhead and (2) a PU might not be minimal so excess reading may be required. Anyway this is an option if we want it. > I had another question while preparing the DW_IDX_* DWARF proposal. I don't > really see how an index is supposed to work with partial units. Suppose > that you have two CUs that you run through a compression tool like dwz. The > tool identifies a common sub-tree between the two CUs. It moves that > sub-tree to a partial unit and replaces the two instances in the CUs with > some DW_TAG_imported_unit DIEs. The sub-tree moved to the PU contains some > names that should be indexed. What should be in the (.debug_names) index? This was mentioned in that patch I linked to in the other bug. Here's the text I'll send when that series is ready: +@item +Definitions in partial units are handled differently. These most +typically are seen in the output of @code{dwz}. + +In general, a DWARF partial unit cannot be read in isolation, but only +by reading it in the context of some other unit that references it via +@code{DW_TAG_imported_unit}. + +Therefore, an ordinary definition in a partial unit is attributed to +one of the outermost containing units. This is done by referencing +this containing CU in the @code{DW_IDX_compile_unit} attribute. + +A further special case applies to @code{DW_TAG_inlined_subroutine} +entries. An inlined subroutine appearing in a partial unit may be +inlined in all of the outermost compilation units that directly or +indirectly include the partial unit. Therefore, in this case, +@value{GDBN} will emit a separate index entry for the entry, once for +each such containing unit.
Found the other bug I mentioned.
My series for #30728 touches on this area a little. There I reason that if a PU is shared across languages, it most likely is semantically valid for both, and so the discrepancy can be ignored. While this is probably true in practice, note that it's not really guaranteed. For example an array type could be shared by C and Ada, and if the lower bound were omitted it would validly describe two different types. I consider this unlikely to happen, though, given the practicalities of the DWARF output (e.g., Ada emits encoded names only).
https://sourceware.org/pipermail/gdb-patches/2025-December/223204.html
The master branch has been updated by Tom Tromey <tromey@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=329a53a6d590e2e90f590c89473990040a86c8e0 commit 329a53a6d590e2e90f590c89473990040a86c8e0 Author: Tom Tromey <tom@tromey.com> Date: Sat Nov 22 11:03:57 2025 -0700 Some cleanups to "pretend language" handling I noticed that the "pretend language" handling in the DWARF reader doesn't work as intended; the problem code in dwarf2_per_cu::set_lang is: if (unit_type () == DW_UT_partial) return; The issue here is that this subverts the very purpose of having a "pretend" language. Some background: when Jakub wrote dwz, we also added support for this style of DWARF compression to gdb. Now, dwz only shares DIEs in a "top level" way -- i.e., at the time (and as far as I know, continuing to today), it would not emit a DW_TAG_imported_unit inside a namespace. So, when implementing this we also implemented an optimization, namely that gdb would not re-read every imported unit a la '#include', but instead would make symtabs for each included unit (partial units didn't yet exist). However, an imported/partial unit might not have a language -- but a language is necessary for interpreting the DIEs. This is where the "pretend" language comes from. When reading a CU, any included partial units that do not have a language of their own will inherit that CU's language. This patch started by removing the DW_UT_partial check. This of course caused assertion failures in some modes, as set_lang also asserts that the language cannot change. But, it's possible for a CU to be prepared multiple times, and for different invocations to provide different languages. This is not a scenario we allowed for in the early days. Nowadays, though, it seems to me that it's basically fine in practice, with the reason being that sharing DIEs that differ semantically but not syntactically across different languages is hard to achieve. We do see this some cross-language sharing in a limited way -- "dwz -5" will emit inclusions from both C and C++ CUs for the gdb.fortran/mixed-lang-stack.exp test -- but note that this sharing is limited to things that are common between C and C++, like "float". Therefore this patch replaces the assertions in set_lang with some compare-exchanges. Finally I changed cutu_reader to use a std::optional for the pretend language. I think this makes it more clear what is happening. And, while doing this I found a spot in the cooked indexer where language_minimal was passed in, but where the importing CU's language should have been used. I regression tested this on x86-64 Fedora 40 using the default board, plus the cc-with-gdb-index, cc-with-debug-names, and cc-with-dwz-5 boards. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=33661
Fixed.