Decl/def matching with templates without template parameters in the DW_AT_name

Simon Marchi simark@simark.ca
Thu Jan 12 01:46:32 GMT 2023



On 1/11/23 18:50, David Blaikie wrote:
> On Wed, Jan 11, 2023 at 10:24 AM Simon Marchi <simark@simark.ca> wrote:
>>
>>
>>
>> On 1/6/23 12:37, David Blaikie via Gdb wrote:
>>> I've been working on reducing debug info size, especially for
>>> expression-template heavy code like in Tensorflow and Eigen.
>>>
>>> To that end, I've implemented the flag -gsimple-template-names in
>>> Clang that simplifies the DW_AT_name for templates by omitting
>>> template parameters (eg: "vector" instead of "vector<int,
>>> std::allocator<int>>") when the DW_TAG_template_*_parameters contain
>>> sufficient information to reconstruct the original name.
>>
>> This reduces the length of strings, and more importantly it allows more
>> sharing, is that it?
> 
> Yep. Especially for complex expression templates - the fully expanded
> template names can be massive (some examples I had involve 50k
> character names... ) - they don't have the benefits of mangled names
> that include redundancy elimination/deduplication at least within a
> single name (admittedly the mangled names for these things are massive
> too - and at some point I might start some discussions about
> alternative mangling/symbol naming schemes, for those who can diverge
> from the Itanium ABI)
> 
>>  It doesn't change the layout of the DIE tree?
> 
> Correct...
> 
> well, in one case it does change the DIE tree - template declarations.
> 
> GCC and Clang produce template declarations without template parameter DIEs, eg:
> 
> $ cat > test.cpp
> template<typename T>
> struct t1 { };
> t1<int> *v;
> $ clang++-tot test.cpp -g -c && llvm-dwarfdump-tot test.o
> ...
> 0x0000002e:   DW_TAG_structure_type
>                 DW_AT_name      ("t1<int>")
>                 DW_AT_declaration       (true)
> 
> 0x00000030:   NULL
> 
> So if we dropped the `<int>` from that name, there'd be no hope to
> connect the declaration with the right definition.
> 
> So when using -gsimple-template-names, we enable template parameter
> DIEs on declarations. (Sony implemented this previously, presumably
> because their debugger needed it for something like this - but it
> wasn't enabled by defaulte because GDB and LLDB didn't seem to need
> these extra DIEs so it wasn't worth the extra size - but maybe it's
> not so much to be a real problem/maybe we should switch that on by
> default at some point if we're going further in this direction).
> 
>>> This generally seems to work in GDB - that looks intentional (perhaps
>>> because someone else implemented this feature elsewhere, or just for
>>> canonicalization reasons (the full string with template parameters
>>> might have different whitespace, parentheses, other formatting)), it
>>> seems unlikely that GDB would accidentally be able to connect two
>>> "vector" declarations up to the right "vector" definitions based on
>>> the template parameters without intentional code to support this
>>> scenario.
>>
>> Probably this code:
>>
>> https://gitlab.com/gnutools/binutils-gdb/-/blob/1b9af5b949bff0c750ededb459400c1857fec416/gdb/dwarf2/read.c#L9002
> 
> Something like that. Though the comment here is interesting:
> 
> https://gitlab.com/gnutools/binutils-gdb/-/blob/1b9af5b949bff0c750ededb459400c1857fec416/gdb/dwarf2/read.c#L8944
> 
>  /* Template parameters may be specified in the DIE's DW_AT_name, or
>     as children with DW_TAG_template_type_param or
>     DW_TAG_value_type_param.  If the latter, add them to the name
>     here.  If the name already has template parameters, then
>     skip this step; some versions of GCC emit both, and
>     it is more efficient to use the pre-computed name.
> 
> At least usually GCC and Clang produce both (DW_AT_name with
> parameters, and DIEs for the parameters) for definitions, and
> name-only, no DIEs for declarations.
> 
> I wonder which compiler has previously produced simplified names -
> it'd be nice as a proof-of-concept/existence proof/prior art for the
> new Clang feature.

Digging in the history leads me to:

  https://inbox.sourceware.org/gdb-patches/201007302017.41074.pedro@codesourcery.com/

So RVCT, the RealView compiler.  I don't have access to that,
unfortunately.  It seems obsolete, also.

> It's also interesting that it says to use the name if it's already got
> them - that suggests no canonicalization, but I'm pretty sure GDB does
> do some canonicalization to allow fuzzy matching of decls/defs between
> Clang and GCC (& whatever other compilers) which might have minor
> differences between how they print template parameters.

I don't know, unfortunately.

> 
> But yeah, that code that specifically says "if there's no '<' in the
> name, go and rebuild the name" is very explicit support for the
> -gsimple-template-names naming... someone must've done this before.
> 
> 
>>> But one place I found a problem was in pretty printers. I have
>>> situations where a type declaration isn't resolved to a definition
>>> when working within a pretty printer (resulting in the pretty printer
>>> being unable to find member variables in the object/of that type) -
>>> but if I print out the variable/member it works correctly - or if I
>>> "list" the source of the file where the definition of the type is,
>>> then the pretty printer works correctly.
>>>
>>> Does this ring a bell for anyone/sound like something sufficient to
>>> file a bug, or would I need to dig in further to create an isolated
>>> reproduction to help communicate the issue I'm seeing?
>>
>> I have no idea about what happens without digging into the code.  I just
>> remember some bug where GDB would generate the templated name with let's
>> say "<5u>" for an integer value template parameter, so if you entered
>> "<5>", it didn't match textually.  Maybe something like that is
>> happening.
>>
>> I also found:
>>
>>   https://sourceware.org/bugzilla/show_bug.cgi?id=17762
> 
> Yeah, I certainly hit a few cases like that when testing
> -gsimple-template-names canonicalization/roundtripping to make sure it
> wasn't lossy.

In order to avoid any bug like this, for you tests, maybe you can use
the -readnow switch to bypass the partial symbol steps.

> 
>> There's the possibility that the "partial symbol" and "expanded symbol"
>> don't have the same name.  So you could see a difference in behavior
>> depending on whether the CU has been expanded by GDB or not (I'm just
>> speculating).
>>
>> Is it valid DWARF (5) for DW_AT_name of a templated struct instantiation
>> to omit the template parameters?  I don't see DWARF mandating one or the
>> other, so I assume that both including them or not are valid.
> 
> Yeah, this is a case where DWARF is like "here are some tools you
> could use to express some language features, have at!" and doesn't say
> "to describe this particular language feature you must use DWARF in
> this particular way"

Typical "DWARF is a permissive standard, not a prescriptive one" thing.

> 
>> If so, I
>> think we can consider it a GDB bug if it doesn't process the "without"
>> version correctly.  Feel free to file a bug with the relevant background
>> information.  It would be useful to include some binaries, one compiled
>> with -gsimple-tempalte-names and one without, so it's easy to compare
>> the two.
> 
> Yeah, for now I'm only seeing the problem with pretty printers, so to
> provide repro steps will be a bit more involved. (I've also got a
> crash that seems to happen when using the feature without
> -gsplit-dwarf (it's probably reproducable with -gsplit-dwarf too, but
> we enable -Wl,-gdb-index in that case and so GDB doesn't parse as much
> DWARF on startup - but if we tickled it into loading more DWARF it'd
> probably hit the same crash, I'd wager))
> 
> I'll see what I can do.

If you end up merging this, it will be interesting to run the full GDB
testsuite against clang with that flag, it would cover a lot of things.

Simon


More information about the Gdb mailing list