Decl/def matching with templates without template parameters in the DW_AT_name

David Blaikie dblaikie@gmail.com
Wed Jan 11 23:50:11 GMT 2023


On Wed, Jan 11, 2023 at 10:24 AM Simon Marchi <simark@simark.ca> wrote:
>
>
>
> On 1/6/23 12:37, David Blaikie via Gdb wrote:
> > I've been working on reducing debug info size, especially for
> > expression-template heavy code like in Tensorflow and Eigen.
> >
> > To that end, I've implemented the flag -gsimple-template-names in
> > Clang that simplifies the DW_AT_name for templates by omitting
> > template parameters (eg: "vector" instead of "vector<int,
> > std::allocator<int>>") when the DW_TAG_template_*_parameters contain
> > sufficient information to reconstruct the original name.
>
> This reduces the length of strings, and more importantly it allows more
> sharing, is that it?

Yep. Especially for complex expression templates - the fully expanded
template names can be massive (some examples I had involve 50k
character names... ) - they don't have the benefits of mangled names
that include redundancy elimination/deduplication at least within a
single name (admittedly the mangled names for these things are massive
too - and at some point I might start some discussions about
alternative mangling/symbol naming schemes, for those who can diverge
from the Itanium ABI)

>  It doesn't change the layout of the DIE tree?

Correct...

well, in one case it does change the DIE tree - template declarations.

GCC and Clang produce template declarations without template parameter DIEs, eg:

$ cat > test.cpp
template<typename T>
struct t1 { };
t1<int> *v;
$ clang++-tot test.cpp -g -c && llvm-dwarfdump-tot test.o
...
0x0000002e:   DW_TAG_structure_type
                DW_AT_name      ("t1<int>")
                DW_AT_declaration       (true)

0x00000030:   NULL

So if we dropped the `<int>` from that name, there'd be no hope to
connect the declaration with the right definition.

So when using -gsimple-template-names, we enable template parameter
DIEs on declarations. (Sony implemented this previously, presumably
because their debugger needed it for something like this - but it
wasn't enabled by defaulte because GDB and LLDB didn't seem to need
these extra DIEs so it wasn't worth the extra size - but maybe it's
not so much to be a real problem/maybe we should switch that on by
default at some point if we're going further in this direction).

> > This generally seems to work in GDB - that looks intentional (perhaps
> > because someone else implemented this feature elsewhere, or just for
> > canonicalization reasons (the full string with template parameters
> > might have different whitespace, parentheses, other formatting)), it
> > seems unlikely that GDB would accidentally be able to connect two
> > "vector" declarations up to the right "vector" definitions based on
> > the template parameters without intentional code to support this
> > scenario.
>
> Probably this code:
>
> https://gitlab.com/gnutools/binutils-gdb/-/blob/1b9af5b949bff0c750ededb459400c1857fec416/gdb/dwarf2/read.c#L9002

Something like that. Though the comment here is interesting:

https://gitlab.com/gnutools/binutils-gdb/-/blob/1b9af5b949bff0c750ededb459400c1857fec416/gdb/dwarf2/read.c#L8944

 /* Template parameters may be specified in the DIE's DW_AT_name, or
    as children with DW_TAG_template_type_param or
    DW_TAG_value_type_param.  If the latter, add them to the name
    here.  If the name already has template parameters, then
    skip this step; some versions of GCC emit both, and
    it is more efficient to use the pre-computed name.

At least usually GCC and Clang produce both (DW_AT_name with
parameters, and DIEs for the parameters) for definitions, and
name-only, no DIEs for declarations.

I wonder which compiler has previously produced simplified names -
it'd be nice as a proof-of-concept/existence proof/prior art for the
new Clang feature.

It's also interesting that it says to use the name if it's already got
them - that suggests no canonicalization, but I'm pretty sure GDB does
do some canonicalization to allow fuzzy matching of decls/defs between
Clang and GCC (& whatever other compilers) which might have minor
differences between how they print template parameters.

But yeah, that code that specifically says "if there's no '<' in the
name, go and rebuild the name" is very explicit support for the
-gsimple-template-names naming... someone must've done this before.


> > But one place I found a problem was in pretty printers. I have
> > situations where a type declaration isn't resolved to a definition
> > when working within a pretty printer (resulting in the pretty printer
> > being unable to find member variables in the object/of that type) -
> > but if I print out the variable/member it works correctly - or if I
> > "list" the source of the file where the definition of the type is,
> > then the pretty printer works correctly.
> >
> > Does this ring a bell for anyone/sound like something sufficient to
> > file a bug, or would I need to dig in further to create an isolated
> > reproduction to help communicate the issue I'm seeing?
>
> I have no idea about what happens without digging into the code.  I just
> remember some bug where GDB would generate the templated name with let's
> say "<5u>" for an integer value template parameter, so if you entered
> "<5>", it didn't match textually.  Maybe something like that is
> happening.
>
> I also found:
>
>   https://sourceware.org/bugzilla/show_bug.cgi?id=17762

Yeah, I certainly hit a few cases like that when testing
-gsimple-template-names canonicalization/roundtripping to make sure it
wasn't lossy.

> There's the possibility that the "partial symbol" and "expanded symbol"
> don't have the same name.  So you could see a difference in behavior
> depending on whether the CU has been expanded by GDB or not (I'm just
> speculating).
>
> Is it valid DWARF (5) for DW_AT_name of a templated struct instantiation
> to omit the template parameters?  I don't see DWARF mandating one or the
> other, so I assume that both including them or not are valid.

Yeah, this is a case where DWARF is like "here are some tools you
could use to express some language features, have at!" and doesn't say
"to describe this particular language feature you must use DWARF in
this particular way"

> If so, I
> think we can consider it a GDB bug if it doesn't process the "without"
> version correctly.  Feel free to file a bug with the relevant background
> information.  It would be useful to include some binaries, one compiled
> with -gsimple-tempalte-names and one without, so it's easy to compare
> the two.

Yeah, for now I'm only seeing the problem with pretty printers, so to
provide repro steps will be a bit more involved. (I've also got a
crash that seems to happen when using the feature without
-gsplit-dwarf (it's probably reproducable with -gsplit-dwarf too, but
we enable -Wl,-gdb-index in that case and so GDB doesn't parse as much
DWARF on startup - but if we tickled it into loading more DWARF it'd
probably hit the same crash, I'd wager))

I'll see what I can do.


More information about the Gdb mailing list