30164 – Restructure symbol domains

Bug 30164 - Restructure symbol domains

Summary: Restructure symbol domains

Status:	RESOLVED FIXED

Alias:	None

Product:	gdb
Classification:	Unclassified
Component:	symtab (show other bugs)
Version:	HEAD

Importance:	P2 normal
Target Milestone:	15.1
Assignee:	Tom Tromey

URL:
Keywords:

Depends on:
Blocks:	24870
	Show dependency tree / graph

Reported:	2023-02-24 17:27 UTC by Tom Tromey
Modified:	2024-01-28 23:45 UTC (History)
CC List:	0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tom Tromey 2023-02-24 17:27:35 UTC

Right now, symbol domains generally follow C.
That is, there is a domain for tags, some for non-C things
(Fortran stuff), a domain for labels, and then VAR_DOMAIN
for everything else -- types, functions, variables.

This leads to bad results, like bug #30158, where an
attempt to look up a function instead finds a namespace.

I think it would be better to drastically overhaul this code.
There should be many more domains, as many as we think
we'll need.  Types, variables, and functions should all
be separate.  Probably namespaces should also be their own thing.

Then, the symbol lookup functions should accept an enum flag
type of all the domains that should be searched.
This way, the C parser can implement its own semantics by
searching the relevant C domains -- but other language parsers
can do as they like.

Comment 1 Tom Tromey 2023-03-01 13:30:47 UTC

These values are baked into the .gdb_index format.
That isn't fatal but it does mean the index would be bit
less efficient.

Moving more to .debug_names or changing the format of .gdb_index
would both be options here.

Comment 2 Tom Tromey 2023-09-14 16:28:59 UTC

I've been slowly working on this.

Lately I've been thinking that perhaps STRUCT_DOMAIN could
be removed.  It's only needed for C and is the source of a
hack in symbol_matches_domain.

Instead, any C-specific type-lookup code could just search
TYPE_DOMAIN and look to see if the type is "tagged".

Comment 3 Tom Tromey 2023-09-15 01:34:19 UTC

(In reply to Tom Tromey from comment #2)
> I've been slowly working on this.
> 
> Lately I've been thinking that perhaps STRUCT_DOMAIN could
> be removed.  It's only needed for C and is the source of a
> hack in symbol_matches_domain.

On further reflection, I don't think this will work properly,
because in C it is fine to have a 'struct name' and a typedef
for 'name' that are different -- they really are separate
namespaces.

Comment 4 Tom Tromey 2023-11-19 16:55:35 UTC

The rot here goes really deep :(

symbol_matches_domain has a C++-specific hack.
(That later was extended to other languages)

However, the stabs reader, and some other readers,
take care to handle the C++ typedef case in a more
principled way: by creating a typedef symbol.
One wonders why this wasn't done for DWARF...

Comment 5 Tom Tromey 2023-11-21 21:57:47 UTC

https://sourceware.org/pipermail/gdb-patches/2023-November/204295.html

Comment 6 Sourceware Commits 2024-01-28 23:44:48 UTC

The master branch has been updated by Tom Tromey <tromey@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=974b36c2ae2b351d022cc62579656f722da6e17a

commit 974b36c2ae2b351d022cc62579656f722da6e17a
Author: Tom Tromey <tom@tromey.com>
Date:   Thu Mar 2 07:44:11 2023 -0700

    Use the new symbol domains
    
    This patch changes the DWARF reader to use the new symbol domains.  It
    also adjusts many bits of associated code to adapt to this change.
    
    The non-DWARF readers are updated on a best-effort basis.  This is
    somewhat simpler since most of them only support C and C++.  I have no
    way to test a few of these.
    
    I went back and forth a few times on how to handle the "tag"
    situation.  The basic problem is that C has a special namespace for
    tags, which is separate from the type namespace.  Other languages
    don't do this.  So, the question is, should a DW_TAG_structure_type
    end up in the tag domain, or the type domain, or should it be
    language-dependent?
    
    I settled on making it language-dependent using a thought experiment.
    Suppose there was a Rust compiler that only emitted nameless
    DW_TAG_structure_type objects, and specified all structure type names
    using DW_TAG_typedef.  This DWARF would be correct, in that it
    faithfully represents the source language -- but would not work with a
    purely struct-domain implementation in gdb.  Therefore gdb would be
    wrong.
    
    Now, this approach is a little tricky for C++, which uses tags but
    also enters a typedef for them.  I notice that some other readers --
    like stabsread -- actually emit a typedef symbol as well.  And, I
    think this is a reasonable approach.  It uses more memory, but it
    makes the internals simpler.  However, DWARF never did this for
    whatever reason, and so in the interest of keeping the series slightly
    shorter, I've left some C++-specific hacks in place here.
    
    Note that this patch includes language_minimal as a language that uses
    tags.  I did this to avoid regressing gdb.dwarf2/debug-names-tu.exp,
    which doesn't specify the language for a type unit.  Arguably this
    test case is wrong.
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30164

Comment 7 Tom Tromey 2024-01-28 23:45:34 UTC

Fixed.