adding namespace support to GDB
David Carlton
carlton@math.stanford.edu
Fri Aug 23 17:19:00 GMT 2002
In article <ro1hehloa33.fsf@jackfruit.Stanford.EDU>, David Carlton
<carlton@math.Stanford.EDU> writes:
> For the time being, I'm going to reread that thread more closely,
> look at Petr Sorfa's module patch, look at the DWARF-3 standard,
> look at existing GDB code, and think about this for a while.
I haven't really done much of that yet, but I did look over some of
the messages in that thread and take some notes based on that combined
with some thought that I have on the issue. The notes don't contain
much (any?) information along the lines of "exactly what in GDB do we
have to change to make this work", since I'm still new enough to GDB's
internals not to have a global feel for them to be confident about
this and and I didn't have time today to poke into GDB's code all that
much. But at least it's a concrete list of issues to keep in mind.
So here the notes are; any comments would be gratefully appreciated.
This is, of course, a rough draft: I'm sending it off now more because
it's 5:00 and I want to head home for the weekend than because I think
it's particularly complete or anything.
Some notes on namespace-related issues:
* The goal is to provide a general framework for associating data
(types, locations, etc.) to names (variable names, type names, etc.)
Let's tentatively call this an 'environment'.
* For some discussion of this, see the thread starting from
<http://sources.redhat.com/ml/gdb/2002-04/msg00072.html>.
* Language contexts where this happens:
* Compound statements. In C, function bodies, bodies of loops, etc.
* Compound data structures. E.g. classes in C++ or Java. (Warning:
I'm learning C++ reasonably quickly, I hope, but there's a lot to
learn. And I read a book about Java once...) Not to mention
simpler examples: C structures, unions.
* Compound name structures. C++ namespaces, Fortran modules
(warning: I know zero about Fortran), Java packages. Are there
any other such structures in languages that GDB supports?
Probably files combined with static global variables in C go in
here. (For that matter, extern global variables also go in here.)
This trichotomy is not a hard-and-fast distinction, needless to say.
For example, in C++, you often have the choice about whether to use
a namespace or a class with static members in a given situation, and
similarly a Java programmer would use a class with static members in
many situations where a C++ programmer would use a namespace.
I'll typically stick to C++ examples.
* According to Jim Blandy (in the message referenced above), existing
GDB constructs that these environments could replace are:
* In 'struct block', to represent local variables, replacing 'nsym'
and 'sym'.
* In 'struct type', to represent fields, replacing 'struct fields'.
But here I think he's only referring to local environments: there's
also the global environment.
* Here are some issues surrounding environments:
* How does GDB initialize the environment structures?
* How does GDB figure out in what environments to search for a name
that a user types in?
* How should we implement environments internally?
* Right now, I'm not worried about the first problem so much. Having
said that, here are some issues that are relevant to it:
* If the compiler generates rich enough debugging information, then
we don't have to worry too much about how to initialize the
environment structures: we have everything we need given to us.
* If the compiler doesn't generate rich enough debugging
information, then we can still do a decent approximation to the
correct information by, say, looking at mangled linkage names for
symbols. It's not perfect, but it'll do fine.
* I'm not sure _exactly_ how we'll detect whether or not we've got
enough debugging information when reading the info for a given
file, but we can figure out something. (E.g. for C++, use mangled
linkage names until we first see a DW_TAG_namespace.)
* For some environments, we can count on being able to easily figure
out a complete picture of what it looks like: this should be true
for compound data structures and compound statements. But it's
not true for many sorts of compound name structures: stuff can get
added to the global environment or to namespaces in somewhat
unpredictable ways. Still, I don't think this is _too_ serious:
having an analogue to the partial symbol table around plus reading
in detailed debugging information for an entire file at a time
should mean that we never miss information that we need. (How
does the minimal symbol table come in to play?)
* The second problem seems to me to be considerably more subtle; even
with perfect debugging information, it's not clear to me that, at
least initially, we'd implement C++'s name lookup rules completely.
* Different languages vary considerably in exactly what information
is accessible at any given point.
* Environments usually form a "tree" in some vague sense, but
exactly what that tree means (and its implications in terms of
environment search rules) varies considerably based on the type of
environment. For example, if you don't find a symbol in the
current compound statement, you can always go up to the enclosing
compound statement. Whereas, if you have a C++ namespace B nested
inside a C++ namespace A, then even if you make symbols inside
A::B accessible via a 'using' declaration, symbols inside A aren't
necessarily accessible.
* Often, language constructs for making compound name structures
accessible (C++ 'using' declarations, etc.) permit some amount of
renaming.
* Some compound name structures don't have names. One example is
files + static global variables in C; another example is anonymous
namespaces in C++. (Note that the second example is a superset of
the first example: the first example is basically like the special
case of the second example in which the parent of the anonymous
namespace is the global namespace.)
* If the compiler doesn't generate rich enough debugging
information, we simply won't be able to do a perfect job here.
(Though I think we'll be able to do a good enough job that users
will forgive us.)
* When doing a lookup, the user may provide part of the name prefix
in addition to the variable name.
* Functions can be overloaded, so sometimes you need types as well
as the name.
* Do virtual member functions and/or virtual base classes pose
problems? I don't think they do, but I'll list them just in case.
* Anything else? I think I'm probably leaving out stuff here. I'm
not too familiar with what GDB's current data structures are for
representing what names are accessible at a given point.
* Then there's the issue of implementing environments internally.
* One issue to keep in mind is that different environments can have
dramatically different numbers of names. E.g. the global
environment is potentially extremely large, as is C++'s 'std'
namespace; but a struct is typically quite small, as the
namespaces in code that I write myself.
So we need a data structure that can deal with these extremes. In
particular, linear lists of names and fixed-size hash tables both
sound like bad ideas to me. Does GDB or libiberty or whatever
have tools for dealing with heaps or hash tables that grow as
necessary? (Is it even a good idea to grow your hash tables as
you add entries to them? My theoretical background in algorithms
and data structures is weak.)
* If we look up a name in an environment, what data do we want that
name lookup to return to us? If what we're looking up is a
variable, then candidates are type information and location
information. But of course we might want to look up other things
(structures/classes/unions, typedefs, enums, namespaces, etc.)
Should we only search based on names, or search based on names +
what kind of object we want to associate to that name? (Probably
the latter.)
* Are we going to try to implement this incrementally or not? On a
separate branch or on the mainline branch? What recent patches to
GDB (whether proposed or approved) help with this effort?
David Carlton
carlton@math.stanford.edu
More information about the Gdb
mailing list