adding namespace support to GDB

Fri Aug 23 17:19:00 GMT 2002

In article <ro1hehloa33.fsf@jackfruit.Stanford.EDU>, David Carlton
<carlton@math.Stanford.EDU> writes:

> For the time being, I'm going to reread that thread more closely,
> look at Petr Sorfa's module patch, look at the DWARF-3 standard,
> look at existing GDB code, and think about this for a while.

I haven't really done much of that yet, but I did look over some of
the messages in that thread and take some notes based on that combined
with some thought that I have on the issue.  The notes don't contain
much (any?) information along the lines of "exactly what in GDB do we
have to change to make this work", since I'm still new enough to GDB's
internals not to have a global feel for them to be confident about
this and and I didn't have time today to poke into GDB's code all that
much.  But at least it's a concrete list of issues to keep in mind.

So here the notes are; any comments would be gratefully appreciated.
This is, of course, a rough draft: I'm sending it off now more because
it's 5:00 and I want to head home for the weekend than because I think
it's particularly complete or anything.

Some notes on namespace-related issues:

* The goal is to provide a general framework for associating data
  (types, locations, etc.) to names (variable names, type names, etc.)
  Let's tentatively call this an 'environment'.

* For some discussion of this, see the thread starting from
  <http://sources.redhat.com/ml/gdb/2002-04/msg00072.html>.

* Language contexts where this happens:

  * Compound statements.  In C, function bodies, bodies of loops, etc.

  * Compound data structures.  E.g. classes in C++ or Java.  (Warning:
    I'm learning C++ reasonably quickly, I hope, but there's a lot to
    learn.  And I read a book about Java once...)  Not to mention
    simpler examples: C structures, unions.

  * Compound name structures.  C++ namespaces, Fortran modules
    (warning: I know zero about Fortran), Java packages.  Are there
    any other such structures in languages that GDB supports?
    Probably files combined with static global variables in C go in
    here.  (For that matter, extern global variables also go in here.)

  This trichotomy is not a hard-and-fast distinction, needless to say.
  For example, in C++, you often have the choice about whether to use
  a namespace or a class with static members in a given situation, and
  similarly a Java programmer would use a class with static members in
  many situations where a C++ programmer would use a namespace.

  I'll typically stick to C++ examples.

* According to Jim Blandy (in the message referenced above), existing
  GDB constructs that these environments could replace are:

  * In 'struct block', to represent local variables, replacing 'nsym'
    and 'sym'.

  * In 'struct type', to represent fields, replacing 'struct fields'.

  But here I think he's only referring to local environments: there's
  also the global environment.

* Here are some issues surrounding environments:

  * How does GDB initialize the environment structures?

  * How does GDB figure out in what environments to search for a name
    that a user types in?

  * How should we implement environments internally?

* Right now, I'm not worried about the first problem so much.  Having
  said that, here are some issues that are relevant to it:

  * If the compiler generates rich enough debugging information, then
    we don't have to worry too much about how to initialize the
    environment structures: we have everything we need given to us.

  * If the compiler doesn't generate rich enough debugging
    information, then we can still do a decent approximation to the
    correct information by, say, looking at mangled linkage names for
    symbols.  It's not perfect, but it'll do fine.

  * I'm not sure _exactly_ how we'll detect whether or not we've got
    enough debugging information when reading the info for a given
    file, but we can figure out something.  (E.g. for C++, use mangled
    linkage names until we first see a DW_TAG_namespace.)

  * For some environments, we can count on being able to easily figure
    out a complete picture of what it looks like: this should be true
    for compound data structures and compound statements.  But it's
    not true for many sorts of compound name structures: stuff can get
    added to the global environment or to namespaces in somewhat
    unpredictable ways.  Still, I don't think this is _too_ serious:
    having an analogue to the partial symbol table around plus reading
    in detailed debugging information for an entire file at a time
    should mean that we never miss information that we need.  (How
    does the minimal symbol table come in to play?)

* The second problem seems to me to be considerably more subtle; even
  with perfect debugging information, it's not clear to me that, at
  least initially, we'd implement C++'s name lookup rules completely.

  * Different languages vary considerably in exactly what information
    is accessible at any given point.

  * Environments usually form a "tree" in some vague sense, but
    exactly what that tree means (and its implications in terms of
    environment search rules) varies considerably based on the type of
    environment.  For example, if you don't find a symbol in the
    current compound statement, you can always go up to the enclosing
    compound statement.  Whereas, if you have a C++ namespace B nested
    inside a C++ namespace A, then even if you make symbols inside
    A::B accessible via a 'using' declaration, symbols inside A aren't
    necessarily accessible.

  * Often, language constructs for making compound name structures
    accessible (C++ 'using' declarations, etc.) permit some amount of
    renaming.

  * Some compound name structures don't have names.  One example is
    files + static global variables in C; another example is anonymous
    namespaces in C++.  (Note that the second example is a superset of
    the first example: the first example is basically like the special
    case of the second example in which the parent of the anonymous
    namespace is the global namespace.)

  * If the compiler doesn't generate rich enough debugging
    information, we simply won't be able to do a perfect job here.
    (Though I think we'll be able to do a good enough job that users
    will forgive us.)

  * When doing a lookup, the user may provide part of the name prefix
    in addition to the variable name.

  * Functions can be overloaded, so sometimes you need types as well
    as the name.

  * Do virtual member functions and/or virtual base classes pose
    problems?  I don't think they do, but I'll list them just in case.

  * Anything else?  I think I'm probably leaving out stuff here.  I'm
    not too familiar with what GDB's current data structures are for
    representing what names are accessible at a given point.

* Then there's the issue of implementing environments internally.

  * One issue to keep in mind is that different environments can have
    dramatically different numbers of names.  E.g. the global
    environment is potentially extremely large, as is C++'s 'std'
    namespace; but a struct is typically quite small, as the
    namespaces in code that I write myself.

    So we need a data structure that can deal with these extremes.  In
    particular, linear lists of names and fixed-size hash tables both
    sound like bad ideas to me.  Does GDB or libiberty or whatever
    have tools for dealing with heaps or hash tables that grow as
    necessary?  (Is it even a good idea to grow your hash tables as
    you add entries to them?  My theoretical background in algorithms
    and data structures is weak.)

  * If we look up a name in an environment, what data do we want that
    name lookup to return to us?  If what we're looking up is a
    variable, then candidates are type information and location
    information.  But of course we might want to look up other things
    (structures/classes/unions, typedefs, enums, namespaces, etc.)
    Should we only search based on names, or search based on names +
    what kind of object we want to associate to that name?  (Probably
    the latter.)

* Are we going to try to implement this incrementally or not?  On a
  separate branch or on the mainline branch?  What recent patches to
  GDB (whether proposed or approved) help with this effort?

David Carlton
carlton@math.stanford.edu