[RFC] Symbol table improvements

Daniel Berlin dan@cgsoftware.com
Mon Jun 11 09:55:00 GMT 2001


Problems right now:

Worst case is currently symbol not existing. We search *everything*.
Search isn't particularly fast. It actually depends on where the
symbol will be found.  We repeatedly search the same blocks.

Previous attempt:

Hash table the block symbol array.  

Requires too much in the way of changes to mdebugread (complete
rewrite).

Pretty invasive to block structure (IE it's a major structure change).  

Requires hash function to take into account whatever semantics we use
for block lookup.  (IE case insensitive, whitespace insensitive,
etc). This is difficult. 

Still have to search blocks where the symbol may not appear. we can
just do it faster.  Ideally, we should have a fast way of testing
whether the block will contain the symbol, without actually having to
look at the symbols in the block.  

Why would this help?  

I've got a perfect example.  A python development team member wrote a
small gdb command script to print out the python stack.  It works by
just walking the C stack, printing the python stack, until we get to
the frame for the routine where python's evaluation starts.

This works remarkably slowly. Much slower than one would expect. You'd cry.
It *trickles* out.
There is nothing in the script that is doing anything weird.
Profiling shows that it ends up making 23,200 lookup_symbol calls. 
This is fine, right?
These 23,200 lookup_symbols make a total of 50 million
lookup_block_symbol calls.
Not pretty.
Not pretty at all.

Current attempt:

All blocks now have a unique block id that is just based off a simple
incrementing counter. (IE every time we create a block and assign it
an id from the counter, and we increment the id counter by one).  So
if two blocks are really the same block, they'll have the same block
ID.  This number is much easier to track than trying to do it by the
pointers (you'd have to know which bits of the pointer to throw away,
etc).

Very fast bitmap from gcc cleaned up and used in gdb for tracking the
blocks we've already searched.  That way, we never search the same
block twice during a symbol lookup.

Global splay tree of names that appear in blocks, with a bitmap
stating what block id's they appear in.  Splay tree insertion is done
in the same loop that is putting the symbols in the actual blocks.

Memory efficient bitmap from gcc cleaned up and used in gdb as this
bitmap.  (You don't want an sbitmap for this. sbitmaps are of fixed
size, and always waste the same amount of memory. bitmaps expand
gracefully, and you only use memory for the blocks of bits containing
the bits we've got set.)

Splay tree ensures that symbols accessed the most frequently, are the
fastest to look for.  In fact, it ensures symbols *looked for* most
frequently, are the fastest we can find/not find in the splay tree.

When we go to do a lookup_symbol, we first check the splay tree.
If the name isn't in the splay tree, there is no point in any of the 
lookup_block_symbol's. It can't appear in a block without being in the
splay tree.
It may still be a partial or minimal symbol.
Greatly improves time for misses.

If it's in the splay tree, we walk the blocks in the same order we do
now (skipping blocks we've already checked, obviously), but only
perform a lookup_block_symbol if the block id of the block appears in
the bitmap of blocks that symbol appears in. In other words, we only
lookup_block_symbol when we know we will find it.

This allows us to achieve the goal of being able to determine if a
block will have a symbol, without having to look at that block's
symbols.

There aren't enough non-lookup_symbol callers of lookup_block_symbol
(two outside of symtab.c, neither in critical paths, one inside
symtab.c, which can be changed to know about the splay tree) to
justify putting the splay lookup/block id bitmap check inside
lookup_block_symbol.  

Since we have no hash table, we don't have to deal with tricky hash
function semantics (we have to have a comparison function that can do
the semantics, but we already do. That's how we get them in the first
place. :)) .  We also have only added one member to the block
structure, an id that nothing except the symbol lookup routines care
about (IE it doesn't affect any code that walks block structures, or
anything).

Thoughts, comments?

I've got the global splay tree stuff done, and it's noticeably faster.
The only thing not done is the not searching of duplicate blocks.

But since we only search blocks we know the symbol will appear in,
this actually may now be pointless (once we find the symbol, we return
anyway).

In effect, i've divorced the lookup from the block structures, as it
should be.

-- 
"My roommate got a pet elephant.  Then it got lost.  It's in the
apartment somewhere.
"-Steven Wright



More information about the Gdb mailing list