Horrible GDB symbol table performance

Mon Jun 5 12:39:00 GMT 2000

>>>>> "Daniel" == Daniel Berlin <dan@cgsoftware.com> writes:
Daniel> And i'll put money these are either java, or C++ programs,
Daniel> right?

You'd lose that bet.  

Our system is a monolithic single address space architecture with an
os kernel, device drivers, network stacks, networking protocols,
routing protocols, network applications, crypto and compression
libraries, the kitchen sink, ...  It's mostly written in C, the
balance (perhaps 2%) in assembly. Our image has about 30,000 symbols
and about 2000 object files.

My test is running a user defined function which grovels through the
chain of blocks which have been allocated by malloc() and prints the
size of each block and the contents of the first four words.  I'm
using a crash dump instead of a live system so I have a repeatable
scenerio.

In my test case, lookup_symbol() is called 13691 times, but it calls
lookup_partial_symbol() 8549573 times.  60% of the time, it's called
with the global flag set.  Even though the average call is .04 ms,
it's called so many times that it adds up to two-thirds of the total
execution time.

Daniel> I had a theory a while ago most of that most of the
Daniel> performance problem in lookup_partial_symbol is because we
Daniel> can't do the binary search, or force a linear lookup, a lot of
Daniel> the time.

Given that we're starting from different assumptions, it shouldn't be
surprising that my theory differs from yours.  My current theory is
that multiple partial symbol tables for each objfile does not scale
well to large applications, and we may need a combined symbol table
for each objfile, or perhaps even a single universal symbol table.

It's a long time since I did any algorithm analysis, so here goes.  If
N is the number of symbols, and M is the number of psymtabs, and if we
for simplicity's sake assume that N is distributed evenly across M;
for a symbol that is present we will observe worst case lookup for a
binary search (which I don't remember and I'm too lazy to look up) of
N/M elements * M/2.  A search for a non existant symbol will be the 
worst case * M.;

If there was a single partial symbol table, we'd observe the average
time for a symbol that is present, and worst case for a non-existant
symbol over N.  My guess is that there is little practical difference
between N and N/M, but the extra factor of M in the present scheme is
significant.

        --jtc

-- 
J.T. Conklin
RedBack Networks