Horrible GDB symbol table performance

Paul Hilfinger hilfingr@EECS.Berkeley.EDU
Mon Jun 5 15:25:00 GMT 2000


> It's a long time since I did any algorithm analysis, so here goes.  If
> N is the number of symbols, and M is the number of psymtabs, and if we
> for simplicity's sake assume that N is distributed evenly across M;
> for a symbol that is present we will observe worst case lookup for a
> binary search (which I don't remember and I'm too lazy to look up) of
> N/M elements * M/2.  A search for a non existant symbol will be the 
> worst case * M.;

Well, just for the record, actually, using binary search the figure
would be roughly 

      M * lg (N/M)			lg = log base 2

comparisons for a failing search vs

      lg N

for a single table.   Now you say that for your data, N=30000 and 
lookup_partial_symbol is called 8549573 times vs. 13691 calls to
lookup_symbol.  I think that in a failing search, lookup_symbol calls
lookup_partial_symbol twice for each partial symbol table (well, that
doesn't quite jibe with your 60% figure for global flag set, but it's
not too far off for an estimate), so that suggests 

      M = 1/2 * 8549573 / 30000 ~= 140

Something's off, given that you have 2000 object files.  Ignoring
that, we have

      M * lg (N/M) = 140 * lg (30000/140) ~= 1080

vs.

      lg N = lg 30000 ~= 15

Quite a significant difference, I agree. In general, the ratio between
the two methods is

    M * lg (N/M) / lg N = M*(1 - (lg M/lg N))

and rises with M.  For linear searches, of course, the two methods are
essentially identical; dividing a linear search into M linear searches
has no effect on overall time.  

Which leads not just to the question of why not have a single partial
symbol table, but also why not just juice up the minimal symbol table
a bit and dispense with partial symbols altogether?

P. Hilfinger


More information about the Gdb-patches mailing list