Horrible GDB symbol table performance
Paul Hilfinger
hilfingr@EECS.Berkeley.EDU
Mon Jun 5 15:25:00 GMT 2000
> It's a long time since I did any algorithm analysis, so here goes. If
> N is the number of symbols, and M is the number of psymtabs, and if we
> for simplicity's sake assume that N is distributed evenly across M;
> for a symbol that is present we will observe worst case lookup for a
> binary search (which I don't remember and I'm too lazy to look up) of
> N/M elements * M/2. A search for a non existant symbol will be the
> worst case * M.;
Well, just for the record, actually, using binary search the figure
would be roughly
M * lg (N/M) lg = log base 2
comparisons for a failing search vs
lg N
for a single table. Now you say that for your data, N=30000 and
lookup_partial_symbol is called 8549573 times vs. 13691 calls to
lookup_symbol. I think that in a failing search, lookup_symbol calls
lookup_partial_symbol twice for each partial symbol table (well, that
doesn't quite jibe with your 60% figure for global flag set, but it's
not too far off for an estimate), so that suggests
M = 1/2 * 8549573 / 30000 ~= 140
Something's off, given that you have 2000 object files. Ignoring
that, we have
M * lg (N/M) = 140 * lg (30000/140) ~= 1080
vs.
lg N = lg 30000 ~= 15
Quite a significant difference, I agree. In general, the ratio between
the two methods is
M * lg (N/M) / lg N = M*(1 - (lg M/lg N))
and rises with M. For linear searches, of course, the two methods are
essentially identical; dividing a linear search into M linear searches
has no effect on overall time.
Which leads not just to the question of why not have a single partial
symbol table, but also why not just juice up the minimal symbol table
a bit and dispense with partial symbols altogether?
P. Hilfinger
More information about the Gdb-patches
mailing list