This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: Removal of demangled names from partial symbols

To: "Peter.Schauer" <Peter dot Schauer at regent dot e-technik dot tu-muenchen dot de>
Subject: Re: Removal of demangled names from partial symbols
From: Daniel Berlin <dberlin at redhat dot com>
Date: 04 Dec 2000 18:36:32 -0500
Cc: gdb at sources dot redhat dot com, gdb-patches at sources dot redhat dot com
References: <200012042053.VAA28364@reisser.regent.e-technik.tu-muenchen.de>

"Peter.Schauer" <Peter.Schauer@regent.e-technik.tu-muenchen.de> writes:

Just a little more data.

I rigged lookup_minimal_symbol to print the number of times (to a
file, one number per line) it had to
do "msymbol = msymbol->hash_next", if the number was > 0.

All I did for this print out is start gdb like "gdb ~/corr101", then
quit as soon as we got a prompt.

corr101 has  27905 minsyms in it (according to maint print msymbols).

We do at least 1 strcmp, possibly 2, for each of the compares in the
loop looking for the minimal symbol name (which ends when we find the
symbol, or msymbol is NULL), because it uses SYMBOL_MATCHES_NAME.

The max number of times we ended up doing this SYMBOL_MATCHES_NAME, is
559 (which occurs ~1500 times).

We had 544476 lookups which took greater than 0 of these "compares"
(which each is 1 strcmp, and maybe 1 strcmp_iw).

We did 40875096 "compares" in these lookups.

Or, an average of 75 "compares" per lookup of a given name.

So, even if the ternary search tree was the equivalent of 3 compares
(due to overhead or whatever.), we'd still be able to do the partial
demangling fillin (which is completely dominated, at 99.99% of the
time being spent looking up the minsym to get the demangled name right
now) an average 25 times faster than lookup_minimal_symbol does it. So
i'm pretty confident we can completely eliminate the 20% slowdown i
was seeing.
Note, i'm not counting the compares both passes lookup_minimal_symbol does, just
the first pass. The second pass is over the minsym demangled hash
table, and goes slower, and is a waste of time for looking up the
symbol to get the mangled name, but i'm just trying to prove a point about how
much faster the demangling by lookup i want to use in the partial
symbol stuff could be.


Let me finish implementing and see if i'm right.

--Dan



> : Why do we not do a lookup_minimal_symbol in
> : a new function, add_psymbol_and_dem_name_to_list, on the mangled name,
> : and if we get  back a symbol, use the demangled name from that,
> : otherwise, demangle it.
> 
> For some symbol formats (e.g. a.out) the linkage and debugging symbols are
> intermixed. By the time you want to record a partial symbol, the minimal
> symbol might have not been seen yet. Or the minimal symbol has been seen,
> but the minimal symbols are not yet installed, so lookup_minimal_symbol
> will fail.
> 
> You might be able to work around this by going over all psymbols and fill
> in the demangled name via lookup_minimal_symbol _after_ the minimal symbols
> are installed, but I am not yet convinced that you don't have to pay your
> price on slower systems. After all, 5 secs to 6 secs is a 20% slowdown.
> 
> > Demangled names were removed from partial symbols to speed start up
> > times a few years ago.
> > 
> > However, with the minsym demangled hash table now around, we demangle
> > all minimal symbols when we install minimal symbols (IE we init the
> > demangled name on them,unconditionally).
> > 
> > Since the minimal symbol table ends up including a large subset of the
> > mangled partial symbols (if not all of them), this means we already have a large
> > subset of the partial symbol names demangled for us at start up
> > anyway.
> > 
> > Why do we not do a lookup_minimal_symbol in
> > a new function, add_psymbol_and_dem_name_to_list, on the mangled name,
> > and if we get  back a symbol, use the demangled name from that,
> > otherwise, demangle it.
> > 
> > Even tests on 100 meg of debug info show we barely add any startup
> > time at all (5 seconds without, 6 seconds with) . 
> > In fact, all added startup time is attributable to the
> > fact that to save memory, I had it bcache the demangled name in
> > SYMBOL_INIT_DEMANGLED_NAME.  If you don't bcache it (like right now),
> > it's in memory  in at least the full symbol, and the minimal
> > symbol (it's  actually in memory once for every time
> > SYMBOL_INIT_DEMANGLED_NAME is called on a symbol, and the demangling succeeds).
> > 
> > I think 1 second on 100 meg of debug info is worth it to not have to
> > linear search on every symbol lookup, which is amazingly 
> > slow, and if you have gdb using swap at all because of the number of
> > symbols, you are almost guaranteed to hit the swap 
> > hard on *every* single lookup, since we have to go through every
> > single symbol. 
> > 
> > This would solve the problem of not being able to lookup partial
> > symbols by demangled name, and allow us to binary search them without
> > fear of missing a symbol.
> > 
> > Would this be acceptable?
> > 
> > My next trick after that would be to add a mangled->demangled mapping
> > structure, if it's necessary to improve speed, and just use that to
> > lookup the names before demangling the 
> > name over again, in cases where we do (ie SYMBOL_INIT_DEMANGLED) need
> > to find a demangled name for a mangled one, and use that
> > rather than the minimal symbol table to try to find the name.
> > The reason for this is that a hash table (in this case, we are
> > using the minimal symbol demangled hash table as a lookup table) is the wrong structure
> > for this, since demangled names can be *very* large (average of 82
> > chars on my large C++ programs), and we always have to hash the entire
> > string, then do a whole bunch of string compares, because the chains are
> > long. This is okay when we hit (except for the long chains), but on
> > misses we waste the same amount of times as hits, if not more. The
> > string compares on hits also cost a lot because of the length of the string.
> > We really should use a ternary search tree or some structure like it,
> > which on hits is actually faster (since we don't need multiple
> > string compares), and on misses is a whole ton faster, since we abort
> > much sooner.
> > 
> > --Dan
> > 
> > 
> > 
> 
> -- 
> Peter Schauer			pes@regent.e-technik.tu-muenchen.de

References:
- Re: Removal of demangled names from partial symbols
  - From: Peter.Schauer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]