This is the mail archive of the gdb@sourceware.cygnus.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: problems with gdb


Chris Blizzard wrote:
> 
> So, one of the problems that I've been having is that some large .so libraries
> take forever to load.  One of the libraries is about 28 meg with debugging
> symbols in it.  I've let it run for about 10 mins and it's never finished
> loading.  Here's what gprof says for loading a reasonable sized library ( 5
> meg or so ):
> 
> Flat profile:
> 
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  70.98     19.47    19.47    11954     1.63     2.25  lookup_minimal_symbol
>  27.12     26.91     7.44 33240213     0.00     0.00  strcmp_iw
>   0.33     27.00     0.09        3    30.00  9038.95  read_dbx_symtab
>   0.15     27.04     0.04    44554     0.00     0.00  hash
>   0.11     27.07     0.03   337915     0.00     0.00  bfd_getl32
>   0.11     27.10     0.03    44554     0.00     0.00  bcache
>   0.11     27.13     0.03      150     0.20     0.20  end_psymtab
> 
> Uhh...that's 33 _million_ calls.  That looks like this chunk of code:
> 
>  for (objfile = object_files;
>        objfile != NULL && found_symbol == NULL;
>        objfile = objfile->next)
>     {
>       if (objf == NULL || objf == objfile)
>         {
>           for (msymbol = objfile->msymbols;
>                msymbol != NULL && SYMBOL_NAME (msymbol) != NULL &&
>                found_symbol == NULL;
>                msymbol++)
>             {
>               if (SYMBOL_MATCHES_NAME (msymbol, name))
>                 {
>                   switch (MSYMBOL_TYPE (msymbol))
>                     {
>                     case mst_file_text:
> 
> I'm sorry, is that looking over a linked list?  SYMBOL_MATCHES_NAME() is a
> macro that does some mangling magic so we can't use a standard hash lookup
> table but there has to be something we can do to speed that up.
> 
> --Chris
> 

	We at HP have been running into many of these issues
and have substantially re-architected the symbol table management
to achieve performance. Here are some of the things we did : 


	(1) The function lookup_minimal_symbol believe it or
not performs linear search. The minimal symbol table is sorted
 by the symbol address and not by name, so binary searches are
 possible in lookup_minimal_symbol_by_pc but not in
 lookup_minimal_symbol ()

	The difficulty here is that a name lookup could be
based on either a  demangled name or a mangled name. So unless we
sort the table by both we will have to do linear search. Sorting
the table by both involves heavy penalty at startup, as that
entail three sorts with different keys (PC, demangled name and
mangled name). We eliminated one of the keys : the demangled name
and do a double sort. 


	(2) Just in time demangling : Rightnow, gdb demangles
the linker symbol table at startup. This simply does not scale
well to large applications. We implemented a scheme by which we
eliminate all anticipatory demangling and do it just in time, only
for the set of symbols the user refers to in a debugging session.
Other than the "minimal" symbol table, there could be other
spots where whole scale anticipatory demangling could be going
on : in build_psymtabs() etc ... Avoiding anticipatory demangling
allowed us to eliminate one of the sort keys for (1) above. This
may or may not be possible to do with your compiler.

	(3) GDB internalizes the native debugging information
in its entirety. Studies show a "typical" debugging session
uses less than 15% all debug info from a binary. 

	(4) For C++, it is very likely the linker (minimal)
symbol table contains all kinds of compiler generated symbols
for such things like vtables, exception handling, RTTI etc.
Also based on your compiler, last of internal symbols generated
for the purpose of relocation could be making their way into
the linker symbol table. Do a maint print msymbols <filename>
and take a look for symbols that don't look like user space
symbols. Some of these symbols are crucial for gdb ro interact
properly with the runtime. But there could be others....

	(5) The bcache could be extremely expensive in some platforms.
What this module does is to try to eliminate duplicates in
debug info. Some platforms (like HP) eliminate all (most) 
redundancy in the debug info in a hidden linker pass. For HP
we simply disconnected the bcache and saw a 50% improvement in
memory and 20% or so performance improvement.
	
	(6) Compilers could be emitting too much debug info
than is necessary. Some compilers try to be optimal but some
don't. 

	Hope this is helpful.

Srikanth

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]