This is the mail archive of the archer@sourceware.org mailing list for the Archer project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Proposal for a new DWARF name index section


On Wed, Dec 2, 2009 at 11:38 AM, Daniel Jacobowitz <drow@false.org> wrote:

> Well, inherent in the cache approach (IMO) is a system-provided cache;
> for installed libraries, the cache data could be added to a debuginfo
> file. ?Of course, that assumes GDB's format stays "relatively stable"
> across GDB updates.

FWIW, I've used the following approach on a previous product X:

- As new binary is detected, a copy of X is invoked to parse all
  the needed debug info into internal form and written to a cache file.
- Once the copy exits, the cache file is directly mmap()ed by X.
- Cache files older than 1 week, and cache files prepared from
  binaries which no longer exist in their original location are
  pruned to keep cache size down.

The cache file contains version of X, so when a new version of X
is shipped, the cache is automatically rebuilt.

It also contains path/timestamp/inode/size for the target binary,
so if e.g. one of the shared libs has been rebuilt since last run,
only that one shared library must be re-processed.

This trades startup speed against disk space, and disk is usually
very cheap now.

One of our typical usage scenarios is a tiny executable linked with
1000+ C++ shared libraries. Simply re-running the test a second time
in a row in GDB takes 1+ minutes, as GDB discards and re-reads the
debug info for each solib (it used to take 6+ minutes before my dwarf
mmap changes).

The major CPU consumers in my tests are now:

CPU: AMD64 processors, speed 2200 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 100000
samples  %        symbol name
43092     8.2847  read_partial_die
38243     7.3525  strcmp_iw_ordered
36744     7.0643  read_attribute_value
28887     5.5537  cpname_parse
28849     5.5464  d_print_comp
27731     5.3315  htab_hash_string
21975     4.2248  cp_canonicalize_string
20736     3.9866  load_partial_dies
18098     3.4795  cpname_lex
15649     3.0086  lookup_minimal_symbol
15156     2.9138  msymbol_hash_iw
14185     2.7272  htab_find_slot_with_hash

I am guessing that a GDB cache of pre-canonicalized strings would
save a *lot* of CPU under this scenario, and there is no reason
you can't put any other indices into the cache, or to have a stable
format of the cache file -- newer version of GDB will simply rebuild
what it needs on demand.


-- 
Paul Pluzhnikov


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]