This is the mail archive of the gdb-patches@sourceware.cygnus.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: (patch) hpjyg09: bcache optimizations



The first red flag here is that this patch adds a new target macro,
BCACHE_ALLOW_DUPLICATES (in config/pa/tm-hppa.h), which is actually
dependent on the application being debugged, not the architecture.
Whether the bcache helps you depends on the contents of your debug
info, not whether it's running on a PA microprocessor.  Hmm.


I'd like to look at this problem on a larger scale:

I've read Srikanth's comment in the patch, explaining why the bcache
imposes too much overhead for some code while it is still quite
effective for other code.

But this situation still sounds weird.  The bcache is a very simple
thing --- bcache.h and bcache.c total 289 lines of code.  You give it
a string of bytes, and it either adds a copy of your string to its
hash table, or finds an identical copy already present.  Either way,
it hands you back a pointer to its stashed copy.  So, a bcache is
helpful if you expect a lot of duplicates.  In Linux's libc.so, the
bcache reduces the space required for the psymtabs by half.

What I can't understand is why such a simple data structure needs to
have such a terrible overhead.  Srikanth writes:

    See that the overhead (which is really every byte that is not used to
    store GDB's data i.e., cells used for housekeeping info like pointers,
    hash chain heads etc.,) is of the order of O(m * n * 64k) where m is
    the number of load modules compiled with -g, and n is the number of
    strings that have unique length. This spells doom for applications wih
    a large number of shared libraries (m increases) and C++ (n increases
    since we also stick demangled names into the cache.)  When debugging
    HP's C compiler more than 140 MB or about 48% of memory is due to the
    bcache overhead. Not the data just the overhead !

I'm sure that's true, but it seems completely unnecessary.  One should
be able to design a bcache with much less overhead.

Looking at the output of "maint print statistics" from running GDB on
itself, I see:

Average hash table population: 8%
Average hash table population: (not applicable)
Average hash table population: (not applicable)
Average hash table population: 0%
Average hash table population: 1%
Average hash table population: 3%
Average hash table population: 1%

(The "not applicable" items are for bcaches with no entries.)

I don't think those lower-level hash tables are being used very
effectively, if the hash function is distributing the items across ~3%
of the buckets.  A good hash function distributes items across as many
different buckets as possible.  I would guess that the bcache has
never been tuned.

So, while I understand the impulse to just disable the bcache, because
it's causing you trouble in your driving uses of the debugger, I think
the better solution, for both Cygnus and HP, is to fix the data
structure so it actually works.  I would recommend:

- using a single-level hash table, with an initial size of about 32k
  buckets (based on looking at GDB debugging GDB)
- growing the hash table when the average chain length grows beyond 
  a certain limit, so the time overhead remains the same as the
  problem size grows
- choosing a hash function by running GDB on your C compiler, doing
  "maint print statistics" to see how it's distributing items across
  buckets, and then tweaking the function to minimize your maximum
  chain length (this is kind of fun, and takes less time than you'd
  think, since hash functions have almost no *required* semantics)
- adding code to print_bcache_statistics to print the number of
  bytes spend on bcache overhead

Yes, this is more work than commenting it out.  But we can't
perpetually choose the quick fix over the real cure.  Especially when
the quick fix adds complexity, like more CPP conditionals and target
parameters that aren't target parameters.

In this light, I don't think we should apply this patch.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]