Initial psymtab replacement results

Tue Dec 15 23:39:00 GMT 2009

>>>>> "Daniel" == Daniel Jacobowitz <drow@false.org> writes:

Daniel> Or you could drag another bit of GDB into this century, and use
Daniel> SQLite or some other in-process database.

I played with this a bit today.  In particular I changed my gdb to dump
an SQLite database from the psymtabs.  I used this schema:

CREATE TABLE IF NOT EXISTS index_version (version INTEGER);
CREATE TABLE IF NOT EXISTS objfile (name, mtime INTEGER, size INTEGER,
                inode INTEGER);
CREATE TABLE IF NOT EXISTS cus (offset INTEGER UNIQUE);
CREATE TABLE IF NOT EXISTS filenames (cu INTEGER REFERENCES cus (offset),
                name, is_primary INTEGER);
CREATE TABLE IF NOT EXISTS symbols (cu INTEGER REFERENCES cus (offset),
                name NOT NULL, psym_domain INTEGER, psym_class INTEGER,
                is_public INTEGER);
CREATE INDEX IF NOT EXISTS byname ON symbols (name);
CREATE TABLE IF NOT EXISTS addresses (cu INTEGER REFERENCES cus (offset),
                low INTEGER, high INTEGER);

The 'byname' index really slows down populating the database.  It took
more than a minute to write out the database for gdb.  However, without
the index, name lookups are extremely slow (as in, I can count to 2
seconds when running a select command in sqlite3).

Maybe I'm misusing SQLite somehow, I'll try to look at this a little
more.  I'm not an SQL wizard, so if I've done something weird, please
let me know, as I'm sure there was no good reason for it.

The other issue is that the resulting database is very big.  For
example, the database for gdb is 72M, but the gdb executable itself is
119M.

I didn't write the reader side of this yet, but that won't be too bad.

I guess one idea would be to write the database in a background thread.
Or just not write it at all by default.

Daniel> Mappable data structures are tricky; one thing I'd definitely
Daniel> insist on is host neutrality.  IMO that is not optional.

Yeah, that does make it trickier.

I'm starting to get a bit discouraged by this project.  I think at this
point we've got strikes against all the ideas:

* .debug_pub*.  These require DWARF extensions and GCC bug fixes, don't
  work nicely with comdat or the other (planned) post-processors.

* .debug_gnu_index.  Pretty much the same problems except that we're
  also inventing it ourselves.

* Mappable data structure.  A pain to make host-independent; and I
  suspect that would kill performance.

* SQLite.  Too big and too slow to create.

I suppose we could write out the equivalent of .debug_gnu_index, only
not as an ELF section, and not as a mappable data structure.  We already
know that will perform adequately.  This won't meet all of my goals but
it would definitely help some use cases.

Maybe I can add code to do a psymtab-like scan of .debug_info in a
background thread.  That might make "gdb gdb" feel faster.

I think we ought to change GCC to drop the .debug_pub* sections (and
maybe .debug_aranges), at least on Linux.  AFAIK they aren't used by
anything, and indeed are barely usable due to historic bugs -- so they
are just wasting time and space.

Let me know what you think,
Tom