This is the mail archive of the archer@sourceware.org mailing list for the Archer project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Initial psymtab replacement results


>>>>> "Daniel" == Daniel Jacobowitz <drow@false.org> writes:

Daniel> Did you create the index, then populate the table?  Is it quicker to
Daniel> populate the table, and then create the index?

I did both, they are both slow.

Originally I didn't have an index and when I was playing with the SQLite
shell I noticed searches were slow.  So, I made an index -- which was
very slow to create in the shell.

Then I thought that maybe making the index before populating the table
would be faster.  I made that change to gdb, but it was still quite
slow.

Another idea I have is to make a new column holding a hash code, and not
use an index; or maybe use that for the index (indexing on an integer
column may be faster).

I was experimenting just now, and removing the "CREATE INDEX" and
changing the schema to mark symbols.name as "PRIMARY KEY" made database
creation much faster -- for gdb, down from 60 seconds to 19.  I still
think that is too slow though.

Daniel> Frank made a good point about putting host characteristics in the
Daniel> cache key.  By careful choice of the types stored, we should be able
Daniel> to create a mapped data structure that is in practice dependent only
Daniel> on endianness and maybe pointer size.  WDYT?

Yeah, I may give that a try.

Daniel> I know you've done a lot of work to kill psymtabs.  Do we populate
Daniel> psymtabs from the index, or are they pretty much optional now?  In
Daniel> other words, can we reclaim and reuse the memory formerly spent on
Daniel> psymtabs?

What I did was introduce a new struct of function pointers, alongside
struct sym_fns.  This provides an abstraction that replaces direct uses
of partial symbols.  The API "design" is completely ad hoc, based on
what previously existed.  So, it is rather weird and large; e.g., it has
a special function just for Ada, because ada-lang.c directly examines
psymtabs.

Then I moved all the uses of partial symbols into a new file, psymtab.c,
and made a new rule: only psymtab.c and the debuginfo readers are
allowed to directly manipulate these data structures.

Finally, I changed dwarf2read.c to have a separate implementation of
these functions, and to use its own indexing data structures.
dwarf2read now decides per-objfile whether to use partial symbols or the
new code.

I did all this because I did not think it was possible to really create
psymbols from the DWARF indices.

This approach saves a bit of memory when using the index.  I don't have
numbers handy but my recollection is that the savings isn't very
dramatic.

I have considered modifying dwarf2read to create "new-style" data
structures when the indices are not available.  I haven't implemented
this yet, though, because it is more work and the payoff doesn't seem to
be huge.

The new code could free some memory whenever it reads full symbols for a
CU.  I haven't implemented that yet.

Finally, with "-readnow", dwarf2read no longer reads partial symbols or
the indices; it skips directly to just reading everything.  I only did
this because it was easy to implement; I actually consider -readnow to
be fairly useless.

Another idea I have is to change the threaded-dwarf branch to read
psymtabs in the background thread.  This isn't too terribly hard, now
that psymtabs are fully segregated.

Tom


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]