This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Dynamic linking optimizations


On Tue, 2009-03-03 at 19:37 +0100, Jakub Jelinek wrote:
> On Tue, Mar 03, 2009 at 10:22:39AM -0800, Ian Lance Taylor wrote:
> > John Moser <nigelenki@comcast.net> writes:
> > 
> > >> Character comparisons are fast.  It doesn't help to minimize them if it
> > >> makes each operation cost more on average.
> > >
> > > Well, the "operations" would  be following a branch, which involves
> > > jumping by a computed offset and reading in a byte (to use as an index).
> > 
> > Yes.  On a modern processor, a character comparison is much faster than
> > a mispredicted branch (I would guess that the memory cache will hit
> > about as often for both algorithms).  The branches in your algorithm
> > would be difficult for the processor to predict correctly, so I think
> > one must assume that half of them will be mispredicted.
> 
> Also, for symbol lookup speed the most important thing is how many
> cache lines you need to read, and, as in typical programs these days there are
> dozens to hundreds of shared libraries, not finding a symbol you're looking
> for is more probable than that you find it.  So you really want to optimize
> for as few cache lines touched as possible for the case a symbol isn't found
> in a shared library.  DT_GNU_HASH section layout is quite good in this.
> 

Yes that's true.  My layout eliminates repetitive data in an attempt to
address this, but it's not always going to work out that way.  Let's
say...

some_funky_symbol_a
some_funky_symbol_b

would each be O(1) lookups, jumped to by a hash table, if they're alone
in their own bucket; in a trie you'd have

some_funky_symbol_{a,b}

with some half a dozen bytes of control data

{h:1B}some_funky_symbol_
\0{'a':1B,off:4B}{'b':1B,off:4B}{h:1B}\0{\0,off}{h:1B}\0{\0,off}

of course you'd have to pad the control data to 8 bytes to get it to
play nice with C struct  logic (unless you want to do pointer arithmetic
for this, which is probably just as fast...i think that's what the
assembly output does) but anyway.

Minimum?  In this example, 22 bytes control data, 21 bytes of character
string, 43 bytes instead of 40.  Add a symbol "some_funky_symbol_c" and
you get 13 bytes overhead to save 18 bytes, so a small win.

So in practice, this could cause more data to be spewed out.  Or less
data, if there's A) long prefixes; or B) lots of symbols with
overlapping prefixes.

If the symbols are evaluated in alphabetical order, the progression
should be somewhat random (read:  ugly) but restricted to somewhat tight
areas of memory, again an attempt to minimize cache misses by keeping
the working set for any given small set of sequential lookups close
together to take advantage of CPU prefetching algorithms.

At least in theory.

And yeah, DT_GNU_HASH is good at landing you in one or zero buckets with
one symbol and saying, "Hmm, this doesn't match" after like 4 character
comparisons or something small (usually) (libF wait I'm looking for
libBar_Execute() this is not it).

> 	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]