This is the mail archive of the mailing list for the binutils project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: DT_GNU_HASH: ~ 50% dynamic linking improvement

Michael Meeks wrote:
> and here was I thinking I was going to have to write a length
> paper to try to convince Ulrich :-) You made my month (really).

What are you talking about?  I never said that I'm opposed to changing
the hash table.  I'm not going to agree to direct bindings, that's all,
and you constantly insisted on bringing that nonsense up.

> 	Anyhow - I've been thinking about various other optimisations, many of
> which we can easily & neatly fit into this nice framework, I've numbered
> them to help chewing them over:
> 	1. Extensibility
> 		+ I want to see a multiply per lookup.
> 		+ ie. since the L2 cache misses are what hammer us -
> 		  if [sic] we want to extend the .chain -Bdirect
> 		  support in future - being able to put that in
> 		  the same record is really important.
> 		+ a 'record size' field gives us that with ~no
> 		  bloat as of now.

You absolutely cannot add direct bindings as an optional feature.
Period.  A binary using direct bindings can and often will have
different effective bindings then a binary which uses normal lookup.
It's complete impossible.  A program can either be compiled and even
_designed_ for direct binding or not.

Adding support in any form for such an incompatible feature to the hash
table which is purely an optional extension is unacceptable.

> 		+ Since we can sort (almost all) of .dynsym -
> 		  there is no problem ensuring that ~all symbols
> 		  that we don't want in .hash occur at the end
> 		  of the .dynsym table [ cf. the next UNDEF point ].

This can only possibly work if you have spare bits...

> 		+ Furthermore - by inspection it can be seen that
> 		  1/2^32 ~=== 1/2^30 [1]
> 			+ using this theorem, we can easily steal
> 			  1/2 bits from the top of the stored hash
> 			  and use them as chain terminators.

... and this is nonsense.  You cannot compute the probabilities by
looking at the complete set of symbols.  You have to determine which
kind of symbols have the same hash value after then change and then make
sure they probability of having those is the same as in the complete
set.  This (most) likely is not the case.

Instead what will happen is that the symbols which are used together in
a program have a higher likelihood to collide (common prefixes etc).  If
even the reduction from 32 to 31 bits produces on the full set a factor
o >2 more duplicates (Jakub tested it) this number is likely higher in
the real world.

On the other hand, what is gained?  You already said we have rather
short hash chains.  Most likely not longer than 5 or 6.  I can count the
number of DSOs with longer chains on my hands and there is not more than
one longer chain in each DSO.  Now do the math: even assuming even
distribution of the chain we have on average 6 chain elements per cache
line.  If the linker sorts them cache lines we can fit in 14.  That's
enough even for the worst chain length.

So, on the one hand you save one 4-byte word which in almost no case
will cause an additional cache line to be loaded.  And even if it would,
the cache lines are consecutive the the processors prefetching takes
case of it.  On the other hand you more than double the likelihood to
have to compare strings.  This is IMO reason enough to not change the

> 	3. What we hash:
> 		+ it is incredible to me that UNDEF symbols are

You didn't read what Jakub wrote.  We don't hash UNDEF records in the
new format.

â Ulrich Drepper â Red Hat, Inc. â 444 Castro St â Mountain View, CA â

Attachment: signature.asc
Description: OpenPGP digital signature

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]