GNU_HASH section format

Tue Oct 31 17:27:00 GMT 2006

On Tue, Oct 31, 2006 at 01:44:40PM +0200, Alex Kogan wrote:
> I see a couple of posts in the mailing list dealing with new format of
> symbol hash table.
> Where can I find the final specification of new format of GNU_HASH section,
> which is currently implemented in binutils?

The initial public specification was given in
http://sources.redhat.com/ml/binutils/2006-06/msg00418.html
but for the final specification I'm afraid you need to read binutils
and/or libc sources.

Or, I'll try to summarize:

The .gnu.hash section uses sh_entsize 4 for 32-bit ELF and sh_entsize 0
for 64-bit ELF (as it has then non-uniform entity size; 4 32-bit words,
variable count of ELFCLASS sized words and then variable count of 32-bit
words; it doesn't repeat the historic mistakes on Alpha/s390x with .hash).
The first 4 32-bit words are:
nbuckets	symndx		maskwords	shift2
- nbuckets is like in .hash section, number of hash buckets (i.e. number
  of 32-bit words in the third part of the .gnu.hash section)
- symndx is the number of .dynsym symbols which cannot be looked up using
  .gnu.hash (i.e. .dynsym[symndx] .. .dynsym[dynsymcount-1] symbols
  are sorted by dl_new_hash (&.dynstr[s.st_name]) % nbuckets values)
- maskwords is the number of ELFCLASS sized words in the second part of
  .gnu.hash section
- shift2 is a shift count used in the bloom filter

When .gnu.hash section (with the associated DT_GNU_HASH dynamic tag pointing
to its start) is present, symbols in .dynsym section have to be sorted
with symbols which might be looked up using the .gnu.hash section
in the second part of the .dynsym section.  In this second part
of .dynsym, for any M and N where symndx <= M < dynsymcount and
symndx <= N < dynsymcount,
M <= N iff ((dl_new_hash (&.dynstr[.dynsym[M].st_name]) % nbuckets)
             <= (dl_new_hash (&.dynstr[.dynsym[N].st_name]) % nbuckets))
(i.e. symbols are sorted by increasing hash % nbuckets value).  Symbols
in the first part of the .dynsym section (i.e. those that can't be looked
up using DT_GNU_HASH, such as special symbol 0, STT_SECTION, STT_FILE,
STB_LOCAL etc. symbols) don't need to be sorted in any particular order
for .gnu.hash, though gABI or psABI rules might mandate some order.

The second part of the .gnu.hash section is the bloom filter, consisting of
maskwords 32-bit words for ELFCLASS32 and maskwords 64-bit words for
ELFCLASS64.  Let C below be 32 for ELFCLASS32 and 64 for ELFCLASS64.
Bloom filter word N has bit B cleared if there is no M, M >= symndx which
satisfies
((dl_new_hash (&.dynstr[.dynsym[M].st_name]) / C) % maskwords) == N
&& ((dl_new_hash (&.dynstr[.dynsym[M].st_name]) % C) == B
    || ((dl_new_hash (&.dynstr[.dynsym[M].st_name]) >> shift2) % C) == B)
If the bloom filter is not needed, maskwords should be 1 and the only
word should have all bits set.

The third part of the .gnu.hash section contains nbuckets 32-bit
words.  Word N in this part of the .gnu.hash section contains lowest M
for which 
(dl_new_hash (&.dynstr[.dynsym[M].st_name]) % nbuckets) == N,
or, if there is no such M, contains 0.

The fourth part of the .gnu.hash section contains dynsymcount - symndx
32-bit words.  Word N in this part of the .gnu.hash section contains
(dl_new_hash (&.dynstr[.dynsym[N].st_name]) & ~1)
| (N == dynsymcount - 1
   || (dl_new_hash (&.dynstr[.dynsym[N].st_name]) % nbuckets)
      != (dl_new_hash (&.dynstr[.dynsym[N + 1].st_name]) % nbuckets))

The hash function used is

static uint_fast32_t
dl_new_hash (const char *s)
{
  uint_fast32_t h = 5381;
  for (unsigned char c = *s; c != '\0'; c = *++s)
    h = h * 33 + c;
  return h & 0xffffffff;
}

(Dan Bernstein's string hash function posted eons ago on comp.lang.c.)

	Jakub