This is the mail archive of the
cgen@sources.redhat.com
mailing list for the CGEN project.
hash function for assembly hash table
- To: cgen at sources dot redhat dot com
- Subject: hash function for assembly hash table
- From: Ben Elliston <bje at redhat dot com>
- Date: Tue, 27 Mar 2001 09:19:56 +1000 (EST)
Doing some debugging yesterday, I noticed that the cgen-generated
assembler uses a very simplistic hash function when building the hash
table used in assembly. It uses:
mnemonic[0] % 127
As you might expect, the mnemonics for many assembly languages are
clustered -- for example, there might be a dozen "load" (`l')
instructions and a dozen corresponding "store" (`s') instructions.
Some architectures feature multimedia instructions in which *every*
instruction is prefixed by `m'.
To see how bad this is, I did some plotting to show how the
distribution looks in the hash table for a chosen port:
http://www.air.net.au/~bje/char0-mod127.ps
By changing the hash function to something more slightly
sophisticated:
sum (all mnemonic chars) % 127
This yields a *much* better distribution with a lower average number
of entries per hash value:
http://www.air.net.au/~bje/ascii-sum-mod127.ps
Any objections if we pursue a hash function like this?
Ben