This is the mail archive of the cgen@sources.redhat.com mailing list for the CGEN project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

hash function for assembly hash table


Doing some debugging yesterday, I noticed that the cgen-generated
assembler uses a very simplistic hash function when building the hash
table used in assembly.  It uses:

      mnemonic[0] % 127

As you might expect, the mnemonics for many assembly languages are
clustered -- for example, there might be a dozen "load" (`l')
instructions and a dozen corresponding "store" (`s') instructions.
Some architectures feature multimedia instructions in which *every*
instruction is prefixed by `m'.

To see how bad this is, I did some plotting to show how the
distribution looks in the hash table for a chosen port:

    http://www.air.net.au/~bje/char0-mod127.ps

By changing the hash function to something more slightly
sophisticated:

     sum (all mnemonic chars) % 127

This yields a *much* better distribution with a lower average number
of entries per hash value:

    http://www.air.net.au/~bje/ascii-sum-mod127.ps

Any objections if we pursue a hash function like this?

Ben


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]