This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Scheme style auto-resizing hashtable



jglascoe@jay.giss.nasa.gov writes:
> I've been thinking about adding a couple new associations to my hashtab
> header:
> 
> (number-nonempty-buckets . 0)
> (sum-buckets-size . 0)
> 
> Then I could hold off on resizing until the mean bucket size exceeds
> 'max-bucket-size (or rather, 'max-mean-bucket-size).
> 
> It'll add a little extra book-keeping overhead, but may pay off for large
> hashtables or for cases where the keys are hashing too closely together.
> Anyone have any thoughts?
> 

This isn't directly on-topic, but I was going to suggest earlier that
you change the header alist to a vector despite your readability
concerns; it should be trivial to write a hashtab-header-alist or
hashtab-properties in Scheme to translate it, and the more I think
about it, the more convinced I am it would be a noticeable performance
win.

Regarding this proposal, basically nothing can save you from a hash
function that hashes a lot of things to the exact same value; no
number of resizes will help. The only hope for salvation here is for
hashing to significantly change when the size of the range changes, so
that resizing will actually solve such collisions.

Another note: I use both eq? and equal? hashing in my code a fair bit.
Also it would be nice if your code were able to hash by any of eq?,
eqv? or equal? (the hashers for the first two should not be hard).  At
that point, I would suggest that a suitably cleaned up version of your
code should replace Guile hash tables entirely, even 3% or 5%
performance overhead is a small price to pay for auto-resizing.

Another reason that maybe not considering the ultimate hash range in
the hash function may be a lose: the obvious way to implement `hashq'
with no mod is to cast the SCM value as an int, this will never
collide in a word-size space but almost certainly will in many common
hashtable sizes.

 - Maciej Stachowiak