This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
jglascoe@jay.giss.nasa.gov writes: > I've been thinking about adding a couple new associations to my hashtab > header: > > (number-nonempty-buckets . 0) > (sum-buckets-size . 0) > > Then I could hold off on resizing until the mean bucket size exceeds > 'max-bucket-size (or rather, 'max-mean-bucket-size). > > It'll add a little extra book-keeping overhead, but may pay off for large > hashtables or for cases where the keys are hashing too closely together. > Anyone have any thoughts? > This isn't directly on-topic, but I was going to suggest earlier that you change the header alist to a vector despite your readability concerns; it should be trivial to write a hashtab-header-alist or hashtab-properties in Scheme to translate it, and the more I think about it, the more convinced I am it would be a noticeable performance win. Regarding this proposal, basically nothing can save you from a hash function that hashes a lot of things to the exact same value; no number of resizes will help. The only hope for salvation here is for hashing to significantly change when the size of the range changes, so that resizing will actually solve such collisions. Another note: I use both eq? and equal? hashing in my code a fair bit. Also it would be nice if your code were able to hash by any of eq?, eqv? or equal? (the hashers for the first two should not be hard). At that point, I would suggest that a suitably cleaned up version of your code should replace Guile hash tables entirely, even 3% or 5% performance overhead is a small price to pay for auto-resizing. Another reason that maybe not considering the ultimate hash range in the hash function may be a lose: the obvious way to implement `hashq' with no mod is to cast the SCM value as an int, this will never collide in a word-size space but almost certainly will in many common hashtable sizes. - Maciej Stachowiak