This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: hash table subtleties


> The magic ratio is chosen so that the expected mean nonempty bucket size
> after upsize is the same as it is after downsize.  (and then, e.g. if a
> user inserts just enough entries for the table to grow to 4096 buckets, he
> must delete half of them before it will shrink back down to 2048). 

This sounds like the right idea, it will prevent overly frequent resize.
I don't follow the reasoning behind your magic ratio but if it works then
it works.

> I look at the number of nonempty buckets rather than the total number of
> buckets since empty buckets have no effect on lookup, re-insertion, and
> deletion (assuming you're using keys that already exist in the table).

Hmmm, since you resize by powers of two, each single bucket in the small
table splits into exactly two buckets in the large table so a bucket that is
empty in the small table is guaranteed to stay empty after a grow.

That means that, when growing the table, you can:
* make a bitmask equal to the size of the table before the grow (i.e. old size)
* for each non-empty bucket (indexed 0, 1, 2...) in the old table:
*** grab the list of keys from this bucket
*** for each key in the list:
---- do a bitwise AND of the bitmask against the hash of the key
---- add the result of the previous step to the current bucket index and
     use this new index to locate a destination bucket in the new table to
     throw the key into

This skull-duggery only works for powers of two because the size of the
table is always internally represented by a number containing only a single
bit 1 and the rest 0 (e.g. 4, 8, 16, 32... )

When shrinking a table, it's even better because you just append
the upper and lower bucket lists together, don't even look at the
hashes <grin>. I'll guess that should speed your deletion up a bit.
This works for any shrink operation that halves the size of the table.

	- Tel

PS: I NEVER did see your source code posting,
this list seems to discard large messages.

If you do post it to a web/ftp server somewhere, can you include the test
scripts that you are using for timing?