This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: avl-trees vs. hashes


Jay Glascoe <jglascoe@jay.giss.nasa.gov> writes:

 > 1.  not storing the key's hash in the entry would be a big win for
 > deletion in certain, relatively common situations.  For example, if the
 > keys and values you use are small (relative to the size of the [big]
 > integer hash object), like small strings, integers, booleans, then
 > 
 > (hash (key . value))
 > 
 > is much larger than
 > 
 > (key . value)
 > 
 > and guile will have a much easier time gc'ing the smaller of the two.

What do you mean, "for deletion"?  Because of the GC, more memory =
more CPU = slower in general (because more memory gets scanned), so
including the hash value makes the program slower in general.

OTOH, all hash operations requiring looking for items might be faster
because you can test for hash equality before testing key equality &
the hash equality test might be faster.

 > 2.  not storing the hash would be a big loss for resize since it's
 > certainly faster (err, at least I hope it is) to do this
 > 
 > index = (scm_num2long(SCM_CAR(entry), "foo", "bar")) % vec_len;
 > 
 > rather than
 > 
 > hash = my_fave_hasher(SCM_CAR(entry));
 > index = hash % vec_len;
 > 
 > but then again, the speed of the hasher depends on the size of the key
 > you're hashing.  So if you've small keys...

I can't imagine the hasher being faster than just looking up the
number, so it should always be faster to look them up than to
recompute all the hashes.  However, it might be a minor point for
small keys because of the other work needed when rehashing.

But, using the above trick it can make all ops that require locating
an object faster (as mentioned above).

-- 
Harvey J. Stein
BFM Financial Research
hjstein@bfr.co.il