This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: avl-trees vs. hashes


On 28 Oct 1998, Harvey J. Stein wrote:

> Jay Glascoe <jglascoe@jay.giss.nasa.gov> writes:
> 
>  > 1.  not storing the key's hash in the entry would be a big win for
>  > deletion in certain, relatively common situations.  For example, if the
>  > keys and values you use are small (relative to the size of the [big]
>  > integer hash object), like small strings, integers, booleans, then
>  > 
>  > (hash (key . value))
>  > 
>  > is much larger than
>  > 
>  > (key . value)
>  > 
>  > and guile will have a much easier time gc'ing the smaller of the two.
> 
> What do you mean, "for deletion"?  Because of the GC, more memory =
> more CPU = slower in general (because more memory gets scanned), so
> including the hash value makes the program slower in general.
> 
> OTOH, all hash operations requiring looking for items might be faster
> because you can test for hash equality before testing key equality &
> the hash equality test might be faster.
> 
>  > 2.  not storing the hash would be a big loss for resize since it's
>  > certainly faster (err, at least I hope it is) to do this
>  > 
>  > index = (scm_num2long(SCM_CAR(entry), "foo", "bar")) % vec_len;
>  > 
>  > rather than
>  > 
>  > hash = my_fave_hasher(SCM_CAR(entry));
>  > index = hash % vec_len;
>  > 
>  > but then again, the speed of the hasher depends on the size of the key
>  > you're hashing.  So if you've small keys...
> 
> I can't imagine the hasher being faster than just looking up the
> number, so it should always be faster to look them up than to
> recompute all the hashes.  However, it might be a minor point for
> small keys because of the other work needed when rehashing.
> 
> But, using the above trick it can make all ops that require locating
> an object faster (as mentioned above).

I think the best solution is to _not_ store the hash by default. Reason: The
user could do so if he wishes. In a generic hashtable implementation, which
allows to provide a hash-function and an equality predicate, you could do the
following:

* Keys are pairs (hash . real-key)
* The hash-function which is provided to make-hashtable is simply 'car.
* The equality function first compares the cars, then performs the real
  equality test on the real-keys:
  (define (my-hash-equal? k1 k2)
    (and (eq? (car k1) (car k2))
         (real-equality-test (cdr k1) (cdr k2))))
* Instead of (hashtable-set! table key value) you do 
  (hashtable-set! table (cons (real-hasher key) key) value)
* make-hashtable is called as follows (or similarly):
  (make-hashtable car my-hash-equal?)

By _not_ adding the hash you have the possibility to get the best of all
worlds.

Jay, you have done a great job providing the auto-resizing hashtables. 
What do you think of my suggestion?

Best regards, 
Dirk Herrmann