This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Scheme style auto-resizing hashtable


hello,

I've been working on a Guile extension; auto-resizing hashtables.  The
extension is basically a bunch of functions which operate on a pair

(cons header
      vector)

where header is an alist

(list (cons 'number-buckets 4)   ;; 4 is default initial size
      (cons 'max-bucket-size 5)  ;; 5 seems to work best
      (cons 'largest-bucket 0)
      (cons 'number-entries 0))

and vector is, well, a vector of "buckets". Each bucket looks like

(N (hash1 key1 . value1) (hash2 key2 . value2) ... (hashN keyN . valueN))

i.e.

(cons N (list (cons 'hash1 (cons 'key1 'value1))
              (cons 'hash2 (cons 'key2 'value2))
              ...
              (cons 'hashN (cons 'keyN 'valueN))))

so the default bucket is simply

(cons 0 (list))

i.e.

(0) ;; cool

Anyway, I re-timed the infamous python dictionaries (as ported to Guile),
Guile's hash tables, and these things I made, "hashtabs".  The result is
that the hash tables and the hashtabs are equally fast (NOTE: this time
around I disabled debugging, and chose a better initial size for the hash
tables: 529 for 3000 entries).  The python dictionary type is only about
5% faster than the others. 

The advantage (well, some might call it a disadvantage) that the hash
tables and hashtabs have over the python dictionaries is that the
dictionaries are "opaque".  You can't see, or screw up, a dictionary
because its representation is completely internal.  Guile's hash tables,
on the other hand, are just vectors of alists.  And the resizable hashtabs
are pairs of alists and vectors.  In fact, except for the hashing
functions, both the hash tables and the hashtabs could easily be rewritten
in Scheme, with only a modest speed penalty (umm, 100% slower?).

fun with hashtabs:

guile> (make-hashtab 5)
  (((number-buckets . 4) (max-bucket-size . 5) (largest-bucket . 0)
    (number-entries . 0))
   #((0) (0) (0) (0)))
guile> (define foo (make-hashtab 5))
guile> (list? foo)
  #f
guile> (pair? foo)
  #t
guile> (do ((i 1 (+ i 1)))((> i 10))(hashtab-set! foo (number->string i) i))
guile> foo
  (((number-buckets . 4) (max-bucket-size . 5) (largest-bucket . 3)
    (number-entries . 10)) . 
   #((2 (-57 "8" . 8) (-53 "4" . 4))
     (3 (-58 "9" . 9) (-54 "5" . 5) (-50 "1" . 1))
     (2 (-55 "6" . 6) (-51 "2" . 2))
     (3 (1953351192 "10" . 10) (-56 "7" . 7) (-52 "3" . 3))))
guile> (hashtab-ref foo "7")
  7
guile> (hashtab-ref foo "123457")
  #f
guile> (do ((i 11 (+ i 1)))((> i 100))(hashtab-set! foo (number->string i)i))
guile> (car foo)
  ((number-buckets . 32) (max-bucket-size . 5) (largest-bucket . 5)
   (number-entries . 100))
guile> (define header (car foo))
guile> (assq-ref header 'number-buckets)
  32
guile> (/ (assq-ref header 'number-entries)
          (assq-ref header 'number-buckets))
  3.125
...

	Jay
	jglascoe@jay.giss.nasa.gov