This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
hello, I've been working on a Guile extension; auto-resizing hashtables. The extension is basically a bunch of functions which operate on a pair (cons header vector) where header is an alist (list (cons 'number-buckets 4) ;; 4 is default initial size (cons 'max-bucket-size 5) ;; 5 seems to work best (cons 'largest-bucket 0) (cons 'number-entries 0)) and vector is, well, a vector of "buckets". Each bucket looks like (N (hash1 key1 . value1) (hash2 key2 . value2) ... (hashN keyN . valueN)) i.e. (cons N (list (cons 'hash1 (cons 'key1 'value1)) (cons 'hash2 (cons 'key2 'value2)) ... (cons 'hashN (cons 'keyN 'valueN)))) so the default bucket is simply (cons 0 (list)) i.e. (0) ;; cool Anyway, I re-timed the infamous python dictionaries (as ported to Guile), Guile's hash tables, and these things I made, "hashtabs". The result is that the hash tables and the hashtabs are equally fast (NOTE: this time around I disabled debugging, and chose a better initial size for the hash tables: 529 for 3000 entries). The python dictionary type is only about 5% faster than the others. The advantage (well, some might call it a disadvantage) that the hash tables and hashtabs have over the python dictionaries is that the dictionaries are "opaque". You can't see, or screw up, a dictionary because its representation is completely internal. Guile's hash tables, on the other hand, are just vectors of alists. And the resizable hashtabs are pairs of alists and vectors. In fact, except for the hashing functions, both the hash tables and the hashtabs could easily be rewritten in Scheme, with only a modest speed penalty (umm, 100% slower?). fun with hashtabs: guile> (make-hashtab 5) (((number-buckets . 4) (max-bucket-size . 5) (largest-bucket . 0) (number-entries . 0)) #((0) (0) (0) (0))) guile> (define foo (make-hashtab 5)) guile> (list? foo) #f guile> (pair? foo) #t guile> (do ((i 1 (+ i 1)))((> i 10))(hashtab-set! foo (number->string i) i)) guile> foo (((number-buckets . 4) (max-bucket-size . 5) (largest-bucket . 3) (number-entries . 10)) . #((2 (-57 "8" . 8) (-53 "4" . 4)) (3 (-58 "9" . 9) (-54 "5" . 5) (-50 "1" . 1)) (2 (-55 "6" . 6) (-51 "2" . 2)) (3 (1953351192 "10" . 10) (-56 "7" . 7) (-52 "3" . 3)))) guile> (hashtab-ref foo "7") 7 guile> (hashtab-ref foo "123457") #f guile> (do ((i 11 (+ i 1)))((> i 100))(hashtab-set! foo (number->string i)i)) guile> (car foo) ((number-buckets . 32) (max-bucket-size . 5) (largest-bucket . 5) (number-entries . 100)) guile> (define header (car foo)) guile> (assq-ref header 'number-buckets) 32 guile> (/ (assq-ref header 'number-entries) (assq-ref header 'number-buckets)) 3.125 ... Jay jglascoe@jay.giss.nasa.gov