This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
On 28 Oct 1998, Harvey J. Stein wrote: > Jay Glascoe <jglascoe@jay.giss.nasa.gov> writes: > > > 1. not storing the key's hash in the entry would be a big win for > > deletion in certain, relatively common situations. For example, if the > > What do you mean, "for deletion"? Because of the GC, more memory = > more CPU = slower in general (because more memory gets scanned), so > including the hash value makes the program slower in general. > All I know is that my hash tables outperform (speed-wise) Guile's hash tables at everything except deletion. If I test (set! mytab (make-proc)) (for-each (insert-entry mytab insert-proc) lines) (for-each (find-entry mytab find-proc) lines) (for-each (delete-entry mytab delete-proc) lines) auto-shrink enabled *time: 557* (gc-time-taken . 332) (cells-allocated . 35957) (cell-heap-size . 98304) (bytes-malloced . 93495) (gc-malloc-threshold . 150000) (cell-heap-segments (537251512 . 536989368) (537909288 . 537385000)) auto-shrink disabled *time: 564* (gc-time-taken . 323) (cells-allocated . 35945) (cell-heap-size . 98304) (bytes-malloced . 93497) (gc-malloc-threshold . 150000) (cell-heap-segments (537251512 . 536989368) (537963016 . 537438728)) Guile hash *time: 448* (gc-time-taken . 241) (cells-allocated . 36002) (cell-heap-size . 98304) (bytes-malloced . 126346) (gc-malloc-threshold . 238159) (cell-heap-segments (537251512 . 536989368) (537962712 . 537438424)) in fact, Guile's hashes are faster at deletion than they are at insertion! if I test (set! mytab (make-proc)) (for-each (insert-entry mytab insert-proc) lines) (for-each (find-entry mytab find-proc) lines) (for-each (insert-entry mytab insert-proc) lines) auto-shrink enabled *time: 398* (gc-time-taken . 185) (cells-allocated . 65968) (cell-heap-size . 294912) (bytes-malloced . 109871) (gc-malloc-threshold . 150000) (cell-heap-segments (537251512 . 536989368) (537909288 . 537385000) (539510888 . 537938024)) Guile hash *time: 635* (gc-time-taken . 433) (cells-allocated . 56009) (cell-heap-size . 98304) (bytes-malloced . 126346) (gc-malloc-threshold . 238159) (cell-heap-segments (537251512 . 536989368) (537962712 . 537438424)) > OTOH, all hash operations requiring looking for items might be faster > because you can test for hash equality before testing key equality & > the hash equality test might be faster. yes, in almost all cases. it's faster, e.g. to check whether two symbols are eq? because then, in the C code, you're comparing two long's, whereas testing two SCM numbers requires that you do two SCM_INUM()'s #define SCM_INUM(x) (((x)>>1)>>1) > I can't imagine the hasher being faster than just looking up the > number, so it should always be faster to look them up than to > recompute all the hashes. However, it might be a minor point for > small keys because of the other work needed when rehashing. the problem is that a full size long won't fit into a SCM INUM. I'm currently masking (and'ing) my hash with 2,097,157 which does fit, but obviously won't work work for extremely huge tables (>2,000,000 entries). If I insist on a range (- 2^31, 2^31 - 1) for my hashes, then they'll be stored as "big numbers". Converting a big number to a long is like converting a string to a number (not very fast).