This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
> Leaving a mark has the problem that a lot of insertions and deletions > which don't substantially change the number of keys (i.e. which don't > force a rehash) cause all the empty slots to change over to marks. I'd be curious to know the results of a mathematic analysis of this. First, deleted slots are of course equivalent to empty slots for insertion (once we have verifyed that the key is not already in the table). Secondly, in practice, I think most tables have a relatively low number of deletions. Finally, you only need to leave a mark when the "next" element in a scan is non-empty. That is easy to figure out when doing a linear scan, but of course that is bad because of clustering. If scans are semi-random (i.e. a secondary hash), then the figuring out if a deletion mark is needed may be impractical. > Generally the non-linked hash table is more memory efficient > but slower (considering garbage collection overhead, maybe faster). But note that because the non-linked hash table saves space, you can (and in fact should) allocated more slots/buckets, and hence collisions are less likely. In any case, the various tradeoffs are complicated, and beyond my knowledge and ability to analize. I'd be interested in a overview of the various results, written by somebody familiar with the literature, but written for a practitioner needing recommendations, not theorems. Another complicationpoint: There are hash algorithms that are guaranteed *always* (i.e. not amortized) O(1) (assuming the hashing is appropriately random). They do this by resizing incrementally. I vaguely remember one of these algrithms is called additive hashing. A Computing Surveys-type article would be great. --Per Bothner Cygnus Solutions bothner@cygnus.com http://www.cygnus.com/~bothner