This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: avl-trees vs. hashes


Jay Glascoe <jglascoe@jay.giss.nasa.gov> writes:

 > On Mon, 19 Oct 1998, Klaus Schilling wrote:
 > 
 > > What are the most striking advantages of hashes over avl-trees (or other
 > > tree-based search structures) that pushed the guile developers into supporting
 > > only hashtables directly? Are binary-tree structures particularly bad for
 > > Scheme?
 > 
 > Unless I'm mistaken, avl-trees do insertion and retrieval in O(log n)
 > time.  Whereas hashes can do both in O(1), constant, time.  But it's
 > possible avl-trees are faster at deletion? anyone... anyone? :)

They're not any faster at deletion.  It's true that avl-trees,
red/black trees, etc, do insertion, retrieval & deletion in O(log n)
vs hash tables which do it in O(1), but this is misleading because log
n tends to be pretty small, & the O(1) has a lot of overhead - hash
fcn computation & bucket traversal.  My guess is that in practice they
can be quite comparable, but you probably have to be a pretty careful
coder to get the tree based overhead down to the hashtable level.
After all, log_2(4096) = 12.  You have a worst case of ~12 comparisons
or so to select an item out of 4000.  You also might be able to do
clever stuff with the keys to make the comparisons do extra work in
the case that the keys are strings.

What the tree based stuff does that hash tables don't do is ordering.
You need a less-than operator as well as an equal operator & then you
can also do the following operations in O(log n):

   Get the smallest item.
   Get the largest item.
   Get the next item.
   Get the previous item.

And, if you do a little extra O(1) work, you can in fact make the
above 4 operations O(1).

 > I'm working on deletion and downsizing for my hashtables right now.
 > Deletion seems to be REAL SLOW, because I basically have to rebuild a
 > bucket (basically an alist) every time I remove an entry.  I could use
 > "dummy entries" or something, but that seems like such an ugly solution.

Why should it be slow?  You cdr down the list & when you find the item
to delete you set-cdr! the item in front to cdr of it.  If it's the
1st item in the list you have to stick it back into the vector.

Here's some code for it:

   (define (delete-item! lst obj cmp?)
     (define (del-itm! lst tail obj cmp?)
       (cond ((null? tail))
	     ((cmp? obj (car tail))
	      (set-cdr! lst (cdr tail)))
	     (else (del-itm! (cdr lst) (cdr tail) obj cmp?))))

     (cond ((null? lst) '())
	   ((cmp? obj (car lst))
	    (cdr lst))
	   (else (del-itm! lst (cdr lst) obj cmp?)
		 lst)))



Should only be about 2x-3x the time of a get.

And although it'd be nice to have the functionality of
shrinking/reforming the hashtable, I don't expect it to get much use.
In particular, I wouldn't automatically call it in the delete
function.

Also, why isn't shrinking the hashtable identical to growing it?  I'd
think you'd just have a fcn like move-hashtable-data which takes a
hashtable & a vector & moves the data from the hashtable to the
vector.  Then this would handle both growing & shrinking by just
passing it vectors of different sizes.

-- 
Harvey J. Stein
BFM Financial Research
hjstein@bfr.co.il