This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: avl-trees vs. hashes


Keith Wright <kwright@tiac.net> writes:

 > > From: Per Bothner <bothner@cygnus.com>
 > > 
 > > > Unless I'm mistaken, avl-trees do insertion and retrieval in O(log n)
 > > > time.  Whereas hashes can do both in O(1), constant, time.  But it's
 > > >  possible avl-trees are faster at deletion? anyone... anyone? :)
 > > 
 > > Nope;  deletion is O(1) is a properly-implemented hashtable.
 > 
 > Sorry, both wrong.  The hash table operations appear to be O(1)
 > as long as the number of entries (n) is less than the size
 > you initially allocated for the table.  But then, most anything
 > not willfully stupid is O(1) if you put a fixed bound on n.
 > 
 > As soon as n exceeds the size of the table you have two choices
 > (A) use some kind of balanced tree for the buckets resulting
 >   in O(log n) behaviour.  (Or worse if you use a linear search.)
 > (B) Make a bigger table and re-hash.  If you double the size of
 >   the table on each re-hash, then to add O(2^k) entries you
 >   must touch the first entry O(k) times, again giving O(log n)
 >   amortised cost.
 >
 > The hash table can help with the constants hidden in the big-O
 > notation, but it does not improve asymtotic complexity.  (Not
 > that this has much to do with Guile.)

I thought that would be the case, but then I did the following
computation & decided it wasn't the case:

For 2^k insertions, assuming a hash table doubling every power of 2,
you get approximately

Insertion times not counting hash table growth:  1 op 2^k times = 2^k

Hash table growth time is approx. reinserting each element every time
the # of elements = a power of 2 = 1 op for each element every power
of 2 = 2 + 4 + ... + 2^k < 2^(k+1)

Total time = 2^(k+1) + 2^k

Time per operation:

   2^(k+1) + 2^k
   -------------
         2^k

 = 3

Constant time.

The mistake in your computation is that you only touch the 1st 
inserted elements k times, the next one k times, the next 2 k-1 times,
the next 4 k-2 times, the next 8 k-3 times, ..., and this is amortized
over 2^k insertions.  You'd have to do 2^k operations every power of 2
times to get k operations per insertion.

-- 
Harvey J. Stein
BFM Financial Research
hjstein@bfr.co.il