This is the mail archive of the gdb-patches@sourceware.cygnus.com mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: (patch) hpjyg09: bcache optimizations

To: Jim Blandy <jimb at cygnus dot com>
Subject: Re: (patch) hpjyg09: bcache optimizations
From: jtc at redback dot com (J.T. Conklin)
Date: 05 Nov 1999 11:13:02 -0800
Cc: Jimmy Guo <guo at cup dot hp dot com>, gdb-patches at sourceware dot cygnus dot com
References: <Pine.LNX.4.10.9911041332020.15719-100000@hpcll168.cup.hp.com> <npaeoskevw.fsf@zwingli.cygnus.com>
Reply-To: jtc at redback dot com

>>>>> "Jim" == Jim Blandy <jimb@cygnus.com> writes:
Jim> But this situation still sounds weird.  The bcache is a very
Jim> simple thing --- bcache.h and bcache.c total 289 lines of code.
Jim> You give it a string of bytes, and it either adds a copy of your
Jim> string to its hash table, or finds an identical copy already
Jim> present.  Either way, it hands you back a pointer to its stashed
Jim> copy.  So, a bcache is helpful if you expect a lot of duplicates.

This brings back memories.  

Many years ago, I discovered CVS had terrible hash behavior.  Date and
Revision strings were used as keys, and each character (of which there
were only digits, '.', and '/') was weighted equally.  I can't recall
exactly, but I think the original hash function resulted in ~10% bucket
utilization --- the remaining ~90% would not be used by any valid keys.
And even among the remaning ~10%, the distribution was non-uniform.

I changed it to one of the functions compared in the Red Dragon book's
discussion of hash functions and their suitability for use for hashing
identifiers (which tend to have a lot of duplication/redundency). This
change resulted in a uniform distribution across all buckets and a
cooresponding performance improvement.

For the longest time, I used to keep a histogram plot of the original
and new hash functions in a folder, and would show them to anyone who
would listen to the tale: switching from a nieve to a complex data
structure will not automatically result in a performance improvement.

So I agree with Jim, fixing bcache to use a better hash table is a
much better approach than microoptimizing the existing hash function
by making it an inline function or a macro.  Many thanks for spending
the time and tracking down the root cause.

For reference, here is hash function I used:

    unsigned int 
    hashpjw(const char* x)
    {
      unsigned int h = 0;
      unsigned int g;

      while (*x != 0)
      {
        h = (h << 4) + *x++;
        if ((g = h & 0xf0000000) != 0)
          h = (h ^ (g >> 24)) ^ g;
      }
      return h;
    }

It will have to be modified slightly for use in bcache, but it may
result in acceptable behavior in this case as well.

        --jtc

-- 
J.T. Conklin
RedBack Networks

References:
- (patch) hpjyg09: bcache optimizations
  - From: Jimmy Guo
- Re: (patch) hpjyg09: bcache optimizations
  - From: Jim Blandy

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]