This is the mail archive of the mailing list for the elfutils project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 3/3] Add a simple array for CU abbreviations with low codes

On 12/12/2013 11:16 AM, Josh Stone wrote:
> On 12/12/2013 09:07 AM, Petr Machata wrote:
>> Josh Stone <> writes:
>>>  struct Dwarf_CU
>>>  {
>>> @@ -283,7 +285,9 @@ struct Dwarf_CU
>>>    size_t type_offset;
>>>    uint64_t type_sig8;
>>> -  /* Hash table for the abbreviations.  */
>>> +  /* Simple array for the abbreviations with low codes.  */
>>> +  Dwarf_Abbrev *abbrev_array[CU_ABBREV_ARRAY_SIZE];
>> This blows up Dwarf_CU from 100odd bytes to something like half the
>> page, I'm not entirely fond of that.  Especially since the hash table
>> that follows is an on-demand-growing structure, having half a page
>> reserved (possibly on stack) just in case seems wasteful.
>> I'm looking into some debuginfo files.  libc has an average of 28
>> abbreviations per CU (with 108 being the most), libstdc++ an average of
>> 77 (with 108 the most), gcc 88 (142), libbost_python 206 (290), vmlinux
>> 112 (185).  So reserving space for 256 seems overly generous.  I'd be
>> fine with a growable heap-allocated array capped at 256.  Hopefully that
>> would still be helpful performance-wise.
> OK, I'll experiment with less generous static sizes, on the heap, as
> well as dynamic/realloced size (still capped though).
> I also had the idea that maybe lookup can sometimes skip the modulus,
> which was by far the hotspot instruction in lookup() by ~72%.  Just
> basically moving my "is-it-a-small-code" branch into hash lookup.  So
> everything would stay in the hash table, hopefully just a bit faster.

This actually works pretty well:

diff --git a/lib/dynamicsizehash.c b/lib/dynamicsizehash.c
index 40f48d5ed119..eec9826c43be 100644
--- a/lib/dynamicsizehash.c
+++ b/lib/dynamicsizehash.c
@@ -50,7 +50,7 @@ lookup (htab, hval, val)
      TYPE val __attribute__ ((unused));
   /* First hash function: simply take the modul but prevent zero.  */
-  size_t idx = 1 + hval % htab->size;
+  size_t idx = 1 + (hval < htab->size ? hval : hval % htab->size);

   if (htab->table[idx].hashval != 0)

             libdw  varlocs  varlocs
              text     time   maxres
    Base:   243103   84.14s  237276k
    P1:     243295   74.90s  237276k
    P2:     243167   70.07s  237272k
    P3-old: 243583   57.14s  238464k
    P3-new: 243199   62.21s  237272k

This simple change (P3-new) is not quite as good as using the fixed
array (P3-old), but it's still a speedup and doesn't add to the memory
profile at all.

It's an improvement because well-behaved DWARF producers will use the
lowest abbrev codes to be more compact in uleb128 encoding.  Every file
I've seen also has codes ordered, so __libdw_findabbrev/getabbrev will
always build up the hash in increasing code order.  This dynamic hash
implementation already resizes itself to stay <90% full, which means
here hval is always less than htab->size.  A well-predicted branch
proves to be faster than hitting the 'div' for me.

So in this typical case, the hash table is acting like just a dynamic
array.  However, if some strange DWARF producer didn't use low abbrev
codes in order, the hash table will still do the right thing, no worse
than before.  And with other hash tables like sig8, where hval is large,
the branch in that version of lookup() should be well-predicted the
other way.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]