This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] libdw: pre-compute leb128 loop limits


On Mon, 2014-12-15 at 23:03 +0100, Mark Wielaard wrote:
> On Mon, 2014-12-15 at 22:42 +0100, Mark Wielaard wrote:
> > On Mon, 2014-12-15 at 12:18 -0800, Josh Stone wrote:
> > > On Fedora 21, this appears to be slightly faster, although pretty close
> > > to noise levels.  Mark, can you see if this helps the performance slip
> > > on your el7 system?
> > 
> > It is slightly faster ~0.5 secs on ~55 secs.
> 
> Wait, I wasn't testing on an idle system. One of the cores was pretty
> busy (with running a fuzzer...). I retested both the original
> (mjw/pending) and your patch with nothing else eating cpu. Now (best of
> 3) original was 54.90 vs patched 53.90. So a whole second won.
> 
> > >    /* Unrolling 0 like uleb128 didn't prove to benefit optimization.  */
> > > -  for (unsigned int i = 0; i < len_leb128 (acc) && *addrp < end; ++i)
> > > +  const size_t max = __libdw_max_len_leb128 (*addrp, end);
> > > +  for (size_t i = 0; i < max; ++i)
> > >      get_sleb128_step (acc, *addrp, i);
> > >    /* Other implementations set VALUE to INT_MAX in this
> > >       case.  So we better do this as well.  */
> > 
> > Unrolling this does seem to give an addition ~0.2 seconds win.
> 
> Adding unrolling now (same idle system, best out of 3) gives me 54.28.
> So it does seem like another slight 0.6 second win.

I retested again and at least for my rhel7 setup both patches seem to
bring real wins. For my f21 setup things aren't so clear, the wins are
less than 0.5 seconds for both when taking the best of 3 runs. But I
don't fully trust my benchmarking on the f21 machine setup since some of
those runs show differences of several seconds with the worse being
70.62 seconds, and the best 66.40 seconds (without patches applied). So
something feels fishy about that machine.

But when everything is applied on average the new code is as fast as
0.160. Best run for 0.160 was 66.54, for an git master plus extra leb128
checking and performance tuning patches it was 66.03. For my rhel7 setup
best run for 0.160 was 53.63 seconds and with patches 53.93.

So at least compared to 0.160 we are not slower (although 0.160 was
slower than 0.158). But we do have a lot more robustness checking.

I would like to push the following patches, currently on mjw/pending, to
master:

commit 0f512f1201dc606fb1793f4a596d6b773033b10e
Author: Mark Wielaard <mjw@redhat.com>
Date:   Tue Dec 16 10:53:22 2014 +0100

    libdw: Unroll the first get_sleb128 step to help the compiler optimize.
    
    The common case is a single-byte. So no extra (max len) calculation is
    necessary then.
    
    Signed-off-by: Mark Wielaard <mjw@redhat.com>

commit 56698acce00bd0cbc8ace327263dfe147fae18fa
Author: Josh Stone <jistone@redhat.com>
Date:   Mon Dec 15 12:18:25 2014 -0800

    libdw: pre-compute leb128 loop limits
    
    Signed-off-by: Josh Stone <jistone@redhat.com>

commit 1df99d104efaa7d0b824f0761493010027a7303e
Author: Mark Wielaard <mjw@redhat.com>
Date:   Sun Dec 14 21:48:23 2014 +0100

    libdw: Add get_uleb128 and get_sleb128 bounds checking.
    
    Both get_uleb128 and get_sleb128 now take an end pointer to prevent
    reading too much data. Adjust all callers to provide the end pointer.
    
    There are still two exceptions. "Raw" dwarf_getabbrevattr and
    read_encoded_valued don't have a end pointer associated yet.
    They will have to be provided in the future.
    
    Signed-off-by: Mark Wielaard <mjw@redhat.com>

commit 1e777486d0c0191c7cc3adc29738e51f348cf039
Author: Mark Wielaard <mjw@redhat.com>
Date:   Fri Dec 12 16:43:04 2014 +0100

    libdw: Make sure all attributes come with a (fake) CU for bound checks.
    
    All attributes now have a reference to a (fake) CU that has startp and
    endp set to the data section where the form data comes from. Use that
    for bounds checking in __libdw_form_val_len and dwarf_formblock to make
    sure data read doesn't overflow any data section. Remove libdwP.h cu_data
    and use cu startp and endp directly where appropriate.
    
    Signed-off-by: Mark Wielaard <mjw@redhat.com>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]