This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] libdw: Make dwarf_getfuncs find all (defining) DW_TAG_subprogram DIEs.


On 09/18/2013 06:33 AM, Mark Wielaard wrote:
> dwarf_getfuncs used to return only the DW_TAG_subprogram DIEs that were
> direct children of the given CU. This is normally how GCC outputs the
> subprogram DIEs. But not always. For nested functions the subprogram DIE
> is placed under the subprogram DIE where it is nested. Other compilers
> might output the defining subprogram DIE of a C++ class function under
> the DW_TAG_namespace DIE where it was defined. Both such constructs seem
> allowed by the DWARF specification. So just searching the CU DIE children
> was wrong.
> 
> To find all (defining) subprogram DIEs in a CU dwarf_getfuncs should
> use __libdw_visit_scopes to walk the tree. The only tricky part is
> making sure the offset returned and used when the callback returns
> DWARF_CB_ABORT is correct and the search continues at the right spot
> in the CU DIE tree.

As one might expect, deeper iteration ends up quite a bit slower, but we
may have to pay that price for correctness.  However, before I get into
any numbers, I think you're *still* not catching everything.

I started experimenting with having stap do its own deep dive, rather
than using dwarf_getfuncs at all.  We have full DIE walks in a few
different contexts, and I figured that if dwarf_getfuncs is now going to
be more expensive, maybe I could instead just consolidate our own full
DIE walks and cache multiple things at once.

With systemtap-debuginfo-2.3-1.fc19.x86_64, my patched stap.git found
75713 probes in 'process("/usr/bin/stap").function("*")' (as counted by
the pass-2 -v line).  Unpatched stap.git with your patched elfutils.git
found only 61056 probes.  :(

Here's one example which is not found: at pc=0x5cee9, there is an inline
instance of allocate(a)ios_base.h.

 [ da48c]  compile_unit
           producer             (GNU_strp_alt) "GNU C++ [...]"
           language             (data1) C_plus_plus (4)
           name                 (strp) "parse.cxx"
...
 [10a491]          inlined_subroutine
                   abstract_origin      (ref_addr) [ 1f72b]
                   low_pc               (addr) +0x000000000005cee9
-->
 [ 1f72b]    subprogram
             specification        (ref_addr) [  11b0]
             inline               (data1) declared_inlined (3)
-->
 [  11b0]        subprogram
                 external             (flag_present) Yes
                 name                 (GNU_strp_alt) "allocate"
                 decl_file            (data1) 5   (->ios_base.h)
                 decl_line            (data1) 99


As far as I can tell, dwarf_getfuncs never reports 1f72b.  The trail
through partial units to that DIE should be:

 [ da48c]  compile_unit
...
 [ da4e6]    imported_unit
             import               (ref_addr) [ 21aad]
-->
 [ 21aad]  partial_unit
...
 [ 21abb]    imported_unit
             import               (ref_addr) [ 1e686]
-->
 [ 1e686]  partial_unit
...
 [ 1f72b]    subprogram
             specification        (ref_addr) [  11b0]
             inline               (data1) declared_inlined (3)


And now that I've laid that out, I think I know why it's missed.  Your
new tree_visitor is filtering out (die_offset < offset), in an attempt
to make the dwarf_getfuncs offset parameter work to resume.  This can
fail with partial_units, because the DIE offsets as they are traversed
are not necessarily ordered anymore.

If I just remove your line that updates v->offset, then suddenly it
works.  However, this means you won't have the right value to return at
the end of dwarf_getfuncs.  Maybe that should be tracked in a separate
field of visitor_info.

Resuming is still going to be problematic with a less-than comparison.
I think rather it needs to iterate until the exact requested offset is
found, without assuming any order.  Unless offset==0, of course, then
just start at the beginning.

I think this also means that you can't prune out whole subtrees, since
you don't know whether that offset could be found within, via an import.
 And if a tree happens to be imported multiple times (can it?), then
resuming on an offset within that is ambiguous -- oh well.


HTH,
Josh

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]