This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

DWARF v4 support


I've now added libdw decoding support for all the v4 format features.

I'd like to release 0.148 by the end of next week.  So I want to tie up
loose ends on this stuff in the next several days, and could use some help.

Here's my outline of what there is to cover:

* DWARF 4
** .debug_line version 4
*** grokked.  still no interface to yield op_index, isa, discriminator
*** readelf MISSING
** .debug_frame version 4
*** address_size, segment_size in header
**** grokked, no support for non-defaults
*** readelf MISSING
** .debug_info version 4
*** new forms
**** sec_offset
***** grokked.  c++/ change?
**** exprloc
***** grokked.  c++/ change?
**** flag_present
***** grokked.  c++/ change?
**** ref_sig8
***** partially grokked, needs .debug_types parsing
** .debug_types
*** partial header grokkage, not yet used
*** __libdw_findcu
*** __libdwfl_nextcu?
*** dwarf_formref_die: sig8 hash lookup
*** readelf MISSING


The v4 .debug_line and .debug_frame formats are not actually generated by
anything.  So we don't have any way to test those.

I haven't actually tested any of the rest of it either.  GCC trunk/4.5 (and
I think 4.4-rh) with -gdwarf-4 will generate the new .debug_info formats.
It would be good to test on that format.  I'm not sure off hand how best to
do it.  Perhaps bootstrap trunk/4.5 gcc and then somehow make libdw read
everything from its cc1{,plus} binaries.

readelf has separate decoders for most of the stuff, and I haven't updated
those.  It's not really hard, but a bunch more little fiddling to do and
test.  Obviously that's very useful to get done and get right, since
looking at eu-readelf is the usual way to debug everything else.

The other piece not quite complete is .debug_types support.  I've done most
of that work on the roland/debug_types branch, not merged yet.  The only
API change there is dwarf_next_unit, which obsoletes dwarf_nextcu and can
be used for either .debug_info or .debug_types parsing.  In the structured
libdw reader code, the only actual support is in dwarf_formref_die, which
should be able to resolve DW_FORM_ref_sig8 references to .debug_types.

I haven't decided quite what I think about other API additions for looking
at .debug_types.  Currently the only way to see any such DIE is
dwarf_formref_die.  You can easily mock up a Dwarf_Attribute with
DW_FORM_ref_sig8 to look up some known sig8, but that is obviously not
something to recommend.  The obvious thoughts are a lookup call that takes
a sig8 directly (i.e. API for the new innards of dwarf_formref_die), and an
equivalent to dwarf_offdie.  

With dwarf_next_cfi, you can iterate over the type units and see each sig8
to do such a lookup.  But if there are (bogusly) multiple units for the
same sig8, there is no way to reach the second and later instances--a sig8
lookup will just hit the first one interned.  With just sig8 lookups, there
is also no direct way to see the type_unit and DIE structure under it.  The
sig8 lookup gets you directly to some owned DIE (e.g. inside layers of
namespace and class_type).  From there you can use dwarf_diecu to get back
to the type_unit, but that's a bit contorted.  So for pure iteration, an
analogue to dwarf_offdie (i.e. just the same, but offset into .debug_types
instead of .debug_info) makes sense.

So I haven't been sure what's worth adding.  You don't normally come up
with a sig8 anywhere but from a DW_FORM_ref_sig8, so I don't really know
what would use a public lookup by sig8 interface.  I guess dwarf_offdie is
not used very much either, but it's an obvious thing to have.  I suppose
mainly I've just hesitated because I can't decide on any name I like.
dwarf_types_offdie?  dwarf_offdie_types?  dwarf_offtypedie?  I don't know.
Of course, we can always just let 0.148 go out with the dwarf_formref_die
support but no new API, and add one later as needed.  But it might be
better to add something now.

Thoughts?

Of course, all the .debug_types stuff too is entirely untested.  Newer GCC
can emit it (and I think -gdwarf-4 does by default, not sure).  It would be
good to get some binaries like that (gcc bootstrap build or whatnot) and
drive it through the decoder.  I still haven't thought of what would do that.
Maybe there is a way to use stap -L for this?


The "c++/ change?" notes above are to-do items.  I haven't merged all this
into the dwarf branch and probably won't do it until we have the final tree
ready for 0.148.  When we do, there are some subtleties to fix for the new
form handling.  The C++ stuff about forms and value spaces will have to
have some conditionals for CU version now.

In a v4 CU:
* data* forms can never be a *ptr, only sec_offset is.
  So now an expected set can include both *ptr and constant and distinguish.
* block* forms can never be a loc/DWARF expression, only exprloc is.
  So now an expected set can include both constant_block and location.

There are also a bunch of new to-do items for dwarflint.

* low-level
** grok new header formats
** grok new forms
** parse .debug_types too
** low-level checks
*** new forms in a CU with version<4 (invalid)
** mid-level checks
*** flag instead of flag_present used in CU v>=4 (suspicious/suboptimal)
*** duplicate sig8 in two type units
*** ref_sig8 connectivity with type units, unreferenced type unit
*** block* used where exprloc expected in CU v>=4 
*** data* used where sec_offset expected in CU v>=4 
** high-level
*** type unit DIEs containing code stuff, PC-related attrs, etc.
    I'm not really clear on what is or isn't supposed to be in there.
    Needs some clarification on dwarf-discuss.
*** (later) compute type signature md5 of type units, match compile-time sig8

The low-level and mid-level ones should be done sooner rather than later.
I think we'd probably like to have the compressor usually read in whatever
formats (i.e. v3 with gcc defaults in Fedora/RHEL6 today) and emit v4
format, so it's important to have good checking about the subtleties like
sec_offset vs data so we don't change the meaning of old data by morphing
it into new format.


Thanks,
Roland

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]