This is the mail archive of the
mailing list for the elfutils project.
Some DWARFv5 draft feedback
- From: Mark Wielaard <mjw at redhat dot com>
- To: elfutils-devel at lists dot fedorahosted dot org
- Date: Thu, 01 Dec 2016 02:24:24 +0100
- Subject: Some DWARFv5 draft feedback
I have been working on elfutils DWARFv5 support based on what GCC7
implements and the current public draft. I haven't had time to implement
everything. But since the public comment period was kept short I thought
it would be nice to at least document some of the things that I stumbled
over. Hopefully they help to understand why I made some of the choices
when reviewing the patches on the elfutils mjw/DWARF5 branch.
Some of these things seem simple to fix/clarify and I will also submit
Issues for them to dwarfstd.org so they don't get forgotten. Others are
just observations that I struggled with. Which might just need
coordination between producers and consumers. Hope the notes are also
useful for writing other DWARFv5 consumers.
BTW. It would be handy if there were sources for the spec so one can
create patches for simple typos. Also it is somewhat opaque how Issues
are handled. Could they and any comments from the committee be sent to
the mailinglist to make tracking changes to the draft easier.
New Language Encodings. The list still has DW_LANG_C_plus_plus_03 which
was discussed some time ago on the dwarf-discuss list. Since c++03
doesn't introduce any language changes it isn't clear what this
signifies. The original submitter also agreed it wasn't necessary. On
the other hand there is no DW_LANG_C_plus_plus_17, which seems
appropriate and necessary given that final DWARFv5 will probably come
out in 2017 as does GCC7 with C++17 language support.
Handling language specific DIE/Attribute properties in partial units.
This seems hard to handle in the abstract. I wonder if there is some
guidance for producers on what kind of things can be moved into a
partial unit to make things easier for consumers. For example the size
of variable or data structure. e.g. subrange types might omit the
DW_AT_lower_bound attribute in which case the CU language context
determines the lower bound. (0 for C, 1 for Fortran, etc.). What if e.g.
DW_TAG_subrange_type is placed in a partial unit? Then it seems good if
the partial unit DIE has a language attribute. In general it would be
good if the partial unit DIE had a language attribute so the context is
clear for a consumer/library that might need provide language specific
properties for a DIE (given it might be unclear or hard to get at all
the (indirect) imports of the partial unit (and what if some of them
have different language attributes?). Alternatively a producer placing a
DIE into a partial unit might have to add any any attributes, like the
lower bound, that might be implicit if the unit DIE had a language
New FORMs. DW_FORM_ref_sup doesn't describe how the offset is
represented. Currently the assumption in elfutils is that it is 4 or 8
bytes depending on whether the containing unit is 32bit or 64bit DWARF.
This would be consistent with DW_FORM_strp_sup. The consequence is that
if the supplemental file has really big data sections you need a 64bit
DWARF unit to reference everything in it. There is no description of the
representation of DW_FORM_line_strp, but DW_FORM_strp is mentioned
twice. I assumed the second should just be DW_FORM_line_strp.
Classifying DW_FORM_data16 as a constant value is slightly confusing.
Having to handle a 128bit value everywhere a constant value class is
allowed is somewhat inconvenient. And such values really only make sense
given a specific data representation/type. As given it isn't immediately
clear in which context one might have to byte-swap for different
endianess (the fact that it also used to represent the MD5 in the line
table confused me a bit, wrongly assuming it meant that I might need to
byte-swap because it was a constant value representation - it shouldn't
of course, it really is a hash represented by a block of bytes). For
these reasons in elfutils we currently handle it as (constant size)
block class (which is what I hear is also what gdb does). In practice it
seems to only impact DW_AT_const_value for which consumers already had
to handle blocks. Using it for other attributes doesn't really seem to
make sense. Suggest to rename to DW_FORM_data16_block and put it in the
block class instead of the constant class. The new DW_FORM_implict_const
did eventually work out well, but there were surprisingly many places
that assumed abbrevs were simple and didn't use much/any abstraction.
The existing DW_FORM_indirect doesn't really seem handled very well,
which would break most of these place too.
Unit headers. Having extra padding fields for all unit types seems a bit
wasteful. Also there is not enough information for a consumer to know
whether it can handle anything from a unit which unit type is unknown.
Which, if any, fields following the unit_type is valid? Or is just the
initial unit_length valid and is the only valid operation skipping the
whole unit? Having a place to store a unique identifier and a reference
to a primary/sub DIE inside the unit is nice and could be made more
generic by turning the unit_type field into a bit/flag field.
One flag to indicate it is a type unit, one for partial unit, one for
skeleton unit and for split unit. Some combinations don't make sense
currently, but might in the future. Or keep the current DW_UT values
(1..6) as they are now. But limit the extensions to 15. Then use the
remaining 4 bits as flags to indicate whether a unit header contains
extra fields. You can define 2 already. One if the header contains an 8
byte ID field. And one if the header contains an DIE offset field (4 or
8 bytes). That basically gives you 16 values for describing the type and
16 for describing the header fields. But you could shift them a bit if
you think it is more important to have flexibility in unit types or
describing header fields/size.
Enumeration types. It is allowed to have a DW_AT_byte_size on a
DW_TAG_enumeration_type, but not DW_AT_encoding. To describe both size
and encoding one needs to use a DW_AT_type pointing to a base type that
represents the "underlying type". For languages where enumerations don't
have an underlying type, or for strongly typed enums it is easier to
attach the encoding directly than adding and indirection to a base type.
Add DW_AT_encoding to the attribute list for DW_TAG_enumeration_type.
Macro Information Header. The macro information entries in the
opcode_operands_table may be described in the table. But some cannot be
described because some forms are not in the list of allowed forms. In
particular DW_FORM_strp_sup is missing so DW_MACRO_define_sup and
DW_MACRO_undef_sup cannot be described. And DW_FORM_ref_sup is missing,
making it impossible to describe DW_MACRO_import_sup. Which makes the
code that checks for allowed forms slightly inconvenient (it should
reject these MACRO descriptions if those forms are used in the table,
but not if they are defined implicitly). Also DW_FORM_line_strp isn't
allowed. But it might be beneficial for describing files referenced by