This is the mail archive of the dwarf2@corp.sgi.com mailing list for the dwarf2 project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Editorial ideas related to 991102.1, 000403.2, 000410.3 and others



Editorial ideas related to 991102.1, 991108.4, 000403.2, 000410.1, 000410.2,
000410.3, 000410.4, probably others...

    Note: In the following I will use the names "32-bit DWARF file format"
    and "64-bit DWARF file format" to refer to the old and new variants
    respectively -- not because I am advocating those names specifically, but
    because I need something to illustrate my several ideas. However, the
    examples should help make clear why I am seeking a *pair* of names...

One of the recurring topics in this group, which comes up in various guises,
has been how to talk about the offsets that "point" from one DWARF section
into another, or even the same section in the case of reference class operands.
(To a lesser extent there is a similar issue with section lengths.)

In his emails of 10 April, David Weatherford proposed several alternatives,
which inspired me to make a counter proposal suggested by his proposal 3.
Here I would like to restate my counter proposal, add some discussion, and
show how it might help unify other vocabulary in the document as well.

Note: Everything discussed here is almost completely editorial, in the sense
that it does not change any bits or code. However, the concepts and vocabulary
involved are pervasive so I think it important that we can all buy into
these recommendations.

Note: I assume that Weatherford's Proposal 2 (000410.2), which adds several
new DWARF forms (DW_FORM_linep, DW_FORM_locp, and DW_FORM_macp) and
corresponding classes (lineptr, loclist and macptr) will not be accepted
because it is a significant incompatible change [oh, if only DWARF had been
defined that way at the beginning...]

As you may recall, his proposal 3 (000410.3) introduced the names lineptr,
loclist and macptr as aliases for the class "constant", with the intent to
use those names in descriptions instead of "constant" where appropriate.


One toe in the water
--------------------

My variant of proposal 3 (email of 12 April, not assigned a number)
is similar in effect, but formalized differently. The key elements are:

  - Define lineptr, loclist and macptr as "real" classes (not just aliases).

  - Specify that DW_FORM_data4 and DW_FORM_data8 belong to more than one
    class. Heretofore each form belonged to exactly one class. So this is
    a concept change, but it will not actually affect any bits or any code.

  - Unlike proposal 3, DW_FORM_data1, DW_FORM_data2, DW_FORM_sdata and
    DW_FORM_udata belong *only* to class constant (and never to lineptr,
    loclist or macptr).

  - Explain the membership of DW_FORM_data4 and DW_FORM_data8 in class
    constant versus one of lineptr, loclist or macptr something like the
    following (duplicated from my earlier counter proposal):

	"The FORMs DW_FORM_data4 and DW_FORM_data8 are included in more than
	one class of operand. They are members of the constant class iff their
	data is not relocatable (even in a relocatable object file). They are
	included in the lineptr, loclist or macptr class iff their data is
	relocatable in a relocatable object file and is used to represent an
	offset into a related DWARF section (which section is indicated by
	the class name)."

    Question: is there any case where these forms might be used with a
    relocation when not representing an offset into a related DWARF
    section? I think not. (Recall that these forms *do not* get used to
    represent addresses in the user program -- DW_FORM_addr is used
    for that.) Even if there is such a case, it is not a serious impediment
    to this approach -- it would just complicate a bit what is now a nice
    simple dicotomy:

      . no associated relocation implies class constant,
      . has relocation implies some class other than constant (which other
        class depends on the attribute that uses it).

    A nice corrollary of this is that the values of the constant class
    are all compile time constant values (the link-time constant values
    belong to other classes named something else) -- that will help the
    presentation for sure!

Observation: This is a nice conceptual distinction for a producer of DWARF:
to output/emit a constant you never need an associated relocation;
to output/emit a lineptr, loclist or macptr you must have an associated
relocation. Unfortunately, this does not directly help a consumer,
because a consumer has no way to ask "Is there (or was there) a
relocation associated with this DW_FORM_data4 or DW_FORM_data8 value?"
The consumer must use other context (like the using attribute kind)
to determine the interpretation -- but that is exactly what happens
today! Some italics text along these lines would likely be helpful.


Two feet in the water
---------------------

Once we get the forms and classes properly defined, we can then exploit
the new vocabulary exactly as proposed by Weatherford -- the result will look
much like his proposal 2 had been adopted. Better yet, we can fold in the 32-
vs 64-bit distinction in a clean way, perhaps something like this:

  o lineptr

    There are two forms of lineptrs: DW_FORM_data4 which is used always
    and only in the 32-bit DWARF file format, and DW_FORM_data8 which is
    used always and only in the 64-bit DWARF file format (see section 7.4).

Similarly for loclist and macptr.

All the rest just falls out, yes?


All the way into the water
--------------------------

This model also suggests an idiom/style for dealing with the 32- vs 64-byte
file size issues in section 7 that feels much better than what we have now.

For example, section 7.5.1 re Compilation Unit Header currently (Draft 1)
reads in part (Draft1A is slightly different but the differences do not
change this discussion):

    "The header for the series of debugging information entries contributed
    by a single compilation unit consists of the following information: 

     1. A 4-byte unsigned integer representing the length of the .debug_info
        contribution for that compilation unit, not including the length field
        itself. 

|       If the length field contains the distinguished value 0xffffffff,
|	then it is followed by an unsigned 8-byte unsigned integer that
|	gives the actual length. This encoding indicates that the section
|	is part of an 64-bit safe DWARF description.

|	In addition, distinguished values 0xffffff00 through 0xfffffffe
|	are reserved for future extension.

     2. A 2-byte unsigned integer representing the version of the DWARF
        information for that compilation unit. For DWARF Version 2, the
	value in this field is 2. 

     3. A 4-byte unsigned offset into the .debug_abbrev section. This offset
        associates the compilation unit with a particular set of debugging
        information entry abbreviations. 

|	In a 64-bit safe DWARF description, this field is an 8-byte
|	unsigned offset.

     4. A 1-byte unsigned integer representing the size in bytes of an address
        on the target architecture. If the system uses segmented addressing,
	this value represents the size of the offset portion of an address."

Note: "|" indicates text added for Draft 1, all the rest is unchanged from
V2.

This "style" has the slight advantage that it leaves the orginal V2 text
unchanged. It is a bit tacky, however, to tell a reader: "A foobar
is a mumble. [Period, full stop, sure sounds final and definitive to me.]
On the other hand, it could be a fratz."

My envisioned rewriting is:

    "The header for the series of debugging information entries contributed
    by a single compilation unit consists of the following information: 

|    1. A 4-byte or 12-byte value representing the length of the .debug_info
        contribution for that compilation unit, not including the length field
|       itself. In the 32-bit DWARF file format, this is a 4-byte unsigned
|	integer length (which must be less than 0xffffff00); in the 64-bit
|	DWARF file format, this is the 4-byte distinguished value 0xffffffff
|	followed by an 8-byte unsigned integer length (see section 7.4).

     2. A 2-byte unsigned integer representing the version of the DWARF
        information for that compilation unit. For DWARF Version 2, the
	value in this field is 2. 

|    3. A 4-byte or 8-byte unsigned offset into the .debug_abbrev section.
        This offset associates the compilation unit with a particular set
|	of debugging information entry abbreviations. In the 32-bit DWARF
|	file format, this is a 4-byte value; in the 64-bit DWARF file format,
|	this is an 8-byte value (see section 7.4).

     4. A 1-byte unsigned integer representing the size in bytes of an address
        on the target architecture. If the system uses segmented addressing,
	this value represents the size of the offset portion of an address."

Note: here "|" indicates changes relative to V2 or Draft 1, however you care
to think of it. [The "reservation" of values 0xffffff00 through 0xfffffffe
as escapes for future extension can and should be explained in 7.4 (per
000403.2) and not be repeated throughout the document as it is in Draft 1.]


A Free Bonus Award
------------------

Think back to 000302.1, in which we changed the definition of a "constant"
class operand for a data member location from being an offset into the
location list section to being an immediate offset in the object. With
the above reformulation, we can have it both ways!

That is, the forms DW_FORM_data1, _data2, _sdata, and _udata (part but
not all of the possible FORMs of constant) can be defined as immediate offsets
into the object, while forms DW_FORM_data4 and DW_FORM_data8 can be
defined as being of class loclist (hence an offset in the .debug_loc section).
One way of conceptualizing this is to add something like the following
to the original description of these forms in part 1:

    When an attribute is described as allowing both a constant and one
    of the lineptr, loclist or macptr class operands, any occurence
    of DW_FORM_data4 or DW_FORM_data8 is assumed to belong to the latter.

Then in Figure 18, we can have

    DW_AT_data_member_location	0x38	block, constant, loclist

This eliminates even the unlikely possibility of forcing an incompatible
change on some implementation that is currently supporting split lifetimes for
data members or of denying that possibility in the future. Such an
implementation must necessarily use DW_FORM_data4 (and in the 64-bit file
format, would have to use DW_FORM_data8) to refer to the .debug_loc section)
which leaves the other constant forms to be given some other interpretation.
Neat!

Of course, we as DWARF designers have to keep this in mind if and when we
want to specify any other attribute as allowing both a constant and a
lineptr, loclist or macptr class operand to make sure the "removal" of the
DW_FORM_data4 and DW_FORM_data8 forms from the constant class for that
operand does not pose a problem. (So far DW_AT_data_member_location is the
only case where this issue arises.) Since DW_FORM_sdata and DW_FORM_udata
are still available as constants and they can represent any compile time
constant values that DW_FORM_data4 and DW_FORM_data8 can represent, this
is unlikely to ever be a problem.

I am not actually proposing this variation of 000302.1 (yet). I mention it
because we *could* have specified 000302.1 to have this effect in the
first place -- but we (I) did not even consider it because the vocabulary and
concept foundation we had to work with did not make this an obvious
possibility. I think the reformulation above (part One in particular) does
provide the clarity and foundation we want and need.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]