This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: dwarf4 Separate Type Unit Entries

From: Roland McGrath <roland at redhat dot com>
To: elfutils-devel at lists dot fedorahosted dot org
Date: Thu, 10 Mar 2011 11:27:11 -0800
Subject: Re: dwarf4 Separate Type Unit Entries
The libdw reader already supports type units and DW_FORM_ref_sig8,
since 0.148.  For iteration and direct lookup there is dwarf_next_unit
(generalization of dwarf_nextcu) and dwarf_offdie_types.

Any reference can be encoded as DW_FORM_ref_sig8, and then this
means the single chosen DIE inside the DW_TAG_type_unit with that
signature (called the "primary type" in the specification wording).

There are several questions about .debug_types stuff that I have not
gotten around to asking on dwarf-discuss (or directly of Cary Coutant,
who invented it, or perhaps by grokking what GCC actually emits).  We
ought to figure out the answers to these, but at this point I am not
likely to drive that discussion myself.

1. What does DW_FORM_ref_addr mean inside .debug_types DIEs?

   I take it to mean it's always a pointer into .debug_info (that's
   how dwarf_formref_die implements it).  The other possible answer
   is that when it appears in .debug_types, it means an offset into
   .debug_types rather than into .debug_info--but that's not what the
   specification says.  So there is no way to have a reference from
   outside a type unit into a particular DIE in that type unit.  DIEs
   inside a type unit can use the CU-relative ref forms to point
   elsewhere in the same TU.  But the only way for another TU or any
   CU to point into a type unit is with ref_sig8.

2. What exactly is the plan for cases where a pointer to inside a
   type unit would seem to make sense?

   Consider:

	namespace ns
	{
	  class outer
	  {
	  public:
	    class inner
	    {
	      ...
	    };
	  };
	};

   Now somewhere in some scope in some CU, you have:

	ns::outer o;
	ns::outer::inner i;

   What is the DW_AT_type of each of those DW_TAG_variable entries?

   If there is only "o", it is straightforward enough.  You have:

	<variable name="o" type="#sig8_A"/>

   and then you have inside .debug_types:

	header: type_signature=A
		type_offset=0x123

	<type_unit>
	  <namespace name="ns">
0x123:	    <class_type name="outer">
	      <class_type name="inner">
	        ...
	      </class_type>
	    </class_type>
	  </namespace>
	</type_unit>

   To look up that DW_AT_type you see DW_FORM_ref_sig8, you go find
   the type unit header with type_signature A, and you look at the
   DIE where that header's type_offset points, and there you have it.

   But what about "i"?  There is no way that a CU can point to any
   DIE inside that type_unit except for 0x123.

   My best guess is that you are supposed to emit two type units,
   one for "outer" and one for "inner", i.e.:

	header: type_signature=A
		type_offset=0x123

	<type_unit>
	  <namespace name="ns">
0x123:	    <class_type name="outer">
	      <class_type name="inner" declaration="1" signature="#sig_B"/>
	    </class_type>
	  </namespace>
	</type_unit>

	header: type_signature=B
		type_offset=0x234

	<type_unit>
	  <namespace name="ns">
	    <class_type name="outer">
0x234:	      <class_type name="inner">
	        ...
	      </class_type>
	    </class_type>
	  </namespace>
	</type_unit>

   Then you can have:

	<variable name="o" type="#sig8_A"/>
	<variable name="i" type="#sig8_B"/>

3. Are type_units intended to be enumerated in the top level like
   compile_units are, e.g. for by-name searches?  Or are they meant
   only to be used by reference, like DW_TAG_partial_unit CUs?

   Say you are just trying to resolve user input of "ns::outer".
   You iterate through all the CUs looking for a type or namespace
   named "ns" that contains an entity named "outer".  Do you also
   look through all the TUs?  If not, then how do you find it?
   
4. Is DW_FORM_ref_sig8 valid for any reference?

   The specification describes DW_FORM_ref_sig8 as a type of
   reference that could appear in any attribute.  But perhaps
   the real intent is that it only appears in DW_AT_signature.

What might be the sensible answer is that a type_unit is really only
meant to be used as the referent of a DW_AT_signature attribute.
That is, each CU using a type would contain:

	<class_type specification="1" signature="#sig_A"/>

So that would mean that the CU is considered to contain "ns::outer"
and you'd find it by name (or enumeration) only when you scan that CU.
Should this class_type DIE have a DW_AT_name of "outer", then?  If it
does, does that mean it should be enclosed in a DW_TAG_namespace of
its own to make its scoped name right?  (It would seem to make more
sense if it appears on its own at top level and has no name, so that
its entire scope and name and all details come entirely from the
referent in the type_unit.)

Is it then that no DW_AT_type attribute should use DW_FORM_ref_sig8
directly, and instead they should point to the declaration="1" type
like above?

If that's not the answer, then must we iterate across all type units
as if they were first-class compile_units too, when looking for types?


Our dwarf_output algorithm today assumes that each compile_unit forms
a closed graph of references.  So, if any reference is a ref_sig8 then
it will bomb out because the tree walk never encountered the referent.
If any such referent must appear in the DW_AT_signature of some
specification=1 entry inside the compile_unit, then we could handle
that and not really break our current model.  In essence, we would
treat that class_type above as a stand-in for the real class_type
inside the TU, similar to how we'd treat an imported_unit entry.  It's
not quite the same, since that entry stands for the whole tree inside
the type_unit (in the example, the "ns" DIE), while references to that
entry instead stand for references to the type_offset entry in the
type unit with matching signature (in the example, the "outer" DIE).

If a CU can contain DW_FORM_ref_sig8 attributes to a type that is not
anywhere in that CU's tree even by stand-in reference, then we have to
rethink our whole model somehow.  Note that in the example above, an
attribute pointing directly to "inner" is still OK, because by
recursive grafting of stand-in referents we'd find it inside the
"outer" tree that our own CU has a stand-in for.


AFAICT, type units in final links are strictly inferior to imported
units.  To have attributes referring to inner types (which is certainly
common in C++) you have to have lots of type units being grafted
together, and each one repeats all the container DIEs of its scope.  To
look up DW_FORM_ref_sig8, you have to do a dynamic search across all
type units (in libdw, it's implemented with a hash table).  Doing it
instead with a partial_unit, you only have the unit header overhead for
the largest granularity of sharing since ref_addr can point directly to
a DIE at any depth inside a partial_unit.  To follow ref_addr you have a
zero-cost direct pointer rather than a search.

The benefit of type units is at link time and before, when they are
produced as COMDAT section groups with group names derived from the
signatures so that the linker (with no DWARF knowledge) consolidates the
duplicates by the vanilla COMDAT logic.  But, frankly, I don't
understand why .debug_types was introduced at all.  This COMDAT plan was
the original intent of partial_unit/imported_unit as well.  I don't see
why the compiler doesn't just produce a partial_unit in a COMDAT group
with a name derived by the same "signature" algorithm and imported_unit
grafts referring to that (both the imported_unit's import attribute and
any other references from outside CUs would use ref_addr values with
relocations to named symbols, those names being similarly chosen by a
signature-hashing algorithm--since using section-relative relocations
implies the unsafe presumption that every separate compiler run that
produced DWARF under the same COMDAT group name produced completely
identical data with exactly the same byte offsets of everything inside).
That's exactly what Appendix E describes, albeit it talks about
different schemes for coming up with the group name/signature than the
type-oriented one.

So I think we want a writer than never produces .debug_types at all.
Instead, compression can just read references to type units as if they
were references to the stand-ins where the primary type inside a type
unit is grafted.  When writing, we produce partial_unit for the largest
granularity of true sharing and imported_unit for grafts.

But perhaps I am missing something about .debug_types that makes it
actually worthwhile.


Thanks,
Roland
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]