This is the mail archive of the dwarf2@corp.sgi.com mailing list for the dwarf2 project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: duplicate dwarf2 reduction via comdat [overlooked CC to dwarf2 list]


>>>>> "Ron" == Ron 603-884-2088 <brender@gemevn.zko.dec.com> writes:

>> The basic idea of my scheme is to separate chunks of potentially duplicated
>> information out into their own COMDAT sections, to be recombined into the
>> monolithic .debug_info section by the linker.  No attempt is currently made
>> to optimize the other DWARF sections, and we currently only do this for
>> header files.
>> 
>> Each header file, then, is given its own compilation unit, in addition to
>> the primary compilation unit corresponding to the primary source file.
>> References between these compilation units use FORM_ref_addr, of course.
>> 
>> The CU for the header file contains only the "interface" parts of the
>> header, namely types.  The "implementation" parts, i.e. anything with a
>> location attribute, remain in the primary CU, since they correspond to
>> actual code in the object file.
>> 
>> For consistency and collision protection, the COMDAT key for a particular
>> compilation unit is generated from the basename of the header file and a
>> checksum of the contents.  The (global) symbols used for references to DIEs
>> in these CUs are composed of this key and a sequence number.  References

> I'm with ya up to this point, but the next loses me...

>> from the header CUs to the primary CU use internal symbols; there is no
>> need for them to be consistent between two CUs for the same header, so long
>> as they refer to (semantically) the same thing.

> What do you mean by "internal symbols"?

In the sense that they are local to a single .o.  ".L1234" and the like.

> The "no need for consistency" assertion just doesn't seem obvious.
> Is there some way to explain this in more detail?

The symbols exported from a COMDAT CU must be consistent between multiple
instances, so that they are interchangeable.  If wa.h defines a class A,
and the DIE for A uses the symbol _DW.wa.h.92485121.4, other COMDAT CUs
for wa.h must use the same symbol if they are to be combined by the linker
(of course, different macro definitions can produce different information
that should not be combined; this will be handled by the checksum).

The symbols that a particular CU references need not be consistent, so
long as they *refer* to the same thing.  For instance, my current
implementation leaves the base types in the primary CU.  So if in one
compilation, my CU for wa.h refers to the DIE for int using .LDIE0 and in
another compilation uses .LDIE25, this does not matter; the CUs are still
equivalent, and can be combined at link time.

>> There are some issues with the current draft that make this less effective
>> than it could be.  For one, 3.3.8.3 says that a concrete out-of-line
>> instance of an inline function needs to be owned by the same parent as the
>> abstract instance, which prevents us from putting them in different CUs.
>> Does anyone know what the rationale for this rule is?  It seems entirely
>> arbitrary to me.

> I can't think of a rationale either at the moment...

>> It would also be nice to be able to provide debugging info for an abstract
>> version of a template, to reduce the redundancy between instantiations.
>> 
>> Also, it should be possible to do the AT_declaration/AT_specification thing
>> with nested types; if a nested type is only defined in the implementation
>> .cc file for a class, the compiler should be able to put its definition at
>> file scope.  This also would remove the necessity for going back and
>> modifying previously generated information; nested types are the downfall
>> of the gcc dwarf1 generator, which tries hard to write everything out
>> immediately and forget about it.

> By "nested types" do you mean things like classes within classes?

Yes.

> It sounds like there is a sort of "if the source (or DWARF) looks like
> this, transform it to this other form" proposal here. An example sure
> would help.

Given code like

  struct A {
    struct N;
    ...
  };

  ....

  struct A::N { ... };

We should be able to generate DWARF like

  1: DW_TAG_structure_type "A"
    2: DW_TAG_structure_type "N"
      DW_AT_declaration 0x1
  ....
  DW_TAG_structure_type
    DW_AT_specification 2
    ...

Rather than go back and change DIE #2 when we see the later definition.
This is particularly important if the definitions of A and A::N are in
different files.  Make sense?

>> This scheme could be further extended by putting the information for
>> COMDATted code into a separate CU in the same COMDAT group with the
>> function itself.

> I'm not up to speed enough to make yet another generalization...

>> Thoughts?  Questions?

> Completely help me understand sorts of questions (see above) at the moment.
> I think some kind of more concrete example would help a lot.

Here's a pretty minimal example.

wa.h:
struct A {
  int i;
};

wa.C:
#include "wa.h"

wa.s (abridged):
...
	.section	.gnu.linkonce.wi.wa.h.92485121
	.4byte	0x16b	# Length of Compilation Unit Info.
	.2byte	0x2	# DWARF version number
	.4byte	.Ldebug_abbrev0	# Offset Into Abbrev. Section
	.byte	0x4	# Pointer Size (in bytes)
	.byte	0x1	# ULEB128 0x1 (DIE (0xb) DW_TAG_compile_unit)
	.byte	0x4	# DW_AT_language
	.ascii "GNU C++ 2.97 20010119 (experimental)\0"	# DW_AT_producer
	.ascii "/home/jason/gtt\0"	# DW_AT_comp_dir
	.ascii "wa.h\0"	# DW_AT_name
...
.globl DW.wa.h.92485121.4
DW.wa.h.92485121.4:
	.byte	0x5	# ULEB128 0x5 (DIE (0x5e) DW_TAG_structure_type)
	.ascii "A\0"	# DW_AT_name
	.byte	0x4	# DW_AT_byte_size
	.byte	0x1	# DW_AT_decl_file
	.byte	0x1	# DW_AT_decl_line
	.byte	0x6	# ULEB128 0x6 (DIE (0x64) DW_TAG_member)
	.ascii "i\0"	# DW_AT_name
	.byte	0x1	# DW_AT_decl_file
	.byte	0x2	# DW_AT_decl_line
	.4byte	.LDIE0	# DW_AT_type
	.byte	0x2	# DW_AT_data_member_location
	.byte	0x23	# DW_OP_plus_uconst
	.byte	0x0	# ULEB128 0x0
...
	.byte	0x0	# end of children of DIE 0x5e
	.byte	0x0	# end of children of DIE 0xb

	.section	.debug_info
	.4byte	0xdc	# Length of Compilation Unit Info.
	.2byte	0x2	# DWARF version number
	.4byte	.Ldebug_abbrev0	# Offset Into Abbrev. Section
	.byte	0x4	# Pointer Size (in bytes)
	.byte	0xc	# ULEB128 0xc (DIE (0xb) DW_TAG_compile_unit)
	.ascii "wa.C\0"	# DW_AT_name
	.ascii "/home/jason/gtt\0"	# DW_AT_comp_dir
	.ascii "GNU C++ 2.97 20010119 (experimental)\0"	# DW_AT_producer
	.byte	0x4	# DW_AT_language
.LDIE0:
	.byte	0xd	# ULEB128 0xd (DIE (0x47) DW_TAG_base_type)
	.ascii "int\0"	# DW_AT_name
	.byte	0x4	# DW_AT_byte_size
	.byte	0x5	# DW_AT_encoding
...
	.byte	0x0	# end of children of DIE 0xb

	.section	.debug_abbrev
	.byte	0x1	# ULEB128 0x1 (abbrev code)
	.byte	0x11	# ULEB128 0x11 (TAG: DW_TAG_compile_unit)
	.byte	0x1	# DW_children_yes
	.byte	0x13	# ULEB128 0x13 (DW_AT_language)
	.byte	0xb	# ULEB128 0xb (DW_FORM_data1)
	.byte	0x25	# ULEB128 0x25 (DW_AT_producer)
	.byte	0x8	# ULEB128 0x8 (DW_FORM_string)
	.byte	0x1b	# ULEB128 0x1b (DW_AT_comp_dir)
	.byte	0x8	# ULEB128 0x8 (DW_FORM_string)
	.byte	0x3	# ULEB128 0x3 (DW_AT_name)
	.byte	0x8	# ULEB128 0x8 (DW_FORM_string)
	.byte	0,0
...
	.byte	0x5	# ULEB128 0x5 (abbrev code)
	.byte	0x13	# ULEB128 0x13 (TAG: DW_TAG_structure_type)
	.byte	0x1	# DW_children_yes
	.byte	0x3	# ULEB128 0x3 (DW_AT_name)
	.byte	0x8	# ULEB128 0x8 (DW_FORM_string)
	.byte	0xb	# ULEB128 0xb (DW_AT_byte_size)
	.byte	0xb	# ULEB128 0xb (DW_FORM_data1)
	.byte	0x3a	# ULEB128 0x3a (DW_AT_decl_file)
	.byte	0xb	# ULEB128 0xb (DW_FORM_data1)
	.byte	0x3b	# ULEB128 0x3b (DW_AT_decl_line)
	.byte	0xb	# ULEB128 0xb (DW_FORM_data1)
	.byte	0,0
	.byte	0x6	# ULEB128 0x6 (abbrev code)
	.byte	0xd	# ULEB128 0xd (TAG: DW_TAG_member)
	.byte	0x0	# DW_children_no
	.byte	0x3	# ULEB128 0x3 (DW_AT_name)
	.byte	0x8	# ULEB128 0x8 (DW_FORM_string)
	.byte	0x3a	# ULEB128 0x3a (DW_AT_decl_file)
	.byte	0xb	# ULEB128 0xb (DW_FORM_data1)
	.byte	0x3b	# ULEB128 0x3b (DW_AT_decl_line)
	.byte	0xb	# ULEB128 0xb (DW_FORM_data1)
	.byte	0x49	# ULEB128 0x49 (DW_AT_type)
	.byte	0x10	# ULEB128 0x10 (DW_FORM_ref_addr)
	.byte	0x38	# ULEB128 0x38 (DW_AT_data_member_location)
	.byte	0xa	# ULEB128 0xa (DW_FORM_block1)
	.byte	0,0
...
	.byte	0xc	# ULEB128 0xc (abbrev code)
	.byte	0x11	# ULEB128 0x11 (TAG: DW_TAG_compile_unit)
	.byte	0x1	# DW_children_yes
	.byte	0x3	# ULEB128 0x3 (DW_AT_name)
	.byte	0x8	# ULEB128 0x8 (DW_FORM_string)
	.byte	0x1b	# ULEB128 0x1b (DW_AT_comp_dir)
	.byte	0x8	# ULEB128 0x8 (DW_FORM_string)
	.byte	0x25	# ULEB128 0x25 (DW_AT_producer)
	.byte	0x8	# ULEB128 0x8 (DW_FORM_string)
	.byte	0x13	# ULEB128 0x13 (DW_AT_language)
	.byte	0xb	# ULEB128 0xb (DW_FORM_data1)
	.byte	0,0
	.byte	0xd	# ULEB128 0xd (abbrev code)
	.byte	0x24	# ULEB128 0x24 (TAG: DW_TAG_base_type)
	.byte	0x0	# DW_children_no
	.byte	0x3	# ULEB128 0x3 (DW_AT_name)
	.byte	0x8	# ULEB128 0x8 (DW_FORM_string)
	.byte	0xb	# ULEB128 0xb (DW_AT_byte_size)
	.byte	0xb	# ULEB128 0xb (DW_FORM_data1)
	.byte	0x3e	# ULEB128 0x3e (DW_AT_encoding)
	.byte	0xb	# ULEB128 0xb (DW_FORM_data1)
	.byte	0,0
...
	.byte	0

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]