This is the mail archive of the
dwarf2@corp.sgi.com
mailing list for the dwarf2 project.
Re: duplicate dwarf2 reduction via comdat [overlooked CC to dwarf2 list]
- To: brender at gemevn dot zko dot dec dot com (Ron 603-884-2088)
- Subject: Re: duplicate dwarf2 reduction via comdat [overlooked CC to dwarf2 list]
- From: Jason Merrill <jason at redhat dot com>
- Date: 22 Jan 2001 18:02:13 +0000
- Cc: DWARF2 at corp dot sgi dot com
- References: <01011917394020@gemevn.zko.dec.com>
- Reply-To: Jason Merrill <jason at redhat dot com>
>>>>> "Ron" == Ron 603-884-2088 <brender@gemevn.zko.dec.com> writes:
>> The basic idea of my scheme is to separate chunks of potentially duplicated
>> information out into their own COMDAT sections, to be recombined into the
>> monolithic .debug_info section by the linker. No attempt is currently made
>> to optimize the other DWARF sections, and we currently only do this for
>> header files.
>>
>> Each header file, then, is given its own compilation unit, in addition to
>> the primary compilation unit corresponding to the primary source file.
>> References between these compilation units use FORM_ref_addr, of course.
>>
>> The CU for the header file contains only the "interface" parts of the
>> header, namely types. The "implementation" parts, i.e. anything with a
>> location attribute, remain in the primary CU, since they correspond to
>> actual code in the object file.
>>
>> For consistency and collision protection, the COMDAT key for a particular
>> compilation unit is generated from the basename of the header file and a
>> checksum of the contents. The (global) symbols used for references to DIEs
>> in these CUs are composed of this key and a sequence number. References
> I'm with ya up to this point, but the next loses me...
>> from the header CUs to the primary CU use internal symbols; there is no
>> need for them to be consistent between two CUs for the same header, so long
>> as they refer to (semantically) the same thing.
> What do you mean by "internal symbols"?
In the sense that they are local to a single .o. ".L1234" and the like.
> The "no need for consistency" assertion just doesn't seem obvious.
> Is there some way to explain this in more detail?
The symbols exported from a COMDAT CU must be consistent between multiple
instances, so that they are interchangeable. If wa.h defines a class A,
and the DIE for A uses the symbol _DW.wa.h.92485121.4, other COMDAT CUs
for wa.h must use the same symbol if they are to be combined by the linker
(of course, different macro definitions can produce different information
that should not be combined; this will be handled by the checksum).
The symbols that a particular CU references need not be consistent, so
long as they *refer* to the same thing. For instance, my current
implementation leaves the base types in the primary CU. So if in one
compilation, my CU for wa.h refers to the DIE for int using .LDIE0 and in
another compilation uses .LDIE25, this does not matter; the CUs are still
equivalent, and can be combined at link time.
>> There are some issues with the current draft that make this less effective
>> than it could be. For one, 3.3.8.3 says that a concrete out-of-line
>> instance of an inline function needs to be owned by the same parent as the
>> abstract instance, which prevents us from putting them in different CUs.
>> Does anyone know what the rationale for this rule is? It seems entirely
>> arbitrary to me.
> I can't think of a rationale either at the moment...
>> It would also be nice to be able to provide debugging info for an abstract
>> version of a template, to reduce the redundancy between instantiations.
>>
>> Also, it should be possible to do the AT_declaration/AT_specification thing
>> with nested types; if a nested type is only defined in the implementation
>> .cc file for a class, the compiler should be able to put its definition at
>> file scope. This also would remove the necessity for going back and
>> modifying previously generated information; nested types are the downfall
>> of the gcc dwarf1 generator, which tries hard to write everything out
>> immediately and forget about it.
> By "nested types" do you mean things like classes within classes?
Yes.
> It sounds like there is a sort of "if the source (or DWARF) looks like
> this, transform it to this other form" proposal here. An example sure
> would help.
Given code like
struct A {
struct N;
...
};
....
struct A::N { ... };
We should be able to generate DWARF like
1: DW_TAG_structure_type "A"
2: DW_TAG_structure_type "N"
DW_AT_declaration 0x1
....
DW_TAG_structure_type
DW_AT_specification 2
...
Rather than go back and change DIE #2 when we see the later definition.
This is particularly important if the definitions of A and A::N are in
different files. Make sense?
>> This scheme could be further extended by putting the information for
>> COMDATted code into a separate CU in the same COMDAT group with the
>> function itself.
> I'm not up to speed enough to make yet another generalization...
>> Thoughts? Questions?
> Completely help me understand sorts of questions (see above) at the moment.
> I think some kind of more concrete example would help a lot.
Here's a pretty minimal example.
wa.h:
struct A {
int i;
};
wa.C:
#include "wa.h"
wa.s (abridged):
...
.section .gnu.linkonce.wi.wa.h.92485121
.4byte 0x16b # Length of Compilation Unit Info.
.2byte 0x2 # DWARF version number
.4byte .Ldebug_abbrev0 # Offset Into Abbrev. Section
.byte 0x4 # Pointer Size (in bytes)
.byte 0x1 # ULEB128 0x1 (DIE (0xb) DW_TAG_compile_unit)
.byte 0x4 # DW_AT_language
.ascii "GNU C++ 2.97 20010119 (experimental)\0" # DW_AT_producer
.ascii "/home/jason/gtt\0" # DW_AT_comp_dir
.ascii "wa.h\0" # DW_AT_name
...
.globl DW.wa.h.92485121.4
DW.wa.h.92485121.4:
.byte 0x5 # ULEB128 0x5 (DIE (0x5e) DW_TAG_structure_type)
.ascii "A\0" # DW_AT_name
.byte 0x4 # DW_AT_byte_size
.byte 0x1 # DW_AT_decl_file
.byte 0x1 # DW_AT_decl_line
.byte 0x6 # ULEB128 0x6 (DIE (0x64) DW_TAG_member)
.ascii "i\0" # DW_AT_name
.byte 0x1 # DW_AT_decl_file
.byte 0x2 # DW_AT_decl_line
.4byte .LDIE0 # DW_AT_type
.byte 0x2 # DW_AT_data_member_location
.byte 0x23 # DW_OP_plus_uconst
.byte 0x0 # ULEB128 0x0
...
.byte 0x0 # end of children of DIE 0x5e
.byte 0x0 # end of children of DIE 0xb
.section .debug_info
.4byte 0xdc # Length of Compilation Unit Info.
.2byte 0x2 # DWARF version number
.4byte .Ldebug_abbrev0 # Offset Into Abbrev. Section
.byte 0x4 # Pointer Size (in bytes)
.byte 0xc # ULEB128 0xc (DIE (0xb) DW_TAG_compile_unit)
.ascii "wa.C\0" # DW_AT_name
.ascii "/home/jason/gtt\0" # DW_AT_comp_dir
.ascii "GNU C++ 2.97 20010119 (experimental)\0" # DW_AT_producer
.byte 0x4 # DW_AT_language
.LDIE0:
.byte 0xd # ULEB128 0xd (DIE (0x47) DW_TAG_base_type)
.ascii "int\0" # DW_AT_name
.byte 0x4 # DW_AT_byte_size
.byte 0x5 # DW_AT_encoding
...
.byte 0x0 # end of children of DIE 0xb
.section .debug_abbrev
.byte 0x1 # ULEB128 0x1 (abbrev code)
.byte 0x11 # ULEB128 0x11 (TAG: DW_TAG_compile_unit)
.byte 0x1 # DW_children_yes
.byte 0x13 # ULEB128 0x13 (DW_AT_language)
.byte 0xb # ULEB128 0xb (DW_FORM_data1)
.byte 0x25 # ULEB128 0x25 (DW_AT_producer)
.byte 0x8 # ULEB128 0x8 (DW_FORM_string)
.byte 0x1b # ULEB128 0x1b (DW_AT_comp_dir)
.byte 0x8 # ULEB128 0x8 (DW_FORM_string)
.byte 0x3 # ULEB128 0x3 (DW_AT_name)
.byte 0x8 # ULEB128 0x8 (DW_FORM_string)
.byte 0,0
...
.byte 0x5 # ULEB128 0x5 (abbrev code)
.byte 0x13 # ULEB128 0x13 (TAG: DW_TAG_structure_type)
.byte 0x1 # DW_children_yes
.byte 0x3 # ULEB128 0x3 (DW_AT_name)
.byte 0x8 # ULEB128 0x8 (DW_FORM_string)
.byte 0xb # ULEB128 0xb (DW_AT_byte_size)
.byte 0xb # ULEB128 0xb (DW_FORM_data1)
.byte 0x3a # ULEB128 0x3a (DW_AT_decl_file)
.byte 0xb # ULEB128 0xb (DW_FORM_data1)
.byte 0x3b # ULEB128 0x3b (DW_AT_decl_line)
.byte 0xb # ULEB128 0xb (DW_FORM_data1)
.byte 0,0
.byte 0x6 # ULEB128 0x6 (abbrev code)
.byte 0xd # ULEB128 0xd (TAG: DW_TAG_member)
.byte 0x0 # DW_children_no
.byte 0x3 # ULEB128 0x3 (DW_AT_name)
.byte 0x8 # ULEB128 0x8 (DW_FORM_string)
.byte 0x3a # ULEB128 0x3a (DW_AT_decl_file)
.byte 0xb # ULEB128 0xb (DW_FORM_data1)
.byte 0x3b # ULEB128 0x3b (DW_AT_decl_line)
.byte 0xb # ULEB128 0xb (DW_FORM_data1)
.byte 0x49 # ULEB128 0x49 (DW_AT_type)
.byte 0x10 # ULEB128 0x10 (DW_FORM_ref_addr)
.byte 0x38 # ULEB128 0x38 (DW_AT_data_member_location)
.byte 0xa # ULEB128 0xa (DW_FORM_block1)
.byte 0,0
...
.byte 0xc # ULEB128 0xc (abbrev code)
.byte 0x11 # ULEB128 0x11 (TAG: DW_TAG_compile_unit)
.byte 0x1 # DW_children_yes
.byte 0x3 # ULEB128 0x3 (DW_AT_name)
.byte 0x8 # ULEB128 0x8 (DW_FORM_string)
.byte 0x1b # ULEB128 0x1b (DW_AT_comp_dir)
.byte 0x8 # ULEB128 0x8 (DW_FORM_string)
.byte 0x25 # ULEB128 0x25 (DW_AT_producer)
.byte 0x8 # ULEB128 0x8 (DW_FORM_string)
.byte 0x13 # ULEB128 0x13 (DW_AT_language)
.byte 0xb # ULEB128 0xb (DW_FORM_data1)
.byte 0,0
.byte 0xd # ULEB128 0xd (abbrev code)
.byte 0x24 # ULEB128 0x24 (TAG: DW_TAG_base_type)
.byte 0x0 # DW_children_no
.byte 0x3 # ULEB128 0x3 (DW_AT_name)
.byte 0x8 # ULEB128 0x8 (DW_FORM_string)
.byte 0xb # ULEB128 0xb (DW_AT_byte_size)
.byte 0xb # ULEB128 0xb (DW_FORM_data1)
.byte 0x3e # ULEB128 0x3e (DW_AT_encoding)
.byte 0xb # ULEB128 0xb (DW_FORM_data1)
.byte 0,0
...
.byte 0