[WIP] New dwarf2 reader - updated 07-02-2001

Tue Jul 31 16:11:00 GMT 2001

Daniel Berlin <dan@cgsoftware.com> writes:
> > Using 64 bits, as you intended, is much better, but why not use the
> > full 128?  Honestly, why play this kind of game at all?
> Because it's faster than trying to keep all the die data.
> >  It's not that
> > hard to do the job perfectly --- just use the significant die contents
> > themselves as the tree key.  No collisions, no hash function, simpler.
> >
> 
> Um, think about how much memory that would use.
> We'd have to keep copies of die contents around.

Use the struct die_info objects themselves as the keys, and the
values.  Just define an arbitrary comparison function with the right
semantics.  We need the struct die_info objects around anyway.

> >> > - In read_comp_unit_dies, when you find a duplicate die, you skip to
> >> >   its sibling.  What if the parent die is identical, but the children
> >> >   dies differ?
> >>
> >> I don't believe this is possible in any language.
> >
> > How about this:
> >
> > namespace X {
> >   namespace A {
> >     int m, n;
> >   }
> > }
> >
> > namespace Y {
> >   namespace A {
> >     int o, p;
> >   }
> > }
> >
> > The dies for the two 'A' namespaces have the name DW_AT_name
> > attribute, but they're clearly different.
> >
> This won't do it, they'll have different decl line attributes.
> And we only eliminate at the top level anyway.

Okay, I hadn't noticed the `nesting_level == 1' test.  This still
isn't okay, though.  Consider the following C program:

    struct {
      int a, b;
    } x;

    struct {
      int c, d;
    } y;

We get the following dies for this, at top level:

        ...

	.byte	0x2	# uleb128 0x2; (DIE (0x5a) DW_TAG_structure_type)
	.long	0x7b	# DW_AT_sibling
	.byte	0x8	# DW_AT_byte_size
	.byte	0x1	# DW_AT_decl_file
	.byte	0x3	# DW_AT_decl_line
	.byte	0x3	# uleb128 0x3; (DIE (0x62) DW_TAG_member)
	.ascii "a\0"	# DW_AT_name
	.byte	0x1	# DW_AT_decl_file
	.byte	0x2	# DW_AT_decl_line
	.long	0x7b	# DW_AT_type
	.byte	0x2	# DW_AT_data_member_location
	.byte	0x23	# DW_OP_plus_uconst
	.byte	0x0	# uleb128 0x0
	.byte	0x3	# uleb128 0x3; (DIE (0x6e) DW_TAG_member)
	.ascii "b\0"	# DW_AT_name
	.byte	0x1	# DW_AT_decl_file
	.byte	0x2	# DW_AT_decl_line
	.long	0x7b	# DW_AT_type
	.byte	0x2	# DW_AT_data_member_location
	.byte	0x23	# DW_OP_plus_uconst
	.byte	0x4	# uleb128 0x4
	.byte	0x0	# end of children of DIE 0x5a
	.byte	0x4	# uleb128 0x4; (DIE (0x7b) DW_TAG_base_type)
	.ascii "int\0"	# DW_AT_name
	.byte	0x4	# DW_AT_byte_size
	.byte	0x5	# DW_AT_encoding
	.byte	0x2	# uleb128 0x2; (DIE (0x82) DW_TAG_structure_type)
	.long	0xa3	# DW_AT_sibling
	.byte	0x8	# DW_AT_byte_size
	.byte	0x1	# DW_AT_decl_file
	.byte	0x7	# DW_AT_decl_line
	.byte	0x3	# uleb128 0x3; (DIE (0x8a) DW_TAG_member)
	.ascii "c\0"	# DW_AT_name
	.byte	0x1	# DW_AT_decl_file
	.byte	0x6	# DW_AT_decl_line
	.long	0x7b	# DW_AT_type
	.byte	0x2	# DW_AT_data_member_location
	.byte	0x23	# DW_OP_plus_uconst
	.byte	0x0	# uleb128 0x0
	.byte	0x3	# uleb128 0x3; (DIE (0x96) DW_TAG_member)
	.ascii "d\0"	# DW_AT_name
	.byte	0x1	# DW_AT_decl_file
	.byte	0x6	# DW_AT_decl_line
	.long	0x7b	# DW_AT_type
	.byte	0x2	# DW_AT_data_member_location
	.byte	0x23	# DW_OP_plus_uconst
	.byte	0x4	# uleb128 0x4
	.byte	0x0	# end of children of DIE 0x82

        ...

The two DW_TAG_structure_type members are distinguished only by their
DW_AT_decl_line attributes.  But those are optional --- a compiler can
omit them entirely.

So this patch has the correctness of the Dwarf 2 reader depending on
the presence of optional source location information for declarations.

> > Just in principle, this seems sloppy.  Dies are just arbitrary tree
> > nodes.  There's no reason to assume that if two parent nodes are
> > identical, we can skip the second one and all its children.
> It's not sloppy.
> Elimination at the top level is normal in other debuggers i know of that do 
> elimination.

I've no objection to the idea, just the implementation.