This is the mail archive of the
frysk@sourceware.org
mailing list for the frysk project.
Re: Dwarf/libdw question
- From: Roland McGrath <roland at redhat dot com>
- To: Sami Wagiaalla <swagiaal at redhat dot com>
- Cc: frysk <frysk at sourceware dot org>
- Date: Mon, 1 Oct 2007 19:20:30 -0700 (PDT)
- Subject: Re: Dwarf/libdw question
- References: <470113F2.5000105@redhat.com>
Hi Sami. Please use more specific Subject lines in your postings.
Reading the list archives' index will not be very informative to
someone looking years from now for discussion on this particular topic.
> I am working on implementing c++ scoping rules in frysk. Is there
> elfutils API that I can use to figure out what class/struct a function
> belongs to, so that references to member variables can be resolved.
The key is DW_AT_specification. Let's take an example:
class c
{
int m1() { return 17; }
int m2();
public:
int m() { return m1() + m2(); }
};
int c::m2() { return 23; }
int main()
{
c x;
return x.m();
}
The DIE tree for this is (explanations below):
[ b] compile_unit
macro_info 0
stmt_list 0
producer "GNU C++ 4.1.2 20070502 (Red Hat 4.1.2-12)"
language C++ (4)
name "s.cxx"
comp_dir "/home/roland/build/stock-elfutils"
[ 67] structure_type
sibling [ d4]
name "c"
byte_size 1
decl_file 1
decl_line 2
[ 71] subprogram
sibling [ 94]
external
name "m1"
decl_file 1
decl_line 3
MIPS_linkage_name "_ZN1c2m1Ev"
type [ d4]
accessibility private (3)
declaration
[ 8d] formal_parameter
type [ db]
artificial
[ 94] subprogram
sibling [ b7]
external
name "m2"
decl_file 1
decl_line 4
MIPS_linkage_name "_ZN1c2m2Ev"
type [ d4]
accessibility private (3)
declaration
[ b0] formal_parameter
type [ db]
artificial
[ b7] subprogram
external
name "m"
decl_file 1
decl_line 6
MIPS_linkage_name "_ZN1c1mEv"
type [ d4]
declaration
[ cc] formal_parameter
type [ db]
artificial
[ d4] base_type
name "int"
byte_size 4
encoding signed (5)
[ db] pointer_type
byte_size 8
type [ 67]
[ e1] subprogram
sibling [ 10d]
specification [ 71]
low_pc 0x000000000040054c
high_pc 0x000000000040055b
frame_base location list [ 0]
[ fe] formal_parameter
name "this"
type [ 10d]
artificial
location 2 byte block
[ 0] fbreg -24
[ 10d] const_type
type [ db]
[ 112] subprogram
sibling [ 13f]
specification [ 94]
decl_line 9
low_pc 0x0000000000400528
high_pc 0x0000000000400537
frame_base location list [ 4c]
[ 130] formal_parameter
name "this"
type [ 10d]
artificial
location 2 byte block
[ 0] fbreg -24
[ 13f] subprogram
sibling [ 16b]
specification [ b7]
low_pc 0x000000000040055c
high_pc 0x0000000000400587
frame_base location list [ 98]
[ 15c] formal_parameter
name "this"
type [ 10d]
artificial
location 2 byte block
[ 0] fbreg -32
[ 16b] subprogram
external
name "main"
decl_file 1
decl_line 11
type [ d4]
low_pc 0x0000000000400538
high_pc 0x000000000040054b
frame_base location list [ e4]
[ 18c] variable
name "x"
decl_file 1
decl_line 13
type [ 67]
location 2 byte block
[ 0] fbreg -17
Note that the subprogram DIEs describing actual machine code are
top-level children of the CU. Here these are [e1], [112], [13f]. They
are not children of [67], the structure_type DIE describing the class.
This is sensible enough because these are global function definitions,
even if they have names and types with scope limited to the class.
Consider [112]. This has the attributes and children that refer to its
machine code (low_pc, high_pc, frame_base, formal_parameter). Note it
does not have the attributes like name and type. Instead, it has a
specification attribute that points to [94]. specification is
analogous to abstract_origin, but rather than linking a concrete code
element to an abstract inline definition, it links a concrete code
element to an abstract declaration. So, [112] is the code for "m2",
and [94] is the specification for "m2".
dwarf_attr_integrate checks for specification as well as abstract_origin.
So, for common cases with attributes you just don't think about it.
dwarf_diename uses dwarf_attr_integrate, so you will see a name without
extra effort even if it's indirect.
I used [112] as the example because m2 is defined outside the class
definition. As you can see, GCC does the same thing for m1 [e1] and m
[13f], though those definitions actually appear lexically inside the
class. Reading the DWARF spec one would expect these cases to use a
single DIE inside the class and not use DW_AT_specification at all. I
don't know if there is a particular reason GCC doesn't do that, and I
see no big benefit in changing what it does. But I think that DWARF
consumers should expect that either style might be used and work the
same with either.
Note how [112] has a decl_line attribute but no decl_file, while [e1]
and [13f] have neither. This is an example of the general rule with
specification (and abstract_origin): it's elided if it's not different.
Since m2's body was defined outside the class, [112] refers to line 9.
If the class declaration were in a header file and the method definition
in another file, there would also be a decl_file attribute. (If
everything were all on one line and the compiler emitted column
information, there would be a decl_column but no decl_line. The
compiler does not yet emit decl_column attributes, but we should write
consumers as if it did.) Since [e1] and [13f] describe bodies defined
in their selfsame specification declarations, they would never have a
decl_{file,line,column} of their own.
So now I've told you the basics to work with, but not actually answered
your question. There are two parts to resolving class members.
First, the name resolution per se. First there are scopes inside a
subprogram DIE, same as in C. When you are dealing with a class method,
the subprogram's specification attribute gives you the declaration
inside the class scope (use dwarf_formref_die (dwarf_attr (...))). Then
use dwarf_getscopes_die on that to see the class, namespace, etc. scopes
containing it. For each of those, see if they have DW_TAG_inheritance,
DW_TAG_imported_declaration, etc. children that contribute more scopes
to the name resolution logic for the language. Among those you find a
member, variable, subprogram, etc. DIE by the name you are looking for.
If you found a static member (aka class variable), i.e. DW_TAG_variable,
you are done. It gets treated just like other variable DIEs.
If you found a class member (aka instance variable), i.e. DW_TAG_member,
then it depends on how you plan to use it. For the context of a pointer
to member (as "mem" in "type cl::*p = &cl::mem;"), then you are done.
The DW_AT_data_member_location tells you what value to use.
In a static method (aka class method), referring to a regular class
member (instance variable) is invalid.
In an instance method, "mem" is resolved the same as "this->mem". The
subprogram DIE for the method definition contains an automatically-inserted
first formal_parameter DIE, with the artifical attribute and named "this".
AFAICT, the only way to distinguish a static method from an instance method
in the DWARF tree is the presence of this first artifical formal_parameter.
(Though in practice it always has the name attribute of "this", I would
write it to detect a first formal_parameter with artifical rather than
looking at the name.) This formal_parameter is like any other aside from
being artifical, so you combine its location attribute with the PC context
you're looking from, and data_member_location attribute of the member DIE
to find the member in the object from that PC context.
When the name resolved to a subprogram DIE, you have to do two things to
see how to treat it. First, if the DIE has DW_AT_declaration, then you
have to find the concrete code DIE whose DW_AT_specification points to it.
Then, you have to check (as above) whether it's a static method or an
instance method, so you know what "name(foo)" is supposed to mean if a user
gave that as a call.
Thanks,
Roland