This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

dwarflint --stats


(Petr is on vacation, so it will most likely be at least a few weeks before
anyone starts to work on this.)

Jakub was interested in sampling some DWARF data to compare what one
compiler vs another is doing with some broad statistics of semantic
interest.  The first particular thing to measure is how much location/value
information we are getting for variables and parameters.

This is not at all an error check, but I think it fits well enough into
dwarflint as a special option to request statistics reports (in future
perhaps several different kinds of things) either in addition to or instead
of the normal warnings/errors.  The first kind of statistic of interest is
in stuff that I think dwarflint looks at already for some of its checks.

So, it would go something like this:

Consider each DW_TAG_variable or DW_TAG_formal_parameter that might have a
location.  That is, ignore variables with DW_AT_declaration and ignore the
formal_parameter children of subprograms with DW_AT_declaration.  Look at
all the rest, and tally.  (Double-check the cases with no DW_AT_location or
DW_AT_const_value at all to see if there is some other exclusion from
"might have a location" that I'm overlooking at the moment.)  For each,
compute what percentage of the variable's scope has locations or values.

An extern global has DW_AT_external and no location at all, but is not
missing any info.  So exclude those from the tally.

Globals (with DW_AT_external) and statics (without) are identified by
having a non-list DW_AT_location that is a singleton DW_OP_addr expression.
Perhaps (optionally?) exclude these from the tally entirely, so they don't
dilute the cumulative percentages.  Every compiler always just emits those
locations regardless of optimization or debuginfo fanciness, so for
comparing compilers it is probably more meaningful to measure only among
the pool of variables with dynamic extent.

If it has DW_AT_const_value, then call that 100%.
If it has a non-list DW_AT_location with a nonempty expression,
also call that 100%.  If it has no DW_AT_location or a non-list
DW_AT_location with an empty expression, call that 0%.

Otherwise, it's a location list.  So, first find the "scope" ranges set for
this DIE.  That is, if the DIE itself has a DW_AT_start_scope that is a
rangelistptr, then exactly that is the set.  If the DIE itself has
DW_AT_{ranges,high_pc} then that's the set (but it won't).  Otherwise, look
back up parent DIEs until one has ranges/high_pc.  If the variable DIE has
a DW_AT_start_scope that is a constant, then exclude the portion of the
range before it (see DWARFv4 3.3.8.2 item 11).  

If the location list covers any address bytes outside the "scope" set,
then exclude those portions from the location list set for further
consideration.  I think dwarflint might already have a warning about that
happening, or if not perhaps it should (though perhaps it is also an OK 
thing in the data, it remains unclear).

Exclude any location list entries whose expression is empty.  (You might
already have a warning about those too, since they are superfluous entries
to have.)

Finally, count up the cumulative bytes covered by the scope set and those
covered by the location set.  Tally the ratio of those two counts as a
percentage.  Perhaps produce a scatter plot with x axis the location/scope
rounded to an integer percentage and y axis the percentage of the DIEs
considered whose ratio is x.  And perhaps say min/max/avg(/median?) ratios
seen.

Another variant would be to also tally what portion of available locations
is mutable vs immutable (or distribution of the ratios, or whatever).  That
is, DW_AT_const_value is immutable.  A location expression is immutable if
it ends in DW_OP_implicit_value or DW_OP_stack_value.  If an expression
uses DW_OP_{,bit_}piece, then it can be partially mutable and partially
immutable.  You can probably just choose arbitrarily to count those on one
side or the other, or perhaps tally them as a separate third statistic.

Jakub can say better than I which statistical analyses he is most
interested in seeing.


Thanks,
Roland

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]