This is the mail archive of the
mailing list for the elfutils project.
Re: incorrect signed data
- From: Mark Wielaard <mjw at redhat dot com>
- To: elfutils-devel at lists dot fedorahosted dot org
- Date: Thu, 06 Feb 2014 13:12:02 +0100
- Subject: Re: incorrect signed data
On Wed, 2014-02-05 at 09:36 -0800, Josh Stone wrote:
> On 02/05/2014 03:44 AM, Mark Wielaard wrote:
> > On Tue, 2014-02-04 at 18:24 -0800, Josh Stone wrote:
> >> [ 8f] subrange_type
> >> type (ref4) [ 78]
> >> lower_bound (data8) 18446744073709551596
> >> upper_bound (data8) 18446744073709551606
> >> Thus gfortran appears to support the current elfutils behavior - read it
> >> as unsigned and cast it without sign extension. It happily put 199 in
> >> data1, and went all the way to data8 for negative values. It could have
> >> been more compact with sdata instead of data8 though.
> >> Also, apparently eu-readelf is not using dwarf_formsdata for bounds, and
> >> it should.
> > It doesn't because it is very low-level and doesn't use any context. So
> > it just sees the DW_FORM_data8 and will print its value.
> That's a lazy answer, and not actually true. The encodings above are
> "just" data1, but it prints the meaning. And high_pc is often data8
> these days, but it computes and prints its symbolic meaning. So there's
> no reason it couldn't give lower/upper_bound some treatment.
I don't think it prints the symbolic meaning. Why I mention it, is
because the design of readelf is to print the lowest level
representation bits so you can see what there really is. And it tries
its best to parse as much low-level info as possible, even if the
higher-level semantics don't make sense (or are just broken/bad DWARF).
libdw does the higher-level view taking context in account. But it is
deliberate that readelf by default doesn't do that.
It also shows why DW_FORM_data are somewhat problematic because
they cannot be interpreted without some context. At the low-level that
readelf works at they are just blobs of bits.
Yes, it could in addition to the low level view also print some extra
info based on context (which is what the DW_AT_high_pc case does, but
that adds a dependency on higher-level libdw functions for parsing the
> And assuming signed is probably ok, even for C. (see more below)
Note that currently we handle all of DW_FORM_sec_offset, DW_FORM_udata,
DW_FORM_sdata, DW_FORM_data the same in readelf. We call
dwarf_formudata () on it and then special case some DW_AT_ attributes to
figure out whether it really represented a sec_offset after all (plus a
special case for DW_AT_high_pc as you noticed), if it isn't a special
case, then we just print the (unsigned) constant value.
Are you saying that the default should be to use dwarf_formsdata (), or
that we should add a special case for DW_AT_[lower|upper]_bound, or that
we should assume we should cast the result to a signed value in the
I think you mean to add a special case for DW_AT_[lower|upper]_bound,
with which I would probably agree.
> > But if I read
> > the DWARF issue correctly then a higher-level interface seeing a
> > DW_TAG_subrange_type would lookup the DW_TAG_type for the DIE first to
> > see whether it is signed or not to decide how to interpret the
> > DW_AT_lower and upper bound values. It can even be a reference or an
> > exprloc that represents the actual value. We might want to introduce a
> > dwarf_subrange_bounds () function that does that.
> The subrange_type encoding indicates signed index, but that doesn't
> actually change how GCC is using data. That data1=199 would be
> negative if you sign extended it, but for actual negative values it's
> using a full data8 with the MSB set. It seems pretty clear that GCC
> does not expect consumers to perform any sign extension on data.
At least for adding the subrange bounds it explicitly doesn't indeed. I
looked at the code in GCC and it does indeed cast the values for upper
and lower bounds to an unsigned value using the precision of its type
> Note, even though the subrange type here is signed "integer(kind=8)", I
> couldn't get gfortran to accept bounds greater than INT32_MAX. So
> there's no chance that 18446744073709551596 could be a legitimate
> positive bound, at least for this language.
> FWIW, C appears to get subrange_type unsigned "sizetype". So you could
> use this to determine how to show the resulting number, even though the
> data1 should never be sign-extended in the process. But if we just
> decide to always print bounds signed, this will be fine until some C
> program has an array bigger than INT64_MAX.
OK. I think I prefer adding the print the signed bounds based on the
subrange type just for correctness, even though I am convinced now that
at least for DW_AT_[lower|upper]_bound printing the result of
dwarf_formsdata () as implemented now (not sign extended) will be