This is the mail archive of the
elfutils-devel@sourceware.org
mailing list for the elfutils project.
Re: incorrect signed data
- From: Mark Wielaard <mjw at redhat dot com>
- To: elfutils-devel at lists dot fedorahosted dot org
- Date: Wed, 05 Feb 2014 12:44:52 +0100
- Subject: Re: incorrect signed data
On Tue, 2014-02-04 at 18:24 -0800, Josh Stone wrote:
> On 02/04/2014 03:12 PM, Josh Stone wrote:
> > There are only a few internal dwarf_formsdata calls: for the decls as I
> > mentioned, and in array_size() for DW_AT_lower/upper_bound. AFAICS the
> > spec doesn't explicitly call bounds signed or unsigned, but only
> > unsigned makes sense to me, so these also ought to use dwarf_formudata.
>
> http://www.dwarfstd.org/ShowIssue.php?issue=020702.1
>
> So Fortran allows negative bounds, yay, and this is the origin of the
> standard's vague statements about data[1248] context.
Thanks for finding this, it explains the context nicely.
> Here's a little experiment with gcc-gfortran-4.8.2-7.fc20.x86_64:
> (and forgive my fortran ignorance, but at least this compiles)
>
> PROGRAM main
> INTEGER A(10:199)
> INTEGER B(-20:-10)
> A(10) = B(-10)
> END
>
> yields:
>
> [ 67] array_type
> type (ref4) [ 7f]
> sibling (ref4) [ 78]
> [ 70] subrange_type
> type (ref4) [ 78]
> lower_bound (data1) 10
> upper_bound (data1) 199
> [ 78] base_type
> byte_size (data1) 8
> encoding (data1) signed (5)
> name (strp) "integer(kind=8)"
> [ 7f] base_type
> byte_size (data1) 4
> encoding (data1) signed (5)
> name (strp) "integer(kind=4)"
> [ 86] array_type
> type (ref4) [ 7f]
> sibling (ref4) [ a5]
> [ 8f] subrange_type
> type (ref4) [ 78]
> lower_bound (data8) 18446744073709551596
> upper_bound (data8) 18446744073709551606
>
> Thus gfortran appears to support the current elfutils behavior - read it
> as unsigned and cast it without sign extension. It happily put 199 in
> data1, and went all the way to data8 for negative values. It could have
> been more compact with sdata instead of data8 though.
>
> Also, apparently eu-readelf is not using dwarf_formsdata for bounds, and
> it should.
It doesn't because it is very low-level and doesn't use any context. So
it just sees the DW_FORM_data8 and will print its value. But if I read
the DWARF issue correctly then a higher-level interface seeing a
DW_TAG_subrange_type would lookup the DW_TAG_type for the DIE first to
see whether it is signed or not to decide how to interpret the
DW_AT_lower and upper bound values. It can even be a reference or an
exprloc that represents the actual value. We might want to introduce a
dwarf_subrange_bounds () function that does that.
> Binutils readelf prints those as hex, no better.
>
> FWIW, libdwarf's dwarfdump just reveals its indecision:
> DW_AT_lower_bound 10
> DW_AT_upper_bound 199(as signed = -57)
> and
> DW_AT_lower_bound 18446744073709551596(as signed = -20)
> DW_AT_upper_bound 18446744073709551606(as signed = -10)
Right, because dwarfdump is similar to eu-readelf, it doesn't use any
context and so it doesn't know how to represent the value encoded with
DW_FORM_data8. I actually like that it also prints the signed value if
different. Maybe we should make eu-readelf do the same? Printing is hex
like binutils readelf does is another way to mask the ambiguity at the
low-level.
> So now I'm not sure anything needs to change. At least dwarf_formsdata
> should stay as-is for gcc.
Are you sure? I think your original analysis is correct that
dwarf_formsdata () is wrong and really should sign-extend.
> We could conceivably use dwarf_formudata for
> DW_AT_decl_file/line/column, since those really are specified unsigned,
> but this is unlikely to ever make a difference. The values for
> dwarf_decl_line/column are asserted 0..INT_MAX, and people with more
> than INT64_MAX files are already insane.
You are right. Still using dwarf_formudata () would be more correct
IMHO.
Cheers,
Mark