[PATCH] Dwarf 5: Handle debug_str_offsets and indexed attributes that have base offsets.

Simon Marchi simon.marchi@polymtl.ca
Tue Nov 19 05:27:00 GMT 2019


On 2019-11-18 11:46 p.m., Ali Tamur wrote:
> Hi Simon,
> Thanks for the review. Please see my replies below.
> 
>> -  ULONGEST ranges_base = 0;
>> +  gdb::optional<ULONGEST> ranges_base = 0;
> ***  I am probably missing something, but I don't really understand the logic
> of making this field optional and initializing it to 0, or the need to make it
> optional at all. If the CU does not have a DW_AT_rnglists_base attribute,
> doesn't it implicitly mean that this CU's ranges offset is 0?  So it would seem
> fine to just leave it at 0 and keep the arithmetic as before.
> 
> Ok, I changed the initial value to 'no value' and thank you, it's better. It
> seems a good idea to differentiate between an unexisting value and an existing
> value of 0 even if the math stays the same. For example, for str_offsets_base,
> there are different code paths depending on whether there is a value (not
> depending whether the value is 0). It seems to me all of [addr_base,
> ranges_base, str_offsets_base] should have similar semantics. But I agree that
> currently there is no behavioral difference in ranges_base case, do you prefer
> it ULONGEST?

I don't know enough about the situation to be able to tell.  But in my mind the choice
would be: if a missing information is semantically equivalent to it being zero, then we
don't need the optional.  The value 0 is just a no-op.  But if a missing information has
a different meaning than an information present with a value 0, then the optional is
probably necessary because we'll need to do some branching based on whether the value is
present or not.

>> +#define DW_STRING_IS_STR_INDEX(attr) ((attr)->string_is_str_index)
> ***  I don't really see the advantage of those macros.
> 
> I agree and I don't like these macros at all. But I wanted to be
> consistent. (Maybe I should clear all of them as a separate refactoring cl
> sometime)

Being consistent with the code around is a good justification, so if you want to keep it
like that I am fine with that.

>> +  struct attribute *attr = dwarf2_attr (comp_unit_die,
>> +                                     DW_AT_str_offsets_base, cu);
>> +  if (attr)
> *** Use: if (attr != nullptr)
> 
> Done and sent another patch to replace all "if (attr)" with 

Thanks, very appreciated.

>>  static void
>>  init_cutu_and_read_dies_no_follow (struct dwarf2_per_cu_data *this_cu,
>> +                                struct dwarf2_cu *parent_cu,
> ***  Please document this new parameter. Since "parent" is passed only to get
> the base offsets, I think it would be clearer to pass the offsets directly. 
> If I look at the signature of the function and see "dwarf2_cu *parent_cu", I
> can't really tell what it's for (though a comment might help).  But if I see
> "ULONGEST str_offsets_base" and "ULONGEST addr_base", it's more self-explanatory.
> 
> I wrote a comment, I hope it's good. I prefer not changing the parameters to
> ULONGEST str_offsets_base and ULONGEST addr_base, because I am confused a little
> too: there is a case where there is no parent_cu and in that case I believe any
> existing value should not be overridden and it is difficult to do that on the
> caller site.

Ok, I had not noticed that.

> ***  I'm a bit confused with those different cases, could you please add a bit
> of comments to explain the logic for each case?  Why is it not possible to
> obtain the string value at this point?  If cu->str_offsets_base has no value,
> doesn't it mean that the offset is 0?
> 
> Maybe it would be more obvious if I saw the actual DWARF being parsed.  Could
> you suggest some gcc or clang commands to obtain some DWARF where we have a
> DW_AT_dwo_name that is an indirect string?
> 
> Reply:
> Sure, here's an example.
> $  cd /tmp/dw5 && echo "int calculate() { return 4; } int main(int argc, char** argv) { return calculate(); }" >>main.cc
> $  clang -gdwarf-5 -gsplit-dwarf main.cc
> $  ls
> a.out main.cc main.dwo 
> 
> $  llvm-dwarfdump --all a.out
> a.out:	file format ELF64-x86-64
> 
> .debug_abbrev contents:
> Abbrev table for offset: 0x00000000
> [1] DW_TAG_compile_unit	DW_CHILDREN_no
> 	DW_AT_stmt_list	DW_FORM_sec_offset
> 	DW_AT_str_offsets_base	DW_FORM_sec_offset
> 	DW_AT_comp_dir	DW_FORM_strx1
> 	DW_AT_GNU_pubnames	DW_FORM_flag_present
> 	DW_AT_GNU_dwo_name	DW_FORM_strx1
> 	DW_AT_low_pc	DW_FORM_addrx
> 	DW_AT_high_pc	DW_FORM_data4
> 	DW_AT_addr_base	DW_FORM_sec_offset
> 
> .debug_info contents:
> 0x00000000: Compile Unit: length = 0x00000024 version = 0x0005 unit_type = DW_UT_skeleton abbr_offset = 0x0000 addr_size = 0x08 DWO_id = 0xee1d4b42a2f0ca0b (next unit at 0x00000028)
> 
> 0x00000014: DW_TAG_compile_unit
>               DW_AT_stmt_list	(0x00000000)
>               DW_AT_str_offsets_base	(0x00000008)
>               DW_AT_comp_dir	("/tmp/dw5")
>               DW_AT_GNU_pubnames	(true)
>               DW_AT_GNU_dwo_name	("main.dwo")
>               DW_AT_low_pc	(0x0000000000401110)
>               DW_AT_high_pc	(0x0000000000401141)
>               DW_AT_addr_base	(0x00000008)
> ....
> .debug_str contents:
> 0x00000000: "/tmp/dw5"
> 0x00000009: "main.dwo"
> ....
> .debug_str_offsets contents:
> 0x00000000: Contribution size = 12, Format = DWARF32, Version = 5
> 0x00000008: 00000000 "/tmp/dw5"
> 0x0000000c: 00000009 "main.dwo"
> 
> Here is what happens. gdb starts to parse DW_TAG_compile_unit DIE. It comes
> to DW_AT_GNU_dwo_name. It is of form DW_FORM_strx1 and it has a value of 1.
> The actual value is somewhere in .debug_str section. To find it we need to
> process .debug_str_offsets (refer to 1st index somewhere within) and also know
> the value of DW_AT_str_offsets_base. However, we are in the middle of parsing
> the die and there is no guarantee that we have yet processed
> DW_AT_str_offsets_base, it may be parsed later. 
> 
> My solution is to temporarily write "1" as the value of the attribute
> DW_AT_GNU_dwo_name, and mark it as 'needs reprocessing'. After all the
> attributes of the die have been processed, DW_AT_str_offsets_base will hold the
> correct value if it exists. Then, we revisit the marked attributes. During
> reprocess, we don't need to read the binary file again, because we had already
> written the value we need (1) in the first pass. We calculate the correct
> address using that value, the contents of .debug_str* sections and
> DW_AT_str_offsets_base value.
> 
> I am not claiming this is the only or the best solution, but I think it is
> intuitive and changes relatively few lines of code.

Thanks for the details above.  It is a bit late here for me to process all of this, but I'll try
to give it a look (and at the revised patch) in the following days.

Thanks,

Simon



More information about the Gdb-patches mailing list