[PATCH] Dwarf 5: Handle debug_str_offsets and indexed attributes that have base offsets.

Simon Marchi simon.marchi@polymtl.ca
Fri Nov 15 03:05:00 GMT 2019


On 2019-11-01 7:43 p.m., Ali Tamur via gdb-patches wrote:
> * Process debug_str_offsets section. Handle DW_AT_str_offsets_base attribute and
> keep the value in dwarf2_cu.
> 
> * Make addr_base and ranges_base fields in dwarf2_cu optional to disambiguate 0
> value (absent or present and 0).
> 
> * During parsing, there is no guarantee that DW_AT_str_offsets_base and
> DW_AT_rnglists_base fields will be processed before the attributes that need
> those values for correct computation. So make two passes, on the first one mark
> the attributes that depend on *_base attributes and process only the others.
> On the second pass, only process the attributes that are marked on the first
> pass.
> 
> * For string attributes, differentiate between addresses that directly point to
> a string and those that point to an offset in debug_str_offsets section.
> 
> * There are now two attributes, DW_AT_addr_base and DW_AT_GNU_addr_base to read
> address offset base. Likewise, there are two attributes, DW_AT_rnglists_base
> and DW_AT_GNU_ranges_base to read ranges base. Since there is no guarantee which
> ones the compiler will generate, create helper functions to handle all cases.
> 
> Tested with CC=/usr/bin/gcc (version 8.3.0) against master branch (also with
> -gsplit-dwarf and -gdwarf-4 flags) and there was no increase in the set of
> tests that fails. (gdb still cannot debug a 'hello world' program with DWARF 5,
> so for the time being, this is all we care about).
> 
> This is part of an effort to support DWARF-5 in gdb.

Hi Ali,

I did a first pass, I have to admit I don't understand everything.  I have noted
some random comments and questions, hopefully it will become a bit clearer on the
second read.

> -  /* The DW_AT_ranges_base attribute if present, zero otherwise
> -     (zero is a valid value though).
> +  /* The DW_AT_rnglists_base attribute if present.
>       Note this value comes from the Fission stub CU/TU's DIE.
>       Also note that the value is zero in the non-DWO case so this value can
>       be used without needing to know whether DWO files are in use or not
>       N.B. This does not apply to DW_AT_ranges appearing in
>       DW_TAG_compile_unit dies.  This is a bit of a wart, consider if ever
>       DW_AT_ranges appeared in the DW_TAG_compile_unit of DWO DIEs: then
> -     DW_AT_ranges_base *would* have to be applied, and we'd have to care
> +     DW_AT_rnglists_base *would* have to be applied, and we'd have to care
>       whether the DW_AT_ranges attribute came from the skeleton or DWO.  */
> -  ULONGEST ranges_base = 0;
> +  gdb::optional<ULONGEST> ranges_base = 0;

I am probably missing something, but I don't really understand the logic of making
this field optional and initializing it to 0, or the need to make it optional at all.

The only way the "optional" is used is in cases like these:

	    int need_ranges_base = tag != DW_TAG_compile_unit
				   && cu->ranges_base.has_value();
	    unsigned int ranges_offset = (DW_UNSND (&attr)
					  + (need_ranges_base
					     ? *cu->ranges_base
					     : 0));

In which case, the old code:

	    int need_ranges_base = tag != DW_TAG_compile_unit;
	    unsigned int ranges_offset = (DW_UNSND (&attr)
					  + (need_ranges_base
					     ? cu->ranges_base
					     : 0));

If the CU does not have a DW_AT_rnglists_base attribute, doesn't it implicitly
mean that this CU's ranges offset is 0?  So it would seem fine to just leave it
at 0 and keep the arithmetic as before.

That question might apply to the other similar fields too.

>  
>    /* When reading debug info generated by older versions of rustc, we
>       have to rewrite some union types to be struct types with a
> @@ -532,6 +531,12 @@ public:
>       all such types here and process them after expansion.  */
>    std::vector<struct type *> rust_unions;
>  
> +  /* The DW_AT_str_offsets_base attribute if present. For DWARF 4 version DWO

Two spaces after period.

> +
>      union
>        {
>  	const char *str;
> @@ -1324,6 +1341,7 @@ struct die_info
>  
>  #define DW_STRING(attr)    ((attr)->u.str)
>  #define DW_STRING_IS_CANONICAL(attr) ((attr)->string_is_canonical)
> +#define DW_STRING_IS_STR_INDEX(attr) ((attr)->string_is_str_index)

I don't really mind if you want to keep the consistency, but for new stuff
I would favor either reading the attribute directly, or using getters/setters.
I don't really see the advantage of those macros.

> +static gdb::optional<ULONGEST>
> +lookup_addr_base (struct dwarf2_cu *cu, struct die_info* comp_unit_die,
> +		  bool will_follow)

Please provide a comment for this function.

> +{> +  struct attribute *attr;
> +  if (will_follow)
> +  {
> +    attr = dwarf2_attr (comp_unit_die, DW_AT_addr_base, cu);
> +    if (attr == nullptr)
> +      attr = dwarf2_attr (comp_unit_die, DW_AT_GNU_addr_base, cu);
> +  }
> +  else
> +  {
> +    attr = dwarf2_attr_no_follow (comp_unit_die, DW_AT_addr_base);
> +    if (attr == nullptr)
> +      attr = dwarf2_attr_no_follow (comp_unit_die, DW_AT_GNU_addr_base);
> +  }

The curly braces scopes above are missing an indentation level.

> +  if (attr == nullptr)
> +    return gdb::optional<ULONGEST> ();
> +  return DW_UNSND (attr);
> +}
> +
> +static gdb::optional<ULONGEST>
> +lookup_ranges_base (struct dwarf2_cu *cu, struct die_info* comp_unit_die)

Please provide a comment for this function too.

> @@ -7667,6 +7731,13 @@ init_cutu_and_read_dies (struct dwarf2_per_cu_data *this_cu,
>    init_cu_die_reader (&reader, cu, section, NULL, abbrev_table);
>    info_ptr = read_full_die (&reader, &comp_unit_die, info_ptr, &has_children);
>  
> +  /* DWO_AT_GNU_dwo_name and DW_AT_comp_dir in the stub may be using
> +     DW_FORM_GNU_str_index which needs DW_AT_str_offsets_base.  */
> +  struct attribute *attr = dwarf2_attr (comp_unit_die,
> +					DW_AT_str_offsets_base, cu);
> +  if (attr)

Use:

  if (attr != nullptr)

There are a few instances throughout the patch.

> +    cu->str_offsets_base = DW_UNSND (attr);
> +
>    if (skip_partial && comp_unit_die->tag == DW_TAG_partial_unit)
>      return;
>  
> @@ -7730,9 +7801,9 @@ init_cutu_and_read_dies (struct dwarf2_per_cu_data *this_cu,
>      }
>  }
>  
> -/* Read CU/TU THIS_CU but do not follow DW_AT_GNU_dwo_name if present.
> -   DWO_FILE, if non-NULL, is the DWO file to read (the caller is assumed
> -   to have already done the lookup to find the DWO file).
> +/* Read CU/TU THIS_CU but do not follow DW_AT_GNU_dwo_name (DW_AT_GNU_dwo_name)

You listed DW_AT_GNU_dwo_name twice, I don't think that was the intent.

> +   if present. DWO_FILE, if non-NULL, is the DWO file to read (the caller is
> +   assumed to have already done the lookup to find the DWO file).
>  
>     The caller is required to fill in THIS_CU->section, THIS_CU->offset, and
>     THIS_CU->is_debug_types, but nothing else.
> @@ -7748,6 +7819,7 @@ init_cutu_and_read_dies (struct dwarf2_per_cu_data *this_cu,
>  
>  static void
>  init_cutu_and_read_dies_no_follow (struct dwarf2_per_cu_data *this_cu,
> +				   struct dwarf2_cu *parent_cu,

Please document this new parameter.

>  				   struct dwo_file *dwo_file,
>  				   die_reader_func_ftype *die_reader_func,
>  				   void *data)
> @@ -7786,6 +7858,11 @@ init_cutu_and_read_dies_no_follow (struct dwarf2_per_cu_data *this_cu,
>  					     ? rcuh_kind::TYPE
>  					     : rcuh_kind::COMPILE));
>  
> +  if (parent_cu)
> +    {
> +      cu.str_offsets_base = parent_cu->str_offsets_base;
> +      cu.addr_base = parent_cu->addr_base;
> +    }

Since "parent" is passed only to get the base offsets, I think it would be clearer to pass
the offsets directly.  If I look at the signature of the function and see "dwarf2_cu *parent_cu",
I can't really tell what it's for (though a comment might help).  But if I see
"ULONGEST str_offsets_base" and "ULONGEST addr_base", it's more self-explanatory.


>    this_cu->length = get_cu_length (&cu.header);
>  
>    /* Skip dummy compilation units.  */
> @@ -7800,11 +7877,31 @@ init_cutu_and_read_dies_no_follow (struct dwarf2_per_cu_data *this_cu,
>    init_cu_die_reader (&reader, &cu, section, dwo_file, abbrev_table.get ());
>    info_ptr = read_full_die (&reader, &comp_unit_die, info_ptr, &has_children);
>  
> +  /* DWO_AT_GNU_dwo_name and DW_AT_comp_dir in the stub may be using
> +     DW_FORM_GNU_str_index, so we need to fetch DW_AT_str_offsets_base
> +     ASAP.  */
> +  struct attribute *attr = dwarf2_attr (comp_unit_die,
> +					DW_AT_str_offsets_base, &cu);
> +  if (attr)
> +    {
> +      /* DW_AT_str_offsets_base shouldn't appear in DWOs.  */
> +      if (dwo_file != NULL)
> +	{
> +	  complaint (_("DW_AT_str_offsets_base present in DWO file %s,"
> +		       " ignored"),
> +		     dwo_file->dwo_name);
> +	}
> +      else
> +	{
> +	  cu.str_offsets_base = DW_UNSND (attr);
> +	}

Remove the curly braces.

> +    }
> +
>    die_reader_func (&reader, info_ptr, comp_unit_die, has_children, data);
>  }

> @@ -18825,9 +18952,7 @@ partial_die_info::read (const struct die_reader_specs *reader,
>  	      sect_offset off = dwarf2_get_ref_die_offset (&attr);
>  	      const gdb_byte *sibling_ptr = buffer + to_underlying (off);
>  
> -	      if (sibling_ptr < info_ptr)
> -		complaint (_("DW_AT_sibling points backwards"));
> -	      else if (sibling_ptr > reader->buffer_end)
> +	      if (sibling_ptr > reader->buffer_end)

Improvements like this one look good, but if they are not directly related to the patch,
please submit them in their own patch, it's easier to review and justify that way.

>  		dwarf2_section_buffer_overflow_complaint (reader->die_section);
>  	      else
>  		sibling = sibling_ptr;
> @@ -18882,10 +19007,11 @@ partial_die_info::read (const struct die_reader_specs *reader,
>  	    /* It would be nice to reuse dwarf2_get_pc_bounds here,
>  	       but that requires a full DIE, so instead we just
>  	       reimplement it.  */
> -	    int need_ranges_base = tag != DW_TAG_compile_unit;
> +	    int need_ranges_base = tag != DW_TAG_compile_unit
> +				   && cu->ranges_base.has_value();
>  	    unsigned int ranges_offset = (DW_UNSND (&attr)
>  					  + (need_ranges_base
> -					     ? cu->ranges_base
> +					     ? *cu->ranges_base
>  					     : 0));
>  
>  	    /* Value of the DW_AT_ranges attribute is the offset in the
> @@ -19176,12 +19302,58 @@ partial_die_info::fixup (struct dwarf2_cu *cu)
>    fixup_called = 1;
>  }
>  
> +void read_attribute_reprocess (const struct die_reader_specs *reader,
> +			       struct attribute *attr)

Please document this function.


> +{
> +  struct dwarf2_cu *cu = reader->cu;
> +  switch (attr->form)
> +    {
> +      case DW_FORM_addrx:
> +      case DW_FORM_GNU_addr_index:
> +        DW_ADDR (attr) = read_addr_index (cu, DW_UNSND (attr));
> +        break;
> +      case DW_FORM_strx:
> +      case DW_FORM_strx1:
> +      case DW_FORM_strx2:
> +      case DW_FORM_strx3:
> +      case DW_FORM_strx4:
> +      case DW_FORM_GNU_str_index:
> +        unsigned int str_index = DW_UNSND (attr);
> +	if (reader->dwo_file != NULL)
> +	  {
> +	    DW_STRING (attr) = read_dwo_str_index (reader, str_index);
> +	    DW_STRING_IS_CANONICAL (attr) = 0;
> +	    DW_STRING_IS_STR_INDEX (attr) = false;
> +	  }
> +	else if (cu->str_offsets_base.has_value ())
> +	  {
> +	    DW_STRING (attr) = read_stub_str_index (cu, str_index);
> +	    DW_STRING_IS_CANONICAL (attr) = 0;
> +	    DW_STRING_IS_STR_INDEX (attr) = false;
> +	  }
> +	else
> +	  {
> +	    if (attr->name != DW_AT_comp_dir
> +		&& attr->name != DW_AT_GNU_dwo_name
> +		&& attr->name != DW_AT_dwo_name)
> +	      {
> +		error (_("Dwarf Error: %s/%s found in non-DWO CU"),
> +		       dwarf_form_name (attr->form),
> +		       dwarf_attr_name (attr->name));
> +	      }
> +	    DW_UNSND (attr) = str_index;
> +	    DW_STRING_IS_STR_INDEX (attr) = true;
> +	  }

I'm a bit confused with those different cases, could you please add a bit
of comments to explain the logic for each case?  Why is it not possible to
obtain the string value at this point?  If cu->str_offsets_base has no value,
doesn't it mean that the offset is 0?

Maybe it would be more obvious if I saw the actual DWARF being parsed.  Could
you suggest some gcc or clang commands to obtain some DWARF where we have a
DW_AT_dwo_name that is an indirect string?

Thanks,

Simon



More information about the Gdb-patches mailing list