dwarf_aggregate_size doesn't work with arrays in partial CUs

KJ Tsanaktsidis ktsanaktsidis@zendesk.com
Sun Oct 3 05:05:22 GMT 2021


On Thu, Sep 30, 2021 at 12:27 AM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi KJ,
>
> On Sat, 2021-09-25 at 17:21 +1000, KJ Tsanaktsidis via Elfutils-devel
> wrote:
> > I'm writing a program that uses ptrace to poke at internal OpenSSL
> > data structures for another process. I'm using libdw to parse the
> > DWARF data for the copy of OpenSSL actually linked in to the target
> > process, so I can extract struct offsets, member sizes and the like
> > and poke at the right places.
> >
> > I've run into an issue where dwarf_aggregate_size can't calculate the
> > size of an array, when the array is included in a partial CU
> > (DW_TAG_partial_unit). If the array unit includes a DW_AT_upper_bound
> > attribute, but not a DW_AT_lower_bound attribute, then
> > dwarf_aggregate_size will infer the lower bound based on the
> > DW_AT_language attribute of the enclisng CU (i.e. whether the language
> > uses zero or one based indexing).
> >
> > However, the debug symbols I'm looking at for OpenSSL from the Ubuntu
> > repositories have the DW_AT_language on the full compilation unit
> > entries, but not in the partial ones included in them. This means that
> > caling dwarf_aggregate_size on the array type DIE does not work.
>
> That is indeed a problem, since dwarf_aggregate_size doesn't provide
> another way to provide the language to use for the
> dwarf_default_lower_bound call. And the default is to return an
> DWARF_E_UNKNOWN_LANGUAGE error.
>
> Maybe we should change the default to assume the lower bound is zero?
>
> > The DWARF spec doesn't really seem to have anything to say on the
> > matter (all it says is "A full or partial compilation unit entry may
> > have the following attributes", but doesn't say what it logically
> > means if an attribute is present on the complete CU but not a partial
> > one).
>
> I think it is assumed that it inherits those attributes from the CU
> from which the partial one was imported and/or from the CU of the DIE
> that referenced the DIE in the partial unit. But I don't think it is
> easy to track that with libdw currently.
>
> > I guess it doesn't really make sense for a single compilation unit to
> > contain multiple languages? So I wonder if dwarf_srclang (called by
> > dwarf_aggregate_size) should crawl through the list of CU's to see if
> > the DIE's CU is included in a CU that _does_ specify DW_AT_language
> > (recursively, I suppose). Then, we can infer that the partial CU's
> > language is the same as the enclosing one.
> >
> > If people reckon this is a good idea (or, have a better one!), I'm
> > happy to try and put together a patch.
>
> I think that suggestion is sound, but really expensive. It also is
> somewhat tricky if you have alt files, you'll have to track back to the
> original Dwarf to see if it imports one of the partial units from the
> alt file.
>
> But I also don't have a good alternative idea. We could maybe have a
> variant of dwarf_aggregate_size that takes a language default value,
> but that doesn't seem like a very generic solution. Or maybe a variant
> of dwarf_srclang that takes any DIE (not just a CU DIE) and which tries
> to figure out the best language to use, which falls back to some
> default value if it cannot figure out what the language is that can be
> used with dwarf_default_lower_bound to get a default (most likely
> zero)?
>
> We could also ask producers (like dwz) to always include a
> DW_AT_language for partial units they create. But that of course makes
> the partial units bigger (and at least dwz creates them to make the
> full debuginfo smaller).
>
> Cheers,
>
> Mark
>

I guess we don't want to hide some really expensive traversal
operation inside a simple call to dwarf_aggregate_size, no...

What if we instead provide a way for the user to specify what language
a CU is? Like "dwarf_cu_report_language(Dwarf_Die *cu, int lang)".
That would get saved with the (partial) CU, and dwarf_srclang could
retrieve this information (if DW_AT_language isn't set). Then, the
user could recursively traverse all CUs and call
dwarf_cu_report_language on each partial CU. And as a bonus, we could
even wrap that up in dwarf_cu_traverse_partial_cu_set_language or
something (OK, the name needs a bit of workshopping).

That way, the expensive thing is in a separate call that's marked as
being very expensive (and cached, so it only needs to be done once).
Sound like a reasonable approach?



More information about the Elfutils-devel mailing list