This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH] Add C++ wrappers for iterating through various libdw and libdwfl objects

From: Mark Wielaard <mjw at redhat dot com>
To: elfutils-devel at lists dot fedorahosted dot org
Date: Mon, 30 Mar 2015 15:55:55 +0200
Subject: Re: [PATCH] Add C++ wrappers for iterating through various libdw and libdwfl objects
On Fri, 2015-03-27 at 21:32 +0100, Petr Machata wrote:
> Mark Wielaard <mjw@redhat.com> writes:
> > On Wed, 2015-03-18 at 00:55 +0100, Petr Machata wrote:
> The iterators are essentially glorified structs.  The usual C++ ABI
> concerns don't apply (reordering, adding virtuals).  They only use
> single inheritance off system classes, and those only serve for tagging
> (so that algorithms can dispatch based on iterator type).

But won't you have issues when some program uses one version of the
elfutils iterators and it also uses a library which was compiled against
an elfutils version that uses a different one? Or can/should/will they
never mix anyway (some of your answers below hint at that).

> >> Error conditions are communicated through exceptions.  For better or
> >> worse, that's how things are done in C++, and there's simply no way to
> >> communicate failures out of constructors or the likes of operator++.
> >> Currently I'm simply throwing an std::runtime_error, but maybe it would
> >> make sense to have a custom family of libelf, libdw, libdwfl exceptions,
> >> all rooted in a general elfutils exception type.
> >
> > Can we add those later, inheriting from std::runtime_error for example
> > to keep ABI?
> 
> Sure.

Great. I wouldn't mind if you prototyped how that might look in the
future. But again, see below. I think my idea of exceptions is slightly
skewed.

> >> One non-iterator-like concept that I included as well is a wrapper
> >> around dwfl_report.  It's an RAII class that ends the reporting when it
> >> goes out of scope.  As soon as one calls dwfl_report_begin, they assume
> >> responsibility to call dwfl_report_end eventually.  But in C++ many
> >> error conditions are communicated through exceptions, and making sure
> >> these cleanups are not skipped is messy.  RAII as a way to express these
> >> responsibilities in C++.
> >
> > The example here can be extended a little. It isn't immediately clear
> > what the advantage is from the two lines. It feels like a wrapper for
> > creating a Dwfl is missing. But maybe it is natural to mix-and-match the
> > C libdwfl.h and c++/libdwfl calls. Expanding the example might show why
> > it is really cool. Maybe one that wraps everything in a function or
> > block to show the RAII and cleanup triggering?
> 
> How about these two?
> 
>   //
>   // Example usage:
>   // {
>   //   Dwfl_Module *mod;
>   //   {
>   //     elfutils::dwfl_report r (dwfl);
>   //     mod = r.report_offline (fn, fn, fd);
>   //     assert (mod != NULL);
>   //   }
>   //   work_with (mod);
>   // }
>   //
>   // Or:
>   // {
>   //   elfutils::dwfl_report r (dwfl);
>   //   for (...; it; ...)
>   //     {
>   //       // open file at *it, potentially throw exception
>   //       r.report_offline (fn, fn, fd);
>   //     }
>   // }
> 
> The value added by the wrapper is not great.  It abstracts away one NULL
> check, and makes sure that dwfl_report_end is called even if you get
> exceptions from wherever (e.g. std::bad_alloc).
> 
> But since it needs a file descriptor, which can't be RAII'd (because
> close can signal errors), there would be enough silly dances around the
> interface anyway.  So it's questionable whether it's as useful as I
> thought when extracting it from dwlocstat.

The examples are much nicer/complete. But maybe we should keep these out
of the public header if they don't really add much value for now.

> >> It's all C++98.  We have had "the new C++" out there on the streets
> >> since 2009 or so, and the iterators presented here were all written with
> >> varying degrees of C++11 in them.  But that is still sufficiently new
> >> that I felt adhering to the C++98 subset is the prudent way forward, and
> >> backported the code.
> >
> > Are there any issues building this with C++11? Does that change ABI?
> 
> You mean e.g. a third-party C++98 library embedding these classes in
> public interfaces, and another C++11 application relying on that?

Right. Or the other way around. I assume these iterators might get
exposed by other libraries build upon elfutils that might get used by
applications that might use that library and elfutils directly.

> Hmm, I think there shouldn't be.  The GCC folks will move (moved) C++98
> to the new ABI as well, and GCC should include backward compatible
> symbols for the old ABI.  I think.

It might not be an issue. Maybe we can just assume that a whole system
is completely using the gcc c++89 abi or the gcc c++11 abi but never two
at the same time.

> >> +namespace elfutils
> >> +{
> >> +  // Helper functions.  Not for client consumption.
> >> +  namespace libdw_impl
> >> +  {
> >> +    inline void throw_libdw (int dwerr = 0);
> >> +    inline bool dwpp_child (Dwarf_Die &die, Dwarf_Die &result);
> >> +    inline bool dwpp_siblingof (Dwarf_Die &die, Dwarf_Die &result);
> >> +    inline Dwarf_Die dwpp_offdie (Dwarf *dbg, Dwarf_Off offset);
> >> +  }
> >
> > So this is just a convention?
> > Any namespace ending in _impl isn't really part of the API?
> 
> It is a convention in these header files.  It could be named just
> "impl", or maybe "aux", or something else still, I don't think there is
> convention to speak of.  Point is somehow to signal to the client that
> there be lions.

It might be my own confusion. I am still thinking this is exposing an
abi. And so we might want to have it checked by some libabigail
invocation. But since it is header only, maybe there really isn't any
abi?

> >> +  // Helper structure with data about each compile unit.
> >> +  struct cu_info
> >> +  {
> >> +    Dwarf_Die cudie;
> >> +    Dwarf_Off abbrev_offset;
> >> +    uint8_t address_size;
> >> +    uint8_t offset_size;
> >> +  };
> >
> > This is all useful information that we get from dwarf_nextcu () anyway,
> > so it is cheap to provide. But don't we also want to provide any type
> > unit information? I see the code uses dwarf_nextcu () and dwarf_offdie
> > () so it cannot really handle type units (as long as they are in
> > separate sections, DWARF5 proposes to merge them into the
> > main .debug_info instead of the separate .debug_type). Was it a
> > deliberate choice to ignore type units?
> 
> No.  The CU iterator was written when there were no type units around.
> dwgrep currently doesn't type units either.  Then again, the name makes
> it clear that this goes through CU's.  I imagine tu_iterator would be
> similarly useful, or all-encompassing unit_iterator.  But I haven't
> written it.

It kind of depends what you call a CU. This goes through compile_units,
and also through partial_units, but not type_units. DWARF describes
these as three kinds of compilation units. Normal, partial and type.

I think it is kind of inconsistent to have the iterator go through
normal and partial units, but not type units. I think normally a user
wants to go through the normal units and then have a tree iterator that
follows DW_TAG_import_unit to virtually create the DIE trees. Or they
might be interested in all type definitions (but don't want to have to
follow DW_AT_type references) so they want to iterate through normal and
type units (but not partial units, because they would be walked
virtually in a tree iterator).

It feels like the iterator is slightly too low-level and/or doesn't give
enough ways to ask how/what you really want to iterate over. I think a
unit_iterator where you can select which unit kinds you would like to
iterate over would be very useful.

> >> +  class cu_iterator
> >> +    : public std::iterator <std::input_iterator_tag, cu_info>
> >> +  {
> >> +    friend class die_tree_iterator;
> >> +    Dwarf *m_dw;
> >> +    Dwarf_Off m_offset;
> >> +    Dwarf_Off m_old_offset;
> >> +    cu_info m_info;
> >> +
> >> +    explicit cu_iterator (Dwarf_Off off)
> >> +      : m_dw (NULL)
> >> +      , m_offset (off)
> >> +      , m_old_offset (0)
> >> +    {}
> >
> > A NULL m_dw seems dangerous/sloppy.
> 
> This is for constructing end iterator.  The constructor is private.

I guess that just shows my limited C++ knowledge. It feels wrong that
the end iterator can have partial NULL references. But see below, I
might just have unreasonable expectations of end iterators (they
cannot/shouldn't be compared in general it seems).

> >> +  public:
> >> +    explicit cu_iterator (Dwarf *dw)
> >> +      : m_dw (dw)
> >> +      , m_offset (0)
> >> +      , m_old_offset (0)
> >> +    {
> >> +      move ();
> >> +    }
> >
> > Just a simple c++ question. What kind of stuff would convert as argument
> > if we don't use explicit here?
> 
> Dwarf CV-* only I think.  And things that have operator Dwarf CV-*.
> nullptr_t in C++11.  The reason I add explicit is that I don't want
> Dwarf *'s implicitly convert to iterators.  I want to see the iterator
> being constructed in the code.
> 
> >> +    // Construct a CU iterator for DW such that it points to a compile
> >> +    // unit represented by CUDIE.
> >> +    cu_iterator (Dwarf *dw, Dwarf_Die cudie)
> >> +      : m_dw (dw)
> >> +      , m_offset (dwarf_dieoffset (&cudie) - dwarf_cuoffset (&cudie))
> >> +      , m_old_offset (0)
> >> +    {
> >> +      move ();
> >> +    }
> >
> > And why don't we need explicit here?
> 
> It's two-argument constructor.  Technically we could add explicit if we
> want to disable the brace-initialization in C++11.

I admit I don't follow. But trust that the cases aren't similar for some
reason.

> >> +    bool
> >> +    operator== (cu_iterator const &that) const
> >> +    {
> >> +      return m_offset == that.m_offset;
> >> +    }
> >
> > Don't you also need to compare m_dw?
> > hmmm, that doesn't work with end () then.
> 
> Well, you are not supposed to compare iterators coming from different
> sources, so this doesn't need to work.

Is that common C++ knowledge? How do you define "source"? I assume
people will expect that if you compare two "wrong/incompatible"
iterators anything can happen (including crashes)? I guess this just
shows my inexperience with C++ iterators. Once you hit an "end" iterator
you always stop processing and won't try comparing against anything.

> Though there could be something like this for paranoia's sake:
> 
>   if (m_dw != NULL && that.m_dw != NULL)
>     assert (m_dw == that.m_dw);

Well, that still crashes on bad uses, so I don't think it is very
helpful. I guess what you have is fine.

> >> +    Dwarf_Off
> >> +    offset () const
> >> +    {
> >> +      return m_old_offset;
> >> +    }
> >
> > Why do we explicitly expose this? Is this more convenient than having it
> > as part of cu_info? And does the user really care about this offset? It
> > could also just be queried with dwarf_cu_die for example.
> 
> The bare offset does come up in eu-readelf.  But you are right that so
> does the CU DIE offset, so this is not a key piece of information.  We
> can drop the interface.  (dwgrep uses it, but it's not essential.)

I think we should either put it in cu_info or drop it. We have other
ways to retrieve this information if the user really needs it. And it
seems to make the interface bigger than necessary.

> > Just thinking of naming here. I assume we would also want to provide a
> > child iterator that transparently handles import_unit/partial_units.
> > Given that often that is the one people might want to use instead of
> > having to deal with "raw" DIE trees. How would we call that one?
> 
> cooked_child_iterator of course! (Just kidding.)
> 
> This sort of stuff is usually handled by policy classes.  Policies are
> template arguments (often template template arguments) that specify some
> detail of behavior of the primary template.  The child_iterator would
> then be configured by a policy class that would or would not do the
> inlining.
> 
> But that's an overkill, since there are only two variants.
> 
> So, um... importing_child_iterator?
> 
> It could also be a flag in template argument that defaults to raw mode.
> So elfutils::child_iterator<elfutils::import_partial_units>.
> 
> We could future-proof it by doing something like:
> 
> namespace elfutils
> {
>   enum partial_import_policy
>   {
>     raw,
>   };
> 
>   template <partial_import_policy ImportPol = elfutils::raw>
>   class child_iterator /* ... */;
> }
> 
> Then later you could add new policies to the enum as they are
> implemented.

I think we will only need "raw" and "logical" walks. IMHO it is more
natural to default to logical. But that kind of depends on what the
default for the CU iterator is (does it include partial units by default
or not).

> >> +    die_tree_iterator (die_tree_iterator const &that)
> >> +      : m_cuit (that.m_cuit)
> >> +      , m_stack (that.m_stack)
> >> +      , m_die (that.m_die)
> >> +    {}
> >
> > No way to walk the type units?
> 
> No.
> 
> >> +    die_tree_iterator
> >> +    operator++ ()
> >> +    {
> >> +      Dwarf_Die child;
> >> +      if (elfutils::libdw_impl::dwpp_child (m_die, child))
> >> +	{
> >> +	  m_stack.push_back (dwarf_dieoffset (&m_die));
> >> +	  m_die = child;
> >> +	  return *this;
> >> +	}
> >> +
> >> +      do
> >> +	if (elfutils::libdw_impl::dwpp_siblingof (m_die, m_die))
> >> +	  return *this;
> >> +	else
> >> +	  // No sibling found.  Go a level up and retry, unless this
> >> +	  // was a sole, childless CU DIE.
> >> +	  if (! m_stack.empty ())
> >> +	    {
> >> +	      m_die = elfutils::libdw_impl::dwpp_offdie (m_cuit.m_dw,
> >> +							 m_stack.back ());
> >> +	      m_stack.pop_back ();
> >> +	    }
> >> +      while (! m_stack.empty ());
> >> +
> >> +      m_die = (++m_cuit)->cudie;
> >> +      return *this;
> >> +    }
> >
> > If we had a "logical cu child iterator" would there be a way to extend
> > this to use either the logical child or low level DIE tree child
> > iterator instead of direct dwpp_siblingof/dwpp_offdie calls?
> 
> I should check out boost iterators, this whole thing looks like ripe for
> grand schemes like generic tree iterator specialized by iterator type.
> 
> In the mean time, we could either adopt the same approach as with
> child_iterator (viz. the policy enum) that the iterator would use as
> necessary.  Or go fully abstract and have child iterator as one of the
> template parameters.

/me would like to avoid making things too abstract if possible. I am
mostly worried that "raw" is too low-level in general and will not be
too useful for people because they will always have to add a logical
view on top of the raw iterators. I don't believe there will be more
than one logical view though. But maybe I am not ambitious enough.

> >> +    // Return a CU iterator representing a CU that the current DIE
> >> +    // comes from.
> >> +    cu_iterator
> >> +    cu () const
> >> +    {
> >> +      return m_cuit;
> >> +    }
> >
> > This interface might make it hard to also support "logical tree walking"
> > if we want to also handle import_unit/partial_unit. Or we could just say
> > "this always returns the low-level DIE tree CU".
> 
> It's not that important.  I use it for progress reports in dwlocstat,
> but that can use some other trick.

Lets drop it!

> >> +  };
> >> +
> >> +  // An attribute iterator goes through attributes of a given DIE.
> >> +  class attr_iterator
> >> +    : public std::iterator <std::input_iterator_tag, Dwarf_Attribute>
> >> +  {
> >> +    Dwarf_Die *m_die;
> >> +    Dwarf_Attribute m_at;
> >> +    ptrdiff_t m_offset;
> >> +
> >> +    struct cb_data
> >> +    {
> >> +      Dwarf_Attribute *at;
> >> +      bool been;
> >> +    };
> >
> > It would be nice to document these fields/structures a little to make
> > reading the implementation easier for noobs like me.
> 
> How about:
> 
> diff --git a/libdw/c++/libdw b/libdw/c++/libdw
> index f884d2e..43bdcd5 100644
> --- a/libdw/c++/libdw
> +++ b/libdw/c++/libdw
> @@ -451,7 +451,10 @@ namespace elfutils
>  
>      struct cb_data
>      {
> +      // The visited attribute.
>        Dwarf_Attribute *at;
> +
> +      // Whether this is second pass through the callback.
>        bool been;
>      };

Yes, a quick note on why we need to keep track of the two passes would
also be appreciated.

> >> +inline void
> >> +elfutils::libdw_impl::throw_libdw (int dwerr)
> >> +{
> >> +  if (dwerr == 0)
> >> +    dwerr = dwarf_errno ();
> >> +  assert (dwerr != 0);
> >> +  throw std::runtime_error (dwarf_errmsg (dwerr));
> >> +}
> >
> > This seems expensive (it copies the string doesn't it?).
> > Which might suggest having an explicit dw_error might be nicer.
> > Then one can call (or not call) dwarf_errmsg when catching.
> 
> It does, but this is error path--you are already losing.
> 
> But the custom error class (inherited off std::exception) would make
> sense.  We could simply expose the error code and do the translation to
> char const * on-demand in what().

I would like the "lazy" on-demand what() approach.
But this probably shows my ignorance on how exceptions are used in C++.
In java exceptions (throw-catch-finally) are often used as normal part
of control flow, so you want the creation of an exception as cheap as
possible, because you might not use it except for type matching to get
into the right catch clause. It looks like in C++ exceptions are much
less common and normally only used for semi-fatal issues.

> >> @@ -39,6 +39,9 @@ noinst_LIBRARIES += libdwfl_pic.a
> >>  
> >>  pkginclude_HEADERS = libdwfl.h
> >>  
> >> +cppdir = $(pkgincludedir)/c++
> >> +cpp_HEADERS = c++/libdwfl
> >
> > Is installing in a c++ subdir common?
> > Why not in the main include dir?
> > The file names don't conflict.
> 
> I don't care either way.  I did it this way because that's what the
> Roland's DWARF writer branch had back then.

Isn't there any convention where c++ headers should be installed?
I just dislike the extra directory name. But if other packages also have
explicit c++ directories for their c++ headers, then lets use that. If
not, lets not and just install under $(pkgincludedir).

> >> +  public:
> >> +    explicit dwfl_report (Dwfl *dwfl)
> >> +      : m_dwfl ((assert (dwfl != NULL), dwfl))
> >> +    {
> >> +      dwfl_report_begin_add (m_dwfl);
> >> +    }
> >
> > Is assert the best idiom for a NULL pointer check?
> 
> So, elsewhere in elfutils we would silently tolerate the NULL and just
> return a -1 or something.  But in a constructor that's not an option.
> 
> We could throw an exception, I suppose.

It might again be my limited C++ experience. If an exception is really
like an assert in most cases anyway, I don't really care. Just trying to
make sure I understand C++ idom around NULL pointers.

Thanks,

Mark
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]