This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: dwarfcmp performance


> about two weeks ago I let dwarfcmp spin on one of the difficult cases. 
> After a week or so it answered that the files are equal.  Another case 
> ended up after about a day.  So I suppose the algorithms are sound, it's 
> just that it takes ages to get through it all.

Ok, that's good to know.

> So the problem is we simply end up doing walks over those same dies 
> again and again and again, as we look for some far-ahead die.  I don't 
> know if there are other time sinks, but getting rid of this traversing 
> should help a great deal.

We already knew that, but measurement is always good.

> The obvious solution is to do an initial pass over all dies, noting 
> forward references, and when we hit the die that is forward-referred to, 
> remembering the path for later.  That way we would avoid all the 
> repetitve walking that context extraction entails.  I suspect that 
> should help a great deal with the overall performance.

I think you have misconstrued the data here.  dwarf_path_finder
(libdw/c++/dwarf_tracker) already does caching for these context paths.
It's not done by a preemptive walk, but by opportunistic caching along the
way of any walks being done.  When a forward reference necessitates finding
a context not already known, it clones and spins forward the current walk,
caching all the nodes seen along the way, until it finds the referent.
Unless there are bugs in that logic, we visit each node at most twice (once
in the comparison walk, and once on a side walk looking for a referent) for
the mere purpose of finding the context.

With a little more instrumentation, you can investigate whether there are
really redundant walks being kicked off from {left,right}_context calls.
Those calls use dwarf_path_finder::path_to.  That starts with a lookup in
the _m_seen map, which is an unordered_map, i.e. a hash table.  (Each key
in this table is a referent DIE's identity, which is a unique pointer and
thus its own perfect hash.)  If that lookup fails--as it always will for
the first forward reference--then it kicks off a side walk.  So, you can
instrument path_to and collect the data on success/failure of that hash
lookup (i.e. cache hit/miss) for each identity.  If there is ever more than
one cache miss for a given identity, then there is a bug somewhere.  If
there is ever any cache miss for an identity that is not a forward
reference, then there is a bug somewhere.  If there are cache hits for
identities that are indeed forward references, then the side-caching done
along the extra walks for prior cache misses are doing a good job.

I believe the reason we have repeated walks is not due to looking for
context, it's due to repeated actual comparison walks.  There is the main
tree walk comparing DIEs.  When that hits reference attributes that are
forward references, dwarf_comparator::reference_match does another subtree
walk rooted at the referent DIEs.  It's all those walks where we have the
massive redundancy.

It could well be that the most costly individual component of those extra
walks is dwarf_path_finder::step.  It's worth looking into how costly that
is for avoidable reasons.  But it's not the chief problem.  The chief
problem is an algorithmic one, which I will get back to later.

If the dwarf_path_finder::step constructor is indeed costly, let's look
into that.  (This will help everything, though it won't be the big fix we
need.)  Here's that method:

      {
	// Append this DIE to the path we'll record for it and its children.
	_m_walker->_m_path.push (here);

	// Record the path down from the CU to see this DIE.
	assert (!bad_die_path (_m_walker->_m_path));
	if (record)
	  _m_walker->_m_seen->insert (std::make_pair ((*here).identity (),
						      _m_walker->_m_path));
      }

So, it does two things.  

First, it pushes the HERE iterator onto the _m_path stack.  That does some
allocation, but at least with subr::sharing_stack, it shouldn't be too
costly.  It's also somewhat hard to avoid (but more on that in a moment).

Second, it does the insertion in the _m_seen map (the aforementioned hash
table).  The hash lookup part of that ought to be quick enough, and if it's
not I'm not sure what we could do about that.  Perhaps it's the case that
the C++ hooey involved in the unordered_map::insert API we're using there
has too much hidden cost.  That is, we are constructing a pair containing a
copy of the path object, only to throw it away if there is already a match.
Perhaps we should instead be using some different part of the unordered_map
API to avoid all that copy construction when it's already in the table, or
perhaps std::move helps or something.  Maybe you know the C++/STL magic
enough better than I do to see an improvement there, or maybe we can ask
bkoz for suggestions.  Perhaps it's as simple as just checking with
unordered_map::find before doing the insertion, and the hash function
(which is trivial here) and hash lookup are fast enough that naively
repeating the hashing for insertion when the find fails is superior to
having any of that copy-construction when the find is succeeds (i.e. the
insertion is going to be redundant and ignored).

So, back to the first part and if we could avoid it.  Well, if that hash
lookup does hit an existing entry, then it should be an entry that is equal
to what our _m_path is after .push (here).  So, theoretically we could do
the hash lookup first, and if it succeeds, then we replace _m_path by
copying the one in the table rather than doing the push that makes our
separate copy equal to it.  Given subr::sharing_stack (which is a linked
list with shared tails), that not only avoids the fresh allocation for the
new top-of-stack element, but also frees all the duplicative elements in
our status-quo-ante _m_path and replaces them with the shared list that's
in the table, further reducing memory use.  However, the memory use is
temporary and my guess is that the allocation cost is not so much.  

What argues against doing that is that the found entry only "should" be
equal to our current _m_path.  If the input data uses DW_TAG_imported_unit
then the same DIE identity could appear in multiple disparate paths.  They
should be "equal enough" paths.  But since one of the purposes of all this
is for dwarfcmp to verify the correctness of sharing decisions made by
rewriting, it does not seem wise to presume that.  The cost to compare the
paths for equality before deciding to reuse the die_path object in the
table probably overwhelms any savings.


Now, back to the larger point.  Optimizing context collection is all well
and good and worth a little bit of investigation, but it's not the big
thing.  The big thing is the walks done inside the "subcomparator" DIE
comparisons from dwarf_comparator::reference_match.

When comparison hits two reference attributes with the same attribute name,
it compares them by comparing their referent DIEs.  That's where the
"subcomparator" comes in.  This is where we do another comparison walk on
the left-hand vs right-hand referent subtrees.  For trivial mismatches
(different tag, one has children and the other doesn't), we always
short-circuit this walk quickly at the top of reference_match.  For
nontrivial cases, including all comparisons that will ultimately succeed,
we have to use a subcomparator walk.

When we have only non-circular backward references, then the reference
attribute comparison could short-circuit quickly and safely, at this code:

      // Maybe the tracker has already cached a correspondence of references.
      typename tracker::reference_match matched;
      if (_m_tracker.reference_matched (matched, ref1, ref2))
	return true;

That works when we've previously called tracker::notice_match to record a
successful comparison.  If we only had non-circular backward references,
then that's what we'd be doing.  However, you'll notice in dwarf_tracker:

    inline bool notice_match (reference_match &/*matched*/,
			      const die1 &, const die2 &/*b*/, bool matches)
    {
      /* XXX not reliable!
	 This match could be predicated on a tentative match of a
	 circular ref inside.  We can't cache that!
      if (matches && matched._m_lhs != NULL)
	matched._m_lhs->second.insert (b);
      */
      return matches;
    }

That is, the recording of a match is actually disabled.  That's because, as
it is, it really only works reliably for those simple, non-circular
backward reference cases.  In cases of circularity, it can have false
positives.  So, for non-algorithmic performance issues another tack you
could take is to re-enable that code by uncommenting it, and then analyze
the performance of known-correct matches (which IIRC is all we test in the
test suite right now) while not worrying about arcane mismatches that could
have false positives.

I've posted here before about the reasons this caching is not correct in
the general case and hence is now disabled in the main dwarf branch.
Perhaps Mark has the archive link closer to hand than I do.
I'll just repeat it now in summary.

First, consider a simple backward reference (as mentioned above).  In the
main comparison walk you came across subtrees A1 and B1, and found them
equal.  Now, later in that walk you are comparing A2 and B2.  A2 has an
attribute like type=A1, while B2 has type=B1.  To compare these two
DW_AT_type attributes, you compare their referents, A1 vs B1.  Since you
already did this before in the main walk, it would be redundant to compare
them again.  (But that's what today's dwarf branch is doing!  But imagine
that code above is not commented out.)  So, in the original comparison of
A1 and B1 in the main walk, you cached that they are equal.  This happens
in dwarf_comparator::match_deref, where it calls tracker::notice_match.
Now we're in dwarf_comparator::reference_match looking at type=A1 vs
type=B1.  Here we call tracker::reference_matched to look up that cached
result, so we short-circuit right there and say these references are equal.
That is good.

Next, consider a simple forward reference (no circularity involved).  In
the main comparison walk you come across two attributes like type=A3 vs
type=B3.  The main walk has not come across A3 yet, hence it's a forward
reference.  So, dwarf_comparator::reference_match does the trivial tests,
A3.tag () == B3.tag () and A3.has_children () == B3.has_children (), and
those pass.  We don't know anything else about A3/B3 yet, so we do a
recursive comparison, using a subcomparator.  (Before that, we've used
tracker::left_context (A3) and tracker::right_context (B3) to see that they
were in equal-enough contexts.)  The subcomparator does a whole subtree
walk on A3 and B3, and finds they are equal.  So that is good.  But later,
the main comparison walk will eventually find A3 and B3 in the CU trees.
We don't want to repeat that subtree comparison, because we already did it
in the side walk in the subcomparator from reference_match.  So, instead,
the last thing reference_match does is record the results of that subtree
comparison.  That's where it calls tracker::notice_match.  Since we did
that on the side walk, when the main walk comes across A3 vs B3 to compare
those subtrees (here we are back in dwarf_comparator::match_deref), the
first thing it does is check for a cached match.  That's the call to
tracker::prematch.  This sees that A3 and B3 were previously compared and
found equal, so it short-circuits the subtree walk of them in the main
comparison walk.  That is good.  Also notice that in the original
subcomparator walk on A3 and B3 we compared all their children and their
children's children, and so on, down to the leaves.  So e.g. if A4 is the
first child of A3 and B4 is the first child of B3, then we already compared
A4 vs B4 and found them equal as part of finding A3 and B3 equal.  When we
did that, we cached this fact too (and every level on down).  Hence, if
there is somewhere else a pair of references like type=A4 vs type=B4, our
cache will hit for that reference_match too and avoid another redundant
walk, which is also good.

Now, consider a weirder case that I'll call a "garden path circularity".
This looks something like the following.  (This is a simplified example
that is not quite like a real case would look, but has the important graph
properties that are the issue.)

A5:<class_type name="foo">		B5:<class_type name="foo">
A6:  <subprogram name="foo1" type=A5/>	B6:  <subprogram name="foo1" type=B5/>
A7:  <subprogram name="foo2"/>		   </class_type>
   </class_type>

Now, you can see that A5 and B5 are not actually equal, because there is no
child B7 to match A7.  But try following the algorithm we were using for
the forward reference with caching above.  When comparing A6 vs B6 we come
to type=A5 vs type=B5.  The circularity-detection logic (which I didn't go
into, but you can look at the list archives and/or at the comments in the
dwarf_ref_tracker::reference_matched method) sees that we are looking at a
circularity because a reference to A5 appears inside the A5 subtree.  It
then checks that the B5 circularity is congruent, and says, "Ok, type=A5
and type=B5 look like they match."  So far so good.  That's fine as a
tentative match, because as we continue with the subtree comparison we will
come to A7, see there is no B7, and thus the whole thing doesn't match.
But!  Along the way, we will have cached A6==B6 as a noticed match, and
then later on when looking at some other references to A6 and B6 we think
they are equal.  There's the false positive.

The real cases are much more complex than this, and I think I used better
examples in the past list postings about this.  But I think this
illustrates the essential character of the danger of false positives we get
from the naive caching plan that I implemented (and is now disabled).

For dwarfcmp to perform well, we need either a version of caching that
really is reliable in all cases, or a drastically different way of dealing
with reference comparisons and circular references that is not susceptible
to this kind of problem.  That's what made me throw up my hands and draft
Mark into the project. :-)


Thanks,
Roland

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]