This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

dwarf branch


The status of my dwarf branch work is that it has sat untouched for some
time now while I was otherwise occupied (mostly by vacation and being sick).
All the work I'd committed anywhere was on the dwarf branch as of f7b29ae.
I've now merged in the 0.146 trunk and verified it still builds.

It's been far too long and I've lost most of my mental state about the
work.  Here's things as I recall them.  I should get back to working on
this by Monday or so and will probably spend quite a while just
recovering everything I knew before about what I wrote.

The current state of 'make check' is that the dwarfcmp tests will take
forever.  The tests on the all-C binaries complete.  The tests on the
binaries that use C++ all come last, and those may take forever.  I
don't think has anything magically to do with C++, it's just that the
reference graphs are massively more complex than anything our C code
produces.  (As described below, I'm not sure if "forever" just means
"way, way too long", but it might well really mean forever.)

I recommend configuring with 'CXXFLAGS=-D_GLIBCXX_DEBUG -g'.
That enables libstdc++ magic that gives clean errors for various
buggy uses of containers and iterators and such that otherwise
just lead to strange clobberation (and maybe still do anyway).

Before commit f7b29ae I was observing some memory leaks that I could
never make sense of.  Possibly some dangling pointer crashes too
(opposite of leaks).  I don't recall any more which cases exhibited
those problems, but I think it will come up somewhere in the
dwarfcmp-self or dwarfcmp-test runs on the C++ binaries.  (Of course,
the binaries we have today are not the same binaries I tested before,
but I bet we will find variants of the same problems.  Once you find
one, it's good to copy that binary off and preserve it as test data,
rather than testing against the src/dwarf* binaries that change with
every hack to the C++ stuff.)

That commit disables using subr::sharing_stack from libdw/c++/subr.hh,
which apparently has some kind of reference counting braino, but I just
kept not understanding it.  The purpose of sharing_stack was to reduce the
massive memory bloat we get from the vanilla copying flavor of std::stack.
In our uses, we always have common tails among many copied stack objects,
so it saves a lot of memory.  Also, they're proper stacks and so there are
never any cycles, so reference-counting is viable.  I imagine there is some
canonical and far better C++ thingabob to use for this rather than whipping
it up as I did.  A reference-counted linked list should not be so hard at
my age, but apparently I can't get it right.

The memory bloat and bugs associated with trying to reduce it was all a
side show to make the test runs less ridiculously slow and memory-intensive.

In commit 4ce456a was supposed to be a "correct but slower" fix.  That
disabled a flavor of caching I had added to speed things up.  But that
caching turned out to have false-positives and so wound up causing a
later assertion failure cascading from that false match.  As I recall
understanding it at the time, disabling the caching should have been the
conservative and correct change to fix the semantic bug, just real slow.
But on the complex cases, it takes so long now that I'm not sure whether
it is actually in an infinite loop (albeit one without leaks or infinite
recursion) or just unbelievably, monumentally slow (i.e. going to take
weeks to complete).

The caching involved is of an equivalence between two DIEs.  In plain
dwarfcmp, this is all part of plain comparison.  In the dwarfcmp-test
cases, i.e. writer bits, it is comparison done as part of the intrinsic
duplication elimination of instantiating a CU in dwarf_output.  In
nontrivial examples, there are long and circular chains of references
between DIEs.  The comparison has a main walk that goes along the DIE
tree structure, and then separate jump-started walks it takes when
following a pair of reference attributes to see if they point to
equivalent things.  In practice, you have already walked most of the
subtrees in the latter part of the main walk, as part of a subtree walk
taken following a reference.  Usually there are many references to the
same DIEs, so you would visit each subtree many times this way.  Hence,
it makes sense during the walk to cache the match results found during
side trips, and short-circuit repeating the same side trip again.  This
caching sped things up a lot.

All the trouble comes from circular references.  The reference-chasing
code has logic to handle circularities.  When a walk becomes circular,
it short-circuits.  But it's part of the comparison of attributes, so it
still has to yield true or false as to whether these two refereneces are
"equal".  The logic is that if both sides of the comparison are
congruent circularities, then they are "equal".  By "congruent" I mean
that if the left-hand side refers to A and the right-hand side refers to
B, then the outer subwalk we are part of that has A as its left-hand
side also has B as its right-hand side.  If that is true, and the whole
rest of the outer comparison subwalk rooted at A vs B also comes up
matching, then A and B really do match and so these two references
really do match and so their containing DIEs really do match and on up
to A and B really do match and ...

Never quote me for graph theory, but this seems to work right for the
main comparison.  However, at the time of the short-circuit, you don't
actually know that A and B match, only that these two reference
attributes do match if A and B do otherwise match.  But, the caching
code works by remembering the true/false result of the comparison
between the DIEs containing these two reference attributes.  When a DIE
had one or more reference attributes that were deemed to match because
of circularity, then you don't really know that this DIE matches (or
that it doesn't).  You only know that the result of that reference
attribute comparison, and all the results that flow back up the walk
from it, match the eventual result of completing the comparison subwalk
for the DIE that's the root of that circularity.

In some of the hairiest C++ cases, this bit me.  There will be a large
subtree like a class_type of the same name from two CUs we're comparing.
The two sides will have many large matching subtrees, including circular
references such as to each other or the containing class_type DIE.
Then, far on down the line, there will be some subtrees that don't match
or are missing on one side.  That makes the whole class_type not match.
Because of the circular references, that makes all those pairs of
"matching" subtrees actually not match.  But along the way, we'd thought
they did match and cached that as true.  I think this was fine enough
for plain dwarfcmp, since those subtrees with the false cached results
are only considered again for later references, and before that matters
you will have failed the main comparison anyway.  But the dwarf_output
code uses those subtree matches along the way to decide what is and
isn't a duplicate with some other subtree already collected, so this
later causes things to blow up.  

IIRC the actual examples were of what should have been the same
class_type (was the same in the source) in two different CUs, but the
compiler omitted some method decls from one and not the other, or
something like that.  This seemed like it might well have been a
compiler bug just on the semantics, and even if not definitely wrong per
se is certainly an impediment to ideal deduplication.  But before we
worry about the compiler, we should make our code semantically correct
for all kinds of input.  Once it works, then these cases will show up
when we instrument a little to look for similar-looking DIEs like a type
with the same name (and path, i.e. qualified name) that do not wind up
matching as duplicates.

I think when last I was really hacking on this, I was trying to
determine if the current code (with the caching disabled) really has
infinite loops or actually just repeats walks so much that it can take
literally days.  I don't think I ever quite managed an answer to that.
If it's really infinite, then something I don't understand is going on
with the whole comparison algorithm.

Assuming that would come out in the wash, my next game plan then (and
now) is to revamp the caching.  AFAICT this will entail reworking the
whole result-propagation part of the main comparison control flow so
that what propagates back up walks as they unwind is not just "true" but
"match contingent on ..." where ... are the "pending" circularities.
Then those results can be cached.  When you reach one of those nodes
that is the root of a detected circularity, then you can remove that
contingency from all the cache entries.  I haven't yet come to exactly
how all that would be done.

It's either that or punt my whole ad hoc algorithm in favor of some
fancy proper method of comparing directed cyclic graphs for equivalence.
But I don't know enough about that sort of thing to find one.


Thanks,
Roland

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]