This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: dwarf_edit_output testcase sanity checking

From: Roland McGrath <roland at redhat dot com>
To: elfutils-devel at lists dot fedorahosted dot org
Date: Wed, 17 Nov 2010 10:49:46 -0800
Subject: Re: dwarf_edit_output testcase sanity checking

> I think I (once again) forgot about the importance of the fact that when
> a reference attribute points to a DIE the context of that DIE always
> matters. That makes any comparison of DIEs not very "simple enough". 

Ah, yes.  I managed to forget about this when reviewing your plan, too.

> So then you get to drag in the reference tracker and use dwarf_comparator
> for correctness. But I would really like to have the dwarf_output
> algorithm be independent from the dwarf_comparator implementation. Not
> just because we then have an independent checker. But also because it
> feels that what dwarf_output should be simpler and more concrete.

I agree with those sentiments.  But I'm not sure what it means in practice.

> Since in the proposed algorithm the first pass does a full walk over the
> DIE tree already, it should be simple to just store the full context of
> every DIE. Then in the second pass when we hash all reference attributes
> we can hash in this context. Which should reduce collisions again.

This does complicate things somewhat, since we have always said before
that we want to use an "equal enough" predicate for context matching,
rather than an "identical" predicate.  It so happens the first cut at
"equal enough" is to compare basic attributes and ignore the references.
But we want to leave the door open to tweak that predicate later on to
improve sharing.  (In fact, I am fairly sure we'll need to tweak it to
get sufficient sharing for C++ because of how the namespace declarations
are done in the libstdc++ headers.)

So in the first pass, we'll have two different hash functions.  To
begin with, "context hash" can be just the combination of the
basic-attributes-only hash (the part of the first pass hash that we
compute before looking at children) with the parent's context hash.
Later on, we'll most likely want to replace that with a heuristic
that excludes certain basic attributes on certain tags.

> I am pondering if we could just rely on the fact that a pure reference
> chain (ignoring child DIEs) will never loop.  I am not sure what to do
> if we ever encounter a real reference chain loop. But it should be easy
> to detect. But if we can then we would have for each DIE a full hash for
> the whole "child chain" and for the whole "reference chain", which
> should give us "perfect" hashes, since they would describe each DIE
> completely.

I don't think we should have the plan of just blowing up when there is
such a loop.  But I don't think we'll encounter one in the real world.
So it should be fine to start out with detecting it and barfing
coherently with an "implement me" error.

Of course this won't ever be a "perfect hash" in the normal sense of
that term, since it's still a combinatorial function that can always
produce the same hash for unrelated inputs if the bits happen that way.

> But I am note sure we can get rid of ignoring hash-collisions though.
> Because I don't see how you ever make a distinction between a hash
> collision because two DIEs are indeed completely identical and when
> there is false collision because the hash function isn't perfect.

I don't see how it could ever be avoided completely, except for the
cryptographic hash idea (which I don't especially like).


Thanks,
Roland

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]