[PATCH] Fix leb128 reading

Mark Wielaard mark@klomp.org
Tue Oct 27 15:23:52 GMT 2020


Hi Tom,

On Mon, 2020-10-26 at 16:54 -0600, Tom Tromey wrote:
> > > In this patch, I chose to try to handle weird leb128 encodings by
> > > preserving their values when possible; or returning the maximum value
> > > on overflow.  It isn't clear to me that this is necessarily the best
> > > choice.
> 
> Mark> I think it is a good thing to accept "padded" leb128 numbers, but
> Mark> probably up to a point. The padding will probably be either mod-4 or
> Mark> mod-8 bytes. So lets at most read up to 16 bytes total.
> 
> Now I think all this is just overkill.  Actually-existing leb128
> relocations (see commit 7d81bc93 in binutils) seem to shrink the
> representation.  There was some talk in the Rust leb128 decoder issues
> about this topic, but that was a wish-list thing possibly for wasm -- so
> not exactly a need.

OK. Lets drop it then. It wasn't really supported earlier (not for more
than 10 bytes anyway). So it shouldn't regress anything.

> Mark> Is the really just (1u << (8 * 8 - 7 * 9)) - 1 = (1u << 1) - 1 = 1
> Mark> So we are really only interested in the lowest bit?
> 
> Yeah.  We could hard-code that if you prefer, but I wrote it this way to
> pretend that maybe the types in the code would be changed (widened) some
> day.

I would like at least a comment. It took me a while to see it really
was just 1, and it might not be immediately clear why that is the
correct value.

> > > +  if (unlikely (value == UINT64_MAX))
> > > +    return INT64_MAX;
> 
> Mark> Here you need to help me out a bit, wouldn't UINT64_MAX also be the
> Mark> representation of -1?
> 
> Yes.  Normally in sleb128 I guess -1 would be written {0x7f}.  However
> one could write it with any number of 0xff prefix bytes first.  This
> weirdness is one reason I changed my mind on handling over-long leb128
> sequences.
> 
> I guess the only solution here is to write 2 copies of each function.

That is what I would do, simply because I get headaches when trying to
reason about these kind of signed/unsigned conversions. Of course you
would still have some of that in the sleb128 variant because you'll
have to use unsigned arithmetic and cast to signed at the end.

> Mark> But what I don't fully understand is how this works with a
> "padded"
> Mark> leb128, won't we possibly forget to negate in that case?
> 
> It isn't clear to me what to do when a sleb128 specifies more than 64
> bits.  Another reason perhaps to avoid this area.

There is not much we can do, except flag it as overflow. Any constant
value in DWARF that represents a > 64bit value is problematic (see for
example the DWARF5 DW_FROM_data16, which the spec claims is a
"constant", but we have to treat it as block form).

Cheers,

Mark


More information about the Elfutils-devel mailing list