RFC: .sleb128 and bignums

Tue Jan 11 14:41:00 GMT 2005

gas/read.c has two sets of functions for handling .sleb128, one for
O_constants and one for O_bigs.  The O_big version seems to be
horribly broken.  For example, a i686-linux-gnu configuration will
assemble:

        .data
        .sleb128        0x123456789a

as:

---------------------------------------------------------------------
Contents of section .data:
 0000 1a                                   .               
---------------------------------------------------------------------

This is because the continuation bit is only set by:

      if (size == 0)
        {
          if ((val == 0 && (byte & 0x40) == 0)
              || (~(val | ~(((valueT) 1 << loaded) - 1)) == 0
                  && (byte & 0x40) != 0))
            byte |= 0x80;
        }

where "size" is the number of littlenums left to read.  This whole
condition is inverted: it's setting the continuation bit in exactly
the situations where it _shouldn't_ be set!  The net effect is that
every bignum .sleb128 is one byte long.

Even with that fixed, there are a couple of problems.  First, the
routine uses unsigned values and arithmetic shifts and never does
any form of sign-extension.  Thus if (a) the number is negative and
(b) every bit of the final littlenum is added to the to the leb128,
the result won't necessarily be negative.  Example:

	.data
	.sleb128	-0x7000000000000000

gives:

---------------------------------------------------------------------
Contents of section .data:
 0000 80808080 80808080 9001               ..........      
---------------------------------------------------------------------

...where the final byte should be 0x7f instead.

Second, the routine strips off sign-filled littlenums using:

  /* Strip leading sign extensions off the bignum.  */
  while (size > 0 && bignum[size - 1] == (LITTLENUM_TYPE) -1)
    size--;

This fails to check the sign of bignum[size - 2] and can therefore
convert a negative number into a positive one.  Example:

	.data
	.sleb128	-0x100000000

gives:

---------------------------------------------------------------------
Contents of section .data:
 0000 808000                               ...             
---------------------------------------------------------------------

The first patch below should fix these cases.  That's only part of the
story though.  I was originally looking at this because the meaning of:

        .sleb128 0xffffffff

depends on whether you have a BFD64 toolchain or not.  If you don't
(as with powerpc-eabisim, for example) the constant is treated as -1.
If you do (as with mips-elf, which is still a 32-bit toolchain)
the constant is treated as an unsigned 33-bit value.

After the fix for gcc PR debug/14238, gcc will use the sleb128 above
for enumeration value FOO in:

        enum x { FOO = 0xffffffff } x;

gcc's HOST_WIDE_INT is 64 bits wide for powerpc-eabisim and so it can
store 0xffffffff as an unsigned value.  dwarf2asm.c:size_of_sleb128()
will therefore expect the sleb128 to be 5 bytes long, but gas produces
-1 instead, which throws off the offset calculations for the rest of
the .debug_info section.

I suppose this is really a gcc bug (it should probably be sign-extending
the constant) but I was curious to know whether the gas behaviour was
expected or not.  It seems odd for:

        .sleb128 0xffffffff

to be truncated but for a bigger number like:

        .sleb128 0x100000000

to be kept as-is.  The second patch below gets gas to treat the constant
as unsigned instead, but I'm not sure whether that's desirable.  It's also
something of a half-measure since any arithmetic involving 0xffffffff will
be done on O_constants, and thus:

        .sleb128 -0xffffffff

will still be treated as ".sleb128 1" by !BFD64 assemblers.

SO... to finally get to the point ;)  I guess there are three options:

  (1) Declare the treatment of 0xffffffff to be correct but fix the
      bignum problems (i.e. apply something like the first patch but
      not the second).

  (2) Treat values as unsigned if they were written that way (i.e. apply
      both patches, or variations of them).

  (3) Get rid of sleb128 bignum support altogether.  Treat everything
      as 32-bit if !BFD64.

(1) seems better than the status quo but the signed/unsigned thing is a
little odd.  I'm uneasy about (2) because of the arithmetic problem
described above.

According to the Cygnus repository, the bugs fixed by the first patch
have been around since the code was first added in Aug 1997.  That suggests
that bignum .sleb128s have never worked, yet as far as I know, no-one has
ever complained before.  Perhaps (3) really is a viable option?

At the moment, I'm really just posting these patches for comments.
I'll do proper testing, add testcases, etc., once I know what the
desired behaviour is.

Richard

gas/
	* read.c (output_big_sleb128): Fix setting of continuation bit.
	Check whether the final byte needs to be sign-extended.
	Fix size-shrinking loop.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: p1.diff
Type: text/x-patch
Size: 2090 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/binutils/attachments/20050111/ed200409/attachment.bin>
-------------- next part --------------

	* read.c (convert_to_bignum): New function, split out from...
	(emit_expr): ...here.
	(emit_leb128_expr): When generating a signed value, see whether an
	unsigned constant has been represented as a negative O_constant.
	Convert it to an unsigned bignum if so.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: p2.diff
Type: text/x-patch
Size: 2572 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/binutils/attachments/20050111/ed200409/attachment-0001.bin>