Bug 15032

Summary:	GNU/Linux backtrace fails to use eh_frame information when built with --enable-64-bit-bfd
Product:	gdb	Reporter:	Joseph Kain <jkain>
Component:	backtrace	Assignee:	Not yet assigned to anyone <unassigned>
Status:	CLOSED WONTFIX
Severity:	normal	CC:	jan, jkain, pedro
Priority:	P2
Version:	7.5
Target Milestone:	---
Host:		Target:
Build:		Last reconfirmed:

Description Joseph Kain 2013-01-17 21:07:55 UTC

I'm unable to provide a reproduction case for this bug but I did spend some time debugging the problem. I debugged the problem using the gdb-7.5.1 sources and then confirmed the same problem occurs with the gdb-git sources from commit 3174fd02b667571ba97f88f6d48705dc0b009a86.

I found that dwarf2_frame_find_fde() may fail to accept the unwind information in an objfile because fde_table->entries[0]->initial_location is invalid. When debugging a 32 bit inferior initial_location would hold a value like 0x10003b8f0 which is clearly out of bounds. The initial_location should have been 0x3b8f0.

I tracked this back to a problem in read_encoded_value() which was called by decode_frame_entry_1() to compute initial_location. In the case I looked at, read_encoded_value() which would go through the DW_EH_PE_pcrel and DW_EH_PE_udata4 cases. In one particular call that I found that:

* DW_EH_PE_pcrel case computed base = 0xbfd60.
* DW_EH_PE_udata4 case computed 0xfff7bb90 via bfd_get_32() and added base for a result of 0x10003b8f0.

When read_encoded_value() summed the two values it used 64 bit math because CORE_ADDR is a 64 bit type. This gave the result 0x10003b8f0 when it should instead have computed the result using 32 bit math and rolling over to 0x3b8f0.

Environment:
Linux dhcp-172-16-174-205.nvidia.com 2.6.35.6-45.fc14.i686 #1 SMP Mon Oct 18 23:56:17 UTC 2010 i686 i686 i386 GNU/Linux

$ gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i686-redhat-linux/4.5.1/lto-wrapper
Target: i686-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,lto --enable-plugin --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch=i686 --build=i686-redhat-linux
Thread model: posix
gcc version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC)

This GDB was configured as "i686-pc-linux-gnu".

I built gdb from the git tree at commit 3174fd02b667571ba97f88f6d48705dc0b009a86. gdb 7.5.1 behaves the same way.
I configured gdb as: ../../gdb-git/configure --prefix /home/joseph/gdb/install/git --enable-64-bit-bfd

Comment 1 Jan Kratochvil 2013-01-18 19:57:33 UTC

The problem is 0xfff7bb90 should have been 64-bit extended as 0xfffffffffff7bb90 and then it would all work.
But I do not see where the bug is without a reproducer, couldn't you provide some such 32-bit binary?

Comment 2 Joseph Kain 2013-01-18 20:29:40 UTC

> couldn't you provide some such 32-bit binary?

I'll see if I can prepare such a 32 bit binary.  At the moment I don't have a releaseable version as I was in the process of adding missing .eh_frame data to the binary when I discocvered this problem.

> The problem is 0xfff7bb90 should have been 64-bit extended as 
> 0xfffffffffff7bb90 and then it would all work.

The DW_EH_PE_udata4 case won't sign extend as it's supposed to represent unsigned data.  I believe, the DW_EH_PE_udata4 case was selected in the block


  if ((encoding & 0x07) == 0x00)
    {
      encoding |= encoding_for_size (ptr_len);
      if (bfd_get_sign_extend_vma (unit->abfd))
	encoding |= DW_EH_PE_signed;
    }

Clearly, DW_EH_PE_signed wasn't set.  I can try to look into why.

Comment 3 Jan Kratochvil 2013-01-18 20:38:00 UTC

x86* arch does not use bfd_get_sign_extend_vma so it is correct DW_EH_PE_signed is not set.

The file probably got relocated by BFD, I do not have an answer now but I was curious how the whole setup could happen, base address is small positive increment and the 0xfff7bb90 address overflows by it, that should not happen.
Maybe the base should have been 0xffffffff000bfd60.

Comment 4 Joseph Kain 2013-01-18 21:25:41 UTC

I stepped through read_encoded_value again looking for any relocation.  Note the values have changed due to changes to the inferior.

When base is computed in the DW_EH_PE_pcrel case:

* bfd_get_section_vma => 0xbfd80.  This matches the value of the VMA of the .eh_frame section as reported by objdump -x.

* (buf - unit->dwarf_frame_buffer) => 0x20.  I think this seems reasonable for the first FDE.

* base = 0xbfda0.


When the final value is computed in the DW_EH_PE_udata4 case:

* bfd_get_32() => 0xfff7bbe0.  bfd_read_32() reads from offset 0xbfda0 from the unit.  I confirmed that buf points to offset 0xbfda0 in the correct .so via /proc/<pid>/maps.  I also confirmed with a hex editor that the value 0xfff7bbe0 appears at offset 0xbfda0 in the .so's image on disk.  I don't think there was any relocation done.

* base + bfd_get_32 (...) => 0x10003b980.

So I don't think there was any relocation done.  I also don't see any relocation entries in the .so for the offset 0xbfda0?  Should there be one?

Comment 5 Joseph Kain 2013-01-18 21:42:28 UTC

Sorry, I think I've left out some important information.

Using the values from my last update we have read_encoded_value returning an initial_location of 0x10003b980.  Where the correct value should be 0x3b980.  I know this value is correct because I have confirmed it as the VMA of the .text section which is where the initial value for the first FDE would be expected to point.

The value read by bfd_read_32() in the DW_EH_PE_udata4 case must be a negative value like 0xfff7bbe0 as pc relative address is being used and the .eh_frame section comes after the .text section.  I've included the section header dump from objdump -x below.

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .hash         000044d8  000000b4  000000b4  000000b4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .dynsym       000092f0  0000458c  0000458c  0000458c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .dynstr       0000b32a  0000d87c  0000d87c  0000d87c  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .gnu.version  0000125e  00018ba6  00018ba6  00018ba6  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .gnu.version_r 00000070  00019e04  00019e04  00019e04  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .rel.dyn      00021b00  00019e74  00019e74  00019e74  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .text         00068858  0003b980  0003b980  0003b980  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  7 .rodata       0001bba0  000a41e0  000a41e0  000a41e0  2**5
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 .eh_frame     00010d54  000bfd80  000bfd80  000bfd80  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 .ctors        00000004  000d1ad4  000d1ad4  000d0ad4  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 10 .dynamic      000000e0  000d1ad8  000d1ad8  000d0ad8  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 11 .got.plt      0000000c  000d1bb8  000d1bb8  000d0bb8  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 12 .data         00003fac  000d1be0  000d1be0  000d0be0  2**5
                  CONTENTS, ALLOC, LOAD, DATA
 13 .writetext    0001bb80  000d5ba0  000d5ba0  000d4ba0  2**5
                  CONTENTS, ALLOC, LOAD, CODE
 14 .bss          0000f674  000f1720  000f1720  000f0720  2**5
                  ALLOC

Comment 6 Jan Kratochvil 2013-01-21 18:39:13 UTC

(In reply to comment #2)
> as I was in the process of adding missing .eh_frame data

I believe the problem is in your added data.  How do you add it?

gas has the .cfi_* directives which assemble .eh_frame the right way for you.

gcc -fasynchronous-unwind-tables -m32 production uses augmentation flag 'R' providing encoding 0x1b, therefore DW_EH_PE_pcrel | DW_EH_PE_sdata4.

As the values are 0xff.. they are really signed and you should use DW_EH_PE_sdata*.

I find it NOT-A-BUG on the GDB side.

Comment 7 Joseph Kain 2013-01-22 20:11:41 UTC

> > as I was in the process of adding missing .eh_frame data
> I believe the problem is in your added data.  How do you add it?

I just added by adding the -fasynchronous-unwind-tables to my gcc invocations.

Looking over the gcc source I see that for non-PIC builds gcc doesn't include the 'R' augmentation at all.  However, ld adds it when linking my objects together.  I'm using binutils 2.18 which includes a bfd that uses encoding 0x10 (DW_EH_PE_pcrel).  Newer binutil release's bfds use 0x1b (DW_EH_PE_pcrel | DW_EH_PE_sdata4).  So, is it the case that gdb doesn't support binaries built with binutils 2.18?

Comment 8 Jan Kratochvil 2013-01-22 20:18:38 UTC

Could you git bisect which binutils check-in started to add DW_EH_PE_sdata4?
There could be some discussion about it on the binutils mailing list.

So far I do not find it as "gdb doesn't support" but rather that older binutils had such bug.

Comment 9 Joseph Kain 2013-01-22 20:26:47 UTC

I haven't done a bisect, but by code inspection/blame I found this:

http://sourceware.org/git/?p=binutils.git;a=commitdiff;h=e8a3a9901149ac2fe85f11b3c8bc66cf4074b94c;hp=7093cb064f881aef599b5f12ab2b8286315bb889

Does this give you what you need?

Comment 10 Jan Kratochvil 2013-01-27 17:49:26 UTC

Sorry I missed your mail.

Therefore it is:
RFC: Using DW_EH_PE_sdata* when converting absolute encodings to PC-relative form
http://sourceware.org/ml/binutils/2009-09/msg00490.html

I do not agree (c) is a bug, according to previous GDB discussions it is correct behavior:

http://sourceware.org/ml/gdb-patches/2010-02/msg00287.html
But as I see now fixing few GDB places to always sign-extend the displacement
CORE_ADDR will permit using the current standard 64bit math operators even for
32bit inferiors.

So this is not a GDB bug, the issue is already fixed in very old binutils.

If you need GDB compatibility with such old binaries/systems I find more suitable a vendor/downstream patch in GDB port there.

Comment 11 Pedro Alves 2013-02-04 19:06:02 UTC

The MIPS case makes things trickier since that arch has signed addresses.
However, if we're adding two 4-byte unigned integers (address and, DW_EH_PE_udata4), overflow is usually defined as wrapping around.
If we look at this in purely an arithmetic perspective, for targets
with unsigned 32-bit addresses, I'd argue that 32-bit wrap around would
actually be correct if the result is intended to be an address.  That's what the target machine would do if presented with the same computation.

Comment 12 Joseph Kain 2013-02-07 18:18:32 UTC

Pedro, I share your opinion on the matter.  However, I've upgraded the linker used in my project so this isn't a pressing issue for me.