Bug 18025

Summary: dwarf2 debug info after rebasing DLLs unusable
Product: binutils Reporter: Corinna Vinschen <corinna>
Component: binutilsAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: amodra, jon.turney, nickc
Priority: P2    
Version: 2.26   
Target Milestone: ---   
Host: Target: cygwin, i686 and x86_64
Build: Last reconfirmed:
Attachments: Proposed patch - detects rebasing and computes an address bias
example of a large object file demonstrating 'nm -l' slowdown
Proposed patch

Description Corinna Vinschen 2015-02-25 15:46:42 UTC
Hi,

we're encountering a problem evaluating Dwarf2 debug info in DLLs after
rebasing the DLL.  Rebasing, that is, moving the image base address of a
DLL and adjusting the relocation information, is an essential part of
DLL handling in the Cygwin distro, required for smooth operation of
the fork emulation.

Consider a DLL built with debug info, unstripped.  As an example, I'm
using the latest file-5.22-1 package which comes with a DLL called
cygmagic-1.dll.  The output of objdump -h on the built DLL looks like this:

$ objdump -h cygmagic-1.dll.

cygmagic-1.dll:     file format pei-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00013618  00000004d9221000  00000004d9221000  00000600  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE, DATA
  1 .data         00000068  00000004d9235000  00000004d9235000  00013e00  2**5
                  CONTENTS, ALLOC, LOAD, DATA
  2 .rdata        00005258  00000004d9236000  00000004d9236000  00014000  2**6
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 [...]
 10 .debug_aranges 00000510  00000004d9243000  00000004d9243000  0001c400  2**4
                  CONTENTS, READONLY, DEBUGGING
 11 .debug_info   0002b4ab  00000004d9244000  00000004d9244000  0001ca00  2**0
                  CONTENTS, READONLY, DEBUGGING
 12 .debug_abbrev 00003d2b  00000004d9270000  00000004d9270000  00048000  2**0
                  CONTENTS, READONLY, DEBUGGING
 13 .debug_line   00006046  00000004d9274000  00000004d9274000  0004be00  2**0
                  CONTENTS, READONLY, DEBUGGING
 14 .debug_frame  00002b68  00000004d927b000  00000004d927b000  00052000  2**3
                  CONTENTS, READONLY, DEBUGGING
 15 .debug_str    00000302  00000004d927e000  00000004d927e000  00054c00  2**0
                  CONTENTS, READONLY, DEBUGGING
 16 .debug_loc    000259a2  00000004d927f000  00000004d927f000  00055000  2**0
                  CONTENTS, READONLY, DEBUGGING
 17 .debug_ranges 00003230  00000004d92a5000  00000004d92a5000  0007aa00  2**0
                  CONTENTS, READONLY, DEBUGGING

Notice the VMA addresses.  The DLL is based at 0x4d9220000.  This DLL
works and evaluating the debug info works nicely.  This is the  `nm -l'
output before rebasing:

  $ nm -l cygmagic-1.dll | grep usr/src/debug
  [...]
  00000004d922bd80 T file_fmttime /usr/src/debug/file-5.22-1/src/print.c:232
  00000004d922c520 T file_fsmagic /usr/src/debug/file-5.22-1/src/fsmagic.c:104
  00000004d922d3f0 T file_getbuffer       /usr/src/debug/file-5.22-1/src/funcs.c
:321
  00000004d922b120 T file_is_tar  /usr/src/debug/file-5.22-1/src/is_tar.c:64
  00000004d922a1d0 T file_looks_utf8      /usr/src/debug/file-5.22-1/src/encodin
g.c:295
  [...]

So it shows the sources and line numbers for the symbols as expected:

  $ nm -l cygmagic-1.dll | grep usr/src/debug | wc -l
  198

And here's what happens after rebase to some arbitrary address:

  $ rebase -b 0x300000000 cygmagic-1.dll
  $ nm -l cygmagic-1.dll | grep usr/src/debug | wc -l
  0

nm lost all connection between the symbols and their sources.

Checking with GDB: In GDB you can set a breakpoint on this function and
it's loaded to the same address.  When breaking, GDB shows the function,
its arguments, and the source line:
  Breakpoint 1, file_fsmagic (ms=ms@entry=0x6000394f0,
      fn=fn@entry=0x23cb75 "./file", sb=sb@entry=0x23c8f0)
      at /usr/src/debug/file-5.22-1/src/fsmagic.c:104
  104     {

After rebasing to, e.g., 0x300000000 as above, it looks like this:

  $ rebase -b 0x300000000 cygmagic-1.dll
  $ nm cygmagic-1.dll | grep file_fsmagic
  000000030000c520 T file_fsmagic

  (gdb) r ./file
  Starting program: /usr/bin/file ./file
  [New Thread 2696.0xecc]
  Warning:
  Cannot insert breakpoint 1.
  Cannot access memory at address 0x4d922c520

So, apparently the debug info uses absolute addresses, rather than
image base relative or section relative addresses.  After rebasing,
the info points to invalid addresses.

Shouldn't binutils be aware of the effect of rebasing, and make sure
that the existing debug info is correctly evaluated even after rebase?

Even better, shouldn't the dwarf2 debug info be defined image base
relative rather than using absolute addressing, to make sure it
still contains valid information even after rebasing a DLL?  At least
on PE/COFF targets?


Thanks,
Corinna
Comment 1 Alan Modra 2015-02-26 01:05:39 UTC
This isn't a binutils problem.  "rebase" isn't part of binutils, and it is your compiler that chooses whether dwarf debug info uses absolute or relative addresses.
Comment 2 Nick Clifton 2015-02-27 12:31:02 UTC
Created attachment 8154 [details]
Proposed patch - detects rebasing and computes an address bias
Comment 3 Nick Clifton 2015-02-27 12:35:40 UTC
Hi Alan,

> This isn't a binutils problem.  "rebase" isn't part of binutils,

True.  It would be best if the rebase program correctly updated the DWARF debug information itself.  Although without relocs to guide it, it would be necessary to add a DWARF parser to the program.

A small patch to the binutils, such as the one uploaded, would allow the tools to detect rebasing and allow for it, without requireing a rewrite of that tool.

> and it is
> your compiler that chooses whether dwarf debug info uses absolute or
> relative addresses.

Not really.  The DWARF format does not support base-address relative addressing, which is what would be needed in this case.


This problem could be fixed outside of the binutils, true.  But it would be nice if the binutils could cope.

Cheers
  Nick
Comment 4 Alan Modra 2015-02-27 23:16:59 UTC
> Not really.  The DWARF format does not support base-address relative addressing, which is what would be needed in this case.

Oh right, I was confusing eh_frame, where relative addresses are supported, with general debug info.
Comment 5 Nick Clifton 2015-03-04 14:49:31 UTC
Hi Alan,

  I assume that you still object to the patch, on the grounds that it is fixing a problem that is not the binutils fault ?

  One thing that did occur to me was that it might be useful to add a new option to objcopy so that it could adjust the absolute addresses inside DWARF sections.  Sort of like --change-section-address, but for debug info.  Then if a someone does use rebase to alter the load address of a DLL, then will have the option of being able to update the debug info inside the DLL as well.

  I looked at the rebase sources, but adapting them seemed a bit hairy to me.  (Probably because of unfamiliarity).

Cheers
  Nick
Comment 6 Alan Modra 2015-03-04 22:43:28 UTC
> I assume that you still object to the patch, on the grounds that it is fixing a problem that is not the binutils fault?

No, no objections from me.
Comment 7 Sourceware Commits 2015-03-05 12:16:22 UTC
The master branch has been updated by Nick Clifton <nickc@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=425bd9e1bb32b25881dd20d76678d041f7bf04f8

commit 425bd9e1bb32b25881dd20d76678d041f7bf04f8
Author: Nick Clifton <nickc@redhat.com>
Date:   Thu Mar 5 12:14:26 2015 +0000

    Allows the binutils to cope with PE binaries where the section addresses have been changed, but the DWARF debug info has not been altered.
    
    	PR binutils/18025
    	* coffgen.c (coff_find_nearest_line_with_names): If the dwarf2
    	lookup fails, check for an address bias in the dwarf info, and if
    	one exists, retry the lookup with the biased value.
    	* dwarf2.c (_bfd_dwarf2_find_symbol_bias): New function.
    	Determines if a bias exists bewteen the addresses of functions
    	based on DWARF information vs symbol table information.
    	* libbfd-in.h (_bfd_dwarf2_find_symbol_bias): Prototype.
    	* libbfd.h: Regenerate.
Comment 8 Nick Clifton 2015-03-05 12:18:05 UTC
OK, I have applied the patch which will allow the binutils to cope with rebased binaries.

If it turns out to be needed I will look into adding the dwarf info rebasing feature to objcopy, but for now I would prefer not to add new features unless they are really needed.

Cheers
  Nick
Comment 9 Corinna Vinschen 2015-03-05 13:18:16 UTC
Hi Nick,

thanks a lot for the patch.  I'll look into getting a new binutils for
the Cygwin distro soon.


Thanks,
Corinna
Comment 10 Jon Turney 2017-03-24 14:17:43 UTC
(In reply to Corinna Vinschen from comment #9)
> thanks a lot for the patch.  I'll look into getting a new binutils for
> the Cygwin distro soon.

So, that update never happened, because 'nm -l' was reported to be very slow
for large programs (e.g. an attempt at 'nm -l' for libstdc++ was abandoned after 48hrs), and I've just bisected that regression to this commit.

Looking at the commit, even if the object isn't rebased, every symbol without
linenumber information (e.g. type 'N' symbols) will cause an exhaustive, linear search for the symbol (I guess leading to polynomial runtime)

I'm not sure how to improve this. 

I guess the bias for all symbols in a compilation unit is going to be the same, so there's some scope for caching that.
Comment 11 Nick Clifton 2017-03-27 08:18:22 UTC
Hi Jon,

> Looking at the commit, even if the object isn't rebased, every symbol without
> linenumber information (e.g. type 'N' symbols) will cause an exhaustive,
> linear search for the symbol (I guess leading to polynomial runtime)
> 
> I'm not sure how to improve this. 

Have you tried using the latest binutils ?  We did make some improvements to
line number caching recent.  (Although I think that this was with DWARF not STABS.  Darn).

> I guess the bias for all symbols in a compilation unit is going to be the
> same, so there's some scope for caching that.

Is there a way to reproduce this problem in a Linux environment ?  (I ask
because it is much easier for me to debug and fix problems in this environment).
I did try uploading a copy of libstdc++.a from old cygwin installation (based upon gcc 5.4.0) and then running "nm -l" on it with a newly built x86_64-pc-cygin toolchain.  It took 7 seconds to complete.  So there must be something that I am doing that does not match what you are doing.

Cheers
  Nick
Comment 12 Jon Turney 2017-03-27 15:23:48 UTC
Created attachment 9949 [details]
example of a large object file demonstrating 'nm -l' slowdown
Comment 13 Jon Turney 2017-03-27 15:28:16 UTC
(In reply to Nick Clifton from comment #11)
> Have you tried using the latest binutils ?  We did make some improvements to
> line number caching recent.  (Although I think that this was with DWARF not
> STABS.  Darn).

Yes, I'm testing with binutils git master.

> Is there a way to reproduce this problem in a Linux environment ?  (I ask
> because it is much easier for me to debug and fix problems in this
> environment).
> I did try uploading a copy of libstdc++.a from old cygwin installation
> (based upon gcc 5.4.0) and then running "nm -l" on it with a newly built
> x86_64-pc-cygin toolchain.  It took 7 seconds to complete.  So there must be
> something that I am doing that does not match what you are doing.

I would think that binutils built to handle pe-i386 targets would show the same
behaviour on any host.

I've attached a tar file containing an unstripped x86 cygstdc++.dll.

For me, 'nm -l' on that is very slow, only managing to output a few symbols
per second.  Reverting 425bd9e1 makes it fast again.
Comment 14 Nick Clifton 2017-03-28 09:52:57 UTC
Hi Jon,

  The problem, I believe, is not the line number lookup code, but rather the
  bias computation code.  This is performing a linear scan of the symbol table
  for every functional unit.

  Please could you try out the uploaded patch.  I found that with this applied
  the "nm -l" scan of the cygstdc++6.dll took just 15 seconds, which I hope is
  fast enough.

Cheers
  Nick
Comment 15 Nick Clifton 2017-03-28 09:53:16 UTC
Created attachment 9953 [details]
Proposed patch
Comment 16 Jon Turney 2017-03-28 17:31:47 UTC
(In reply to Nick Clifton from comment #14)
>   The problem, I believe, is not the line number lookup code, but rather the
>   bias computation code.  This is performing a linear scan of the symbol table
>   for every functional unit.

Yes.

>   Please could you try out the uploaded patch.  I found that with this applied
>   the "nm -l" scan of the cygstdc++6.dll took just 15 seconds, which I hope is
>   fast enough.

Thanks very much for looking at this.

This seems to work and is at least 10,000 times faster! Awesome :)
Comment 17 Sourceware Commits 2017-03-29 11:28:51 UTC
The master branch has been updated by Nick Clifton <nickc@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=e643cb45bf85fa5c8c49a89ff177de246af4212e

commit e643cb45bf85fa5c8c49a89ff177de246af4212e
Author: Nick Clifton <nickc@redhat.com>
Date:   Wed Mar 29 12:27:44 2017 +0100

    Improve the speed of scanning PE binaries for line number information.
    
    	PR binutils/18025
    	* coff-bfd.h (struct coff_section_data): Add new fields:
    	saved_bias and bias.
    	* coffgen.c (coff_find_nearest_line_with_names): Cache the bias
    	computed for PE binaries.
    	* dwarf2.c (scan_unit_for_symbols): Only warn once about each
    	missing abbrev.
Comment 18 Nick Clifton 2017-03-29 11:29:13 UTC
Patch applied.