Bug 2342 - linkonce debug is broken
Summary: linkonce debug is broken
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-15 22:16 UTC by H.J. Lu
Modified: 2006-05-24 15:51 UTC (History)
3 users (show)

See Also:
Host: i686-pc-linux-gnu
Target: i686-pc-linux-gnu
Build: i686-pc-linux-gnu
Last reconfirmed:


Attachments
A testcase (978 bytes, application/octet-stream)
2006-02-15 22:19 UTC, H.J. Lu
Details
A small hack which can be applied. (645 bytes, patch)
2006-04-24 15:22 UTC, Michael Matz
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Comment 1 H.J. Lu 2006-02-15 22:19:53 UTC
Created attachment 868 [details]
A testcase

The old linker:

[hjl@gnu xx]$ make
g++  -g   -c -o test01a.o test01a.cpp
g++  -g   -c -o test01b.o test01b.cpp
g++  -g   -c -o test01c.o test01c.cpp
ld -r -o test.o test01a.o test01b.o test01c.o
g++  -g -o test test.o
gdb -batch -x gdb.cmd test > gdb.log
Function "internal_error" not defined.
Function "info_command" not defined.
.gdbinit:8: Error in sourced command file:
No breakpoint number 0.
bp2=`grep -i Breakpoint gdb.log | grep test01a.h | grep "line 4"` || exit 1; \
echo $bp2 | grep 0x0:; \
if [ $? = 0 ]; then exit 1; else true; fi
[hjl@gnu xx]$

The new linker:

[hjl@gnu xx]$ make LD=../ld
g++  -g   -c -o test01a.o test01a.cpp
g++  -g   -c -o test01b.o test01b.cpp
g++  -g   -c -o test01c.o test01c.cpp
../ld -r -o test.o test01a.o test01b.o test01c.o
g++  -g -o test test.o
gdb -batch -x gdb.cmd test > gdb.log
Function "internal_error" not defined.
Function "info_command" not defined.
.gdbinit:8: Error in sourced command file:
No breakpoint number 0.
bp2=`grep -i Breakpoint gdb.log | grep test01a.h | grep "line 4"` || exit 1; \
echo $bp2 | grep 0x0:; \
if [ $? = 0 ]; then exit 1; else true; fi
Breakpoint 2 at 0x0: file test01a.h, line 4.
make: *** [all] Error 1
[hjl@gnu xx]$
Comment 2 Alan Modra 2006-02-15 23:06:15 UTC
I think we all agree that the ideal ld behaviour would be to strip out debug
info corresponding to removed linkonce sections.  Failing that, I claim that
marking the debug info in some way as invalid is the next best solution.  At
least that gives gdb a chance to discard the info.  That is what we are trying
to do by putting a zero in debug info address fields.

Your patch caused other gdb problems, specifically, gdb miscalculated the
address range for a compilation unit, which led to gdb reporting file and line
number wrongly for a breakpoint set in another compilation unit.
Comment 3 H.J. Lu 2006-02-15 23:50:34 UTC
I know my patch isn't ideal. But at least, it fixes the testcase with gdb
and your change fails it. Do you have a testcase to show the gdb problem you
are trying to address? We can try to make both testcases to work correctly
with gdb by modifying ld/gdb or both.

BTW, I am using gdb 6.4.50.20060201-cvs.
Comment 4 Alan Modra 2006-02-16 00:39:04 UTC
The problem occurred in debugging a huge C++ oracle application, using current
CVS gdb as well as older gdbs.  My attempts to generate a reduced testcase have
so far failed.
Comment 5 Alan Modra 2006-02-16 00:47:52 UTC
At least, I haven't managed to generate a testcase that fails on x86.  On
powerpc64-linux, the testcase you have here also shows the problem I saw with
the oracle app.

With debug info for removed linkonce pointing to kept
(gdb) b foo1
Breakpoint 1 at 0x10000734: file test01c.cpp, line 4.

With debug info for removed linkonce zeroed
(gdb) b foo1
Breakpoint 1 at 0x10000734: file test01b.cpp, line 5.
Comment 6 H.J. Lu 2006-02-16 01:43:28 UTC
On x86, the linker without your change gave me

(gdb) b foo1
Breakpoint 1 at 0x80484a7: file test01b.cpp, line 5.
(gdb) r
Starting program: /export/home/hjl/bugs/binutils/linkonce/test

Breakpoint 1, foo1 (b=1) at test01b.cpp:5
5        B *pb= new B ;
(gdb)

What compiler are you using? Why doesn't it fail on x86?
Comment 7 Alan Modra 2006-02-16 04:56:46 UTC
As I said, I haven't found a testcase that fails on x86..  The ppc64 compiler is

Target: powerpc-linux
Configured with: /src/gcc-current/configure --prefix=/usr/local
--build=powerpc-linux --host=powerpc-linux --target=powerpc-linux
--enable-targets=powerpc64-linux --with-cpu=default64 --disable-nls
--enable-__cxa_atexit --enable-languages=all
Thread model: posix
gcc version 4.2.0 20060213 (experimental)
Comment 8 H.J. Lu 2006-02-16 05:45:00 UTC
Since my testcase works on x86, but fails on ppc64, can you compare the
output of "readelf -w" bewteen x86 and ppc64? The problem you have seen
may be specific to ppc64 in ld, gcc and/or gdb.
Comment 9 Michael Matz 2006-04-24 14:33:02 UTC
Btw, HJ: your patch to revert Alans also makes ld quite slow on huge 
testcases.  I put a tarball on http://www.suse.de/~gcctest/slowld.tar.gz . 
link command: 
 
% g++ -g -o ff3d trapFPE.o main.o FFThread.o StaticCenter.o  
language/libfflanguage.a solver/libffsolve.a language/libpovlanguage.a 
geometry/libffgeometry.a algebra/libffalgebra.a utils/libffutils.a -pthread 
 
This will take from five to ten minutes to link depending on the machine. 
These are i386 .o and .a files. 
 
The problem is that _bfd_elf_check_kept_section is N^2 in the number of 
sections, and furthermore does repeatedly the same work over and over again 
(e.g. sorting all symbols of a BFD over and over).  Not having PRETEND in 
action completely avoids this work here (though of course the N^2 problem 
still is there).  This reduces link time to about 2 to 3 seconds. 
 
We were trying to work-around this by noting that the 
_bfd_elf_check_kept_section() function basically is const, i.e. given 
the same discared input section it will give the same result every time. 
Hence we can remember it in struct bfd_section or in the ELF specific part 
of a section.  That still leavs multiple sorts over the same set of 
symbols for each BFD (one time for each section needing that handling). 
Then ld needs only 17 seconds, which still is much better. 
 
When I saw that this actually was not a problem in FSF binutils, but only 
in your version I stopped making the patch pretty for submission, so I  
add it here only for demonstration what I mean: 
 
-------------------------------------------------- 
@@ -7512,7 +7438,13 @@ elf_link_input_bfd (struct elf_final_lin 
                        { 
                          asection *kept; 
 
-                         kept = _bfd_elf_check_kept_section (sec); 
+                         if (sec->hack_foo == NULL) 
+                           { 
+                             sec->hack_foo = _bfd_elf_check_kept_section 
(sec); 
+                           } 
+                         if (sec->hack_foo == NULL) 
+                           sec->hack_foo = (void*)-1; 
+                         kept = sec->hack_foo == (void*)-1 ? NULL : 
sec->hack_foo; 
                          if (kept != NULL) 
                            { 
                              *ps = kept; 
--------------------------------------------------------- 
 
Probably can't be applied due to white-space changes.  Also add a 'hack_foo' 
member to asection ;-)  Perhaps you might use that idea in your reversal 
patch to make HJ binutils not as slow. 
 
Another thing I noticed while reading the code is some obvious funnyness 
in match_group_member(), which read like so: 
 
match_group_member (asection *sec, asection *group) 
{ 
  asection *first = elf_next_in_group (group); 
  asection *s = first; 
  while (s != NULL) 
    { 
      if (bfd_elf_match_symbols_in_sections (s, sec)) 
        return s; 
      if (s == first) 
        break; 
    } 
  return NULL; 
} 
 
This obviously was designed to loop over all sections in a section group, 
when provided with one.  The loop structure and use of elf_next_in_group 
indicate this.  But this loop actually doesn't iterate, as "s" never 
is changed. 
 
Comment 10 H.J. Lu 2006-04-24 15:02:48 UTC
How often does it happen? What is the worst case link time you have seen
so far?
Comment 11 Michael Matz 2006-04-24 15:17:59 UTC
10 minutes was the worst I think, with the tarball (it might be that it's 
not synced yet, I don't know how often the webserver does that). 
Comment 12 Michael Matz 2006-04-24 15:22:13 UTC
Created attachment 979 [details]
A small hack which can be applied.

So this is a more complete patch which works for us.  For now we tested only
the testsuite and the huge tarball, not any packages.
Comment 13 H.J. Lu 2006-04-25 03:23:46 UTC
Here is the patch to speed it up:

http://sourceware.org/ml/binutils/2006-04/msg00329.html
Comment 14 H.J. Lu 2006-05-24 15:51:09 UTC
Fixed by

http://sourceware.org/ml/binutils/2006-05/msg00183.html