This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Strange ld.gold segmentation error issues.


Hi,

Thank you for the comment.

(2014/06/03 3:30), Cary Coutant wrote:
Oh, I am using -gsplit-dwarf switch to gcc, g++, and
pass -gdb-index to gold. Does it matter? (with the older version of
GNU gold, it did not cause this segmentation error.)

The crash is in the code that creates the .gdb_index section, so that
option is definitely significant.

I see. It gets interesting :-)


Initially, I suspected that it could be an OOM issue since there are
many processes running "make -j4 ...", but then I found I have more
than 2.5 GB of main memory is free at the time the GNU gold  binary was
invoked to produce a .so library just before the segfault occurs The
library is NOT THAT big.

Also, to my surprise, changing "-j4" make switch to "-j3" did not
change the issue. Even with less number of processes invoked by make,
the segfault still occurred. So OOM is unlikely, and come to think of
it, if OOM had happened, the kernel should have recorded it, but I did
not see such messages in kernel logs.

Anyway, so, I modified my local version of "ld" to invoke a shell
script, which
checks if the particular library (here libnspr4.so) is going to be
created, and if so, invokes ld.new (gold binary) under gdb and see
what happens. Otherwise it simply invokes gold binary with the passed
arguments.
(In the previous posting, I thought it was libmozalloc.so that caused
the blowup, but as it turned out it is the next target, libnspr4.so)


Funny, the first few times, it did not trip ?!??
Maybe I was doing something wrong.
On the third try, I could capture the stacktrace.

This is puzzling. I'd think a problem like this would be reproducible,
unless there's some sort of race going on with the .o files. Does the
problem go away if you change make to use "-j1"?

Yes, indeed it was puzzling. I thought it was my manual operation error, but I will check this in parallel with my effort to trace down which object file is causing the trouble.

It is possible that mozilla thunderbird has gone a major build system revision in the last 12 months, and there may be still some issues of
building the binaries. But since the objects linked are for a library
and it is under a file tree hierarchy, I am not sure how can object creation can be mixed up by "-j4" (IF makefile is written/created correctly, that is.)

[I obtained three dumps.
One with my stock ~/.gdbinit tailored to mozilla thunderbird
debugging. But it contained a set of spurious warnings related to
files referenced in .gdbinit.
The 2nd ONE was obtained after this .gdbinit file renamed to .gdbinit.save
to remove the spurious warning.
The 3rd one was obtained after I cleared ccache completely.
I cleared ccache's cache to make sure
that I am not using corrupt object files (for some mysterious
reason). I use a version of ccache enhanced to support -gsplit-dwarf.
https://bitbucket.org/zephyrus00jp/ccache-gsplit-dwarf-support
https://bugzilla.samba.org/show_bug.cgi?id=10005

The second and third stack trace matched completely (except for the
process ID that is printed at the end.) So I am sure ccache is not
involved with the problem.
So I am showing the 3rd dump below.

Funny thing is that I can re-invoke top-most make -f client.mk with
suitable environment variable setting, etc., and can create a working
mozilla thunderbird (!?) I wonder in what condition the left over
libnspr4.so is. Maybe the link/build system of mozilla thunderbird is
clever enough to figure out that libnspr4.a is used instead(?), but I
digress.

Since the linker is crashing early during the first pass, it will not
have even created the output file yet, so you are probably left with
an older copy left over from a link that did not crash.


I see. That may explain it.
I suppose .PRECIOUS or whatever is used in Makefile(s) not to remove
library file before the command is executed.

Program received signal SIGSEGV, Segmentation fault.
gold::Gdb_index::add_symbol (this=0x901e90, cu_index=3,
     sym_name=0x2aaaaaaec000 <Address 0x2aaaaaaec000 out of bounds>,
     flags=0 '\000') at gdb-index.cc:1128
1128          reinterpret_cast<const unsigned char*>(sym_name));
(gdb) #0  gold::Gdb_index::add_symbol (this=0x901e90, cu_index=3,
     sym_name=0x2aaaaaaec000 <Address 0x2aaaaaaec000 out of bounds>,
     flags=0 '\000') at gdb-index.cc:1128
#1  0x0000000000517602 in gold::Gdb_index_info_reader::read_pubtable (
     this=0x7fffffff5a30, table=0x9022d0, offset=<optimized out>)
     at gdb-index.cc:879

This is definitely helpful -- thanks for going through so much trouble
to get these stack traces. This shows that we are in the middle of
hashing a name from the .debug_pubnames (or .debug_gnu_pubnames)
table, but for some reason we have a name that runs off the end of the
table with no null-termination. That should not happen, and suggests a
corrupt .o file. It would be helpful to figure out which .o file we're
reading at this point, but I'll need you to do a bit more to collect
that...


OK

#2  0x00000000005176c9 in
gold::Gdb_index_info_reader::read_pubnames_and_pubtypes
(this=0x7fffffff5a30, die=0x7fffffff5960) at gdb-index.cc:942
#3  0x0000000000518009 in gold::Gdb_index_info_reader::visit_top_die (
     this=0x7fffffff5a30, die=0x7fffffff5960) at gdb-index.cc:379
#4  0x00000000005180d3 in
gold::Gdb_index_info_reader::visit_compilation_unit
     (this=0x7fffffff5a30, cu_offset=<optimized out>,
     cu_length=<optimized out>, root_die=<optimized out>) at gdb-index.cc:326
#5  0x000000000062a8f2 in gold::Dwarf_info_reader::do_parse<false> (
     this=this@entry=0x7fffffff5a30) at dwarf_reader.cc:1363
#6  0x000000000062746e in gold::Dwarf_info_reader::parse (
     this=this@entry=0x7fffffff5a30) at dwarf_reader.cc:1234
#7  0x00000000005187b1 in gold::Gdb_index::scan_debug_info (this=0x901e90,
     is_type_unit=is_type_unit@entry=false, object=object@entry=0x946f90,
     symbols=0x2aaaaaaeb150 "", symbols@entry=0xb <Address 0xb out of
bounds>,
     symbols_size=symbols_size@entry=504, shndx=<optimized out>,
     reloc_shndx=9, reloc_type=4) at gdb-index.cc:1119
#8  0x0000000000550939 in gold::Layout::add_to_gdb_index<64, false> (
     this=this@entry=0x7fffffff6f30, is_type_unit=is_type_unit@entry=false,
     object=object@entry=0x946f90, symbols=0xb <Address 0xb out of bounds>,
     symbols@entry=0x2aaaaaaeb150 "", symbols_size=symbols_size@entry=504,
     shndx=<optimized out>, reloc_shndx=9, reloc_type=4) at layout.cc:1569

In frame #8, the value of object->name_ would tell you which .o file
it's reading. If you can find this and send me a copy of that .o file,
I'd like to take a look at it. (Since you say this is actually a
fairly small link, you could just send me all the .o files.)

I will try to do this. I know nothing about the innards of
ld.gold and the suggestion about object->name_ is very helpful.

And, if the total size of the objects don't add up too much, maybe I can send you the whole set after compressing them, etc.

Thank you again for your help.


-cary


Chiaki






Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]