Bug 11488 - fsf gdb x86_64-apple-darwin crashes when loading libraries due to an endless loop
Summary: fsf gdb x86_64-apple-darwin crashes when loading libraries due to an endless...
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: shlibs (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-04-12 10:09 UTC by Andre'
Modified: 2014-05-28 19:45 UTC (History)
9 users (show)

See Also:
Host: x86_64-apple-darwin
Target:
Build:
Last reconfirmed:


Attachments
workaround for this bug on mac (347 bytes, application/octet-stream)
2011-06-20 10:28 UTC, Fawzi Mohamed
Details
patch on the top for 7.2 branch to skip stub libs or encrypted images (1.42 KB, application/octet-stream)
2011-06-20 10:44 UTC, Fawzi Mohamed
Details
reformatted workaround for this bug on mac (344 bytes, patch)
2011-06-20 11:05 UTC, Fawzi Mohamed
Details | Diff
reformatted patch to skip encrypted and stub libs (1.40 KB, patch)
2011-06-20 11:10 UTC, Fawzi Mohamed
Details | Diff
ignores the routines_64 section (250 bytes, patch)
2011-06-23 17:54 UTC, Fawzi Mohamed
Details | Diff
fixes read of mmaped sections (537 bytes, patch)
2011-06-23 18:53 UTC, Fawzi Mohamed
Details | Diff
fixes mmap read of sections (C90 reformat) (531 bytes, patch)
2011-06-23 19:40 UTC, Fawzi Mohamed
Details | Diff
ensures that the cie ptr of a fde is really a cie (1.07 KB, patch)
2011-06-23 19:48 UTC, Fawzi Mohamed
Details | Diff
ensures that the cie ptr of a fde is really a cie (reformatted) (1.08 KB, patch)
2011-06-27 15:46 UTC, Fawzi Mohamed
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andre' 2010-04-12 10:09:25 UTC
fsf cvs gdb says: unable to read unknown load command 0x1a
and crashes afterwards.

This happens for code as simple as

-----------------------------------
//#include <QtCore/QCoreApplication>

int main(int argc, char *argv[])
{
//    QCoreApplication a(argc, argv);
//    QString name(argv[0]);

    return 0;
}
-----------------------------------

[I.e. basically a 'return 0;']

for a binary build with 

g++ -c -pipe -g -gdwarf-2 -arch x86_64 -Xarch_x86_64 -mmacosx-version-min=10.5
-Wall -W -DQT_CORE_LIB -DQT_SHARED -I../../../git/qt/qt-4.6/mkspecs/macx-g++
-I../../qt-test-app
-I../../../git/qt/qt-4.6/lib/QtCore.framework/Versions/4/Headers
-I../../../git/qt/qt-4.6/include/QtCore -I../../../git/qt/qt-4.6/include -I.
-I../../qt-test-app -I. -F/data/git/qt/qt-4.6/lib -o main.o
../../qt-test-app/main.cpp

g++ -headerpad_max_install_names -arch x86_64 -Xarch_x86_64
-mmacosx-version-min=10.5 -o qt-test-app main.o -F/data/git/qt/qt-4.6/lib
-L/data/git/qt/qt-4.6/lib -framework QtCore -L/data/git/qt/qt-4.6/lib

That's a plain Qt based application using a default 4.6 install of Qt.

[The qmake file generating that line is:

-----------------------------------
QT       += core
QT       -= gui

TARGET = qt-test-app
CONFIG   += console
CONFIG   -= app_bundle

SOURCES += main.cpp
-----------------------------------


The back trace:


Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00007fff5f3fffd0
0x0000000100145fd4 in decode_frame_entry (unit=0x1033a0580, start=0x102c77570
"\f", eh_frame_p=1, cie_table=0x7fff5fbfed50, fde_table=0x7fff5fbfed40) at
dwarf2-frame.c:1904
1904    {
(gdb) bt
#0  0x0000000100145fd4 in decode_frame_entry (unit=0x1033a0580,
start=0x102c77570 "\f", eh_frame_p=1, cie_table=0x7fff5fbfed50,
fde_table=0x7fff5fbfed40) at dwarf2-frame.c:1904
#1  0x000000010014668a in decode_frame_entry (unit=0x1033a0580,
start=0x102c77570 "\f", eh_frame_p=1, cie_table=0x7fff5fbfed50,
fde_table=0x7fff5fbfed40) at dwarf2-frame.c:1857
#2  0x000000010014668a in decode_frame_entry (unit=0x1033a0580,
start=0x102c77570 "\f", eh_frame_p=1, cie_table=0x7fff5fbfed50,
fde_table=0x7fff5fbfed40) at dwarf2-frame.c:1857
#3  0x000000010014668a in decode_frame_entry (unit=0x1033a0580,
start=0x102c77570 "\f", eh_frame_p=1, cie_table=0x7fff5fbfed50,
fde_table=0x7fff5fbfed40) at dwarf2-frame.c:1857
#4  0x000000010014668a in decode_frame_entry (unit=0x1033a0580,
start=0x102c77570 "\f", eh_frame_p=1, cie_table=0x7fff5fbfed50,
fde_table=0x7fff5fbfed40) at dwarf2-frame.c:1857
#5  0x000000010014668a in decode_frame_entry (unit=0x1033a0580,
start=0x102c77570 "\f", eh_frame_p=1, cie_table=0x7fff5fbfed50,
fde_table=0x7fff5fbfed40) at dwarf2-frame.c:1857
#6  0x000000010014668a in decode_frame_entry (unit=0x1033a0580,
start=0x102c77570 "\f", eh_frame_p=1, cie_table=0x7fff5fbfed50,
fde_table=0x7fff5fbfed40) at dwarf2-frame.c:1857

[snip]

#52396 0x000000010014668a in decode_frame_entry (unit=0x1033a0580,
start=0x102c77570 "\f", eh_frame_p=1, cie_table=0x7fff5fbfed50,
fde_table=0x7fff5fbfed40) at dwarf2-frame.c:1857
#52397 0x000000010014668a in decode_frame_entry (unit=0x1033a0580,
start=0x102c77570 "\f", eh_frame_p=1, cie_table=0x7fff5fbfed50,
fde_table=0x7fff5fbfed40) at dwarf2-frame.c:1857
#52398 0x000000010014668a in decode_frame_entry (unit=0x1033a0580,
start=0x102c77570 "\f", eh_frame_p=1, cie_table=0x7fff5fbfed50,
fde_table=0x7fff5fbfed40) at dwarf2-frame.c:1857
#52399 0x0000000100146ad3 in dwarf2_build_frame_info (objfile=0x10334ee00) at
dwarf2-frame.c:2067
#52400 0x000000010001b66f in macho_symfile_read (objfile=0x10334ee00,
symfile_flags=8) at machoread.c:664
#52401 0x00000001000ca7c9 in syms_from_objfile (objfile=0x10334ee00,
addrs=0x1007cfb90, offsets=0x0, num_offsets=0, add_flags=8) at symfile.c:990
#52402 0x00000001000caa75 in symbol_file_add_with_addrs_or_offsets (abfd=<value
temporarily unavailable, due to optimizations>, add_flags=8, addrs=0x1007cfb90,
offsets=0x0, num_offsets=0, flags=2) at symfile.c:1082
#52403 0x0000000100016b88 in solib_read_symbols (so=0x100944800, flags=8) at
solib.c:484
#52404 0x00000001000173e7 in solib_add (pattern=0x0, from_tty=0, target=<value
temporarily unavailable, due to optimizations>, readsyms=1) at solib.c:776
#52405 0x00000001000dd850 in handle_inferior_event (ecs=0x7fff5fbff360) at
infrun.c:4099
#52406 0x00000001000e04b2 in wait_for_inferior (treat_exec_as_sigtrap=0) at
infrun.c:2483
#52407 0x00000001000e0848 in proceed (addr=0, siggnal=TARGET_SIGNAL_0, step=0)
at infrun.c:2005
#52408 0x00000001000d6dad in run_command_1 (args=0x0, from_tty=1,
tbreak_at_main=<value temporarily unavailable, due to optimizations>) at
infcmd.c:585
#52409 0x000000010018c98b in execute_command (p=0x100708af3 "", from_tty=1) at
top.c:441
#52410 0x00000001000f2a94 in command_handler (command=0x100708af0 "") at
event-top.c:511
#52411 0x00000001000f3655 in command_line_handler (rl=<value temporarily
unavailable, due to optimizations>) at event-top.c:736
#52412 0x00000001001c46b9 in rl_callback_read_char () at callback.c:205
#52413 0x00000001000f2bf9 in rl_callback_read_char_wrapper (client_data=<value
temporarily unavailable, due to optimizations>) at event-top.c:178
#52414 0x00000001000f17af in process_event () at event-loop.c:393
#52415 0x00000001000f2306 in gdb_do_one_event (data=<value temporarily
unavailable, due to optimizations>) at event-loop.c:458
#52416 0x00000001000eb349 in catch_errors (func=0x1000f20f0 <gdb_do_one_event>,
func_args=0x0, errstring=0x1002645d8 "", mask=<value temporarily unavailable,
due to optimizations>) at exceptions.c:510
#52417 0x0000000100058a36 in tui_command_loop (data=<value temporarily
unavailable, due to optimizations>) at ./tui/tui-interp.c:171
#52418 0x00000001000ed809 in captured_command_loop (data=<value temporarily
unavailable, due to optimizations>) at ./main.c:229
#52419 0x00000001000eb349 in catch_errors (func=0x1000ed800
<captured_command_loop>, func_args=0x0, errstring=0x1002645d8 "", mask=<value
temporarily unavailable, due to optimizations>) at exceptions.c:510
#52420 0x00000001000eceea in captured_main (data=<value temporarily unavailable,
due to optimizations>) at ./main.c:907
#52421 0x00000001000eb349 in catch_errors (func=0x1000ec560 <captured_main>,
func_args=0x7fff5fbff9c0, errstring=0x1002645d8 "", mask=<value temporarily
unavailable, due to optimizations>) at exceptions.c:510
#52422 0x00000001000ec32b in gdb_main (args=<value temporarily unavailable, due
to optimizations>) at ./main.c:916
#52423 0x00000001000015cd in main (argc=<value temporarily unavailable, due to
optimizations>, argv=<value temporarily unavailable, due to optimizations>) at
gdb.c:33
Comment 1 David Anderson 2010-04-22 08:57:13 UTC
It looks like this happens for binaries that are linked against a Framework.
Comment 2 Arvid Picciani 2010-09-21 15:51:55 UTC
i can reproduce this without Qt:

g++ -g main.cpp -framework ApplicationServices 

where main.cpp only contains: int main(){}

the result ./a.out will fail to load in gdb 
Comment 3 Alexander 2011-05-24 13:59:31 UTC
Hello,

Is there any update on this problem? I run into the same using the latest GDB (7.2) from macports.
Comment 4 Andre' 2011-05-24 14:55:17 UTC
I don't think so. For the last few times the topic came up on Freenode's #gdb channel (and it does so every few weeks), the result was one tenth "works for me" (but presumably because the guy was either not using C++ or not using Frameworks) and nine tenth "can't reproduce, as I don't have a Mac". Given that Apple already jumped to LLDB, this doesn't look like it will ever be fixed.
Comment 5 Fawzi Mohamed 2011-06-09 14:32:34 UTC
I traced this problem to libobjc.A.dylib (i.e. the objective-c runtime).
It seems that there is something into it that cannot be parsed correctly by decode_frame_entry.
apple-gdb simply doesn't try to run it on it (in fact on any mach-o lib without info).
A simple hack would be to also not look into it, and indeed it works, but I suppose that the decision to run it on all objects by default (unlike apple that resorts to the minima info and exported symbols)  was done on purpose by fsf, so one should fix the parsing.
I will try to look a bit more in detail into it.
Comment 6 Fawzi Mohamed 2011-06-20 10:28:34 UTC
Created attachment 5806 [details]
workaround for this bug on mac

A minimal workaround for the 7.2 branch to avoid the infinite loop triggered by libobjc.A.dylib .
This is what apple gdb does, it skips dwarf2_build_frame_info when there is no dwarf information. Apple version might still do it if the user forces the use of eh info, something that is off by default.
Comment 7 Fawzi Mohamed 2011-06-20 10:44:49 UTC
Created attachment 5807 [details]
patch on the top for 7.2 branch to skip stub libs or encrypted images

this is what apple gdb does, it might be that gdb parsing is robust enough, and does not need it (I haven't done test libs that trigger this), but it might be useful to skip parsing in these situations.
Comment 8 Fawzi Mohamed 2011-06-20 11:05:46 UTC
Created attachment 5808 [details]
reformatted workaround for this bug on mac
Comment 9 Fawzi Mohamed 2011-06-20 11:10:39 UTC
Created attachment 5809 [details]
reformatted patch to skip encrypted and stub libs
Comment 10 Fawzi Mohamed 2011-06-20 11:12:29 UTC
I am now looking at the mach-o format to see if I understand what is parsed incorrectly in libobjc.A.dylib
Comment 11 Fawzi Mohamed 2011-06-21 15:03:12 UTC
I have found the problem, and a way to detect it.
decode_frame_entry and decode_frame_entry_1 decodes both CIE or FDE.
A FDE has a back pointer to its CIE.
Sometime on mac this pointer is broken and points back to a FDE.
In libobjc.A.dylib this is particularly broken, and points back to 
As to parse the CIE if the pointer is unknown one simply calls decode_frame_enty this inconsistency is not detected. One should split the function, so that one that decodes only CIE is used, or peek, and ensure that the CIE pointer actually points to a CIE.

I think that there is a strong argument to introduce such a check, as it is just the correct thing to do, and if done correctly it doesn't slow down the parsing of .eh_frame section.

Then one should think what to do in such a case, one can set the cie to null (something that is detected later on), or to the last CIE.

That is not the end of the problem, because I noted that the information that gdb is parsing as .eh_frame is different from what
  otool -s __TEXT __eh_frame libobjc.A.dylib
outputs, I don't know if this is expected or not, but the __TEXT __eh_frame section of the mach-o executable looks much more sensible. This might be connected with fat binaries, but I haven't investigated it yet.
Comment 12 Fawzi Mohamed 2011-06-23 17:54:16 UTC
Created attachment 5814 [details]
ignores the routines_64 section

patch on the 7.2 branch to ignore the section with the init routines for dylibs on 64 bit architectures (as done with 32 bit).
Removes the "unable to read unknown load command 0x1a" message.
Comment 13 Josh Matthews 2011-06-23 18:08:53 UTC
Thank you so much for doing this work. Applying all three patches to trunk allowed me to run a debug build of Firefox for the first time. I did see a bunch of output like this at first:

warning: can't find section '.const' in OSO file /Users/jdm/src/mozilla-central/obj-ff-dbg/nsprpub/pr/src/io/./prscanf.o
warning: can't find section '.const' in OSO file /Users/jdm/src/mozilla-central/obj-ff-dbg/nss/nssutil/dertime.o
warning: can't find section '*UND*' in OSO file /Users/jdm/src/mozilla-central/obj-ff-dbg/toolkit/library/../../content/smil/nsSMILRepeatCount.o
warning: can't find section '*UND*' in OSO file /Users/jdm/src/mozilla-central/obj-ff-dbg/toolkit/library/../../dist/lib/libcrmf.a(crmftmpl.o)

but it ran. The next time I tried, I saw this assertion after that earlier spew:

machoread.c:392: internal-error: macho_add_oso_symfile: Assertion `current_oso.symbol_table == NULL' failed.

But that's progress!
Comment 14 Fawzi Mohamed 2011-06-23 18:53:30 UTC
Created attachment 5815 [details]
fixes read of mmaped sections

patch against the 7.2 branch that fixes the mmap based reading of sections of fat architecture files (the mmap forgot the offset of the mach-o file within the fat file (I also sneaked in a fix to support file offsets larger than 32 bits on 64 bit architectures, better be prepared ;).

This should fix a lot of errors, not only the eh_section, hopefully making the debugger fully functional on mac.
Comment 15 Fawzi Mohamed 2011-06-23 19:40:06 UTC
Created attachment 5816 [details]
fixes mmap read of sections (C90 reformat)
Comment 16 Fawzi Mohamed 2011-06-23 19:48:14 UTC
Created attachment 5817 [details]
ensures that the cie ptr of a fde is really a cie

patch against the 7.2 branch that checks that the CIE pointer in an FDE is a CIE and not an FDE, otherwise when a for example an FDE points to itself you have a recursive call that exhausts the stack (the fde tries to decode its cie, as it is not in the table, which being a fde tries to decode its cie....).
This was what happened in libobjc.A.dylib.

There are various ways to detect this, I implemented one, but I think that in some form the check should go in gdb, as it makes it more robust.
Comment 17 Fawzi Mohamed 2011-06-23 20:10:56 UTC
Hi Josh, you are welcome, now it should be even better, I hope to have fixed all "blocking" bugs.

An overview for the reviewers:

Apple gdb simply ignores the eh_frame sections for the libraries without embedded dwarf info, not using eh information (which was http://sourceware.org/bugzilla/attachment.cgi?id=5808&action=diff ) but I find that one should be able to use them, so I looked further and found the "correct" fix.
As the path to it was a bit convoluted I have done a few improvements to pieces of code that did fail as consequence of the original bug.

The "main" fix is
   http://sourceware.org/bugzilla/attachment.cgi?id=5816&action=diff
which fixes the mmaped read of sections.

I feel that
	http://sourceware.org/bugzilla/attachment.cgi?id=5817
which adds a check on the cie pointer is important and improves gdb robustness and should also go in.

	http://sourceware.org/bugzilla/attachment.cgi?id=5814
adds the forgotten ignore of routine_64 load command, and should also go in

	http://sourceware.org/bugzilla/attachment.cgi?id=5809
is something that I have seen apple does and seems reasonable, but I have not needed it, so I am not sure if it should go in
Comment 18 Ehsan Akhgari 2011-06-24 16:07:44 UTC
Fawzi, I tried your patches, but it seems that something is broken for me:

ehsanakhgari:~/src/gdb (ehsan-trial) [12:06:32]$ ./objdir/gdb/gdb ls
GNU gdb (GDB) 7.3.50.20110623-cvs
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin10.7.0".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
"/bin/ls": not in executable format: File format not recognized
(gdb) r
Starting program:  
No executable file specified.
Use the "file" or "exec-file" command.
(gdb) quit

Do you have any ideas what's going wrong here?
Comment 19 Fawzi Mohamed 2011-06-27 10:41:44 UTC
Hi Ehsan, that looks like another bug, if I should guess, I would say that something in gdb doesn't support fat main executables.
That is a separate issue, please open another bug report (if there isn't already one about it).

Note that Tristan Gingold 2011-06-24 submitted a patch that also contains my main fix, and it has been approved and already checked in so the core of this bug should be basically fixed.
I haven't changed the status because I wasn't able to confirm the fix as the master seems to be broken (and since a bit) at least on mac due to the removal of ENUM_BITFIELD, which wasn't done fully consequently.
Comment 20 Fawzi Mohamed 2011-06-27 15:46:09 UTC
Created attachment 5823 [details]
ensures that the cie ptr of a fde is really a cie (reformatted)
Comment 21 Fawzi Mohamed 2011-07-04 18:10:59 UTC
I confirmed that the current master has fixed this bug (previously my environment did include an external ansidecl.h file, which broke my build).

Basically both "ignores the routines_64 section" and "fixes mmap read of sections (C90 reformat)" (as done by Tristan) are in master.
Thus I am closing this bug.

I am still submitting the ciePtr patch as it improves the robustness of gdb.

The skipping of encrypted and stub libs, should probably wait if/until someone finds it useful.
Comment 22 Jonathan Watt 2011-09-06 17:41:31 UTC
Can someone who knows the gdb release process/schedule clarify which version of gdb this fix will be in please?
Comment 23 Andre' 2011-09-07 06:09:27 UTC
I am not involved in the gdb release process in any way, but according to my current understanding of the process I would expect this to show up in the gdb 7.4 release. Until then you can build from cvs or git.
Comment 24 Jonathan Watt 2011-09-08 08:35:35 UTC
Thanks, Andre.
Comment 25 Jackie Rosen 2014-02-16 19:43:19 UTC Comment hidden (spam)