Bug 16563 - Corrupt .eh-frame section created when linking LTO and non-LTO objects
Summary: Corrupt .eh-frame section created when linking LTO and non-LTO objects
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.25
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-12 12:03 UTC by Nick Clifton
Modified: 2014-09-22 09:28 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Files needed to reproduce problem (25.51 KB, application/x-xz)
2014-02-12 12:03 UTC, Nick Clifton
Details
Second test case (26.70 KB, application/force-download)
2014-08-12 13:19 UTC, Nick Clifton
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Clifton 2014-02-12 12:03:00 UTC
Created attachment 7405 [details]
Files needed to reproduce problem

Linking together LTO and non-LTO objects can result in a corrupt
.eh_frame section.  To reproduce this run the following command on the
uploaded x86_64 object files:

  % g++ -o broken.exe -O0 -flto -fno-fat-lto-objects -flto-partition=none \
    a.o b.o -Wl,--no-demangle

Then examine the contents of the .eh_frame section with:

  % readelf -w broken.exe > /dev/null

  Warning: Invalid CIE pointer 0xfffffccc in FDE at 0x000040

The a.o and b.o object files were produced as follows (sources also
uploaded):

  % g++ -c -O0 -flto -fno-fat-lto-objects a.cpp
  % g++ -c -O0 b.cpp

Note - compiling both a.cpp and b.cpp with LTO results in a working
binary.  So does compiling them both without LTO.  Also if the
-Wl,-traditional linker command line option is included on the g++
command line then the link works.  (This is because traditional linking
disables the optimization of the .eh_frame section).
Comment 1 Alan Modra 2014-07-28 04:16:26 UTC
Hi Nick, this seems to no longer be a problem.  With current mainline gcc and ld I see nothing unusal in .eh_frame.
Comment 2 Nick Clifton 2014-08-12 13:19:20 UTC
Created attachment 7743 [details]
Second test case

Second test case.
Comment 3 Nick Clifton 2014-08-12 13:21:04 UTC
The first test case no longer reproduces the problem - but the problem itself still exists.  I have now uploaded a second test case that still demonstrates the problem.
Comment 4 Alan Modra 2014-08-13 14:46:27 UTC
There are a couple of things going on here, but the "Invalid CIE pointer" is a readelf bug I'd say.  There's nothing prohibiting a FDE to be located before its CIE.  The .eh_frame cie_id is a signed relative offset (gcc's unwinder treats it that way), and .debug_frame cie_id is section relative.  So readelf ought to be able to handle this ordering.

Of course it's unexpected to find CIEs and FDEs like this.  The reason appears to be that the elf-eh-frame.c code finds a duplicate CIE in a.o and b.o, and therefore causes the b.o copy to be removed and some b.o FDE's then use an a.o CIE.  That's all quite nornal, use the CIE the elf-eh-frame.c sees as "earlier".
The problem is that a.o's lto recompiled object actually gets linked into the output .eh_frame *after* b.o's .eh_frame.
Comment 5 Alan Modra 2014-08-13 16:38:38 UTC
I got that wrong.  a.o and b.o appear in the correct order in the output .eh_frame.  Apparently it is the elf-eh-frame.c code that sees them in reverse order.
Comment 6 Alan Modra 2014-08-13 22:25:30 UTC
I'll leave the readelf part of this bug to Nick.  Fixing ld ordering of .eh_frame.
Comment 7 cvs-commit@gcc.gnu.org 2014-08-14 04:51:43 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gdb and binutils".

The branch, master has been updated
       via  18cd5bce47dc4a33dd1d8e3036b99d2fa7e3234f (commit)
      from  b879806f2fdd2eca7092d7b854d6cbbbbbd0493b (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=18cd5bce47dc4a33dd1d8e3036b99d2fa7e3234f

commit 18cd5bce47dc4a33dd1d8e3036b99d2fa7e3234f
Author: Alan Modra <amodra@gmail.com>
Date:   Thu Aug 14 13:49:31 2014 +0930

    Linker part of PR16563 fix
    
    Presents .eh_frame input sections to the optimisation machinery in
    elf-eh-frame.c in the order they are given by the linker script.
    
    	PR 16563
    bfd/
    	* elflink.c (bfd_elf_discard_info): Process .eh_frame and .stab
    	in the order they are mapped to output sections.
    ld/
    	* ldlang.c (map_head_is_link_order): Rename from
    	stripped_excluded_sections.
    	(lang_clear_os_map): New function, extracted from..
    	(strip_excluded_output_sections): ..here.
    	* ldlang.h (lang_clear_os_map): Declare.
    	* ldwrite.c (ldwrite): Call lang_clear_os_map.
    	* emultempl/sh64elf.em (sh64_elf_${EMULATION_NAME}_after_allocation):
    	Likewise.

-----------------------------------------------------------------------

Summary of changes:
 bfd/ChangeLog           |    6 +++
 bfd/elflink.c           |  109 ++++++++++++++++++++++++++--------------------
 ld/ChangeLog            |   12 +++++
 ld/emultempl/sh64elf.em |    3 +
 ld/ldlang.c             |   42 +++++++++++++++---
 ld/ldlang.h             |    2 +
 ld/ldwrite.c            |    1 +
 7 files changed, 121 insertions(+), 54 deletions(-)
Comment 8 Alan Modra 2014-08-14 04:53:23 UTC
Linker problem fixed mainline.
Comment 9 Nick Clifton 2014-08-15 13:39:13 UTC
Hi Alan,

  Thanks for the patch - it does indeed solve the problem.

  I believe however that readelf is/was correct in saying the CIE pointer was invalid.  These pointers are offsets into the .debug_frame section of the CIE entry to be used by the FDE that is being decoded.  But the offsets can never be negative (ie pointing to before the start of .debug_frame), so readelf was correct to complain with this particular testcase.

  You are correct however in saying that readelf does have a potential bug - it will complain about FDEs that reference a CIE that is defined later on in the .debug_frame section.  But I have never yet seen a binary that does this.  When one turns up I will fix readelf. :-)

Cheers
  Nick
Comment 10 cvs-commit@gcc.gnu.org 2014-09-22 09:28:43 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gdb and binutils".

The branch, master has been updated
       via  49727e46175419c638251b39091f24c575568bee (commit)
      from  aa8f4d1e5e6c01420489a2dfba72495bbd8489be (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=49727e46175419c638251b39091f24c575568bee

commit 49727e46175419c638251b39091f24c575568bee
Author: Alan Modra <amodra@gmail.com>
Date:   Mon Sep 22 17:53:15 2014 +0930

    Readelf: Handle forward references to CIEs
    
    The linker side of pr16563 was fixed with commit 18cd5bce, but
    unfortunately people continue to use older linkers with -flto.  This
    means we have binaries with working .eh_frame that can't be dumped by
    readelf, and I'm seeing internal IBM bug reports about this fact.
    
    	PR 16563
    	* dwarf.c (GET): Remove semicolon.
    	(read_cie): New function, extracted from..
    	(display_debug_frames): ..here.  Correctly handle signed offset
    	from FDE to CIE in .eh_frame.  Decode forward referenced CIEs too.

-----------------------------------------------------------------------

Summary of changes:
 binutils/ChangeLog |    8 ++
 binutils/dwarf.c   |  241 +++++++++++++++++++++++++++++++++++-----------------
 2 files changed, 169 insertions(+), 80 deletions(-)