Bug 23839 - ld generates a corrupted relocation table for .ARM.exidx sections
Summary: ld generates a corrupted relocation table for .ARM.exidx sections
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.30
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-10-28 13:44 UTC by Philippe Daouadi
Modified: 2019-07-01 10:19 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Example of relocation outside the section (70.58 KB, application/x-zip-compressed)
2019-06-24 08:36 UTC, lorisnardo
Details
Example containing also the boot.o object file (70.61 KB, application/x-zip-compressed)
2019-06-27 14:04 UTC, lorisnardo
Details
Proposed patch (261 bytes, patch)
2019-06-27 15:26 UTC, Nick Clifton
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Philippe Daouadi 2018-10-28 13:44:40 UTC
I have a bug where my unwind tables are corrupted because of a faulty relocation in the .rel.ARM.exidx section of my output executable.

I just discovered https://sourceware.org/bugzilla/show_bug.cgi?id=20595 which is fixed, but it seems like this is the exact bug I am seeing.

I have applied the following patch to catch my issue:

diff -ru binutils-2.30.orig/bfd/elf32-arm.c binutils-2.30/bfd/elf32-arm.c
--- binutils-2.30.orig/bfd/elf32-arm.c	2018-10-28 14:32:50.434311154 +0100
+++ binutils-2.30/bfd/elf32-arm.c	2018-10-28 14:33:48.771218065 +0100
@@ -12275,6 +12275,9 @@
   relend = relocs + input_section->reloc_count;
   for (; rel < relend; rel++)
     {
+      if (rel->r_offset < 0 || rel->r_offset >= input_section->size)
+        printf("that's an overflow !!!\n");
+
       int			   r_type;
       reloc_howto_type *	   howto;
       unsigned long		   r_symndx;

And indeed, some relocations are applied to addresses that are outside of the concerned section. This is my first time digging in ld's code so please tell me if this is actually normal behavior.

So I assumed that's because of exidx entries merging and I reached the same workaround described here: https://sourceware.org/bugzilla/show_bug.cgi?id=20595#c1 .

I can see some rests of #20595's fix around elf32-arm.c:14823 (still on binutils 2.30), but I don't understand what those external or internal relocations and swap in and out functions are.

I could reproduce my issue on binutils 2.31, but I did all my investigation on 2.30.

Please tell me if I can help investigate the issue. I can provide a reproduction case but I couldn't simplify it so it implies lots of objects, libraries and a custom toolchain.
Comment 1 Nick Clifton 2019-03-18 13:42:51 UTC
(In reply to Philippe Daouadi from comment #0)

Hi Philippe,

  I am very sorry that it has taken me so long to get around to looking
  at this PR.  Is it still a problem ?  If so, do you have a way that
  I can reproduce it myself, so that I can investigate ?

Cheers
  Nick
Comment 2 Philippe Daouadi 2019-03-19 20:11:02 UTC
Hi,

Unfortunately, I couldn't manage to produce a simple test to reproduce this issue. It happens to me with a big enough C++ codebase, and the actual bug appears on PS Vita which is hard to debug. I noticed that the bigger the codebase became, the more the probability of this bug manifesting increased. I guess with more code, there are more exidx entries that get merged and thus more relocations that occur outside of the concerned section.

If you have a C++ project at hand that cross-compiles to ARM, maybe you can try to link it with the patch I wrote in my previous comment to see if the issue triggers.

Regards,
Philippe
Comment 3 Nick Clifton 2019-03-20 11:31:48 UTC
(In reply to Philippe Daouadi from comment #2)

Hi Philippe,

> If you have a C++ project at hand that cross-compiles to ARM, maybe you can

Sorry - I do not have one of those. :-(  I did consider adding in the patch
anyway, although converting the printf() to an abort(), but there are cases
where it is valid for a relocation to have an offset that is larger than the
section size, so that would not work.

Unfortunately it seems that in order to fix this problem we are really
going to need a reproducer.

By the way, what configuration are you using ?

Cheers
  Nick

PS.  It may turn out that these large C++ programs really are too large
and that they cannot be linked successfully.  Obviously the linker should
not silently be producing broken executables, but in the end it may turn
out that there is no real solution to the underlying problem...
Comment 4 Philippe Daouadi 2019-03-21 20:07:55 UTC
Hi Nick,

> there are cases
> where it is valid for a relocation to have an offset that is larger than the
> section size

I didn't know that, what kind of case would that be? Is it really possible to predict what will be the next section?

> By the way, what configuration are you using ?

I'm not sure about what you mean by 'configuration'. This is how binutils is built:

https://github.com/vitasdk/buildscripts/blob/592a603585e1939fe60c0226a47d6d01a88f3192/CMakeLists.txt#L309

You can look at the rest of the file and repository to see how the rest is built.

> Unfortunately it seems that in order to fix this problem we are really
> going to need a reproducer.

I'll be busy next week, but I will try to set up a small reproduction case when I come back the week after. No guarantees though :)
Comment 5 lorisnardo 2019-06-24 08:36:48 UTC
Created attachment 11861 [details]
Example of relocation outside the section
Comment 6 lorisnardo 2019-06-24 08:42:40 UTC
Comment on attachment 11861 [details]
Example of relocation outside the section

In this zip there are two object files (main.o and crtbegin.o) which have been linked using the following command:
ld --cref --verbose -d -nostdlib -EL -marmelf -o output.o -r -T lnk.cmd boot.o main.o crtbegin.o

The linker version used is 2.28.0.20170620 which is part of the GNU Tools for ARM Embedded Processors 6-2017-q2-update.

In the output.o object file, the last relocation in .rel.ARM.exidx, is outside the ARM.exidx section which is referenced
Comment 7 Nick Clifton 2019-06-27 13:21:42 UTC
Hi Lorisnardo,

(In reply to lorisnardo from comment #6)
> ld --cref --verbose -d -nostdlib -EL -marmelf -o output.o -r -T lnk.cmd
> boot.o main.o crtbegin.o

Unfortunately the boot.o file is missing from the zip archive.  Would you
mind uploading a copy please ?

> In the output.o object file, the last relocation in .rel.ARM.exidx, is
> outside the ARM.exidx section which is referenced

Also, if you could, please can you upload the output.o file ?  I would
like to be able to compare it to the versions I produce locally, in
case there is something different between our build environments.

Thanks.

Nick
Comment 8 lorisnardo 2019-06-27 13:59:19 UTC
(In reply to Nick Clifton from comment #7)
> Hi Lorisnardo,
> 
> (In reply to lorisnardo from comment #6)
> > ld --cref --verbose -d -nostdlib -EL -marmelf -o output.o -r -T lnk.cmd
> > boot.o main.o crtbegin.o
> 
> Unfortunately the boot.o file is missing from the zip archive.  Would you
> mind uploading a copy please ?
> 
> > In the output.o object file, the last relocation in .rel.ARM.exidx, is
> > outside the ARM.exidx section which is referenced
> 
> Also, if you could, please can you upload the output.o file ?  I would
> like to be able to compare it to the versions I produce locally, in
> case there is something different between our build environments.
> 
> Thanks.
> 
> Nick


Hi, thanks for the replay.

The boot.o file in the command line is not necessary to produce the output.o file provided in the zip, the command line included it because I erroneously copied it from a previous attempt to reproduce the error.

In any case I will provide both the boot.o and the corresponding output.o file, which also contains the same relocation problem.

Loris
Comment 9 lorisnardo 2019-06-27 14:04:31 UTC
Created attachment 11871 [details]
Example containing also the boot.o object file
Comment 10 Nick Clifton 2019-06-27 15:26:00 UTC
Created attachment 11872 [details]
Proposed patch

Hi Lorisnardo,

  Thanks for the files.  I think that I may have found the problem.
  Please could you try out this patch and let me know if it works for you ?

Cheers
  Nick
Comment 11 lorisnardo 2019-06-27 17:06:09 UTC
(In reply to Nick Clifton from comment #10)
> Created attachment 11872 [details]
> Proposed patch
> 
> Hi Lorisnardo,
> 
>   Thanks for the files.  I think that I may have found the problem.
>   Please could you try out this patch and let me know if it works for you ?
> 
> Cheers
>   Nick

Hi, I have applied the patch and in the provided test case worked, tomorrow I will try it with a more complex case and I'll let you know
Comment 12 lorisnardo 2019-06-28 09:44:24 UTC
Hi,
I have tested the patch with more cases and it works correctly.
Comment 13 Sourceware Commits 2019-07-01 10:18:37 UTC
The master branch has been updated by Nick Clifton <nickc@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=539300fb929ada830ec7c4002fd59f0d86c823b1

commit 539300fb929ada830ec7c4002fd59f0d86c823b1
Author: Nick Clifton <nickc@redhat.com>
Date:   Mon Jul 1 11:17:01 2019 +0100

    Correct the calculation of offsets for ARM exidx relocs when performing a partial link.
    
    	PR 23839
    bfd	* elf32-arm.c (elf32_arm_update_relocs): Do not include the
    	section VMA in the offset used to update exidx relocs.
    
    ld	* testsuite/ld-arm/unwind-4.d: Adjust for corrected calculation of
    	exidx relocs.
Comment 14 Nick Clifton 2019-07-01 10:19:05 UTC
Great - patch applied.

Cheers
  Nick