Bug 18668

Summary: relocation truncated to fit: R_AARCH64_CALL26 (veneers not inserted)
Product: binutils Reporter: David Abdurachmanov <david.abdurachmanov>
Component: ldAssignee: Jiong Wang <jiwang>
Status: RESOLVED FIXED    
Severity: normal CC: jiwang, pbrobinson
Priority: P2    
Version: 2.26   
Target Milestone: ---   
Host: aarch64-*-linux-gnu Target: aarch64-*-linux-gnu
Build: Last reconfirmed:

Description David Abdurachmanov 2015-07-14 09:30:39 UTC
The issues was discovered in OpenLoops package, also this was raised on the internet earlier. IIUC, this works fine if ARM linker is used.

OpenLoops generates DSO of ~150MB and -/+ 128 MB are not enough for direct calls. ld.bfd does not insert veneers in these cases.

A smallest reproducer was posted on LLVM mailing-list:

    void  foo ();
    int main () {foo();}

$ gcc -Wl,--defsym=foo=0x80000000 -o main main.c
/tmp/cc0zWsNY.o: In function `main':
main.c:(.text+0x8): relocation truncated to fit: R_AARCH64_CALL26 against symbol `foo' defined in *ABS* section in main
collect2: error: ld returned 1 exit status

Note, that this works fine on ARMv7/AArch32 and veneer is inserted. Tested on Jetson TK1 + binutils 2.24 + GCC 4.8.

Looking at AArch64 backend in aarch64_type_of_stub there is a check:

2287   if ((r_type == AARCH64_R (CALL26) || r_type == AARCH64_R (JUMP26))
2288       && (branch_offset > AARCH64_MAX_FWD_BRANCH_OFFSET
2289           || branch_offset < AARCH64_MAX_BWD_BRANCH_OFFSET))
2290     {
2291       stub_type = aarch64_stub_long_branch;
2292     }

I guess, the mechanism for adding stub/veneer exist, but it's somehow it's not applied here.

Binutils commit: d5131498a57d1789ff0fea2cfeb1af90802c8dad (Mon Jul 13 17:14:13 2015 +0100)

### OpenLoops failures ###

process_obj/pplljjj/virtual_6_pplljjj_eexuuxggg_1_qp.os: In function `__ol_vamp_6_pplljjj_eexuuxggg_1_qp_MOD_vamp_6':
virtual_6_pplljjj_eexuuxggg_1_qp.f90:(.text+0x22820): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__ol_last_step_qp_MOD_check_last_aq_v' defined in .text section i
n lib/libopenloops.so
virtual_6_pplljjj_eexuuxggg_1_qp.f90:(.text+0x22f94): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__ol_last_step_qp_MOD_check_last_aq_v' defined in .text section i
n lib/libopenloops.so
virtual_6_pplljjj_eexuuxggg_1_qp.f90:(.text+0x23228): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__ol_vert_interface_qp_MOD_loop_av_q' defined in .text section in
lib/libopenloops.so
virtual_6_pplljjj_eexuuxggg_1_qp.f90:(.text+0x234d0): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__ol_prop_interface_qp_MOD_loop_a_q' defined in .text section in
lib/libopenloops.so
virtual_6_pplljjj_eexuuxggg_1_qp.f90:(.text+0x23764): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__ol_vert_interface_qp_MOD_loop_av_q' defined in .text section in
lib/libopenloops.so
virtual_6_pplljjj_eexuuxggg_1_qp.f90:(.text+0x23a0c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__ol_prop_interface_qp_MOD_loop_a_q' defined in .text section in
lib/libopenloops.so
virtual_6_pplljjj_eexuuxggg_1_qp.f90:(.text+0x23c34): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__ol_last_step_qp_MOD_check_last_aq_v' defined in .text section i
n lib/libopenloops.so
virtual_6_pplljjj_eexuuxggg_1_qp.f90:(.text+0x24180): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__ol_prop_interface_qp_MOD_loop_a_q' defined in .text section in
lib/libopenloops.so
virtual_6_pplljjj_eexuuxggg_1_qp.f90:(.text+0x24414): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__ol_vert_interface_qp_MOD_loop_av_q' defined in .text section in
lib/libopenloops.so
virtual_6_pplljjj_eexuuxggg_1_qp.f90:(.text+0x246bc): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__ol_prop_interface_qp_MOD_loop_a_q' defined in .text section in
lib/libopenloops.so
Comment 1 Jiong Wang 2015-07-14 11:10:58 UTC
thanks for reporting this, will double check the status
Comment 2 Jiong Wang 2015-07-14 16:27:42 UTC
> 
> Looking at AArch64 backend in aarch64_type_of_stub there is a check:
> 
> 2287   if ((r_type == AARCH64_R (CALL26) || r_type == AARCH64_R (JUMP26))
> 2288       && (branch_offset > AARCH64_MAX_FWD_BRANCH_OFFSET
> 2289           || branch_offset < AARCH64_MAX_BWD_BRANCH_OFFSET))
> 2290     {
> 2291       stub_type = aarch64_stub_long_branch;
> 2292     }
> 
> I guess, the mechanism for adding stub/veneer exist, but it's somehow it's
> not applied here.

It's not applied because the "st_type == STT_FUNC" at the head of the function.

Can I reproduce this issue by building OpenLoops svn head?
Comment 3 David Abdurachmanov 2015-07-14 16:36:41 UTC
OpenLoops are suffering from another issue -- the offset between load instruction and constant pool is above 1MB boundary. Yes, you cannot have huge functions in AArch64.

PR63304 on GCC (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304). OpenLoops compilation most likely will fail with these errors first.

After explaining the problem to the authors, they were working on cutting down the size of function bodies. Some of that work landed in http://openloops.hepforge.org/svn/OpenLoops/branches/public_beta . That's the one, which hit "relocation truncated to fit: R_AARCH64_CALL26" a number of times, because DSO is huge.

Compilation instructions could be taken from here: https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_7_6_X/stable/openloops.spec
Comment 4 David Abdurachmanov 2015-07-15 08:17:24 UTC
I all process libraries except one in OpenLoops, that's enough to trigger the issue.

You can get it here (4.7M): http://davidlt.web.cern.ch/davidlt/vault/openloops-1.1.1-stripped.tar.bz2

$ sha256sum openloops-1.1.1-stripped.tar.bz2
e8f55e404b1076f8fd8d2f42390e74dad5024d7a3b0a30dd905e4cc981628911  openloops-1.1.1-stripped.tar.bz2

You can trigger compilation via

    cd openloops-1.1.1-stripped
    ./openloops update --processes generator=0

It takes <6 min before linking issue is hit on 8-core APM X-Gene 1. You also need at least 4.9 GCC. With 4.8 you would hit an ICE.


### OpenLoops failure ###

process_obj/ppajjj/born_ppajjj_ddxagggg_1_qp.os: In function `colourvector.5130':
born_ppajjj_ddxagggg_1_qp.f90:(.text+0x7418c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__subtf3@@GCC_3.0' defined in .text section in /home/davidlt/build/b/slc7_aarch64_gcc493/external/gcc/4.9.3/bin/../lib/gcc/aarch64-unknown-linux-gnu/4.9.3/../../../../lib64/libgcc_s.so
born_ppajjj_ddxagggg_1_qp.f90:(.text+0x74208): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__subtf3@@GCC_3.0' defined in .text section in /home/davidlt/build/b/slc7_aarch64_gcc493/external/gcc/4.9.3/bin/../lib/gcc/aarch64-unknown-linux-gnu/4.9.3/../../../../lib64/libgcc_s.so
born_ppajjj_ddxagggg_1_qp.f90:(.text+0x7421c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__subtf3@@GCC_3.0' defined in .text section in /home/davidlt/build/b/slc7_aarch64_gcc493/external/gcc/4.9.3/bin/../lib/gcc/aarch64-unknown-linux-gnu/4.9.3/../../../../lib64/libgcc_s.so
born_ppajjj_ddxagggg_1_qp.f90:(.text+0x74330): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__subtf3@@GCC_3.0' defined in .text section in /home/davidlt/build/b/slc7_aarch64_gcc493/external/gcc/4.9.3/bin/../lib/gcc/aarch64-unknown-linux-gnu/4.9.3/../../../../lib64/libgcc_s.so
born_ppajjj_ddxagggg_1_qp.f90:(.text+0x74344): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__subtf3@@GCC_3.0' defined in .text section in /home/davidlt/build/b/slc7_aarch64_gcc493/external/gcc/4.9.3/bin/../lib/gcc/aarch64-unknown-linux-gnu/4.9.3/../../../../lib64/libgcc_s.so
born_ppajjj_ddxagggg_1_qp.f90:(.text+0x7461c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__subtf3@@GCC_3.0' defined in .text section in /home/davidlt/build/b/slc7_aarch64_gcc493/external/gcc/4.9.3/bin/../lib/gcc/aarch64-unknown-linux-gnu/4.9.3/../../../../lib64/libgcc_s.so
born_ppajjj_ddxagggg_1_qp.f90:(.text+0x74630): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__subtf3@@GCC_3.0' defined in .text section in /home/davidlt/build/b/slc7_aarch64_gcc493/external/gcc/4.9.3/bin/../lib/gcc/aarch64-unknown-linux-gnu/4.9.3/../../../../lib64/libgcc_s.so
born_ppajjj_ddxagggg_1_qp.f90:(.text+0x746a8): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__subtf3@@GCC_3.0' defined in .text section in /home/davidlt/build/b/slc7_aarch64_gcc493/external/gcc/4.9.3/bin/../lib/gcc/aarch64-unknown-linux-gnu/4.9.3/../../../../lib64/libgcc_s.so
born_ppajjj_ddxagggg_1_qp.f90:(.text+0x746bc): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__subtf3@@GCC_3.0' defined in .text section in /home/davidlt/build/b/slc7_aarch64_gcc493/external/gcc/4.9.3/bin/../lib/gcc/aarch64-unknown-linux-gnu/4.9.3/../../../../lib64/libgcc_s.so
born_ppajjj_ddxagggg_1_qp.f90:(.text+0x7473c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__subtf3@@GCC_3.0' defined in .text section in /home/davidlt/build/b/slc7_aarch64_gcc493/external/gcc/4.9.3/bin/../lib/gcc/aarch64-unknown-linux-gnu/4.9.3/../../../../lib64/libgcc_s.so
born_ppajjj_ddxagggg_1_qp.f90:(.text+0x74750): additional relocation overflows omitted from the output
collect2: error: ld returned 1 exit status
scons: *** [proclib/libopenloops_ppajjj_lt.so] Error 1
./openloops update --processes generator=0  2404.52s user 13.93s system 693% cpu 5:48.57 total
Comment 5 Jiong Wang 2015-07-15 10:38:12 UTC
(In reply to David Abdurachmanov from comment #4)
> I all process libraries except one in OpenLoops, that's enough to trigger
> the issue.
> 
> You can get it here (4.7M):
> http://davidlt.web.cern.ch/davidlt/vault/openloops-1.1.1-stripped.tar.bz2
> 
> $ sha256sum openloops-1.1.1-stripped.tar.bz2
> e8f55e404b1076f8fd8d2f42390e74dad5024d7a3b0a30dd905e4cc981628911 
> openloops-1.1.1-stripped.tar.bz2
> 
> You can trigger compilation via
> 
>     cd openloops-1.1.1-stripped
>     ./openloops update --processes generator=0
> 
> It takes <6 min before linking issue is hit on 8-core APM X-Gene 1. You also
> need at least 4.9 GCC. With 4.8 you would hit an ICE.

I hit an ICE with gcc 4.9.1 on my aarch64 board, then I rebuild above package using gcc built from svn trunk, the build finished, no linking errors.

I guess this may caused by some new features in gcc trunk reduce code size slightly that the the branch range can fit into.

Anyway, as documented on AArch64 ELF spec, chapter 4.6.7, we still needs that st_type == STT_FUNC check, but looks like we are missing the other "or" cases support:

  * The target symbol has typeSTT_FUNC
  * Or, the target symbol and relocated place are in separate sections input to the linker.
  * Or, the target symbol is undefined (external to the link unit).

Your main/foo small cases definitely can be fixed by adding a simple input section check as the symbol defined via "--defsym" is in *ABS* section while the code itself is in text section.

While for the Openloops failure, I guess it's not enough by only fixing the first "Or".

But it should not fall into the second "Or" scenario also, as if you are building shared library, then those undefine symbol should go through PLT stub, and if you are building exectuable, should not be undefined symbol, otherwise linking error will happen.

My understanding if you have run into a issue which is:
  * you are building shared library.
  * the function call go through PLT stub.
  * the call to plt stub itself is out of range that triggered
    his relocation truncation error.

Then a further look at AArch64 BFD code shows we don't insert veneer for call to plt stub while we should.

    if (via_plt_p)
      return stub_type; which is stub_type_none.

will fix this by constructing one hand-written testcase. And I will try gcc 4.9.2 to see if I can reproduce your OpenLoop error to confirm my fix is OK.
Comment 6 David Abdurachmanov 2015-07-15 10:55:05 UTC
My last test was with GCC 4.9.3, that did not ICE during OpenLoops compilation.

Thanks for looking into this!
Comment 7 Jiong Wang 2015-07-15 14:00:43 UTC
(In reply to David Abdurachmanov from comment #6)
> My last test was with GCC 4.9.3, that did not ICE during OpenLoops
> compilation.
> 
> Thanks for looking into this!

gcc 4.9.3 still ICE, but I have applied the relative GCC fix then I can reproduce this error.

I can confirm that it's caused by the call to plt stub is out of range, and my local patch fixed it.

I will send out the patch after cleanup.
Comment 8 David Abdurachmanov 2015-07-15 14:09:37 UTC
Could you point to GCC PR you mentioned? I just want to x-check why my toolchain build works.

I will give your patch a spin once it lands, though OpenLoops still won't fully compile  (offset between load instruction and constant pool being too great).

BTW, does your patch also solves main/foo test case?

It would be nice to see fixed in 2.25.1 if the deadline hasn't yet passed.
Comment 9 Jiong Wang 2015-07-15 14:13:38 UTC
(In reply to David Abdurachmanov from comment #8)
> Could you point to GCC PR you mentioned? I just want to x-check why my
> toolchain build works.

the gcc fix is

  https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/optabs.c?r1=216765&r2=216764&pathrev=216765
 
> BTW, does your patch also solves main/foo test case?

yes, that's resolved also.

> 
> It would be nice to see fixed in 2.25.1 if the deadline hasn't yet passed.

Hopefully we can.
Comment 10 Jiong Wang 2015-07-16 14:36:49 UTC
the patch fix OpenLoop builds is sent out for review.

  https://sourceware.org/ml/binutils/2015-07/msg00137.html

NOTE, it address the OpenLoops issue only which is root cause of what David run into.

A seperate patch which fix the new issue exposed by the small foo/main testcase will be sent out later.
Comment 11 David Abdurachmanov 2015-08-07 11:35:15 UTC
I tested it in the last couple of days and it worked fine. Details on GCC bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304
Comment 12 David Abdurachmanov 2015-08-07 11:38:57 UTC
For the record, the patch for "-Wl,--defsym=foo=0x80000000" fix is here (already approved):

https://sourceware.org/ml/binutils/2015-07/msg00210.html
Comment 13 Sourceware Commits 2015-08-11 21:00:25 UTC
The master branch has been updated by Jiong Wang <jiwang@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=07f9ddfeba5b572451471f905473f7ddbba1d472

commit 07f9ddfeba5b572451471f905473f7ddbba1d472
Author: Jiong Wang <jiong.wang@arm.com>
Date:   Tue Aug 11 21:44:31 2015 +0100

    [AArch64] PR18668, repair long branch veneer for plt stub
    
    2015-08-11  Jiong Wang  <jiong.wang@arm.com>
    bfd/
       PR ld/18668
       * elfnn-aarch64.c (aarch64_type_of_stub): Update destination for
       calls go through plt stub.
       (elfNN_aarch64_final_link_relocate): Adjust code logic for CALL26,
       JUMP26 relocation to support inserting veneer for call to plt stub.
    
    ld/testsuite/
       * ld-aarch64/farcall-b-gsym.s: New test.
       * ld-aarch64/farcall-b-plt.s: Likewise.
       * ld-aarch64/farcall-bl-plt.s: Likewise.
       * ld-aarch64/farcall-b-gsym.d: New expect file.
       * ld-aarch64/farcall-b-plt.d: Likewise.
       * ld-aarch64/farcall-bl-plt.d: Likewise.
Comment 14 Jiong Wang 2015-08-11 21:02:07 UTC
mark as fixed