Bug 21062 - Building GCC in LTO mode fails when using ld.gold on AARCH64
Summary: Building GCC in LTO mode fails when using ld.gold on AARCH64
Status: RESOLVED MOVED
Alias: None
Product: binutils
Classification: Unclassified
Component: gold (show other bugs)
Version: 2.28
: P2 normal
Target Milestone: ---
Assignee: Cary Coutant
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-18 18:44 UTC by PeteVine
Modified: 2017-05-15 18:49 UTC (History)
3 users (show)

See Also:
Host: aarch64-linux-gnu
Target: aarch64-linux-gnu
Build: aarch64-linux-gnu
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description PeteVine 2017-01-18 18:44:48 UTC
Hi,

I've never had to report any binutils bugs (the wonders of old platforms ;)) so I'm not exactly sure how to proceed.

I've encountered the following bug related to gold and LTO:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105

which only happens during LTO bootstrap on a Cortex A53 CPU (S905 SoC). Using normal bootstrap or switching to ld.bfd is sufficient to complete the LTO build. 

I rebuilt binutils yesterday (2.28.51.20170117) but the produced code still crashes.

What I missed previously is the following kernel debug message which might be relevant here. Any suggestions for exonerating gold and implicating the kernel welcome:

[ 4802.696081] cc1[14840]: unhandled level 2 translation fault (11) at 0x00000000, esr 0x92000006
[ 4802.696092] pgd = ffffffc055a2c000
[ 4802.700620] [00000000] *pgd=0000000055693003, *pmd=0000000000000000

[ 4802.709039] CPU: 0 PID: 14840 Comm: cc1 Not tainted 3.14.29-amlogics905x-gb122fc2-dirty #9
[ 4802.709043] task: ffffffc054b62000 ti: ffffffc065e3c000 task.ti: ffffffc065e3c000
[ 4802.709052] PC is at 0xffb8d0
[ 4802.709059] LR is at 0xffb638
[ 4802.709063] pc : [<0000000000ffb8d0>] lr : [<0000000000ffb638>] pstate: 60000000
[ 4802.709066] sp : 0000007ff62c4d00
[ 4802.709069] x29: 0000007ff62c5c30 x28: 0000000000000000 
[ 4802.709074] x27: 0000007ff62c51f0 x26: 000000000148fb40 
[ 4802.709079] x25: 0000007ff62c55b0 x24: 00000000013dd000 
[ 4802.709083] x23: 0000000000743c10 x22: 0000007ff62c5c50 
[ 4802.709087] x21: 0000007ff62c55b0 x20: 0000007ff62c4d70 
[ 4802.709092] x19: 0000007ff62c55b0 x18: 000000000000001d 
[ 4802.709096] x17: 0000007f8c274a40 x16: 00000000013e0058 
[ 4802.709100] x15: 0000000000000001 x14: 0000000000000000 
[ 4802.709104] x13: 0000000000000000 x12: 0000000000000000 
[ 4802.709109] x11: 0000000000000000 x10: 0000000000000000 
[ 4802.709113] x9 : 0000000000000000 x8 : 0000000000000000 
[ 4802.709117] x7 : 0000000000000000 x6 : 0000007ff62c5130 
[ 4802.709121] x5 : ffffffffffffffc0 x4 : 0000000000000000 
[ 4802.709126] x3 : 0000000000743e34 x2 : 0000000000000008 
[ 4802.709130] x1 : 00000000014bad88 x0 : 0000000000000008
Comment 1 PeteVine 2017-01-19 13:35:47 UTC
Just in case, here's the configure command line used to build binutils:

             --prefix=/usr 
             --enable-plugins 
             --enable-shared  
             --disable-werror 
             --with-system-zlib
             --enable-gold=yes
             --enable-lto
Comment 2 Edward Vielmetti 2017-02-01 20:30:16 UTC
I'm seeing what I think is the same bug; it manifests itself building Swift on aarch64 (96-core Packet 2A server, with a Cavium ThunderX CPU).

/usr/bin/ld.gold: internal error in relocate_tls, at ../../gold/aarch64.cc:7419

is the error message. Suggestions where to go to help report this more carefully would be welcomed.
Comment 3 PeteVine 2017-02-01 21:42:16 UTC
That's a different issue, ld.gold never crashes in mine.
Comment 4 PeteVine 2017-03-05 21:46:07 UTC
OK, so I've managed to identify the root cause of this bug, namely, the -mfix-cortex-a53-843419 erratum workaround, which happens via the linker.

To recap, ld.gold produces crashing code if both LTO and -mfix-cortex-a53-843419 are used at the same time. Removing the workaround switch or using ld.bfd allows to bootstrap GCC using LTO (--with-build-config=bootstrap-lto).

So, a compiler built with --enable-fix-cortex-a53-843419 needs just -mcpu/mtune=cortex-a53 to trigger the workaround. 

In my tests, it made code produced that way slower on a currently available revision of A53, which is not mentioned in the documentation. Otherwise I would have found out if it was still necessary a long time ago :)
Comment 5 Cary Coutant 2017-05-15 18:43:35 UTC
This is binutils PR gold/21491.
Comment 6 Cary Coutant 2017-05-15 18:49:54 UTC
(In reply to Edward Vielmetti from comment #2)
> I'm seeing what I think is the same bug; it manifests itself building Swift
> on aarch64 (96-core Packet 2A server, with a Cavium ThunderX CPU).
> 
> /usr/bin/ld.gold: internal error in relocate_tls, at
> ../../gold/aarch64.cc:7419
> 
> is the error message. Suggestions where to go to help report this more
> carefully would be welcomed.

This may be PR gold/19353. It's already been fixed, but perhaps your version of gold is older. If you still see the problem in the latest gold, please open a new PR at sourceware.org/bugzilla.