During LTO compilation of chromium with gcc trunk for single partition
(-flto --param lto-partitions=1) I get the following assembly error:
Thumb2 branch out of range.
LTO compilation of chromium with multiple partitions has no issues,
this happens only for single or no partitioning.
Assembly file: https://www.dropbox.com/s/yl171mwqd9lad9c/chrome.ltrans0.s?dl=0
Options passed to gas: -march=armv7-a -mfloat-abi=hard -mfpu=neon -meabi=5
I get the following backtrace under gdb:
#0 as_bad_where (file=0x7fffffffe29f
format=0x552ed2 "Thumb2 branch out of range")
#1 0x000000000045e7dd in md_apply_fix (fixP=0x931ea6e0,
#2 0x000000000042f0fd in fixup_segment (fixP=0x931ea6e0, this_segment=0x7cf7c0)
#3 0x000000000042f36e in fix_segment (abfd=0x7b81f0, sec=0x7cf7c0,
xxx=0x0) at /home/prathamesh.kulkarni/gnu-toolchain/src/binutils-gdb.git/gas/write.c:1132
#4 0x0000000000473540 in bfd_map_over_sections (abfd=0x7b81f0,
operation=0x42f334 <fix_segment>, user_storage=0x0)
#5 0x0000000000430f9c in write_object_file () at
#6 0x0000000000406123 in main (argc=2, argv=0x7b85c0) at
The program aborts from here by calling as_bad_where() in
if ((value & ~0x3fffff) && ((value & ~0x3fffff) != ~0x3fffff))
if (!(ARM_CPU_HAS_FEATURE (cpu_variant, arm_arch_t2)))
as_bad_where (fixP->fx_file, fixP->fx_line, BAD_RANGE);
else if ((value & ~0x1ffffff)
&& ((value & ~0x1ffffff) != ~0x1ffffff))
as_bad_where (fixP->fx_file, fixP->fx_line,
_("Thumb2 branch out of range"));
For the case when branch is out of range, "value" equals ffffffda.
This probably happens because the section generated during LTO build is too large, and function call exceeds the range for short-call.
Forcing gcc to emit long calls for each function call doesn't result in this error. I wonder if this is gas or gcc bug ?
This is a 1.5GB assembly file, which creates an 18MB text section. Thumb2 branches have a range +/-16MB. So yes, the branches are out of range.
One thing the assembler can do is branch relaxation. It can convert an out-of-range branch into a different longer sequence with a longer range via branch relaxation. You can see examples in gas/config/tc-mips.c after the RELAX_ENCODE macro definition. The mips gas port can convert a conditional branch with 16-bits of offset into an unconditional branch (jump) with 28 bits (?) of offset.
We could do something similar in the ARM port. Except in the arm case, we are trying to fix a branch/call, which would have to be converted into a sequence to load the address into a register and then use a branch on register instruction. It looks like we can use the IP register for this. The linker may use IP for trampolines and PLTs, but that happens after the call, and we are doing this before the call, so it appears that it will work. Unless perhaps there are other places in the ABI where IP is used, I don't know the ARM ABI very well. This is an enhancement request, not an assembler bug, as this kind of relaxation is optional. This would be a bit of work in the assembler, and the gain would be small, since it is rare to see text sections larger than 16MB.
The arm gas port already have relaxation to handle the section between 4-byte and 2-byte thumb encodings. See inst.relax and output_relax_insn. This could be extended to handle the out-of-range bl to blr case. This would require a little reorganization to support more than one type of relaxation.
Otherwise, you need the compiler to emit long calls to avoid out-of-range calls, which you already mentioned works.
I would not call this a gas or gcc bug. It is more of a feature conflict. You are trying to create a single text section which is larger than the hardware and/or ABI directly supports, and you can't expect this to work without a workaround of some sort.