+++ This bug was initially created as a clone of Bug #22903 +++
Hi there. It looks like bug 22903 was only fixed for ld and is still an issue in gold. As an experiment I increased STUB_ADDR_ALIGN to 8 for Reloc_stub and that does resolve the issue. I'm not sure if increasing the alignment is appropriate to all reloc stub types, so this was just a proof of concept.
diff --git a/gold/aarch64.cc b/gold/aarch64.cc
index 07abe44931..7626fcad4a 100644
@@ -1317,7 +1317,7 @@ class Reloc_stub : public Stub_base<size, big_endian>
}; // End of Reloc_stub
template<int size, bool big_endian>
-const int Reloc_stub<size, big_endian>::STUB_ADDR_ALIGN = 4;
+const int Reloc_stub<size, big_endian>::STUB_ADDR_ALIGN = 8;
// Write data to output file.
Original description from bug #22903 follows.
It is not currently possible to specify an alignment requirement that will be used for generated veneer stubs (i.e. far calls for -fpic, -fpie etc. builds).
Currently, the alignment for the stubs is 4 bytes. While this works just fine for the majority of the systems, it works only because many requisite deeds has been done beforehand (and a hint of luck, too).
The problematic veneer template (aarch64_long_branch_stub at bfd/elfnn-aarch64.c) uses LDR to load the far address. The address itself is stored after the veneer code block, which does the address loading (via LDR/ADD) and branching. The template looks like this:
ldr ip0, 1f # <-- ip0, i.e. X16, i.e. 64-bit register
adr ip1, #0
add ip0, ip0, ip1
1: .xword <address>
While the address is 8-byte aligned within the stub itself, it will be misaligned unless the veneer lands on a 8-byte (or more) aligned address. ARMv8-A ARM clearly states, that unless an address is accessed to the size of the data element being accessed (i.e. N-bit accesses must be N-bit aligned) either an Alignment fault is generated or an unaligned access is performed.
It is possible to disable the alignment check, and thus perform an unaligned access, via system register SCTLR_ELx.A (e.g. the case for Linux). However, there's a small catch 22 that is well buried into the small details within the ARM. If the stage 1 address translation is disabled (e.g. MMU disabled), Device-nGnRnE memory type is assigned to all data accesses (or the address simply happens to be some type of Device memory, nothing unusual with SoCs). Unlike Normal memory type, all accesses to any type of Device memory *must* be aligned, period.
So, if the code has to deal with a large memory area and is not able to use MMU (say, not available or being set up), and thus no address translation is enabled, or for whatever reason uses Device memory type, LD's current approach will generate code, that is highly prone to intermittent failures that could be difficult to track down (without proper JTAG tools) as no matter how well the user does his task, the generated code is the source of the failure. Also, it should be understood that it would be an overkill and highly complex task trying to recover from this sort of exception (must interpret the bytecode, then perform aligned access(es), maybe patch the bytecode etc.) while the proper thing to do is to simply not perform any unaligned accesses when such accesses are not possible.
Obviously, one can always just generate the long branches by hand, maybe use static linking where possible, so this is not a roadblocker by no means. As the subject is rather undocumented and there's apparently a patch readily available, this should be fixed. Perhaps there is no need to change the default alignment (without further studies), but it should be possible to change the alignment nevertheless.
I hope I provided enough background information for this rare, but indeed curious case!