Consider the following asm input:
.thumb
.text
ldr r1, 0f
0f: .word 0x12345678
In this case, we report the word is misaligned and fail, though the
section is aligned to 2-byte boundaries, so the word *might* be properly
aligned, after all, if only the previous linked section enabled the text
section above to start 2 bytes after a 4-byte aligned address. Anyway,
we probably don't want to worry about this case.
However, I think we should be concerned about the converse case:
.thumb
.text
ldr r1, 0f
ldr r2, 1f
0f: .word 0x01234567
1f: .word 0x89abcdef
nop
We do NOT report an error here, but if this text segment gets placed at
a 2-byte offset from a 4-byte aligned address (e.g., link the object
file in twice), the second pair will have misaligned words, and the
PC-relative offsets will resolve to aligned words that contain only part
of the word to be loaded.
The following patchlet arranges for us to complain when the target of
such an ldr doesn't ensure the expected alignment. However, it's not
quite enough to solve the general problem. Consider:
.thumb
.text
ldr sp, 0f
0f: .word 0x80000000
This extended form of ldr takes 4 bytes, and it doesn't require nor
ensure the target word to be aligned to a 4-byte boundary. It just so
happens that, if it's not aligned, the value loaded into the register is
a rotated version of the word containing the misaligned address.
I'm not sure it would be appropriate for us to reject potentially
misaligned words: there might be (obfuscated) code intended to detect
and behave differently depending on whether it ends up at an even or odd
half-word.
However, I think it would be nice for us to at least warn that the code
might behave differently depending on the actual alignment it gets. I'm
thinking something as simple as tracking the max natural alignment used
in each segment, and warning of potental linker-induced behavior changes
if that alignment is not recorded for the segment.
Tracking symbols with their natural alignments, and maybe even
references to them that expect a certain alignment, might be pushing too
far, on the one hand, and still missing relevant cases of separate
compilation or complex address computations on the other.
Is this something we might want to pursue, so as to warn even for e.g.:
.text
.word 0
but limited to once per segment?
Or should we track and warn about PC-relative addressing requirements,
so as to warn about segments containing PC-relative addressing (in
whatever forms) whose expected alignment exceeds the section's? (this
could miss e.g. setting a register to PC + offset, and then loading a
word at the address stored in the register)
A combination of these?
Thoughts?
Here's the patchlet that covers only the PCrel-load-to-low-reg case:
--- gas/config/tc-arm.c 2020-01-28 12:50:34.000000000 +0100
+++ gas/config/tc-arm.c 2020-02-18 00:13:11.486184639 +0100
@@ -28755,6 +28755,9 @@
(((unsigned long) fixP->fx_frag->fr_address
+ (unsigned long) fixP->fx_where) & ~3)
+ (unsigned long) value);
+ else if (get_recorded_alignment (seg) < 2)
+ as_warn_where (fixP->fx_file, fixP->fx_line,
+ _("segment does not ensure enough alignment for target word"));
if (value & ~0x3fc)
as_bad_where (fixP->fx_file, fixP->fx_line,