This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Should the assembler pad a section to its alignment?
- From: Roland McGrath <mcgrathr at google dot com>
- To: "binutils at sourceware dot org" <binutils at sourceware dot org>
- Date: Fri, 16 Aug 2013 16:17:06 -0700
- Subject: Should the assembler pad a section to its alignment?
Consider this input file:
.text
.p2align 4
nop
GAS for i386 and x86-64 (ELF) produces a .text section whose sh_size is
exactly the length of the instruction stream (here, one byte).
GAS for ARM (ELF) produces a .text section whose sh_size is padded out
to the alignment even though the actual instruction stream is much
shorter (here, 4 bytes of actual instruction becomes 16 bytes of section).
Is one more correct than the other?
Just on principle, it seems to me that the assembler ought to be
consistent across all ELF targets in this regard. (I have not
investigated what code in the assembler causes the difference.)
The practical reason I am concerned with this is slightly complex. For
*-nacl* targets, it's an ABI violation if a single instruction straddles
a specified (per-machine) alignment boundary (32 bytes on x86, 16 bytes
on ARM). For x86-64, the linker uses a variety of different-length nop
instructions when it does code filling. If, for example, one input
.text section is 31 bytes (with sh_addralign=32) and the next input
.text section has sh_addralign=64, then the linker will do code fill of
33 bytes starting at address 31. It will do this with three 10-byte nop
instructions (the longest it knows) followed by one 3-byte nop
instruction. The result is that a 10-byte instruction straddles the
alignment boundary at address 32, violating the ABI constraint.
I can address this in one of two ways. One would be to teach the linker
to use only one-byte nop instructions for my x86 targets. (That is the
obvious thing to do, but it is more cumbersome than it ought to be
because the fill hook is in the "per-CPU" stuff rather than the
"per-target" stuff.) The other would be to make the assembler pad out
its code sections to their sh_addralign (as it already does for ARM), so
that in practice the linker would only ever be attempting code fill that
starts on a 32-byte boundary. I am inclined at first blush to go the
latter route because that both seems easier to do and would make the
assembler's behavior more consistent between CPUs,
What say you?
Thanks,
Roland