Bug 29145 - GAS always generates padding instructions regardless of `--no-pad-sections`
Summary: GAS always generates padding instructions regardless of `--no-pad-sections`
Status: UNCONFIRMED
Alias: None
Product: binutils
Classification: Unclassified
Component: gas (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-05-13 07:05 UTC by LIU Hao
Modified: 2022-06-14 14:16 UTC (History)
1 user (show)

See Also:
Host:
Target: x86_64-w64-mingw32
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description LIU Hao 2022-05-13 07:05:18 UTC
Given this simple C program:

```
int foo(int a);
int bar(int a) { return foo(a);  }
```

We compile it with GCC, targeting linux:

```
lh_mouse@lhmouse-dev ~ $ x86_64-linux-gnu-gcc -Os test.c -S -o test.s && cat test.s
        .file   "test.c"
        .text
        .globl  bar
        .type   bar, @function
bar:
.LFB0:
        .cfi_startproc
        jmp     foo@PLT
        .cfi_endproc
.LFE0:
        .size   bar, .-bar
        .ident  "GCC: (Debian 8.3.0-6) 8.3.0"
        .section        .note.GNU-stack,"",@progbits
```

The function contains only a `jmp` instruction. We assemble this file:

```
lh_mouse@lhmouse-dev ~ $ x86_64-linux-gnu-as test.s -o test.o && objdump -h test.o

test.o:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000005  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  0000000000000000  0000000000000000  00000045  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  0000000000000000  0000000000000000  00000045  2**0
                  ALLOC
  3 .comment      0000001d  0000000000000000  0000000000000000  00000045  2**0
                  CONTENTS, READONLY
  4 .note.GNU-stack 00000000  0000000000000000  0000000000000000  00000062  2**0
                  CONTENTS, READONLY
  5 .eh_frame     00000030  0000000000000000  0000000000000000  00000068  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
```

The `.text` section has a size of 5 and alignment of 1, which looks good, and does what `-Os` is presumed to do.


We attempt to compile the same code targeting mingw-w64:

```
lh_mouse@lhmouse-dev ~ $ x86_64-w64-mingw32-gcc -Os test.c -S -o test.s && cat test.s
        .file   "test.c"
        .text
        .globl  bar
        .def    bar;    .scl    2;      .type   32;     .endef
        .seh_proc       bar
bar:
        .seh_endprologue
        jmp     foo
        .seh_endproc
        .ident  "GCC: (GNU) 8.3-win32 20190406"
        .def    foo;    .scl    2;      .type   32;     .endef
```

The function still contains only a `jmp` instruction. But when we assemble it:

```
lh_mouse@lhmouse-dev ~ $ x86_64-w64-mingw32-as test.s -o test.o && objdump -h test.o

test.o:     file format pe-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000010  0000000000000000  0000000000000000  00000104  2**4
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  0000000000000000  0000000000000000  00000000  2**4
                  ALLOC, LOAD, DATA
  2 .bss          00000000  0000000000000000  0000000000000000  00000000  2**4
                  ALLOC
  3 .xdata        00000004  0000000000000000  0000000000000000  00000114  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .pdata        0000000c  0000000000000000  0000000000000000  00000118  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
  5 .rdata$zzz    00000020  0000000000000000  0000000000000000  00000124  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
```

This time the `.text` section has a size of 16 and alignment of 16 bytes, which is undesired.


This also results in extra NOPs in the object file:

```
lh_mouse@lhmouse-dev ~ $ objdump -d test.o

test.o:     file format pe-x86-64


Disassembly of section .text:

0000000000000000 <bar>:
   0:   e9 00 00 00 00          jmpq   5 <bar+0x5>
   5:   90                      nop
   6:   90                      nop
   7:   90                      nop
   8:   90                      nop
   9:   90                      nop
   a:   90                      nop
   b:   90                      nop
   c:   90                      nop
   d:   90                      nop
   e:   90                      nop
   f:   90                      nop
```

which LD will not be able to remove.

Adding `.p2align` directives in 'test.s' seems only able to increase the alignment (requests for 32, 64, etc. work), but it is not possible to decrease the alignment (requests for 1, 2, 4, 8 are ignroed). Could this be improved a little?
Comment 1 Nick Clifton 2022-05-18 16:13:59 UTC
(In reply to LIU Hao from comment #0)
  
> We attempt to compile the same code targeting mingw-w64:

> This time the `.text` section has a size of 16 and alignment of 16 bytes,
> which is undesired.

Unfortunately this is required by the PE specification, so it cannot
be changed.  (The alignment of the .text section can be increased, but
it cannot be decreased below 16 bytes).

> This also results in extra NOPs in the object file:

True - but it is not quite as bad as you might assume.  If you have a
second function in your test file, it will take up some/all of these
bytes, thus potentially reducing the number of NOPs.  For example:

  % cat test2.s
        .text
        .globl  bar
        .def    bar; .scl 2; .type 32; .endef
        .seh_proc       bar
bar:
        .seh_endprologue
        jmp     foo
        .seh_endproc

        .globl  barf
        .def    barf; .scl 2; .type 32; .endef
        .seh_proc       barf
barf:
        .seh_endprologue
        jmp     foo
        .seh_endproc

        .def    foo; .scl 2; .type 32; .endef

  % as test2.s -o test2.o
  % objdump -d test2.o

  Disassembly of section .text:

  0000000000000000 <bar>:
     0:	e9 00 00 00 00       	jmp    5 <barf>

  0000000000000005 <barf>:
     5:	e9 00 00 00 00       	jmp    a <barf+0x5>
     a:	90                   	nop
     b:	90                   	nop
     c:	90                   	nop
     d:	90                   	nop
     e:	90                   	nop
     f:	90                   	nop

So this version of the test case only has 6 nop instructions in it...
Comment 2 LIU Hao 2022-05-18 16:48:54 UTC
(In reply to Nick Clifton from comment #1)
> (In reply to LIU Hao from comment #0)
>   
> > We attempt to compile the same code targeting mingw-w64:
> 
> > This time the `.text` section has a size of 16 and alignment of 16 bytes,
> > which is undesired.
> 
> Unfortunately this is required by the PE specification, so it cannot
> be changed.  (The alignment of the .text section can be increased, but
> it cannot be decreased below 16 bytes).
> 

Thank you for the reply.

Would you please provide a link to this specification? I doubt whether it is necessary to keep a 16-byte aligned .text section in every object.

If I was including all source files into a big .c file, then compiling it with `-Os`, there would be no such padding, as expected. This is however pretty bad when building a static library, as it will pull in all symbols when only a few of them are requested.

Normally we have quite a few .o files; each of them has its own .text section. Does it really matter that each individual such section isn't aligned? During the linking phase, LD will join them together, and the final, merged .text section will have a 512-byte alignment in files and a 4096-byte alignment in memory. I don't see how that can malfunction.
Comment 3 Nick Clifton 2022-05-23 11:19:15 UTC
Hi lh_mouse,

>> Unfortunately this is required by the PE specification, so it cannot

> Would you please provide a link to this specification? I doubt whether it is
> necessary to keep a 16-byte aligned .text section in every object.

Here is a link:

   https://docs.microsoft.com/en-us/windows/win32/debug/pe-format

Although I agree that I cannot find an actual requirement for 16-byte alignment
in that document.   I am not an expert on the PE format however, so the requirement
may be in there, just hidden.

I would suggest that you ask about this on the cygwin mailing list as there
are many PE experts there:

   https://www.cygwin.com/lists.html

Cheers
   Nick
Comment 5 LIU Hao 2022-05-23 13:27:59 UTC
On the page you've referenced, there is

> 20 4 PointerToRawData
> The file pointer to the first page of the section within the COFF file. 
> For executable images, this must be a multiple of FileAlignment from the 
> optional header. For object files, the value should be aligned on a 4-byte 
> boundary for best performance. When a section contains only uninitialized 
> data, this field should be zero.
which says raw data '_should_ be aligned on a 4-byte boundary', and may imply
that other alignments are acceptible as well.


There is also the section flag

> IMAGE_SCN_ALIGN_1BYTES 0x00100000
> Align data on a 1-byte boundary. Valid only for object files.
which says explicitly that data in object files may be aligned to a 1-byte
boundary i.e. not aligned at all.


Therefore I suspect it is doable.
Comment 6 LIU Hao 2022-06-12 07:26:33 UTC
Is there any progress on this issue? If this behavior can't be changed by default, it's also nice to have an experimental option.
Comment 7 Nick Clifton 2022-06-14 14:16:12 UTC
(In reply to LIU Hao from comment #6)
> Is there any progress on this issue? If this behavior can't be changed by
> default, it's also nice to have an experimental option.

Sorry no.  It is on my list, but it is at a very low priority for me.  Of course you could always have a go yourself....

For example in bfd/pei-x86_64.c around line 50 there is:

  { COFF_SECTION_NAME_PARTIAL_MATCH (".text"), \
    COFF_ALIGNMENT_FIELD_EMPTY, COFF_ALIGNMENT_FIELD_EMPTY, 4 }, \

If you changed the 4 to a 1 that might then give you 1-byte alignment for the .text section.  I have not tried this myself, but it is worth having a go if you feel inclined.

Cheers
  Nick