30844 – ld riscv: --emit-relocs does not retain the original relocation type

Bug 30844 - ld riscv: --emit-relocs does not retain the original relocation type

Summary: ld riscv: --emit-relocs does not retain the original relocation type

Status:	NEW

Alias:	None

Product:	binutils
Classification:	Unclassified
Component:	ld (show other bugs)
Version:	unspecified

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2023-09-12 22:34 UTC by Fangrui Song
Modified:	2023-09-19 07:02 UTC (History)
CC List:	6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Fangrui Song 2023-09-12 22:34:25 UTC

From https://maskray.me/blog/2021-03-14-the-dark-side-of-riscv-linker-relaxation#ld---emit-relocs (with updated instructions)

For GNU ld's AArch64/PPC64/x86-64 ports, the --emit-relocs code retains the original relocation type even if a linker optimization is applied. This is partly to communicate more information to the analysis tool and partly because the transformation may not be described with any existing relocation type.

However, the RISC-V port changes the relocation type, including relocation types for relaxable instructions (e.g. R_RISCV_CALL_PLT), and markers (R_RISCV_ALIGN and R_RISCV_RELAX). I believe this was not a deliberate decision as there is no --emit-relocs test at all in ld/testsuite/ld-riscv-elf/. So far, the unintentional change seems fine but certain relaxations (such as TLSDESC) may not have relocation types associated with the relaxed instructions.

cat > aarch64.s <<'eof'
.global _start; _start:
  adrp    x1, :got:x
  ldr     x1, [x1, #:got_lo12:x]
.data; .globl x; .hidden x; x: .word 0
eof
cat > ppc64.s <<'eof'
.globl _start; _start:
  addis 3, 2, .Lhidden@toc@ha
  ld    3, .Lhidden@toc@l(3)
  lwa   3, 0(3)
.data; .globl hidden; .hidden hidden; hidden: .long 0
.section .toc,"aw",@progbits
.Lhidden: .tc hidden[TC], hidden
eof
cat > x86-64.s <<'eof'
.globl _start; _start:
  movq foo@gotpcrel(%rip), %rax
foo: nop
eof

aarch64-linux-gnu-gcc -fuse-ld=lld -B/tmp/Rel/bin -nostdlib aarch64.s -Wl,--emit-relocs -o aarch64 && aarch64-linux-gnu-objdump -dr aarch64 | sed -E '/^\s*$/d'
powerpc64le-linux-gnu-gcc -nostdlib ppc64.s -Wl,--emit-relocs -o ppc64 && powerpc64le-linux-gnu-objdump -dr ppc64 | sed -E '/^\s*$/d'
gcc -nostdlib x86-64.s -Wl,--emit-relocs -o x86-64 && objdump -dr x86-64 | sed -E '/^\s*$/d'

cat > riscv64.s <<'eof'
.global _start; _start:
  call f
  call f
  .balign 8
f: ret
eof

riscv64-linux-gnu-gcc -nostdlib riscv64.s -Wl,--emit-relocs -o riscv64 && riscv64-linux-gnu-objdump -dr riscv64 | sed -E '/^\s*$/d'

Output (removed blank lines for compacter output):
```
aarch64:     file format elf64-littleaarch64
Disassembly of section .text:
00000000000102f8 <_start>:
   102f8:       d503201f        nop
                        102f8: R_AARCH64_ADR_GOT_PAGE   x
   102fc:       10100661        adr     x1, 303c8 <x>
                        102fc: R_AARCH64_LD64_GOT_LO12_NC       x
ppc64:     file format elf64-powerpcle
Disassembly of section .text:
0000000000000240 <_start>:
 240:   00 00 00 60     nop
                        240: R_PPC64_TOC16_HA   hidden
 244:   00 81 62 38     addi    r3,r2,-32512
                        244: R_PPC64_TOC16_LO   hidden
 248:   02 00 63 e8     lwa     r3,0(r3)
x86-64:     file format elf64-x86-64
Disassembly of section .text:
00000000000012dd <_start>:
    12dd:       48 8d 05 00 00 00 00    lea    0x0(%rip),%rax        # 12e4 <foo>
                        12e0: R_X86_64_REX_GOTPCRELX    foo-0x4
00000000000012e4 <foo>:
    12e4:       90                      nop
riscv64:     file format elf64-littleriscv
Disassembly of section .text:
00000000000002a0 <_start>:
 2a0:   008000ef                jal     2a8 <f>
                        2a0: R_RISCV_JAL        f
 2a4:   004000ef                jal     2a8 <f>
                        2a4: R_RISCV_NONE       *ABS*+0x4
                        2a4: R_RISCV_JAL        f
00000000000002a8 <f>:
 2a8:   8082                    ret
                        2a8: R_RISCV_NONE       *ABS*+0x4
                        2a8: R_RISCV_NONE       *ABS*+0x6
```

I have a section "ld --emit-relocs" with very brief information on my website for a while. Recently https://reviews.llvm.org/D159082 motivated me to elaborate and bring this up:)

Comment 1 Palmer Dabbelt 2023-09-13 01:10:58 UTC

Nelson and I are just chatting about this.  It's not intentional, but we also don't quite know what the right answer is here: there's some relocs we can touch less (like leaving R_RISCV_ALIGN in outputs even after we've aligned, for example) but others we'd have to avoid relaxing (like R_RISCV_CALL, which requires two instructions).  Certainly producing binaries with R_RISCV_NONE is a bug, it's not even in the psABI so we can't be producing binaries with it.

So I think we're happy to fix bugs here, but we really need to know what the desired output is.

Do you have something more concrete about what you're looking for?

Comment 2 Fangrui Song 2023-09-13 06:19:01 UTC

(In reply to Palmer Dabbelt from comment #1)
> Nelson and I are just chatting about this.  It's not intentional, but we
> also don't quite know what the right answer is here: there's some relocs we
> can touch less (like leaving R_RISCV_ALIGN in outputs even after we've
> aligned, for example) but others we'd have to avoid relaxing (like
> R_RISCV_CALL, which requires two instructions).  Certainly producing
> binaries with R_RISCV_NONE is a bug, it's not even in the psABI so we can't
> be producing binaries with it.
> 
> So I think we're happy to fix bugs here, but we really need to know what the
> desired output is.
> 
> Do you have something more concrete about what you're looking for?

Thanks for looking into this. My feeling is that --emit-relocs should switch to preserve the original relocation type, including R_RISCV_CALL_PLT(etc), R_RISCV_RELAX, and R_RISCV_ALIGN.

Analysis tools can check whether R_RISCV_RELAX is present in a section. If present, they need to do disassembly work to figure out whether a R_RISCV_CALL_PLT relocations is associated with an un-relaxed code sequence of a relaxed code sequence. Yes, it may look dirty but I don't think there is a better way. (x86-32 relocation handling inspects the relocated relocation as well)

FWIW I left a comment on https://lore.kernel.org/linux-riscv/20230912233331.zejmknzcm5mwzzcz@google.com/T/#t on the only --emit-relocs use I can find, which is relatively new in the Linux kernel.

Comment 3 Palmer Dabbelt 2023-09-13 14:35:23 UTC

(In reply to Fangrui Song from comment #2)
> (In reply to Palmer Dabbelt from comment #1)
> > Nelson and I are just chatting about this.  It's not intentional, but we
> > also don't quite know what the right answer is here: there's some relocs we
> > can touch less (like leaving R_RISCV_ALIGN in outputs even after we've
> > aligned, for example) but others we'd have to avoid relaxing (like
> > R_RISCV_CALL, which requires two instructions).  Certainly producing
> > binaries with R_RISCV_NONE is a bug, it's not even in the psABI so we can't
> > be producing binaries with it.
> > 
> > So I think we're happy to fix bugs here, but we really need to know what the
> > desired output is.
> > 
> > Do you have something more concrete about what you're looking for?
> 
> Thanks for looking into this. My feeling is that --emit-relocs should switch
> to preserve the original relocation type, including R_RISCV_CALL_PLT(etc),
> R_RISCV_RELAX, and R_RISCV_ALIGN.

So we leave the R_RISCV_ALIGN in there and delete the excess bytes until it's aligned?  That will result in a subtly different meaning for the output binary, as the alignment will be smaller.  Maybe that's fine as the actual target will already be aligned to the originally correct amount, but if something's trying to then do more relaxation after we might have subtle issues.

> Analysis tools can check whether R_RISCV_RELAX is present in a section. If
> present, they need to do disassembly work to figure out whether a
> R_RISCV_CALL_PLT relocations is associated with an un-relaxed code sequence
> of a relaxed code sequence. Yes, it may look dirty but I don't think there
> is a better way. (x86-32 relocation handling inspects the relocated
> relocation as well)

Ya, that's ugly -- but if it's the way it's done then maybe that's just the only answer.  We could add some sort of R_RISCV_CALL_SHORT type relocation so we can emit a correct relocation for relaxed calls without relying on disassembly hueristics.

We could also just stop relaxing calls under --emit-relocs, then we get a correct output binary with the long calls.

> FWIW I left a comment on
> https://lore.kernel.org/linux-riscv/20230912233331.zejmknzcm5mwzzcz@google.
> com/T/#t on the only --emit-relocs use I can find, which is relatively new
> in the Linux kernel.

Thanks, Alex just pointed Nelson and I at it as well.

Comment 4 Nelson Chu 2023-09-13 17:34:43 UTC

> The --emit-relocs should switch to preserve the original
> relocation type, including R_RISCV_CALL_PLT(etc),
> R_RISCV_RELAX, and R_RISCV_ALIGN.

Looks reasonable, so based on this rule when setting --emit-reloc,

1. R_RISCV_CALL_PLT

auipc, [R_RISCV_CALL_PLT][R_RISCV_RELAX]
jalr
(relax and emit to) ->
jal, [R_RISCV_CALL_PLT][R_RISCV_RELAX]

2. R_RISCV_ALIGN

.align 3, 6 nops under rvc, [R_RISCV_ALIGN] with addend 6
->
less then 6 nops (may be 0), [R_RISCV_ALIGN] with addend 6 or less?

3. R_RISCV_HI20/LO12

Even not all people like gp relaxation, we still need a rule in GNU ld for --emit-reloc.

lui, [R_RISCV_HI20][R_RISCV_RELAX]
addi, [R_RISCV_LO12][R_RISCV_RELAX]
->
add with gp, [R_RISCV_HI20][R_RISCV_RELAX][R_RISCV_LO12][R_RISCV_RELAX], four relocs point to same instruction?

4, R_RISCV_PCREL_HI20/PCREL_LO12

Likewise, all the four relocs point to the same pcrel_lo instruction.

5. R_RISCV_GOT_HI20/PCREL_LO12

Likewise.

6. ... future relaxations, TLS, ...

Comment 5 Alan Modra 2023-09-15 02:51:12 UTC

(In reply to Fangrui Song from comment #0)
> For GNU ld's AArch64/PPC64/x86-64 ports, the --emit-relocs code retains the
> original relocation type even if a linker optimization is applied.

No, ppc64 adjusts relocations to match the emitted code.  See for example 
R_PPC64_GOT16_LO_DS handling in ppc64_elf_relocate_section, adjusted to R_PPC64_TOC16_LO when a got indirect code sequence can be edited to got pointer relative.

> This is partly to communicate more information to the analysis tool

This is exactly why relocations for ppc64 (and ppc32) were edited.  IBM's FDPR post-link optimisation tool used them.  ppc64 even emits relocs for linker generated stub code.

The fact that other targets emit the original relocations is not a good argument for saying that riscv should do so.  Most maintainers of other targets simply didn't see a need to correct the relocs when editing code.

Comment 6 Fangrui Song 2023-09-19 07:02:59 UTC

(In reply to Alan Modra from comment #5)
> (In reply to Fangrui Song from comment #0)
> > For GNU ld's AArch64/PPC64/x86-64 ports, the --emit-relocs code retains the
> > original relocation type even if a linker optimization is applied.
> 
> No, ppc64 adjusts relocations to match the emitted code.  See for example 
> R_PPC64_GOT16_LO_DS handling in ppc64_elf_relocate_section, adjusted to
> R_PPC64_TOC16_LO when a got indirect code sequence can be edited to got
> pointer relative.
> 
> > This is partly to communicate more information to the analysis tool
> 
> This is exactly why relocations for ppc64 (and ppc32) were edited.  IBM's
> FDPR post-link optimisation tool used them.  ppc64 even emits relocs for
> linker generated stub code.
> 
> The fact that other targets emit the original relocations is not a good
> argument for saying that riscv should do so.  Most maintainers of other
> targets simply didn't see a need to correct the relocs when editing code.

Thanks for the reply! I did not know. I have now made some notes
on https://maskray.me/blog/2023-02-26-linker-notes-on-power-isa#emit-relocs

    For example, a TOC-indirect to TOC-relative optimization uses a pair of relocations `R_PPC64_TOC16_HA(.toc)+R_PPC64_TOC16_LO_DS(.toc)`.
    After optimization, they will become `R_PPC64_TOC16_HA(sym)+R_PPC64_TOC16_LO(sym)`. The `R_PPC64_TOC16_HA` relocation is present even if the first instruction is converted to a NOP.
    
    A general-dynamic TLS model code sequence may use relocations `R_PPC64_GOT_TLSGD16_HA+R_PPC64_GOT_TLSGD16_LO+R_PPC64_TLSGD+R_PPC64_REL24`.
    After optimization, they will become:
    
    * `R_PPC64_NONE+R_PPC64_TPREL16_HA+R_PPC64_TPREL16_LO+R_PPC64_NONE` after general-dynamic to local-exec TLS optimization.
    * `R_PPC64_GOT_TPREL16_HA+R_PPC64_GOT_TPREL16_LO_DS+R_PPC64_NONE+R_PPC64_NONE` after general-dynamic to initial-exec TLS optimization.

So it seems that ppc performed conversion can all be described by existing relocation types, which is nice.
However, I do not know whether the property will hold for all current and future RISC-V relaxation schemes.

When investigating relocation overflow pressure for x86-64 small code model, I have found that preserving the original relocation type gives me more information: I can tell how
many R_X86_64_PC32/R_X86_64_GOTPCRELX/R_X86_64_REX_GOTPCRELX are problematic. If they are converted to R_X86_64_PC32/R_X86_64_32, I'd lose some information.

Perhaps whether the --emit-relocs uses the original relocation type or the transformed relocation type , does not matter for the majority of use cases.
E.g. Linux kernel's objtool, seems to perform a sanity check on relocations. It just needs to know the categories of relocations, e.g. absolute/PC-relative, not the exact type.