Bug 27180 - RISC-V far relocations for auipc instructions may cause segfault with --emit-relocs
Summary: RISC-V far relocations for auipc instructions may cause segfault with --emit-...
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.35
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-14 02:54 UTC by Julius Werner
Modified: 2024-06-28 05:55 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Proposed solution (2.04 KB, patch)
2021-04-29 09:42 UTC, Nelson Chu
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Julius Werner 2021-01-14 02:54:11 UTC
I have run into a segfault in GNU ld for the RISC-V architecture. I'm using 2.35.1, but I also briefly tried out the latest Git tree and saw it there as well. The following minimal testcase can reproduce it:

======== test.c ========
extern int mysymbol[0];
int *_start(void) {
        return mysymbol;
}
========================

======= test.ld ========
ENTRY(_start)
SECTIONS
{
        mysymbol = 0x1000;

        .text 0x90000000 : {
                *(.text)
        }
}
========================

$ riscv64-elf-gcc -nostdlib -T test.ld -Wl,--emit-relocs -mcmodel=medany -o test test.c
collect2: fatal error: ld terminated with signal 11 [Segmentation fault]
compilation terminated.

I have debugged the segfault a bit and found that it happens at `rh = elf_sym_hashes (input_bfd)[indx];` in elflink.c:11232, because indx is completely out of range. The problem is that this code assumes that r_symndx (computed from `irela->r_info >> r_sym_shift`) can always be trusted to be a symbol index. However, for some absolute relocations in the RISC-V architecture, it is actually an absolute offset.

In my testcase, the problematic relocation starts out as an R_RISCV_PCREL_HI20 relocation (where the high 32 bytes would be a symbol index, so this code would work correctly). However, when calling into the riscv_elf_relocate_section() backend, the relocation ends up being transformed into an R_RISCV_HI20 absolute relocation, where the high 32 bytes of r_info are simply an absolute address. (It's possible that this bug also exists for normal R_RISCV_HI20 relocations in the source file, or maybe that case would've been caught earlier and it's only the late transformation of the relocation that causes the problem... I didn't test that far.)
Comment 1 Nelson Chu 2021-04-29 09:42:56 UTC
Created attachment 13402 [details]
Proposed solution
Comment 2 Nelson Chu 2021-04-29 09:53:14 UTC
Hi Julius,

Thanks for reporting this.  Your assumption is correct, the PCREL relocs are converted to the directly access relocs, but we don't update them to the relocation table, so we will get segment fault when the --emit-relocs is set.

Attached is my proposed solution, without fully testing.  My idea is that try to update the PCREL relocs and their referenced symbols to the corresponding HI20/ LO12_I/S relocs and correct target symbols.  But there is a problem for the R_RISCV_GOT_HI20 + PCREL_LO12.  If we convert it to HI20 + LO12, then we should update the symbol to the got entry (maybe need to add one, but seems hard to get the dynindx or ...).  Or alternatively, we probably can encode the symbol value to the r_addend directly, without referring to a symbol, like what RELATIVE relocs do.  I don't know which one is better, so I just report the dangerous relocs for those changed R_RISCV_GOT_HI20, rather than report segment fault.
Comment 3 Sourceware Commits 2024-06-28 05:51:24 UTC
The master branch has been updated by Nelson Chu <nelsonc1225@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=81c353cb9cbba572b73ea4a3949571c44282b0df

commit 81c353cb9cbba572b73ea4a3949571c44282b0df
Author: Nelson Chu <nelson@rivosinc.com>
Date:   Tue Jun 11 16:35:58 2024 +0800

    RISC-V: PR27180, Update relocation for riscv_zero_pcrel_hi_reloc.
    
    When pcrel access overflow, the riscv_zero_pcrel_hi_reloc may convert pcrel
    relocation to absolutly access if possible at the relocate stage.  We used to
    encode the target address into r_sym of R_RISCV_HI20 if it is converted from
    R_RISCV_PCREL_HI20.  But that may cause segfault if --emit-relocs is set,
    since r_sym becomes an address rather than a symbol index.  Although the
    relocate result is correct, it does not meet the definition, so may cause
    unexpected behaviors.
    
    This patch encodes the target address into r_addend, rather than r_sym, if
    riscv_zero_pcrel_hi_reloc converts the relocation.  Besdies, since the
    corresponding pcrel_lo relocation are also changed to absolutly access,
    we should also update them to R_RISCV_LO12_I/S.
    
    bfd/
            PR 27180
            * elfnn-riscv.c (riscv_pcrel_hi_reloc): New boolean `absolute', to
            inform corresponding pcrel_lo that the pcrel_hi relocation was already
            converted to hi20 relocation.
            (riscv_record_pcrel_hi_reloc): Likewise, record `absolute'.
            (riscv_pcrel_lo_reloc): Removed `const' for Elf_Internal_Rela *reloc,
            since we may need to convert it from pcrel_lo to lo relocation.
            (riscv_record_pcrel_lo_reloc): Likewise.  Convert pcrel_lo to lo
            relocation if corresponding pcrel_hi was converted to hi relocation.
            (riscv_zero_pcrel_hi_reloc): Encode target absolute address into
            r_addend rather than r_sym.  Clear the `addr' to avoid duplicate
            relocate in the perform_relocation.
            (riscv_elf_relocate_section): Updated.
    ld/
            PR 27180
            * testsuite/ld-riscv-elf/pcrel-lo-addend-3a-emit-relocs.d: New testcase.
            Segfault without applying this patch.
            * testsuite/ld-riscv-elf/ld-riscv-elf.exp: Updated.
Comment 4 Nelson Chu 2024-06-28 05:55:33 UTC
Marked as resolved and fixed.