Bug 31317 - [RISCV] static PIE crashes during self relocation
Summary: [RISCV] static PIE crashes during self relocation
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.39
: P2 normal
Target Milestone: 2.40
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-30 13:25 UTC by Andreas Schwab
Modified: 2024-03-26 13:47 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Schwab 2024-01-30 13:25:27 UTC
When running a static PIE created by the mold linker, it crashes during self relocation while it tries to look up the __global_pointer$ symbols.  That happens because the mold linker puts the relocation addend of a RELATIVE also in the relocated field (unlike the BFD linker, which sets it to zero).

From sysdeps/riscv/dl-machine.h:

  if (l->l_type == lt_executable && l->l_scope != NULL)

In elf_machine_runtime_setup the condition l->l_scope != NULL references an unrelocated pointer (where l points to the statically initialized _dl_main_map) which is never NULL in a static PIE produced by ld.mold.

The failure can be seen in the mold testsuite (riscv64-ifunc-static-pie and riscv64-static-pie).

https://build.opensuse.org/package/live_build_log/devel:gcc:next/mold/openSUSE_Factory_RISCV/riscv64
Comment 1 Palmer Dabbelt 2024-02-22 18:44:05 UTC
IIUC this is an ABI grey area.  I guess we could write down this as a requirement and then call it a mold bug, but it seems better to just stop taking advantage of the behavior if we can.  I guess something like

diff --git a/sysdeps/riscv/dl-machine.h b/sysdeps/riscv/dl-machine.h
index 0cbb476c05..a540760f6f 100644
--- a/sysdeps/riscv/dl-machine.h
+++ b/sysdeps/riscv/dl-machine.h
@@ -348,7 +348,7 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
       gotplt[1] = (ElfW(Addr)) l;
     }
 
-  if (l->l_type == lt_executable && l->l_scope != NULL)
+  if (l->l_type == lt_executable && l->l_relocated && l->l_scope != NULL)
     {
       /* The __global_pointer$ may not be defined by the linker if the
 	 $gp register does not be used to access the global variable

would squash the crash.  Maybe that's good enough for static-PIE?  We also load up GP in _start, and it's not like MIPS where we have per-object GP values so I think we should be safe?
Comment 2 Andreas Schwab 2024-02-22 21:47:18 UTC
l_scope is never null, this condition needs to be removed again.  It's simply the wrong check.
Comment 3 Palmer Dabbelt 2024-02-22 22:53:11 UTC
(In reply to Andreas Schwab from comment #2)
> l_scope is never null, this condition needs to be removed again.  It's
> simply the wrong check.

Sorry, I think I'm a bit lost here.  IIUC we don't actually need GP set at this point for static-PIE, as it's done in _start.  So I maybe something like this

diff --git a/sysdeps/riscv/dl-machine.h b/sysdeps/riscv/dl-machine.h
index 0cbb476c05..ae38df4bc1 100644
--- a/sysdeps/riscv/dl-machine.h
+++ b/sysdeps/riscv/dl-machine.h
@@ -348,7 +348,7 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
       gotplt[1] = (ElfW(Addr)) l;
     }
 
-  if (l->l_type == lt_executable && l->l_scope != NULL)
+  if (l->l_type == lt_executable && scope != NULL)
     {
       /* The __global_pointer$ may not be defined by the linker if the
 	 $gp register does not be used to access the global variable
@@ -360,7 +360,7 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
 
       const ElfW(Sym) *ref = &gp_sym;
       _dl_lookup_symbol_x ("__global_pointer$", l, &ref,
-			   l->l_scope, NULL, 0, 0, NULL);
+			   scope, NULL, 0, 0, NULL);
       if (ref)
         asm (
           "mv gp, %0\n"

should do it?  This relies on the caller to obtain scope, the static-PIE call sets it to NULL and the others set it to l_scope.
Comment 4 Andreas Schwab 2024-02-22 23:16:15 UTC
No, l_relocated is probably the correct condition.  The point is that l_scope is always initialized, and it is only null because of undefined behaviour.
Comment 5 Palmer Dabbelt 2024-02-22 23:26:26 UTC
(In reply to Andreas Schwab from comment #4)
> No, l_relocated is probably the correct condition.  The point is that
> l_scope is always initialized, and it is only null because of undefined
> behaviour.

OK, that makes sense.  IIUC the relocated check short-circuits that UB, so it'd just be useless to also check l_scope?  Either way I think we're far enough along for a real patch, I sent https://inbox.sourceware.org/libc-alpha/20240222232400.6326-1-palmer@rivosinc.com/

Thanks!
Comment 6 Wang, Yanzhang 2024-02-23 03:06:56 UTC
I think the l_relocated is enough too.

I have did some investigation about this in the GNU ld and mold. If I understand  correctly, the mold will fill the relocated field if the command option --apply-dynamic-relocs is true, whose default value is true.

But for GNU ld, this is ignored and skipped at elfnn-riscv.c:riscv_elf_relocate_section

	      sreloc = elf_section_data (input_section)->sreloc;
	      riscv_elf_append_rela (output_bfd, sreloc, &outrel);
	      if (!relocate)
		continue;

Although the relative rel has been computed but the relocate is true, so we do nothing.

It seems this is not defined explicit somewhere and all the two different behaviors seems right.

For this case, the l_scope in dl_main_map is a pointer and need relocate. In the mold linker, the value in relocated field will be the value of global dl_main_map's location. But in GNU ld, the value will be 0.

So we need only to use the l_relocated, it's a value which will not be relocaed and will be set at the end of _dl_relocate_static_pie.
Comment 7 Sourceware Commits 2024-03-25 14:20:17 UTC
The master branch has been updated by Andreas Schwab <schwab@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=96d1b9ac2321b565f340ba8f3674597141e3450d

commit 96d1b9ac2321b565f340ba8f3674597141e3450d
Author: Palmer Dabbelt <palmer@rivosinc.com>
Date:   Thu Feb 22 15:24:00 2024 -0800

    RISC-V: Fix the static-PIE non-relocated object check
    
    The value of l_scope is only valid post relocation, so this original
    check was triggering undefined behavior.  Instead just directly check to
    see if the object has been relocated, at which point using l_scope is
    safe.
    
    Reported-by: Andreas Schwab <schwab@suse.de>
    Closes: BZ #31317
    Fixes: e0590f41fe ("RISC-V: Enable static-pie.")
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Comment 8 Andreas Schwab 2024-03-26 13:47:50 UTC
Fixed.