I'm sorry to report that commit f64f4ce069300f33e26b025ebb0233d5ca3957a5, which removed the ELF_DURING_STARTUP optimization, broke ld.so on m68k-unknown-linux-gnu. It segfaults immediately, even when calling ld.so with no arguments or just --help. I found this by bisecting. Reverting this commit against 2.35 or the latest master (23102686ec67b856a2d4fd25ddaa1c0b8d175c4f) allows it to work again.
@ Adhemerval: is that commit safe to (temporarily) revert for all arches?
Can you provide a reproduce? I cannot reproduce the failure at master 23102686ec67b856a2d4fd25ddaa1c0b8d175c4f (2022-04-15) or release/2.35/master ba9c42ac0e265bf1e4ec1075fa20e7166fda8bfc (2022-04-15). The output looks like: ``` % m68k-linux-gnu-gcc --version m68k-linux-gnu-gcc (Debian 11.2.0-10) 11.2.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. % mkdir -p out/m68k && cd out/m68k % ../../configure --prefix=/tmp/glibc/m68k && make -j 50 && make -j 50 install && 'cp' -f /usr/m68k-linux-gnu/lib/libgcc_s.so.2 /tmp/glibc/m68k/lib ... % /tmp/glibc/m68k/lib/ld-linux-x86-64.so.2 --version ld.so (GNU libc) development release version 2.35.9000. Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. % /tmp/glibc/m68k/bin/localedef --version localedef (GNU libc) 2.35.9000 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Ulrich Drepper. ```
Sorry, the command should specify --host. I confirm the breakage. ../../configure --prefix=/tmp/glibc/m68k --host=m68k-glibc-linux-gnu CC=m68k-linux-gnu-gcc CXX=m68k-linux-gnu-g++ && make -j 50 && make -j 50 install && 'cp' -f /usr/m68k-linux-gnu/lib/libgcc_s.so.2 /tmp/glibc/m68k/lib /tmp/glibc/m68k/lib/ld.so.1 does crash and reverting f64f4ce069300f33e26b025ebb0233d5ca3957a5 fixes it. Unfortunately I don't have a m68k machine to debug this. % /tmp/glibc/m68k/lib/ld.so.1 /tmp/glibc/m68k/lib/ld.so.1: missing program name Try '/tmp/glibc/m68k/lib/ld.so.1 --help' for more information.
I think the issue is pre-existing. `elf: Assume disjointed .rela.dyn and .rela.plt for loader` just tickled the compiler to generate the code which breaks. The m68k ld.so runs the following instructions in order. switch (r_type) # load jump table from GOT to fp. note the GOT entry hasn't been relocated by a R_68K_RELATIVE below 15738: 2c6d 08d4 moveal %a5@(2260),%fp *reloc_addr = l_addr + reloc->r_addend; # process relative relocations 1576a: 2784 c800 movel %d4,%a3@(0000000000000000,%a4:l) switch (r_type) # load a value from the jump table 157c8: 3c36 6800 movew %fp@(0000000000000000,%d6:l),%d6
We are(In reply to Fangrui Song from comment #4) > I think the issue is pre-existing. `elf: Assume disjointed .rela.dyn and > .rela.plt for loader` just tickled the compiler to generate the code which > breaks. > The m68k ld.so runs the following instructions in order. > > switch (r_type) # load jump table from GOT to fp. note the GOT entry > hasn't been relocated by a R_68K_RELATIVE below > 15738: 2c6d 08d4 moveal %a5@(2260),%fp > > *reloc_addr = l_addr + reloc->r_addend; # process relative relocations > 1576a: 2784 c800 movel %d4,%a3@(0000000000000000,%a4:l) > > switch (r_type) # load a value from the jump table > 157c8: 3c36 6800 movew %fp@(0000000000000000,%d6:l),%d6 We are most likely hitting a compiler bug here, unrolling the loops fixes the issue: diff --git a/elf/dynamic-link.h b/elf/dynamic-link.h index 25dd7ca4f2..28cb25bdcd 100644 --- a/elf/dynamic-link.h +++ b/elf/dynamic-link.h @@ -114,13 +114,18 @@ elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[], } \ } \ \ - for (int ranges_index = 0; ranges_index < 2; ++ranges_index) \ - elf_dynamic_do_##reloc ((map), scope, \ - ranges[ranges_index].start, \ - ranges[ranges_index].size, \ - ranges[ranges_index].nrelative, \ - ranges[ranges_index].lazy, \ - skip_ifunc); \ + elf_dynamic_do_##reloc ((map), scope, \ + ranges[0].start, \ + ranges[0].size, \ + ranges[0].nrelative, \ + ranges[0].lazy, \ + skip_ifunc); \ + elf_dynamic_do_##reloc ((map), scope, \ + ranges[1].start, \ + ranges[1].size, \ + ranges[1].nrelative, \ + ranges[1].lazy, \ + skip_ifunc); \ } while (0) # if ELF_MACHINE_NO_REL || ELF_MACHINE_NO_RELA It does increase the loader size, though.
The compiler is perfectly entitled to reorder the insns as above. It cannot know that it is in a special environment where the GOT isn't relocated yet.
Created attachment 14069 [details] avoid switch statement when performing relocations Please try this patch.
Created attachment 14070 [details] handle less relocation types during ld.so bootstrap Please try this instead.
Unrolling this may mitigate the issue likely because the optimization is related to loop-invariant code motion. gcc somehow decides hoisting the GOT address of the jump table is beneficial (for such a non-PI_STATIC_AND_HIDDEN arch). musl ld.so uses a function call as a barrier (__dls2) to avoid such code generation problems. (In reply to Andreas Schwab from comment #8) > Created attachment 14070 [details] > handle less relocation types during ld.so bootstrap > > Please try this instead. Thanks. This looks good to me. `case R_68K_NONE` can also be placed in `#ifdef RTLD_BOOTSTRAP` if it helps reduce the possibility that gcc uses a jump table (which needs a relocated address in a GOT entry).
I can confirm that the patch works here. Many thanks for your speedy fix.
(In reply to James Le Cuirot from comment #10) > I can confirm that the patch works here. Many thanks for your speedy fix. Thanks for testing. Pushed https://sourceware.org/git/?p=glibc.git;a=commit;h=a8e9b5b8079d18116ca69c9797e77804ecf2ee7e to master and cherry picked it into release/2.35/master This is fixed, but I don't have permission to change the status.
Thanks! Marking as fixed then.