Bug 29071 - m68k: Removal of ELF_DURING_STARTUP optimization broke ld.so
Summary: m68k: Removal of ELF_DURING_STARTUP optimization broke ld.so
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.35
: P2 normal
Target Milestone: 2.36
Assignee: Adhemerval Zanella
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-04-17 16:55 UTC by James Le Cuirot
Modified: 2022-04-21 07:58 UTC (History)
4 users (show)

See Also:
Host: m68k-unknown-linux-gnu
Target: m68k-unknown-linux-gnu
Build: m68k-unknown-linux-gnu
Last reconfirmed:
fweimer: security-


Attachments
avoid switch statement when performing relocations (1.45 KB, patch)
2022-04-18 14:18 UTC, Andreas Schwab
Details | Diff
handle less relocation types during ld.so bootstrap (915 bytes, patch)
2022-04-18 15:14 UTC, Andreas Schwab
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description James Le Cuirot 2022-04-17 16:55:02 UTC
I'm sorry to report that commit f64f4ce069300f33e26b025ebb0233d5ca3957a5, which removed the ELF_DURING_STARTUP optimization, broke ld.so on m68k-unknown-linux-gnu. It segfaults immediately, even when calling ld.so with no arguments or just --help.

I found this by bisecting. Reverting this commit against 2.35 or the latest master (23102686ec67b856a2d4fd25ddaa1c0b8d175c4f) allows it to work again.
Comment 1 Andreas K. Huettel 2022-04-17 17:06:43 UTC
@ Adhemerval: is that commit safe to (temporarily) revert for all arches?
Comment 2 Fangrui Song 2022-04-17 18:54:38 UTC
Can you provide a reproduce?
I cannot reproduce the failure at master 23102686ec67b856a2d4fd25ddaa1c0b8d175c4f (2022-04-15) or
release/2.35/master ba9c42ac0e265bf1e4ec1075fa20e7166fda8bfc (2022-04-15).
The output looks like:

```
% m68k-linux-gnu-gcc --version
m68k-linux-gnu-gcc (Debian 11.2.0-10) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

% mkdir -p out/m68k && cd out/m68k
% ../../configure --prefix=/tmp/glibc/m68k && make -j 50 && make -j 50 install && 'cp' -f /usr/m68k-linux-gnu/lib/libgcc_s.so.2 /tmp/glibc/m68k/lib
...
% /tmp/glibc/m68k/lib/ld-linux-x86-64.so.2 --version
ld.so (GNU libc) development release version 2.35.9000.
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
% /tmp/glibc/m68k/bin/localedef --version
localedef (GNU libc) 2.35.9000
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Ulrich Drepper.
```
Comment 3 Fangrui Song 2022-04-17 20:52:44 UTC
Sorry, the command should specify --host. I confirm the breakage.

../../configure --prefix=/tmp/glibc/m68k --host=m68k-glibc-linux-gnu CC=m68k-linux-gnu-gcc CXX=m68k-linux-gnu-g++ && make -j 50 && make -j 50 install && 'cp' -f /usr/m68k-linux-gnu/lib/libgcc_s.so.2 /tmp/glibc/m68k/lib

/tmp/glibc/m68k/lib/ld.so.1 does crash and reverting f64f4ce069300f33e26b025ebb0233d5ca3957a5 fixes it. Unfortunately I don't have a m68k machine to debug this.

% /tmp/glibc/m68k/lib/ld.so.1
/tmp/glibc/m68k/lib/ld.so.1: missing program name
Try '/tmp/glibc/m68k/lib/ld.so.1 --help' for more information.
Comment 4 Fangrui Song 2022-04-18 02:09:33 UTC
I think the issue is pre-existing. `elf: Assume disjointed .rela.dyn and .rela.plt for loader` just tickled the compiler to generate the code which breaks.
The m68k ld.so runs the following instructions in order.

   switch (r_type)   # load jump table from GOT to fp. note the GOT entry hasn't been relocated by a R_68K_RELATIVE below
15738:       2c6d 08d4       moveal %a5@(2260),%fp

  *reloc_addr = l_addr + reloc->r_addend;  # process relative relocations
1576a:       2784 c800       movel %d4,%a3@(0000000000000000,%a4:l)

   switch (r_type)   # load a value from the jump table
157c8:       3c36 6800       movew %fp@(0000000000000000,%d6:l),%d6
Comment 5 Adhemerval Zanella 2022-04-18 11:39:03 UTC
We are(In reply to Fangrui Song from comment #4)
> I think the issue is pre-existing. `elf: Assume disjointed .rela.dyn and
> .rela.plt for loader` just tickled the compiler to generate the code which
> breaks.
> The m68k ld.so runs the following instructions in order.
> 
>    switch (r_type)   # load jump table from GOT to fp. note the GOT entry
> hasn't been relocated by a R_68K_RELATIVE below
> 15738:       2c6d 08d4       moveal %a5@(2260),%fp
> 
>   *reloc_addr = l_addr + reloc->r_addend;  # process relative relocations
> 1576a:       2784 c800       movel %d4,%a3@(0000000000000000,%a4:l)
> 
>    switch (r_type)   # load a value from the jump table
> 157c8:       3c36 6800       movew %fp@(0000000000000000,%d6:l),%d6

We are most likely hitting a compiler bug here, unrolling the loops fixes the issue:

diff --git a/elf/dynamic-link.h b/elf/dynamic-link.h
index 25dd7ca4f2..28cb25bdcd 100644
--- a/elf/dynamic-link.h
+++ b/elf/dynamic-link.h
@@ -114,13 +114,18 @@ elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
          }                                                                   \
       }                                                                              \
                                                                              \
-      for (int ranges_index = 0; ranges_index < 2; ++ranges_index)           \
-        elf_dynamic_do_##reloc ((map), scope,                                \
-                               ranges[ranges_index].start,                   \
-                               ranges[ranges_index].size,                    \
-                               ranges[ranges_index].nrelative,               \
-                               ranges[ranges_index].lazy,                    \
-                               skip_ifunc);                                  \
+      elf_dynamic_do_##reloc ((map), scope,                                  \
+                             ranges[0].start,                                \
+                             ranges[0].size,                                 \
+                             ranges[0].nrelative,                            \
+                             ranges[0].lazy,                                 \
+                             skip_ifunc);                                    \
+      elf_dynamic_do_##reloc ((map), scope,                                  \
+                             ranges[1].start,                                \
+                             ranges[1].size,                                 \
+                             ranges[1].nrelative,                            \
+                             ranges[1].lazy,                                 \
+                             skip_ifunc);                                    \
   } while (0)
 
 # if ELF_MACHINE_NO_REL || ELF_MACHINE_NO_RELA


It does increase the loader size, though.
Comment 6 Andreas Schwab 2022-04-18 12:21:22 UTC
The compiler is perfectly entitled to reorder the insns as above.  It cannot know that it is in a special environment where the GOT isn't relocated yet.
Comment 7 Andreas Schwab 2022-04-18 14:18:06 UTC
Created attachment 14069 [details]
avoid switch statement when performing relocations

Please try this patch.
Comment 8 Andreas Schwab 2022-04-18 15:14:19 UTC
Created attachment 14070 [details]
handle less relocation types during ld.so bootstrap

Please try this instead.
Comment 9 Fangrui Song 2022-04-18 15:57:25 UTC
Unrolling this may mitigate the issue likely because the optimization is related to loop-invariant code motion. gcc somehow decides hoisting the GOT address of the jump table is beneficial (for such a non-PI_STATIC_AND_HIDDEN arch).

musl ld.so uses a function call as a barrier (__dls2) to avoid such code generation problems.

(In reply to Andreas Schwab from comment #8)
> Created attachment 14070 [details]
> handle less relocation types during ld.so bootstrap
> 
> Please try this instead.

Thanks. This looks good to me. `case R_68K_NONE` can also be placed in `#ifdef RTLD_BOOTSTRAP` if it helps reduce the possibility that gcc uses a jump table (which needs a relocated address in a GOT entry).
Comment 10 James Le Cuirot 2022-04-18 21:52:32 UTC
I can confirm that the patch works here. Many thanks for your speedy fix.
Comment 11 Fangrui Song 2022-04-20 21:57:25 UTC
(In reply to James Le Cuirot from comment #10)
> I can confirm that the patch works here. Many thanks for your speedy fix.

Thanks for testing.

Pushed 
https://sourceware.org/git/?p=glibc.git;a=commit;h=a8e9b5b8079d18116ca69c9797e77804ecf2ee7e to master and cherry picked it into release/2.35/master 

This is fixed, but I don't have permission to change the status.
Comment 12 James Le Cuirot 2022-04-20 21:59:47 UTC
Thanks! Marking as fixed then.