Bug 30020 - segfault in ld-linux after aug 2022
Summary: segfault in ld-linux after aug 2022
Status: RESOLVED NOTABUG
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.35
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-18 12:46 UTC by Pete Lomax
Modified: 2023-02-21 10:26 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pete Lomax 2023-01-18 12:46:26 UTC
My manually built elf x64 file started segfaulting somewhere deep inside ld-linux in August 2022, exact same had been fine for 10 months.
You can download the offending file (a single plain 4MB ELF x64) from http://phix.x10.mx/p64
Oddly running /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ./p64 works fine, ie the exact same files but loaded the other way round. 
One user upgraded their Linux kernel to 5.4.0-131-generic, which fixed the issue, but suspects 5.15.x won’t work. 
I'm still (5 months on) trying to find a fix. Apparently it wasn't the DT_HASH thing. The ELF headers are as simple as I could make them, should they need changing I need at the very least some clues.
Let me know if a 1.3K sys_write "It works!" with an easy to read byte-by-byte 300 line listing would help any (no idea why it would), alas not input for gcc/ld/etc.
Comment 1 Pete Lomax 2023-01-19 11:47:43 UTC
PS It has previously worked for about a decade without any plt/got sections.
Comment 2 Pete Lomax 2023-02-12 12:52:39 UTC
Is there anything else I can/should do to get this looked at?
Comment 3 Adhemerval Zanella 2023-02-13 14:23:34 UTC
With glibc master I see:

$ ./testrun.sh ./p64
Phix hybrid interpreter/compiler.

Version 1.0.1 (64 bit Linux) Copyright Pete Lomax 2006..2016

Enter ? for options or filename to execute:

Is it the expected output?

However, running through execve the kernel shows the binary can not be loaded:

kernel: Code: ff ff 00 45 31 db 48 8d 15 c9 ac 00 00 4c 8d 05 46 8f 01 00 4c 8d 2d 1f 8f 01 00 49 89 c2 48 8d 58 ff 48 89 f8 49>
kernel: p64[54765]: segfault at 0 ip 00007fc7ce59c350 sp 00007fffd17839c0 error 4 in ld-linux-x86-64.so.2[7fc7ce57c000+2a000]

So I am not sure it is really a glibc error, the binary is also stripped and not fully conformant (readelf -a show a lot of bogus information), so it is really difficult to debug.  I think what might be happening is kernel ELF binfmt is seeing some unexpected and thus throwing the error.
Comment 4 Pete Lomax 2023-02-13 19:30:38 UTC
Thank you, that is a massive step in the right direction, I think.
Yes, that is what success should look like. I had (and meant it):
  Size of section headers:           64 (bytes)
  Number of section headers:         0 (64)
Previous versions of readelf (I just checked, 3.5.0-54-generic) did not mistreat that 0 as 64.
Anyway, I patched that e_shentsize to 0 which got readelf down to 2 complaints (vs 0 previously):

readelf: Error: Size (0xb0) of section <no-strings> is not a multiple of its sh_entsize (0x10)
readelf: Error: Corrupt DT_SYMTAB dynamic entry

I've also got:
  LOAD           0x0000000000000228 0x0000000000400228 0x0000000000400228

and patching away that 40 made readelf happier, but spannered ld-linux-x86-64.so.2 ./p64
So, while that was probably wrong, it made readelf show the 10 dynamic section entries it used to,
along with an 11th (NULL) which it never did before, and (also new) followed by ten times:
    
0000004004b0  000100000001 R_X86_64_64      readelf: Error:  bad symbol index: 00000001 in reloc

PS: There really are no section headers, and there never was any debug info to be stripped.
These headers all worked perfectly for years before August 2022.
Comment 5 Pete Lomax 2023-02-15 17:45:00 UTC
To quickly reiterate, I believe that "0 now means 64" must be a big clue.

Either it should not be doing that, or those changes should be documented somewhere.
Comment 6 Adhemerval Zanella 2023-02-15 17:51:37 UTC
In any case, is this really a glibc bug? As I said I think the issue is a ill formatted ELF file that is preventing the kernel binfmt loader to start the process.
Comment 7 Pete Lomax 2023-02-15 18:38:48 UTC
If not glibc, what should this bug be classified as?
Comment 8 Adhemerval Zanella 2023-02-15 19:57:02 UTC
(In reply to Pete Lomax from comment #7)
> If not glibc, what should this bug be classified as?

It is what I tried to show on comment #3, where I see that the *kernel* is throwing the following error:

Feb 15 19:48:58 ubuntu22-vm kernel: Code: ff ff 00 45 31 db 48 8d 15 c9 ac 00 00 4c 8d 05 46 8f 01 00 4c 8d 2d 1f 8f 01 00 49 89 c2 48 8d 58 ff 48 89 f8 49
f7 da 66 90 <8b> 08 83 f9 07 77 19 85 c9 74 45 83 f9 07 77 40 48 63 0c 8a 48 01
Feb 15 19:48:58 ubuntu22-vm kernel: p64[1403]: segfault at 0 ip 00007ff10dcb7350 sp 00007ffd5a54b570 error 4 in ld-linux-x86-64.so.2[7ff10dc97000+2a000]

This means the resulting image can not execute the entry point after the kernel loads both dynamic loader and the binary for some reason. You can check that even with gdb, I can even start to execute the first instruction (starti).  

I think you will need to check why binfmt loading code in kernel is doing differently; or, most likely, fix the bogus binary.
Comment 9 Pete Lomax 2023-02-16 11:36:07 UTC
Therein lies the nub. I would love to "fix the bogus binary", but have no idea what that segfault actually means, "for some reason" just isn't quite enough.

I have already patched the binary to appease readelf, as best I can, but it does not fix the problem, and further also fubars the running of it ld-linux first (and the fact that way still works proves at least to me there is nothing fundamentally wrong with that binary anyway).

What I an really here for is to find out what precisely changed in August 2022 and what precisely can I do about it.
Comment 10 Adhemerval Zanella 2023-02-16 11:59:15 UTC
(In reply to Pete Lomax from comment #9)
> Therein lies the nub. I would love to "fix the bogus binary", but have no
> idea what that segfault actually means, "for some reason" just isn't quite
> enough.
> 
> I have already patched the binary to appease readelf, as best I can, but it
> does not fix the problem, and further also fubars the running of it ld-linux
> first (and the fact that way still works proves at least to me there is
> nothing fundamentally wrong with that binary anyway).
> 
> What I an really here for is to find out what precisely changed in August
> 2022 and what precisely can I do about it.

You can try bisect by rebuilding your binary with -Wl,-dynamic-linker= (similar to what --enable-hardcoded-path-in-tests) so you don't mess with your system. It is quite time-consuming to debug it because patchelf does not work this binary (and hexedtting is quite tedious), if you could provide a source code on how to reproduce it would be immensely better.
Comment 11 Pete Lomax 2023-02-16 22:03:37 UTC
I might struggle to bisect the ZERO changes I made this end, I rather presumed someone on your end would figure out which kernel build (circa August 2022) this suddenly went wrong on and examine the source code changes that went in since it worked on the previous day's build.
Comment 12 Pete Lomax 2023-02-16 22:33:33 UTC
Here's a 32 bit nasm example that works fine on 3.2.0-126-generic-pae but segfaults on 5.15.0-58-generic. I have narrowed it down to the PT_LOAD 3 or 4, if you put 4 of 4 back in it'll work again, I'll continue playing with that to see whether I can get what I need out of it (and make me a 64-bit version).

; tiny.asm

  

  BITS 32

  

  %define ET_EXEC       2

  %define EM_386        3

  %define EV_CURRENT    1

  

  %define PT_LOAD       1

  %define PT_DYNAMIC    2

  %define PT_INTERP     3

  

  %define PF_X          1

  %define PF_W          2

  %define PF_R          4

  

  %define STT_FUNC      2

  

  %define STB_GLOBAL    1

  

  %define R_386_PC32    2

  

  %define DT_NULL       0

  %define DT_NEEDED     1

  %define DT_HASH       4

  %define DT_STRTAB     5

  %define DT_SYMTAB     6

  %define DT_STRSZ      10

  %define DT_SYMENT     11

  %define DT_REL        17

  %define DT_RELSZ      18

  %define DT_RELENT     19

  

  %define ST_INFO(b, t) (((b) << 4) | (t))

  %define R_INFO(s, t)  (((s) << 8) | (t))

  

  phentsz       equ     0x20

;  shentsz      equ     0x28

  shentsz       equ     0x0

  

                org     0x08048000

  

  ;; The ELF header

  

  ehdr:                                                 ; Elf32_Ehdr

                db      0x7F, "ELF", 1, 1, 1            ;   e_ident

        times 9 db      0

                dw      ET_EXEC                         ;   e_type

                dw      EM_386                          ;   e_machine

                dd      EV_CURRENT                      ;   e_version

                dd      _start                          ;   e_entry

                dd      phdr - $$                       ;   e_phoff

                dd      0                               ;   e_shoff

                dd      0                               ;   e_flags

                dw      ehdrsz                          ;   e_ehsize

                dw      phentsz                         ;   e_phentsize

                dw      3                               ;   e_phnum

                dw      shentsz                         ;   e_shentsize

                dw      0                               ;   e_shnum

                dw      0                               ;   e_shstrndx

  ehdrsz        equ     $ - ehdr

  

  ;; The program segment header table

  

  phdr:                                                 ; Elf32_Phdr

                dd      PT_INTERP                       ;   p_type

                dd      interp - $$                     ;   p_offset

                dd      interp                          ;   p_vaddr

                dd      interp                          ;   p_paddr

                dd      interpsz                        ;   p_filesz

                dd      interpsz                        ;   p_memsz

                dd      PF_R                            ;   p_flags

                dd      0                               ;   p_align

;  phentsz      equ     $ - phdr



                dd      PT_DYNAMIC                      ;   p_type

                dd      dyntab - $$                     ;   p_offset

                dd      dyntab                          ;   p_vaddr

                dd      dyntab                          ;   p_paddr

                dd      dyntabsz                        ;   p_filesz

                dd      dyntabsz                        ;   p_memsz

                dd      PF_R | PF_W                     ;   p_flags

                dd      4                               ;   p_align



;               dd      PT_LOAD                         ;   p_type

;               dd      symtab - $$                     ;   p_offset

;               dd      symtab                          ;   p_vaddr

;               dd      symtab                          ;   p_paddr

;               dd      symtabsz                        ;   p_filesz

;               dd      symtabsz                        ;   p_memsz

;               dd      PF_R | PF_W                     ;   p_flags

;               dd      4                               ;   p_align

;

;               dd      PT_LOAD                         ;   p_type

;               dd      data - $$                       ;   p_offset

;               dd      data                            ;   p_vaddr

;               dd      data                            ;   p_paddr

;               dd      datasz                          ;   p_filesz

;               dd      datasz                          ;   p_memsz

;               dd      PF_R | PF_W                     ;   p_flags

;               dd      4                               ;   p_align



                dd      PT_LOAD                         ;   p_type

                dd      code - $$                       ;   p_offset

                dd      code                            ;   p_vaddr

                dd      code                            ;   p_paddr

                dd      codesz                          ;   p_filesz

                dd      codesz                          ;   p_memsz

                dd      PF_R | PF_W | PF_X              ;   p_flags

                dd      0x1000                          ;   p_align

  

;               dd      PT_LOAD                         ;   p_type

;               dd      0                               ;   p_offset

;               dd      $$                              ;   p_vaddr

;               dd      $$                              ;   p_paddr

;               dd      filesz                          ;   p_filesz

;               dd      memsz                           ;   p_memsz

;               dd      PF_R | PF_W | PF_X              ;   p_flags

;               dd      0x1000                          ;   p_align



  ;; The interpreter segment

  

  interp:

                db      '/lib/ld-linux.so.2', 0

  interpsz      equ     $ - interp

                db      0   ; pad/dword-align

  

  ;; The dynamic section

  

  dyntab:

                dd      DT_STRTAB, strtab

                dd      DT_STRSZ,  strtabsz

                dd      DT_SYMTAB, symtab

                dd      DT_SYMENT, symentsz

                dd      DT_REL,    reltab

                dd      DT_RELSZ,  reltabsz

                dd      DT_RELENT, relentsz

                dd      DT_HASH,   hashtab

                dd      DT_NEEDED, libc_name

                dd      DT_NULL,   0

  dyntabsz      equ     $ - dyntab

  

  ;; The symbol table

  

  symtab:                                               ; Elf32_Sym

                dd      0                               ;   st_name

                dd      0                               ;   st_value

                dd      0                               ;   st_size

                db      0                               ;   st_info

                db      0                               ;   st_other

                dw      0                               ;   st_shndx

  symentsz      equ     $ - symtab  

                dd      exit_name                       ;   st_name

                dd      0                               ;   st_value

                dd      0                               ;   st_size

                db      ST_INFO(STB_GLOBAL, STT_FUNC)   ;   st_info

                db      0                               ;   st_other

                dw      0                               ;   st_shndx

  

  ;; The hash table

  

  hashtab:

                dd      1                               ; no. of buckets

                dd      2                               ; no. of symbols

                dd      1                               ; the bucket: symbol #1

                dd      0, 0                            ; two links, both zero



  ;; The string table

  

  strtab:

                db      0

  libc_name     equ     $ - strtab

                db      'libc.so.6', 0

  exit_name     equ     $ - strtab

                db      '_exit', 0

  strtabsz      equ     $ - strtab

  

  ;; The relocation table

  

  reltab:                                               ; Elf32_Rel

                dd      exit_call                       ;   r_offset

                dd      R_INFO(1, R_386_PC32)           ;   r_info

  relentsz      equ     $ - reltab

  reltabsz      equ     $ - reltab



  symtabsz      equ     $ - symtab  

  

  ;; Data section



  data          db      'Phix'  

;  exit         dd      0



  datasz        equ     $ - data



  ;; Our program

  

  _start:

                push    byte 42

                call    exit_call

  exit_call     equ     $ - 4



  code          equ     _start

  codesz        equ     $ - code

  

  ;; End of the file image.

  

  filesz        equ     $ - $$

  memsz         equ     filesz
Comment 13 Pete Lomax 2023-02-16 22:52:45 UTC
PS this is what success would look like:
$ nasm -f bin -o tiny tiny.asm
$ chmod +x tiny
$ ./tiny ; echo $?
42
Comment 14 Adhemerval Zanella 2023-02-17 12:14:01 UTC
Trying to debug if this is a glibc issue, I am now even more convinced the issue is on the binary itself.  Testing with glibc 2.32 to 2.35 on a simple sysroot still segfaults.

It also does work on older 4.4.0 kernel (ubuntu16).
Comment 15 Pete Lomax 2023-02-20 18:26:31 UTC
Got it! It seems simply that p_align is now more strictly applied (fair enough, I guess). Many thanks, I've updated the binary at http://phix.x10.mx/p64 in case you still need it.

There is one small matter remaining, readelf -a still displays these errors:

readelf: Error: Size (0xb0) of section <no-strings> is not a multiple of its sh_entsize (0x10)
readelf: Error: Corrupt DT_SYMTAB dynamic entry

My best guess for that is it is comparing the number of entries in the Dynamic Link Info with the number of entries in the Symbol Table, for reasons that utterly escape me. It may also be "entries of 16 bytes per vs. entries of 24 bytes per", but I'm not sure.

It also says "There are no sections in this file." which is correct, so I guessed it must be somehow faking a section, and indeed I can see that it is - search readelf.c for "overkill".
I could live with this, but would rather know whether there is something I could do better.
I've still got lots of work to do, but the original crisis is now officially over, and of course words are somehow completely inadequate to express my gratitude for all your help.
Comment 16 Pete Lomax 2023-02-20 18:39:46 UTC
I should have said those are clearly bogus errors: 0xb0 is a multiple of 0x10, and the program now runs fine, which it certainly would not do were DT_SYMTAB actually corrupt. Also I am pretty sure any tweaks needed would be fairly close to that "overkill" comment, or the things it is calling.
Comment 17 Adhemerval Zanella 2023-02-20 19:09:20 UTC
(In reply to Pete Lomax from comment #15)
> Got it! It seems simply that p_align is now more strictly applied (fair
> enough, I guess). Many thanks, I've updated the binary at
> http://phix.x10.mx/p64 in case you still need it.
> 
> There is one small matter remaining, readelf -a still displays these errors:
> 
> readelf: Error: Size (0xb0) of section <no-strings> is not a multiple of its
> sh_entsize (0x10)
> readelf: Error: Corrupt DT_SYMTAB dynamic entry
> 
> My best guess for that is it is comparing the number of entries in the
> Dynamic Link Info with the number of entries in the Symbol Table, for
> reasons that utterly escape me. It may also be "entries of 16 bytes per vs.
> entries of 24 bytes per", but I'm not sure.
> 
> It also says "There are no sections in this file." which is correct, so I
> guessed it must be somehow faking a section, and indeed I can see that it is
> - search readelf.c for "overkill".
> I could live with this, but would rather know whether there is something I
> could do better.
> I've still got lots of work to do, but the original crisis is now officially
> over, and of course words are somehow completely inadequate to express my
> gratitude for all your help.

So I take this is not a glibc issue in the end then.  I will this bug as NOTABUG then.
Comment 18 Florian Weimer 2023-02-21 10:26:16 UTC
Closing as indicated.