Summary: | segfault in ld-linux after aug 2022 | ||
---|---|---|---|
Product: | glibc | Reporter: | Pete Lomax <petelomax> |
Component: | dynamic-link | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED NOTABUG | ||
Severity: | normal | CC: | adhemerval.zanella, fweimer |
Priority: | P2 | Flags: | fweimer:
security-
|
Version: | 2.35 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: |
Description
Pete Lomax
2023-01-18 12:46:26 UTC
PS It has previously worked for about a decade without any plt/got sections. Is there anything else I can/should do to get this looked at? With glibc master I see: $ ./testrun.sh ./p64 Phix hybrid interpreter/compiler. Version 1.0.1 (64 bit Linux) Copyright Pete Lomax 2006..2016 Enter ? for options or filename to execute: Is it the expected output? However, running through execve the kernel shows the binary can not be loaded: kernel: Code: ff ff 00 45 31 db 48 8d 15 c9 ac 00 00 4c 8d 05 46 8f 01 00 4c 8d 2d 1f 8f 01 00 49 89 c2 48 8d 58 ff 48 89 f8 49> kernel: p64[54765]: segfault at 0 ip 00007fc7ce59c350 sp 00007fffd17839c0 error 4 in ld-linux-x86-64.so.2[7fc7ce57c000+2a000] So I am not sure it is really a glibc error, the binary is also stripped and not fully conformant (readelf -a show a lot of bogus information), so it is really difficult to debug. I think what might be happening is kernel ELF binfmt is seeing some unexpected and thus throwing the error. Thank you, that is a massive step in the right direction, I think. Yes, that is what success should look like. I had (and meant it): Size of section headers: 64 (bytes) Number of section headers: 0 (64) Previous versions of readelf (I just checked, 3.5.0-54-generic) did not mistreat that 0 as 64. Anyway, I patched that e_shentsize to 0 which got readelf down to 2 complaints (vs 0 previously): readelf: Error: Size (0xb0) of section <no-strings> is not a multiple of its sh_entsize (0x10) readelf: Error: Corrupt DT_SYMTAB dynamic entry I've also got: LOAD 0x0000000000000228 0x0000000000400228 0x0000000000400228 and patching away that 40 made readelf happier, but spannered ld-linux-x86-64.so.2 ./p64 So, while that was probably wrong, it made readelf show the 10 dynamic section entries it used to, along with an 11th (NULL) which it never did before, and (also new) followed by ten times: 0000004004b0 000100000001 R_X86_64_64 readelf: Error: bad symbol index: 00000001 in reloc PS: There really are no section headers, and there never was any debug info to be stripped. These headers all worked perfectly for years before August 2022. To quickly reiterate, I believe that "0 now means 64" must be a big clue. Either it should not be doing that, or those changes should be documented somewhere. In any case, is this really a glibc bug? As I said I think the issue is a ill formatted ELF file that is preventing the kernel binfmt loader to start the process. If not glibc, what should this bug be classified as? (In reply to Pete Lomax from comment #7) > If not glibc, what should this bug be classified as? It is what I tried to show on comment #3, where I see that the *kernel* is throwing the following error: Feb 15 19:48:58 ubuntu22-vm kernel: Code: ff ff 00 45 31 db 48 8d 15 c9 ac 00 00 4c 8d 05 46 8f 01 00 4c 8d 2d 1f 8f 01 00 49 89 c2 48 8d 58 ff 48 89 f8 49 f7 da 66 90 <8b> 08 83 f9 07 77 19 85 c9 74 45 83 f9 07 77 40 48 63 0c 8a 48 01 Feb 15 19:48:58 ubuntu22-vm kernel: p64[1403]: segfault at 0 ip 00007ff10dcb7350 sp 00007ffd5a54b570 error 4 in ld-linux-x86-64.so.2[7ff10dc97000+2a000] This means the resulting image can not execute the entry point after the kernel loads both dynamic loader and the binary for some reason. You can check that even with gdb, I can even start to execute the first instruction (starti). I think you will need to check why binfmt loading code in kernel is doing differently; or, most likely, fix the bogus binary. Therein lies the nub. I would love to "fix the bogus binary", but have no idea what that segfault actually means, "for some reason" just isn't quite enough. I have already patched the binary to appease readelf, as best I can, but it does not fix the problem, and further also fubars the running of it ld-linux first (and the fact that way still works proves at least to me there is nothing fundamentally wrong with that binary anyway). What I an really here for is to find out what precisely changed in August 2022 and what precisely can I do about it. (In reply to Pete Lomax from comment #9) > Therein lies the nub. I would love to "fix the bogus binary", but have no > idea what that segfault actually means, "for some reason" just isn't quite > enough. > > I have already patched the binary to appease readelf, as best I can, but it > does not fix the problem, and further also fubars the running of it ld-linux > first (and the fact that way still works proves at least to me there is > nothing fundamentally wrong with that binary anyway). > > What I an really here for is to find out what precisely changed in August > 2022 and what precisely can I do about it. You can try bisect by rebuilding your binary with -Wl,-dynamic-linker= (similar to what --enable-hardcoded-path-in-tests) so you don't mess with your system. It is quite time-consuming to debug it because patchelf does not work this binary (and hexedtting is quite tedious), if you could provide a source code on how to reproduce it would be immensely better. I might struggle to bisect the ZERO changes I made this end, I rather presumed someone on your end would figure out which kernel build (circa August 2022) this suddenly went wrong on and examine the source code changes that went in since it worked on the previous day's build. Here's a 32 bit nasm example that works fine on 3.2.0-126-generic-pae but segfaults on 5.15.0-58-generic. I have narrowed it down to the PT_LOAD 3 or 4, if you put 4 of 4 back in it'll work again, I'll continue playing with that to see whether I can get what I need out of it (and make me a 64-bit version). ; tiny.asm BITS 32 %define ET_EXEC 2 %define EM_386 3 %define EV_CURRENT 1 %define PT_LOAD 1 %define PT_DYNAMIC 2 %define PT_INTERP 3 %define PF_X 1 %define PF_W 2 %define PF_R 4 %define STT_FUNC 2 %define STB_GLOBAL 1 %define R_386_PC32 2 %define DT_NULL 0 %define DT_NEEDED 1 %define DT_HASH 4 %define DT_STRTAB 5 %define DT_SYMTAB 6 %define DT_STRSZ 10 %define DT_SYMENT 11 %define DT_REL 17 %define DT_RELSZ 18 %define DT_RELENT 19 %define ST_INFO(b, t) (((b) << 4) | (t)) %define R_INFO(s, t) (((s) << 8) | (t)) phentsz equ 0x20 ; shentsz equ 0x28 shentsz equ 0x0 org 0x08048000 ;; The ELF header ehdr: ; Elf32_Ehdr db 0x7F, "ELF", 1, 1, 1 ; e_ident times 9 db 0 dw ET_EXEC ; e_type dw EM_386 ; e_machine dd EV_CURRENT ; e_version dd _start ; e_entry dd phdr - $$ ; e_phoff dd 0 ; e_shoff dd 0 ; e_flags dw ehdrsz ; e_ehsize dw phentsz ; e_phentsize dw 3 ; e_phnum dw shentsz ; e_shentsize dw 0 ; e_shnum dw 0 ; e_shstrndx ehdrsz equ $ - ehdr ;; The program segment header table phdr: ; Elf32_Phdr dd PT_INTERP ; p_type dd interp - $$ ; p_offset dd interp ; p_vaddr dd interp ; p_paddr dd interpsz ; p_filesz dd interpsz ; p_memsz dd PF_R ; p_flags dd 0 ; p_align ; phentsz equ $ - phdr dd PT_DYNAMIC ; p_type dd dyntab - $$ ; p_offset dd dyntab ; p_vaddr dd dyntab ; p_paddr dd dyntabsz ; p_filesz dd dyntabsz ; p_memsz dd PF_R | PF_W ; p_flags dd 4 ; p_align ; dd PT_LOAD ; p_type ; dd symtab - $$ ; p_offset ; dd symtab ; p_vaddr ; dd symtab ; p_paddr ; dd symtabsz ; p_filesz ; dd symtabsz ; p_memsz ; dd PF_R | PF_W ; p_flags ; dd 4 ; p_align ; ; dd PT_LOAD ; p_type ; dd data - $$ ; p_offset ; dd data ; p_vaddr ; dd data ; p_paddr ; dd datasz ; p_filesz ; dd datasz ; p_memsz ; dd PF_R | PF_W ; p_flags ; dd 4 ; p_align dd PT_LOAD ; p_type dd code - $$ ; p_offset dd code ; p_vaddr dd code ; p_paddr dd codesz ; p_filesz dd codesz ; p_memsz dd PF_R | PF_W | PF_X ; p_flags dd 0x1000 ; p_align ; dd PT_LOAD ; p_type ; dd 0 ; p_offset ; dd $$ ; p_vaddr ; dd $$ ; p_paddr ; dd filesz ; p_filesz ; dd memsz ; p_memsz ; dd PF_R | PF_W | PF_X ; p_flags ; dd 0x1000 ; p_align ;; The interpreter segment interp: db '/lib/ld-linux.so.2', 0 interpsz equ $ - interp db 0 ; pad/dword-align ;; The dynamic section dyntab: dd DT_STRTAB, strtab dd DT_STRSZ, strtabsz dd DT_SYMTAB, symtab dd DT_SYMENT, symentsz dd DT_REL, reltab dd DT_RELSZ, reltabsz dd DT_RELENT, relentsz dd DT_HASH, hashtab dd DT_NEEDED, libc_name dd DT_NULL, 0 dyntabsz equ $ - dyntab ;; The symbol table symtab: ; Elf32_Sym dd 0 ; st_name dd 0 ; st_value dd 0 ; st_size db 0 ; st_info db 0 ; st_other dw 0 ; st_shndx symentsz equ $ - symtab dd exit_name ; st_name dd 0 ; st_value dd 0 ; st_size db ST_INFO(STB_GLOBAL, STT_FUNC) ; st_info db 0 ; st_other dw 0 ; st_shndx ;; The hash table hashtab: dd 1 ; no. of buckets dd 2 ; no. of symbols dd 1 ; the bucket: symbol #1 dd 0, 0 ; two links, both zero ;; The string table strtab: db 0 libc_name equ $ - strtab db 'libc.so.6', 0 exit_name equ $ - strtab db '_exit', 0 strtabsz equ $ - strtab ;; The relocation table reltab: ; Elf32_Rel dd exit_call ; r_offset dd R_INFO(1, R_386_PC32) ; r_info relentsz equ $ - reltab reltabsz equ $ - reltab symtabsz equ $ - symtab ;; Data section data db 'Phix' ; exit dd 0 datasz equ $ - data ;; Our program _start: push byte 42 call exit_call exit_call equ $ - 4 code equ _start codesz equ $ - code ;; End of the file image. filesz equ $ - $$ memsz equ filesz PS this is what success would look like: $ nasm -f bin -o tiny tiny.asm $ chmod +x tiny $ ./tiny ; echo $? 42 Trying to debug if this is a glibc issue, I am now even more convinced the issue is on the binary itself. Testing with glibc 2.32 to 2.35 on a simple sysroot still segfaults. It also does work on older 4.4.0 kernel (ubuntu16). Got it! It seems simply that p_align is now more strictly applied (fair enough, I guess). Many thanks, I've updated the binary at http://phix.x10.mx/p64 in case you still need it. There is one small matter remaining, readelf -a still displays these errors: readelf: Error: Size (0xb0) of section <no-strings> is not a multiple of its sh_entsize (0x10) readelf: Error: Corrupt DT_SYMTAB dynamic entry My best guess for that is it is comparing the number of entries in the Dynamic Link Info with the number of entries in the Symbol Table, for reasons that utterly escape me. It may also be "entries of 16 bytes per vs. entries of 24 bytes per", but I'm not sure. It also says "There are no sections in this file." which is correct, so I guessed it must be somehow faking a section, and indeed I can see that it is - search readelf.c for "overkill". I could live with this, but would rather know whether there is something I could do better. I've still got lots of work to do, but the original crisis is now officially over, and of course words are somehow completely inadequate to express my gratitude for all your help. I should have said those are clearly bogus errors: 0xb0 is a multiple of 0x10, and the program now runs fine, which it certainly would not do were DT_SYMTAB actually corrupt. Also I am pretty sure any tweaks needed would be fairly close to that "overkill" comment, or the things it is calling. (In reply to Pete Lomax from comment #15) > Got it! It seems simply that p_align is now more strictly applied (fair > enough, I guess). Many thanks, I've updated the binary at > http://phix.x10.mx/p64 in case you still need it. > > There is one small matter remaining, readelf -a still displays these errors: > > readelf: Error: Size (0xb0) of section <no-strings> is not a multiple of its > sh_entsize (0x10) > readelf: Error: Corrupt DT_SYMTAB dynamic entry > > My best guess for that is it is comparing the number of entries in the > Dynamic Link Info with the number of entries in the Symbol Table, for > reasons that utterly escape me. It may also be "entries of 16 bytes per vs. > entries of 24 bytes per", but I'm not sure. > > It also says "There are no sections in this file." which is correct, so I > guessed it must be somehow faking a section, and indeed I can see that it is > - search readelf.c for "overkill". > I could live with this, but would rather know whether there is something I > could do better. > I've still got lots of work to do, but the original crisis is now officially > over, and of course words are somehow completely inadequate to express my > gratitude for all your help. So I take this is not a glibc issue in the end then. I will this bug as NOTABUG then. Closing as indicated. |