This is with latest gdb 8.0, configured solely with '--target=powerpc-linux-gnu' on a GNU/Linux 64bit host system. When I try to remote debug through gdbserver on a PowerPC e500v2 system, I get the following assertion: (gdb) target remote 172.20.5.224:2345 Remote debugging using 172.20.5.224:2345 Reading symbols from /bsp/sysroot/lib/ld.so.1...(no debugging symbols found)...done. gdbarch.c:3228: internal-error: int gdbarch_elf_make_msymbol_special_p(gdbarch*): Assertion `gdbarch != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) y Version 7.12.1 has the same problem, but version 7.11 works, so this must be due to a change between those two versions.
I get the same with: * MPC8309E * Debugged SW compiled with GCC 4.8.2 -ggdb3 * Remote debugging or core dump analysis * Custom GDB 8.0 build with --enable-targets=powerpc-linux,powerpc-freebsd,powerpc-elf,powerpc-eabi Attaching a sample binary and sample core file for easy recreation.
Created attachment 10364 [details] Core file created by running on PowerPC
Created attachment 10365 [details] Crashing demo program for PowerPC
Created attachment 10366 [details] Source code for the crashing sample program
This would be very important to get fixed.. prevents usage on powerpc..
Still happens on 8.1.1
Out of curiosity, could you please try with git HEAD?
Yes it happens. Building on Ubuntu 14.04 64bit if relevant.
Decided to dig a bit deeper: The problematic scenario starts from function add_vsyscall_page, which calls symbol_file_add_from_memory with this strange "filename" 'system-supplied DSO at %s'. This call chain ends up to gdbarch_elf_make_msymbol_special_p(gdbarch*) and hits the assert. symbol_file_add_from_bfd symbol_file_add_with_addrs syms_from_objfile reread_symbols read_symbols sym_read elf_read_minimal_symbols elf_symtab_read ST_REGULAR get_objfile_arch returns NULL --> passed to gdbarch_elf_make_msymbol_special_p Big question for a complete gdb dev noob like me is whether or not the gdbarch should be filled for this kind of strange DSO object? If it should, is the correct place get_objfile_bfd_data which eventually queries it from gdbarch_find_by_info? The built-in debug prints of gdbarch_find_by_info are the following during the problematic scenario: gdbarch_find_by_info: info.bfd_arch_info powerpc:vle gdbarch_find_by_info: info.byte_order 0 (big) gdbarch_find_by_info: info.osabi 5 (GNU/Linux) gdbarch_find_by_info: info.abfd 0x3d5dbf0 gdbarch_find_by_info: info.tdep_info 0x0 gdbarch_find_by_info: Target rejected architecture Whereas earlier when symbols are processed for my main binary, it has printed: gdbarch_find_by_info: info.bfd_arch_info powerpc:common gdbarch_find_by_info: info.byte_order 0 (big) gdbarch_find_by_info: info.osabi 5 (GNU/Linux) gdbarch_find_by_info: info.abfd 0x3cd0e90 gdbarch_find_by_info: info.tdep_info 0x0 gdbarch_find_by_info: New architecture 0x3dc7380 (powerpc:common) selected So for some whatever reason the DSO symbol has different architecture info and this architecture is "not supported" by my GDB build? Even though I build it with --enable-targets=all.
(In reply to Lassi Niemistö from comment #9) > Decided to dig a bit deeper: > > The problematic scenario starts from function add_vsyscall_page, which calls > symbol_file_add_from_memory with this strange "filename" 'system-supplied > DSO at %s'. This call chain ends up to > gdbarch_elf_make_msymbol_special_p(gdbarch*) and hits the assert. > > symbol_file_add_from_bfd > symbol_file_add_with_addrs > syms_from_objfile > reread_symbols > read_symbols > sym_read > elf_read_minimal_symbols > elf_symtab_read ST_REGULAR > get_objfile_arch returns NULL --> passed to > gdbarch_elf_make_msymbol_special_p > > Big question for a complete gdb dev noob like me is whether or not the > gdbarch should be filled for this kind of strange DSO object? If it should, > is the correct place get_objfile_bfd_data which eventually queries it from > gdbarch_find_by_info? > > The built-in debug prints of gdbarch_find_by_info are the following during > the problematic scenario: > gdbarch_find_by_info: info.bfd_arch_info powerpc:vle > gdbarch_find_by_info: info.byte_order 0 (big) > gdbarch_find_by_info: info.osabi 5 (GNU/Linux) > gdbarch_find_by_info: info.abfd 0x3d5dbf0 > gdbarch_find_by_info: info.tdep_info 0x0 > gdbarch_find_by_info: Target rejected architecture > > Whereas earlier when symbols are processed for my main binary, it has > printed: > gdbarch_find_by_info: info.bfd_arch_info powerpc:common > gdbarch_find_by_info: info.byte_order 0 (big) > gdbarch_find_by_info: info.osabi 5 (GNU/Linux) > gdbarch_find_by_info: info.abfd 0x3cd0e90 > gdbarch_find_by_info: info.tdep_info 0x0 > gdbarch_find_by_info: New architecture 0x3dc7380 (powerpc:common) selected > > So for some whatever reason the DSO symbol has different architecture info > and this architecture is "not supported" by my GDB build? Even though I > build it with --enable-targets=all. The first one (the rejected one) has "powerpc:vle", whereas the second one has "powerpc:common". Why did you expect "powerpc:vle" to be chosen? Instinctively, I would expect the same arch to be chosen for the vsyscall page than for the main objfile. Can you see if the architecture "powerpc:common" has also been rejected for the vsyscall page? If so, can you step in the decision process to see why?
(In reply to Simon Marchi from comment #10) > The first one (the rejected one) has "powerpc:vle", whereas the second one > has "powerpc:common". Why did you expect "powerpc:vle" to be chosen? > Instinctively, I would expect the same arch to be chosen for the vsyscall > page than for the main objfile. Can you see if the architecture > "powerpc:common" has also been rejected for the vsyscall page? If so, can > you step in the decision process to see why? Actually, forget about this, I did not understand the process right. I was able to reproduce the crash using the core you provided. The "vle" mach comes from the BFD library. We open the BFD from memory, and the BFD library decides it's of the "powerpc" arch, "vle" microarch (the numerical value is 84). Then, we try to look up a gdbarch corresponding to that BFD arch. However, GDB knows nothing bfd_mach_ppc_vle. The powerpc gdbarch init code looks in this table of variants: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/rs6000-tdep.c;h=e78de49b2e69808966fa77d0e1ba3b071dfe540e;hb=HEAD#l3029 but vle is not present there. So either: 1. BFD is wrong about the micro architecture, it should not be vle 2. GDB should know about the vle microarchitecture
Sounds related to bug 19797.
Thanks for comments. Some more findings: * The executable file load does not cause issues, it is the core file load part * Also gdb 7.9.1 (last version working fine) involves loading this "system supplied DSO" as the last thing upon core file load but with it it ends up searching for powerpc:common and not powerpc:vle
And I can confirm our architecture has nothing to do with vle, so it would mean it is the BFD lib who gets this wrong. Is the BFD library statically built into GDB as I can see at least some of its sources under binutils-gdb?
The file bfd/elf32-ppc.c has been modified between the versions and there is now a new function /* When defaulting arch/mach, decode apuinfo to find a better match. */ _bfd_elf_ppc_set_arch (bfd *abfd) ..which thinks to find PPC_APUINFO_VLE
Adding Alan Modra to the CC list if we could get some comment on this.
The core file load1 segment contains an image of a kernel vdso that has a .PPC.EMB.apuinfo section of size 24. That section contains p/x contents[0]@24 $2 = {0x0, 0x0, 0x0, 0x8, 0x0, 0x0, 0x0, 0x4, 0x0, 0x0, 0x0, 0x2, 0x41, 0x50, 0x55, 0x69, 0x6e, 0x66, 0x6f, 0x0, 0x1, 0x4, 0x0, 0x1} So there is a single apuinfo word, 0x01040001 saying PPC_APUINFO_VLE (high 16 bits) revision 1 (low 16 bits). BFD is therefore correctly setting arch/mach to "powerpc:vle" for this object. So the question becomes how did PPC_APUINFO_VLE become set? I wonder how old a toolchain was used to build your kernel. If gas had VLE support but lacked git commit fbd940576f 2014-08-22 the that might be the cause. See https://sourceware.org/ml/binutils/2014-08/msg00217.html
Thanks Alan for the analysis! We are running binutils 2.24 plus some patches on top of it. Git log for this 2.24 tag tells it has at least the main commit of VLE support: b9c361e0ad33f2c841067fd4bf0959a72ad5a265 Add support for PowerPC VLE. And indeed the fixup commit fbd940576f seems absent (2.24 is dated 2013-12-02). The first version with the fix seems to be 2.25. This should explain the results and we shall primarily mitigate the issue in our project by updating the toolchain. To achieve a one-gdb-for-all build that works with legacy branches, we might patch the gdb to skip parsing apuinfo for good as we know the architecture anyway in this case.
Yes, our toolchain is very old as well, we're still using Sourcery G++ Lite 2011.03-38, which has binutils 2.20. Thank you Lassi for digging into this!