Bug 21603 - powerpc-linux-gnu-gdb throws internal error on remote debugging: 'gdbarch!=NULL' failed
Summary: powerpc-linux-gnu-gdb throws internal error on remote debugging: 'gdbarch!=NU...
Status: UNCONFIRMED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: 8.0
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-16 12:41 UTC by David Engster
Modified: 2018-08-08 07:46 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Core file created by running on PowerPC (14.92 KB, application/octet-stream)
2017-08-25 09:44 UTC, Lassi Niemistö
Details
Crashing demo program for PowerPC (22.62 KB, application/octet-stream)
2017-08-25 09:46 UTC, Lassi Niemistö
Details
Source code for the crashing sample program (741 bytes, text/plain)
2017-08-25 09:47 UTC, Lassi Niemistö
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Engster 2017-06-16 12:41:02 UTC
This is with latest gdb 8.0, configured solely with '--target=powerpc-linux-gnu' on a GNU/Linux 64bit host system. When I try to remote debug through gdbserver on a PowerPC e500v2 system, I get the following assertion:

(gdb) target remote 172.20.5.224:2345
Remote debugging using 172.20.5.224:2345
Reading symbols from /bsp/sysroot/lib/ld.so.1...(no debugging symbols found)...done.
gdbarch.c:3228: internal-error: int gdbarch_elf_make_msymbol_special_p(gdbarch*): Assertion `gdbarch != NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) y

Version 7.12.1 has the same problem, but version 7.11 works, so this must be due to a change between those two versions.
Comment 1 Lassi Niemistö 2017-08-25 09:43:01 UTC
I get the same with:
* MPC8309E
* Debugged SW compiled with GCC 4.8.2 -ggdb3
* Remote debugging or core dump analysis
* Custom GDB 8.0 build with --enable-targets=powerpc-linux,powerpc-freebsd,powerpc-elf,powerpc-eabi

Attaching a sample binary and sample core file for easy recreation.
Comment 2 Lassi Niemistö 2017-08-25 09:44:41 UTC
Created attachment 10364 [details]
Core file created by running on PowerPC
Comment 3 Lassi Niemistö 2017-08-25 09:46:15 UTC
Created attachment 10365 [details]
Crashing demo program for PowerPC
Comment 4 Lassi Niemistö 2017-08-25 09:47:34 UTC
Created attachment 10366 [details]
Source code for the crashing sample program
Comment 5 Lassi Niemistö 2018-07-10 07:48:54 UTC
This would be very important to get fixed.. prevents usage on powerpc..
Comment 6 Lassi Niemistö 2018-08-02 09:09:42 UTC
Still happens on 8.1.1
Comment 7 Sergio Durigan Junior 2018-08-03 04:34:11 UTC
Out of curiosity, could you please try with git HEAD?
Comment 8 Lassi Niemistö 2018-08-03 06:34:21 UTC
Yes it happens. Building on Ubuntu 14.04 64bit if relevant.
Comment 9 Lassi Niemistö 2018-08-03 13:05:12 UTC
Decided to dig a bit deeper:

The problematic scenario starts from function add_vsyscall_page, which calls symbol_file_add_from_memory with this strange "filename" 'system-supplied DSO at %s'. This call chain ends up to gdbarch_elf_make_msymbol_special_p(gdbarch*) and hits the assert.

symbol_file_add_from_bfd
symbol_file_add_with_addrs
syms_from_objfile
reread_symbols
read_symbols
sym_read
elf_read_minimal_symbols
elf_symtab_read ST_REGULAR
get_objfile_arch returns NULL --> passed to gdbarch_elf_make_msymbol_special_p

Big question for a complete gdb dev noob like me is whether or not the gdbarch should be filled for this kind of strange DSO object? If it should, is the correct place get_objfile_bfd_data which eventually queries it from gdbarch_find_by_info?

The built-in debug prints of gdbarch_find_by_info are the following during the problematic scenario:
gdbarch_find_by_info: info.bfd_arch_info powerpc:vle
gdbarch_find_by_info: info.byte_order 0 (big)
gdbarch_find_by_info: info.osabi 5 (GNU/Linux)
gdbarch_find_by_info: info.abfd 0x3d5dbf0
gdbarch_find_by_info: info.tdep_info 0x0
gdbarch_find_by_info: Target rejected architecture

Whereas earlier when symbols are processed for my main binary, it has printed:
gdbarch_find_by_info: info.bfd_arch_info powerpc:common
gdbarch_find_by_info: info.byte_order 0 (big)
gdbarch_find_by_info: info.osabi 5 (GNU/Linux)
gdbarch_find_by_info: info.abfd 0x3cd0e90
gdbarch_find_by_info: info.tdep_info 0x0
gdbarch_find_by_info: New architecture 0x3dc7380 (powerpc:common) selected

So for some whatever reason the DSO symbol has different architecture info and this architecture is "not supported" by my GDB build? Even though I build it with --enable-targets=all.
Comment 10 Simon Marchi 2018-08-03 16:09:24 UTC
(In reply to Lassi Niemistö from comment #9)
> Decided to dig a bit deeper:
> 
> The problematic scenario starts from function add_vsyscall_page, which calls
> symbol_file_add_from_memory with this strange "filename" 'system-supplied
> DSO at %s'. This call chain ends up to
> gdbarch_elf_make_msymbol_special_p(gdbarch*) and hits the assert.
> 
> symbol_file_add_from_bfd
> symbol_file_add_with_addrs
> syms_from_objfile
> reread_symbols
> read_symbols
> sym_read
> elf_read_minimal_symbols
> elf_symtab_read ST_REGULAR
> get_objfile_arch returns NULL --> passed to
> gdbarch_elf_make_msymbol_special_p
> 
> Big question for a complete gdb dev noob like me is whether or not the
> gdbarch should be filled for this kind of strange DSO object? If it should,
> is the correct place get_objfile_bfd_data which eventually queries it from
> gdbarch_find_by_info?
> 
> The built-in debug prints of gdbarch_find_by_info are the following during
> the problematic scenario:
> gdbarch_find_by_info: info.bfd_arch_info powerpc:vle
> gdbarch_find_by_info: info.byte_order 0 (big)
> gdbarch_find_by_info: info.osabi 5 (GNU/Linux)
> gdbarch_find_by_info: info.abfd 0x3d5dbf0
> gdbarch_find_by_info: info.tdep_info 0x0
> gdbarch_find_by_info: Target rejected architecture
> 
> Whereas earlier when symbols are processed for my main binary, it has
> printed:
> gdbarch_find_by_info: info.bfd_arch_info powerpc:common
> gdbarch_find_by_info: info.byte_order 0 (big)
> gdbarch_find_by_info: info.osabi 5 (GNU/Linux)
> gdbarch_find_by_info: info.abfd 0x3cd0e90
> gdbarch_find_by_info: info.tdep_info 0x0
> gdbarch_find_by_info: New architecture 0x3dc7380 (powerpc:common) selected
>
> So for some whatever reason the DSO symbol has different architecture info
> and this architecture is "not supported" by my GDB build? Even though I
> build it with --enable-targets=all.

The first one (the rejected one) has "powerpc:vle", whereas the second one has "powerpc:common".  Why did you expect "powerpc:vle" to be chosen?  Instinctively, I would expect the same arch to be chosen for the vsyscall page than for the main objfile.  Can you see if the architecture "powerpc:common" has also been rejected for the vsyscall page?  If so, can you step in the decision process to see why?
Comment 11 Simon Marchi 2018-08-03 16:22:40 UTC
(In reply to Simon Marchi from comment #10)
> The first one (the rejected one) has "powerpc:vle", whereas the second one
> has "powerpc:common".  Why did you expect "powerpc:vle" to be chosen? 
> Instinctively, I would expect the same arch to be chosen for the vsyscall
> page than for the main objfile.  Can you see if the architecture
> "powerpc:common" has also been rejected for the vsyscall page?  If so, can
> you step in the decision process to see why?

Actually, forget about this, I did not understand the process right.  I was able to reproduce the crash using the core you provided.

The "vle" mach comes from the BFD library.  We open the BFD from memory, and the BFD library decides it's of the "powerpc" arch, "vle" microarch (the numerical value is 84).  Then, we try to look up a gdbarch corresponding to that BFD arch.  However, GDB knows nothing bfd_mach_ppc_vle.  The powerpc gdbarch init code looks in this table of variants:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/rs6000-tdep.c;h=e78de49b2e69808966fa77d0e1ba3b071dfe540e;hb=HEAD#l3029

but vle is not present there.  So either:

1. BFD is wrong about the micro architecture, it should not be vle
2. GDB should know about the vle microarchitecture
Comment 12 Pedro Alves 2018-08-06 11:28:00 UTC
Sounds related to bug 19797.
Comment 13 Lassi Niemistö 2018-08-07 04:53:53 UTC
Thanks for comments. Some more findings:
* The executable file load does not cause issues, it is the core file load part
* Also gdb 7.9.1 (last version working fine) involves loading this "system supplied DSO" as the last thing upon core file load but with it it ends up searching for powerpc:common and not powerpc:vle
Comment 14 Lassi Niemistö 2018-08-07 05:18:41 UTC
And I can confirm our architecture has nothing to do with vle, so it would mean it is the BFD lib who gets this wrong. Is the BFD library statically built into GDB as I can see at least some of its sources under binutils-gdb?
Comment 15 Lassi Niemistö 2018-08-07 07:57:28 UTC
The file bfd/elf32-ppc.c has been modified between the versions and there is now a new function 

/* When defaulting arch/mach, decode apuinfo to find a better match.  */
_bfd_elf_ppc_set_arch (bfd *abfd)

..which thinks to find PPC_APUINFO_VLE
Comment 16 Lassi Niemistö 2018-08-07 09:12:22 UTC
Adding Alan Modra to the CC list if we could get some comment on this.
Comment 17 Alan Modra 2018-08-08 00:23:43 UTC
The core file load1 segment contains an image of a kernel vdso that has a .PPC.EMB.apuinfo section of size 24.  That section contains

p/x contents[0]@24
$2 = {0x0, 0x0, 0x0, 0x8, 0x0, 0x0, 0x0, 0x4, 0x0, 0x0, 0x0, 0x2, 0x41, 0x50, 0x55, 0x69, 0x6e, 0x66, 0x6f, 0x0, 0x1, 0x4, 0x0, 0x1}

So there is a single apuinfo word, 0x01040001 saying PPC_APUINFO_VLE (high 16 bits) revision 1 (low 16 bits).

BFD is therefore correctly setting arch/mach to "powerpc:vle" for this object.

So the question becomes how did PPC_APUINFO_VLE become set?  I wonder how old a toolchain was used to build your kernel.  If gas had VLE support but lacked git commit fbd940576f 2014-08-22 the that might be the cause.  See https://sourceware.org/ml/binutils/2014-08/msg00217.html
Comment 18 Lassi Niemistö 2018-08-08 06:14:08 UTC
Thanks Alan for the analysis!

We are running binutils 2.24 plus some patches on top of it. Git log for this 2.24 tag tells it has at least the main commit of VLE support:
b9c361e0ad33f2c841067fd4bf0959a72ad5a265 Add support for PowerPC VLE.

And indeed the fixup commit fbd940576f seems absent (2.24 is dated 2013-12-02). The first version with the fix seems to be 2.25.

This should explain the results and we shall primarily mitigate the issue in our project by updating the toolchain. To achieve a one-gdb-for-all build that works with legacy branches, we might patch the gdb to skip parsing apuinfo for good as we know the architecture anyway in this case.
Comment 19 David Engster 2018-08-08 07:46:08 UTC
Yes, our toolchain is very old as well,  we're still using Sourcery G++ Lite 2011.03-38, which has binutils 2.20. Thank you Lassi for digging into this!