Bug 26876 - gdb error: internal-error: Unknown CFA rule when debugging the linux kernel with qemu
Summary: gdb error: internal-error: Unknown CFA rule when debugging the linux kernel w...
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: 10.1
: P2 normal
Target Milestone: 10.2
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-13 09:58 UTC by robert
Modified: 2020-12-03 20:49 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2020-11-14 00:00:00


Attachments
linux kernel config (29.13 KB, text/plain)
2020-11-13 09:58 UTC, robert
Details

Note You need to log in before you can comment on or make changes to this bug.
Description robert 2020-11-13 09:58:18 UTC
Created attachment 12952 [details]
linux kernel config

Hi all,

I'm trying to debug a linux kernel issue using gdb. I'm essentially following this tutorial: https://www.kernel.org/doc/html/v5.9/dev-tools/gdb-kernel-debugging.html

When I execute "lx-symbols", I get the error mentioned in the header.

I'm using GNU gdb (GDB) 10.1 on archlinux with gcc version 10.2.0 (GCC) and QEMU emulator version 5.1.0

I have attached the kernel config. The binary exceeds the upload limit. I can provide it upon request.
The kernel is based on commit: 9ff9b0d392ea08090cd1780fb196f36dbb586529

Here's the exact output of my gdb debugging session:

[I] ~/S/R/S/linux (master|…) $ gdb vmlinux                                     
GNU gdb (GDB) 10.1                                                             
Copyright (C) 2020 Free Software Foundation, Inc.                            
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.                             
There is NO WARRANTY, to the extent permitted by law.                                          
Type "show copying" and "show warranty" for details.           
This GDB was configured as "x86_64-pc-linux-gnu".                                              
Type "show configuration" for configuration details.                                           
For bug reporting instructions, please see:                                                    
<https://www.gnu.org/software/gdb/bugs/>.                       
Find the GDB manual and other documentation resources online at:                               
    <http://www.gnu.org/software/gdb/documentation/>.                                          
                                               
For help, type "help".                                                         
Type "apropos word" to search for commands related to "word"...                
Reading symbols from vmlinux...        
(gdb) tar rem :1234                            
Remote debugging using :1234                                                                   
0x000000000000fff0 in exception_stacks ()                                      
(gdb) c                                                                        
Continuing.                                                                    
^C                                                                                             
Program received signal SIGINT, Interrupt.                                                     
default_idle () at arch/x86/kernel/process.c:688                                               
688             safe_halt();                                                   
(gdb) lx-symbols                                                               
loading vmlinux                                                                                
scanning for modules in /home/robert/Sec-T/Research/SEV-dev-fuzz/linux
../../gdb/dwarf2/frame.c:1085: internal-error: Unknown CFA rule.
A problem internal to GDB has been detected,                                                   
further debugging may prove unreliable.
Quit this debugging session? (y or n) y                                        
                                                                               
This is a bug, please report it.  For instructions, see:                                       
<https://www.gnu.org/software/gdb/bugs/>.                                                      
                                                                                               
../../gdb/dwarf2/frame.c:1085: internal-error: Unknown CFA rule.                               
A problem internal to GDB has been detected,                                                   
further debugging may prove unreliable.        
Create a core file of GDB? (y or n) y                                                          
fish: “gdb vmlinux” terminated by signal SIGABRT (Abort)

I'm happy to provide additional details if needed.

Regards,

Robert
Comment 1 Simon Marchi 2020-11-13 21:35:34 UTC
I've built the kernel using the commit and the config you provided. Can you summarize who to run the resulting kernel in qemu?  Do I have to create a VM, install a distribution in it and install my built kernel in it?
Comment 2 Simon Marchi 2020-11-14 02:52:17 UTC
Well, to my surprise, I managed to reproduce!

1. I installed an ubuntu 20.04 guest in qemu
2. I built and installed a kernel in it
3. I debugged the qemu guest (add -s to the qemu command line, don't forget to pass nokaslr to the Linux kernel)

First, the backtrace:

#0  0x000055befa524260 in execute_cfa_program (fde=0x621000f84c90, insn_ptr=0x7fab8d86da86 <error: Cannot access memory at address 0x7fab8d86da86>, insn_end=0x7fab8d86da90 <error: Cannot access memory at address 0x7fab8d86da90>, gdbarch=0x621000be3d10, pc=0xffffffff81b3318e, fs=0x7ffe0a288d10, text_offset=0x0) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/frame.c:367
#1  0x000055befa52bf02 in dwarf2_frame_cache (this_frame=0x6210006cfde0, this_cache=0x6210006cfdf8) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/frame.c:1025
#2  0x000055befa52ea38 in dwarf2_frame_this_id (this_frame=0x6210006cfde0, this_cache=0x6210006cfdf8, this_id=0x6210006cfe40) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/frame.c:1226
#3  0x000055befa8dde95 in compute_frame_id (fi=0x6210006cfde0) at /home/smarchi/src/binutils-gdb/gdb/frame.c:588
#4  0x000055befa8de53e in get_frame_id (fi=0x6210006cfde0) at /home/smarchi/src/binutils-gdb/gdb/frame.c:636
#5  0x000055befa8ecf33 in get_prev_frame (this_frame=0x6210006cfde0) at /home/smarchi/src/binutils-gdb/gdb/frame.c:2504
#6  0x000055befb1ff582 in frame_info_to_frame_object (frame=0x6210006cfde0) at /home/smarchi/src/binutils-gdb/gdb/python/py-frame.c:364
#7  0x000055befb201016 in gdbpy_newest_frame (self=0x7fabbcb11a40, args=0x0) at /home/smarchi/src/binutils-gdb/gdb/python/py-frame.c:599
#8  0x00007fabc25f01aa in cfunction_vectorcall_NOARGS (func=0x7fabbca78d60, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/methodobject.c:459
#9  0x00007fabc2405d6d in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ../Include/cpython/abstract.h:127
#10 call_function (tstate=0x612000009940, pp_stack=0x7ffe0a289370, oparg=<optimized out>, kwnames=0x0) at ../Python/ceval.c:4963
#11 0x00007fabc240def6 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3469
#12 0x00007fabc241106b in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at ../Objects/call.c:283
#13 0x00007fabc2405d6d in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ../Include/cpython/abstract.h:127
#14 call_function (tstate=0x612000009940, pp_stack=0x7ffe0a289540, oparg=<optimized out>, kwnames=0x0) at ../Python/ceval.c:4963
#15 0x00007fabc240def6 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3469
#16 0x00007fabc241106b in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=2, globals=<optimized out>) at ../Objects/call.c:283
#17 0x00007fabc2405d6d in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ../Include/cpython/abstract.h:127
#18 call_function (tstate=0x612000009940, pp_stack=0x7ffe0a289710, oparg=<optimized out>, kwnames=0x0) at ../Python/ceval.c:4963
#19 0x00007fabc2407018 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3486
#20 0x00007fabc255bd3b in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=1, kwnames=0x0, kwargs=0x7fabbcb17d98, kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x7fabbca970a0, name=0x7fabbcbc9470, qualname=0x7faba48cfc90) at ../Python/ceval.c:4298
#21 0x00007fabc2638de4 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:435
#22 0x00007fabc2405d6d in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ../Include/cpython/abstract.h:127
#23 call_function (tstate=0x612000009940, pp_stack=0x7ffe0a2899c8, oparg=<optimized out>, kwnames=0x0) at ../Python/ceval.c:4963
#24 0x00007fabc240746d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3500
#25 0x00007fabc255bd3b in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=1, kwnames=0x0, kwargs=0x7faba4734c50, kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7faba4742a30, qualname=0x7faba4736210) at ../Python/ceval.c:4298
#26 0x00007fabc2638de4 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:435
#27 0x00007fabc2405d6d in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ../Include/cpython/abstract.h:127
#28 call_function (tstate=0x612000009940, pp_stack=0x7ffe0a289c70, oparg=<optimized out>, kwnames=0x0) at ../Python/ceval.c:4963
#29 0x00007fabc2407018 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3486
#30 0x00007fabc241106b in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=3, globals=<optimized out>) at ../Objects/call.c:283
#31 0x00007fabc2639da8 in _PyObject_Vectorcall (kwnames=0x0, nargsf=3, args=0x7ffe0a289d80, callable=0x7faba48d8430) at ../Include/cpython/abstract.h:127
#32 _PyObject_FastCall (nargs=3, args=0x7ffe0a289d80, func=0x7faba48d8430) at ../Include/cpython/abstract.h:147
#33 object_vacall (base=base@entry=0x7faba48da080, callable=0x7faba48d8430, vargs=vargs@entry=0x7ffe0a289e10) at ../Objects/call.c:1186
#34 0x00007fabc263a14c in PyObject_CallMethodObjArgs (obj=0x7faba48da080, name=<optimized out>) at ../Objects/call.c:1214
#35 0x000055befb1f0966 in cmdpy_function (command=0x6110001d2d00, args=0x55befd892b20 "", from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/python/py-cmd.c:141
#36 0x000055befa256cb4 in cmd_func (cmd=0x6110001d2d00, args=0x0, from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:2181
#37 0x000055befb9c2ace in execute_command (p=0x60200005109b "", from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/top.c:668
#38 0x000055befa858fca in command_handler (command=0x602000051090 "lx-symbols ") at /home/smarchi/src/binutils-gdb/gdb/event-top.c:589
#39 0x000055befa859cc5 in command_line_handler (rl=...) at /home/smarchi/src/binutils-gdb/gdb/event-top.c:774
#40 0x000055befa856f93 in gdb_rl_callback_handler (rl=0x602000051010 "lx-symbols ") at /home/smarchi/src/binutils-gdb/gdb/event-top.c:219
#41 0x000055befbda9a4d in rl_callback_read_char () at /home/smarchi/src/binutils-gdb/readline/readline/callback.c:281
#42 0x000055befa8569d4 in gdb_rl_callback_read_char_wrapper_noexcept () at /home/smarchi/src/binutils-gdb/gdb/event-top.c:177
#43 0x000055befa856c02 in gdb_rl_callback_read_char_wrapper (client_data=0x60f000000310) at /home/smarchi/src/binutils-gdb/gdb/event-top.c:194
#44 0x000055befa85856c in stdin_event_handler (error=0, client_data=0x60f000000310) at /home/smarchi/src/binutils-gdb/gdb/event-top.c:516
#45 0x000055befcec2717 in handle_file_event (file_ptr=0x606000069260, ready_mask=1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:575
#46 0x000055befcec2f58 in gdb_wait_for_event (block=1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:701
#47 0x000055befcec0cf6 in gdb_do_one_event () at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:237
#48 0x000055befadfda4e in start_event_loop () at /home/smarchi/src/binutils-gdb/gdb/main.c:347
#49 0x000055befadfde7d in captured_command_loop () at /home/smarchi/src/binutils-gdb/gdb/main.c:407
#50 0x000055befae02861 in captured_main (data=0x7ffe0a28ab80) at /home/smarchi/src/binutils-gdb/gdb/main.c:1234
#51 0x000055befae02944 in gdb_main (args=0x7ffe0a28ab80) at /home/smarchi/src/binutils-gdb/gdb/main.c:1249
#52 0x000055bef9c3e442 in main (argc=2, argv=0x7ffe0a28acf8) at /home/smarchi/src/binutils-gdb/gdb/gdb.c:32

So we are executing the CIE of this FDE:

(top-gdb) p *fde
$4 = {
  cie = 0x621000f84bb0,
  initial_location = 0xffffffff81b33180,
  address_range = 0xf,
  instructions = 0x7fab8d86db08 <error: Cannot access memory at address 0x7fab8d86db08>,
  end = 0x7fab8d86db08 <error: Cannot access memory at address 0x7fab8d86db08>,
  eh_frame_p = 0 '\000'
}

It's really strange that top-gdb can't access the memory of the program we are supposed to be executing:

(top-gdb) p insn_ptr
$9 = (const gdb_byte *) 0x7fab8d86da86 <error: Cannot access memory at address 0x7fab8d86da86>
(top-gdb) p fde.cie.initial_instructions 
$10 = (const gdb_byte *) 0x7fab8d86da85 <error: Cannot access memory at address 0x7fab8d86da85>
Comment 3 Simon Marchi 2020-11-14 03:29:51 UTC
It works on GDB 9.2, so I bisected it.  The first failing commit is 3d4560f707b077adfb54759df5efbd96301ca2d8 ("Move the frame data to the BFD when possible").  It make sense, because there appear to be some lifetime problem or something like that related to the frame data.
Comment 4 robert 2020-11-14 09:06:42 UTC
Hi, 
thanks for looking into this. I forgot to include my qemu setup, sorry for that. There is no need to use any kind of guest OS to trigger the bug:

qemu-system-x86_64 -enable-kvm -m 4096 -smp 1 -kernel linux/arch/x86/boot/bzImage -append "console=ttyS0 nokaslr" -S -s -nographic -monitor tcp::4444,server,nowait -serial stdio

I let the kernel boot (it will panic because there is no /init to execute) and then simply stop it via "CTRL-C": 

GNU gdb (GDB) 10.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/g>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) file vmlinux
Reading symbols from vmlinux...
(gdb) tar rem :1234
Remote debugging using :1234
0x000000000000fff0 in exception_stacks ()
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0xffffffff814222e0 in rdtsc_ordered ()
    at ./arch/x86/include/asm/msr.h:234
234             asm volatile(ALTERNATIVE_2("rdtsc",
(gdb) lx-symbols 
loading vmlinux
../../gdb/dwarf2/frame.c:1085: internal-error: Unknown CFA rule.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) y
This is a bug, please report it.  For instructions, see:
<https://www.gnu.org/software/gdb/bugs/>.

../../gdb/dwarf2/frame.c:1085: internal-error: Unknown CFA rule.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) y
fish: “gdb” terminated by signal SIGABRT (Abort)

Regards,
Robert
Comment 5 Simon Marchi 2020-11-14 17:23:34 UTC
Strange, I can't reproduce using that technique.  It just says "loading vmlinux" and  goes back to the prompt.

When I use the guest OK, it either finds some modules:

loading vmlinux                                                                                                                                                                                                                        
scanning for modules in /home/smarchi/src/linux                                                                                                                                                                                        
loading @0xffffffffc0119000: /home/smarchi/src/linux/arch/x86/kvm/kvm-amd.ko                                                                                                                                                           
loading @0xffffffffc0103000: /home/smarchi/src/linux/drivers/crypto/ccp/ccp.ko                                                                                                                                                         
loading @0xffffffffc00fe000: /home/smarchi/src/linux/crypto/sha1_generic.ko                                                                                                                                                            
loading @0xffffffffc004d000: /home/smarchi/src/linux/arch/x86/kvm/kvm.ko                                                                                                                                                               
loading @0xffffffffc0041000: /home/smarchi/src/linux/virt/lib/irqbypass.ko                                                                                                                                                             
loading @0xffffffffc003a000: /home/smarchi/src/linux/drivers/virtio/virtio_balloon.ko                                                                                                                                                  
loading @0xffffffffc002d000: /home/smarchi/src/linux/drivers/char/virtio_console.ko                                                                                                                                                    
loading @0xffffffffc0027000: /home/smarchi/src/linux/drivers/block/virtio_blk.ko                                                                                                                                                       
loading @0xffffffffc0019000: /home/smarchi/src/linux/drivers/net/virtio_net.ko                                                                                                                                                         
loading @0xffffffffc0014000: /home/smarchi/src/linux/drivers/net/net_failover.ko                                                                                                                                                       
loading @0xffffffffc0048000: /home/smarchi/src/linux/net/core/failover.ko                                                                                                                                                              
loading @0xffffffffc000c000: /home/smarchi/src/linux/drivers/virtio/virtio_pci.ko                                                                                                                                                      
loading @0xffffffffc0005000: /home/smarchi/src/linux/drivers/virtio/virtio_ring.ko                                                                                                                                                     
loading @0xffffffffc0000000: /home/smarchi/src/linux/drivers/virtio/virtio.ko    

or crashes.
Comment 6 Tom Tromey 2020-11-14 22:46:58 UTC
It looks like find_comp_unit and set_comp_unit have their tests reversed:

  bfd *abfd = objfile->obfd;
  if (gdb_bfd_requires_relocations (abfd))
    return dwarf2_frame_bfd_data.set (abfd, unit);
  return dwarf2_frame_objfile_data.set (objfile, unit);

Seems like it should use the per-BFD when relocations are *not* required.
Comment 7 Simon Marchi 2020-11-14 23:38:12 UTC
(In reply to Tom Tromey from comment #6)
> It looks like find_comp_unit and set_comp_unit have their tests reversed:
> 
>   bfd *abfd = objfile->obfd;
>   if (gdb_bfd_requires_relocations (abfd))
>     return dwarf2_frame_bfd_data.set (abfd, unit);
>   return dwarf2_frame_objfile_data.set (objfile, unit);
> 
> Seems like it should use the per-BFD when relocations are *not* required.

Oh, interesting!

When I swap that, I no longer get an ASan crash, but I get:

loading vmlinux
scanning for modules in /home/smarchi/build/linux
loading @0xffffffffc011a000: /home/smarchi/build/linux/arch/x86/kvm/kvm-amd.ko
Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0xffff868177dcbff0: 
Error occurred in Python: Cannot access memory at address 0xffff868177dcbff0

I should be able to fix set_comp_unit/find_comp_unit in the commit that introduced those changes and bisect again to find when GDB starts giving this MemoryError...
Comment 8 Simon Marchi 2020-11-15 02:48:31 UTC
I bisected the second problem, it points to this commit:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=7c6f271296319576fa00587928e5ff52ced9c1bb

GDB trips on the array with no size in the structure.  I filed a new bug here:

https://sourceware.org/bugzilla/show_bug.cgi?id=26901
Comment 9 Simon Marchi 2020-12-03 19:09:33 UTC
I sent a patch for the set_comp_unit/find_comp problem:

https://sourceware.org/pipermail/gdb-patches/2020-December/173725.html

I also sent a patch for 26901, so once the two are merged, it should all be working.
Comment 10 cvs-commit@gcc.gnu.org 2020-12-03 20:48:29 UTC
The master branch has been updated by Simon Marchi <simark@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=0bc2e38dd71e24676e180b6b37e0a2cd1186994d

commit 0bc2e38dd71e24676e180b6b37e0a2cd1186994d
Author: Simon Marchi <simon.marchi@polymtl.ca>
Date:   Thu Dec 3 15:47:56 2020 -0500

    gdb: fix logic of find_comp_unit and set_comp_unit
    
    The logic in find_comp_unit and set_comp_unit is reversed.  When the BFD
    requires relocation, we want to put the comp_unit structure in the
    map where the comp_unit objects are not shared, that is the one indexed
    by objfile.  If the BFD does not require relocation, then, we can share
    a single comp_unit structure for all users of that BFD, so we want to
    put it in the BFD-indexed map.  The comments on top of
    dwarf2_frame_bfd_data and dwarf2_frame_objfile_data make that clear.
    
    Fix it by swapping the two in find_comp_unit and set_comp_unit.
    
    I don't have a test for this, because I don't see how to write one in a
    reasonable amount of time.
    
    gdb/ChangeLog:
    
            PR gdb/26876
            * dwarf2/frame.c (find_comp_unit, set_comp_unit): Reverse use of
            dwarf2_frame_bfd_data and dwarf2_frame_objfile_data.
    
    Change-Id: I80c1ee7ad8425fa4947de65b170973d05f5a52ec
Comment 11 Simon Marchi 2020-12-03 20:49:00 UTC
Fixed.