31878 – Inferior kernel function calls on x86_64 cause kernel panics

Bug 31878 - Inferior kernel function calls on x86_64 cause kernel panics

Summary: Inferior kernel function calls on x86_64 cause kernel panics

Status:	UNCONFIRMED

Alias:	None

Product:	gdb
Classification:	Unclassified
Component:	gdb (show other bugs)
Version:	HEAD

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2024-06-11 12:00 UTC by Ilya Leoshkevich
Modified:	2024-06-12 12:53 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ilya Leoshkevich 2024-06-11 12:00:11 UTC

I'm debugging Linux Kernel on x86_64 and trying to perform an inferior function call. This causes a kernel panic:

(gdb) call kmsan_get_metadata($rip, 0)
BUG: unable to handle page fault for address: ffffffff85403d4f
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0011) - permissions violation
[...]
RSP: 0018:ffffffff85403d40 EFLAGS: 00010282

The reason is that x86_64 uses call_dummy_location=ON_STACK, unlike, say, s390x, where inferior kernel function calls work fine. In userspace, jumping to non-executable stack causes a SEGV, which gdb happily handles, which is not the case for kernel.

A naive solution to switch to AT_ENTRY_POINT (which is less versatile than ON_STACK) doesn't immediately help, since vmlinux entry point is specified as a linear address:

start address 0x0000000001000000
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         03600000  ffffffff81000000  0000000001000000  00200000  2**12
                  CONTENTS, ALLOC, LOAD, READONLY, CODE

but inferior function calls happen when paging is on, so entry point needs to be translated.

With these two things changed, inferior kernel function calls begin to work:
(gdb) call kmsan_get_metadata($rip, 0)
$1 = (void *) 0xffff88822d70b813

Here is the diff (not a real fix - I'm not sure how to solve this properly, this only highlights where the problem may be):

diff --git a/gdb/i386-tdep.c b/gdb/i386-tdep.c
index f1f909e1616..4866fdb4ded 100644
--- a/gdb/i386-tdep.c
+++ b/gdb/i386-tdep.c
@@ -8586,7 +8586,7 @@ i386_gdbarch_init (struct gdbarch_info info, struct gdbarch_list *arches)
   set_gdbarch_get_longjmp_target (gdbarch, i386_get_longjmp_target);
 
   /* Call dummy code.  */
-  set_gdbarch_call_dummy_location (gdbarch, ON_STACK);
+  set_gdbarch_call_dummy_location (gdbarch, AT_ENTRY_POINT);
   set_gdbarch_push_dummy_code (gdbarch, i386_push_dummy_code);
   set_gdbarch_push_dummy_call (gdbarch, i386_push_dummy_call);
   set_gdbarch_frame_align (gdbarch, i386_frame_align);
diff --git a/gdb/infcall.c b/gdb/infcall.c
index edac9a74179..cc0401db328 100644
--- a/gdb/infcall.c
+++ b/gdb/infcall.c
@@ -1271,7 +1271,7 @@ call_function_by_hand_dummy (struct value *function,
        CORE_ADDR dummy_addr;
 
        real_pc = funaddr;
-       dummy_addr = entry_point_address ();
+       dummy_addr = 0xffffffff81000000;
 
        /* A call dummy always consists of just a single breakpoint, so
           its address is the same as the address of the dummy.

Comment 1 Tom Tromey 2024-06-11 14:50:27 UTC

Thanks for the report.

I'm also not sure how to fix this one.
Maybe a knob to let the user / some python code
affect the call dummy location?  That seems very
obscure though.

Comment 2 Ilya Leoshkevich 2024-06-12 12:53:56 UTC

I think such a knob would be good enough. It should be possible to use it in vmlinux-gdb.py, so that it's transparent to most end users.

I had some alternative random ideas, but I think they are worse:

- If a target provides CR0 register and CR0.PG is set, then translate the entry point address from linear to virtual using program headers.
- Introduce a "SEGV" interface, similar to the JIT interface, that the kernel will call before panicking. GDB can then handle panic as if it were a SEGV.

The problem with both is that they will require some amount of Intel-specific code, whereas the knob you propose would solve the problem on all the other affected architectures too.