I'm debugging Linux Kernel on x86_64 and trying to perform an inferior function call. This causes a kernel panic: (gdb) call kmsan_get_metadata($rip, 0) BUG: unable to handle page fault for address: ffffffff85403d4f #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0011) - permissions violation [...] RSP: 0018:ffffffff85403d40 EFLAGS: 00010282 The reason is that x86_64 uses call_dummy_location=ON_STACK, unlike, say, s390x, where inferior kernel function calls work fine. In userspace, jumping to non-executable stack causes a SEGV, which gdb happily handles, which is not the case for kernel. A naive solution to switch to AT_ENTRY_POINT (which is less versatile than ON_STACK) doesn't immediately help, since vmlinux entry point is specified as a linear address: start address 0x0000000001000000 Sections: Idx Name Size VMA LMA File off Algn 0 .text 03600000 ffffffff81000000 0000000001000000 00200000 2**12 CONTENTS, ALLOC, LOAD, READONLY, CODE but inferior function calls happen when paging is on, so entry point needs to be translated. With these two things changed, inferior kernel function calls begin to work: (gdb) call kmsan_get_metadata($rip, 0) $1 = (void *) 0xffff88822d70b813 Here is the diff (not a real fix - I'm not sure how to solve this properly, this only highlights where the problem may be): diff --git a/gdb/i386-tdep.c b/gdb/i386-tdep.c index f1f909e1616..4866fdb4ded 100644 --- a/gdb/i386-tdep.c +++ b/gdb/i386-tdep.c @@ -8586,7 +8586,7 @@ i386_gdbarch_init (struct gdbarch_info info, struct gdbarch_list *arches) set_gdbarch_get_longjmp_target (gdbarch, i386_get_longjmp_target); /* Call dummy code. */ - set_gdbarch_call_dummy_location (gdbarch, ON_STACK); + set_gdbarch_call_dummy_location (gdbarch, AT_ENTRY_POINT); set_gdbarch_push_dummy_code (gdbarch, i386_push_dummy_code); set_gdbarch_push_dummy_call (gdbarch, i386_push_dummy_call); set_gdbarch_frame_align (gdbarch, i386_frame_align); diff --git a/gdb/infcall.c b/gdb/infcall.c index edac9a74179..cc0401db328 100644 --- a/gdb/infcall.c +++ b/gdb/infcall.c @@ -1271,7 +1271,7 @@ call_function_by_hand_dummy (struct value *function, CORE_ADDR dummy_addr; real_pc = funaddr; - dummy_addr = entry_point_address (); + dummy_addr = 0xffffffff81000000; /* A call dummy always consists of just a single breakpoint, so its address is the same as the address of the dummy.
Thanks for the report. I'm also not sure how to fix this one. Maybe a knob to let the user / some python code affect the call dummy location? That seems very obscure though.
I think such a knob would be good enough. It should be possible to use it in vmlinux-gdb.py, so that it's transparent to most end users. I had some alternative random ideas, but I think they are worse: - If a target provides CR0 register and CR0.PG is set, then translate the entry point address from linear to virtual using program headers. - Introduce a "SEGV" interface, similar to the JIT interface, that the kernel will call before panicking. GDB can then handle panic as if it were a SEGV. The problem with both is that they will require some amount of Intel-specific code, whereas the knob you propose would solve the problem on all the other affected architectures too.