Bug 23160 - 4.17 breaks syscalls tapset
Summary: 4.17 breaks syscalls tapset
Alias: None
Product: systemtap
Classification: Unclassified
Component: tapsets (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Jafeer Uddin
Depends on: 23391
  Show dependency treegraph
Reported: 2018-05-10 18:00 UTC by Frank Ch. Eigler
Modified: 2018-10-11 13:27 UTC (History)
4 users (show)

See Also:
Last reconfirmed:

adaptation example for syscall.read (631 bytes, patch)
2018-06-15 21:23 UTC, Frank Ch. Eigler
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Ch. Eigler 2018-05-10 18:00:39 UTC
Kindly reported by jmoyer@rh, 4.17-rc0 as in rawhide changes the syscall wrapper functions in ways that our tapset cannot currently adapt to.  https://lwn.net/Articles/752422/  This kills all the syscall.* probes.

One possible approach to fix is to switch our kprobes over to a new set of per-arch wrapper functions that carry parameters inside a pt_regs* pointer.  Coincidentally, that is the same way that the sys_enter/sys_exit tracepoints carry individual parameters.  (The __do_sys* family of functions do carry normal dwarf parameters, but cannot probe their .return because they're declared inline.)

% uname -a
Linux vm-rawhide-64 4.17.0-0.rc3.git4.1.fc29.x86_64 #1 SMP Fri May 4 19:41:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

% stap -L 'kernel.function("__*_accept").*'     
kernel.function("__do_sys_accept@net/socket.c:1629").callee("__sys_accept4@net/socket.c:1542") $fd:int $upeer_sockaddr:struct sockaddr* $upeer_addrlen:int* $flags:int $err:int $fput_needed:int $address:struct __kernel_sockaddr_storage
kernel.function("__do_sys_accept@net/socket.c:1629").inline $upeer_addrlen:int* $upeer_sockaddr:struct sockaddr* $fd:int
kernel.function("__ia32_sys_accept@net/socket.c:1629").call $regs:struct pt_regs const*
kernel.function("__ia32_sys_accept@net/socket.c:1629").callee("__se_sys_accept@net/socket.c:1629") $upeer_addrlen:long int $upeer_sockaddr:long int $fd:long int
kernel.function("__ia32_sys_accept@net/socket.c:1629").exported $regs:struct pt_regs const*
kernel.function("__ia32_sys_accept@net/socket.c:1629").return $return:long int $regs:struct pt_regs const*
kernel.function("__se_sys_accept@net/socket.c:1629").inline $upeer_addrlen:long int $upeer_sockaddr:long int $fd:long int
kernel.function("__x64_sys_accept@net/socket.c:1629").call $regs:struct pt_regs const*
kernel.function("__x64_sys_accept@net/socket.c:1629").callee("__se_sys_accept@net/socket.c:1629") $upeer_addrlen:long int $upeer_sockaddr:long int $fd:long int
kernel.function("__x64_sys_accept@net/socket.c:1629").exported $regs:struct pt_regs const*
kernel.function("__x64_sys_accept@net/socket.c:1629").return $return:long int $regs:struct pt_regs const*
Comment 1 Frank Ch. Eigler 2018-05-11 21:27:25 UTC
As a curiosity, I have a little prototype hacky solution to this problem.

It involves:
- hooking into the __$ARCH_sys_$SYSCALL function (__x64_sys_read etc.)
- grabbing its $regs (pt_regs*) parameter
- reusing the nd_$SYSCALL probe alias parameter handling (int_arg(2) etc.)

... but how?  Add this to some common .stp file:

function set_user_mode(r) %{
    c->uregs = (void*)STAP_ARG_r;
    c->user_mode_p = 1;

... and a variant of this to every sysc_*.stp file:

probe __nd_syscall.read = kernel.function("__x64_sys_read")

Then the preexisting nd_syscall.read alias works unmodified:

probe nd_syscall.read =
        fd = int_arg(1)
        buf_uaddr = pointer_arg(2)

i.e., the set_user_mode function tricks probes built upon this alias
into thinking that the pt_regs* given to the new syscall wrapper is the
new proper register set for later registers.stp function calls to read

(Season to taste; adjust kernel.function -> kprobe.function() and int_arg(2)
to fetch $regs probably.)

One big downside: no access to individual parameters as context variables.
I guess we missed that with nd_syscall probes already.  But that means that
it's not possible to modify the parameters in the stap probe before they
get relayed to the real __do_sys_FUNCTION.
Comment 2 Frank Ch. Eigler 2018-06-15 21:23:08 UTC
Created attachment 11073 [details]
adaptation example for syscall.read

This patch is an example of the adaptation of syscall.read to both 4.17 and the tracepoint (bug #14690) as fallbacks.  The gist is to factor out the registers.stp-based argument fetching code into a separate macro, and to introduce a new embedded-c function to set the context uregs to the pt_regs* struct we get with either the 4.17 style syscall wrapper or the tracepoints.

... now to repeat some 330 times ... eww
Comment 3 Jafeer Uddin 2018-10-11 13:27:19 UTC
All syscalls have been adapted for kernel 4.17+ in commit 8b8c9b636389b67a2288e31eb1f9b14a3992bc18