Fedora rawhide recently switched to the linux-6.5 kernels. Systemtap from the git repo works with the earlier linux-6.4, but when attempting to run any systemtap instrumentation on linux-6.5 the build fails. This can be seen with "make installcheck" the smoke test will fail with the following messages. In file included from /home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/runtime.h:271, from /home/wcohen/systemtap_write/install/share/systemtap/runtime/runtime.h:26, from /tmp/stapndYBKY/stap_34ef71e6e2ed0e338834720b8dff538e_1736_src.c:21: /home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/access_process_vm.h: In function '__access_process_vm_': /home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/access_process_vm.h:57:36: error: passing argument 1 of 'get_user_pages_remote' from incompatible pointer type [-Werror=incompatible-pointer-types] 57 | ret = get_user_pages_remote (tsk, mm, addr, 1, write, 1, &page, &vma); | ^~~ | | | struct task_struct * In file included from ./include/linux/kallsyms.h:13, from ./include/linux/ftrace.h:13, from ./include/linux/kprobes.h:28, from /home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/runtime.h:21: ./include/linux/mm.h:2377:46: note: expected 'struct mm_struct *' but argument is of type 'struct task_struct *' 2377 | long get_user_pages_remote(struct mm_struct *mm, | ~~~~~~~~~~~~~~~~~~^~ /home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/access_process_vm.h:57:41: error: passing argument 2 of 'get_user_pages_remote' makes integer from pointer without a cast [-Werror=int-conversion] 57 | ret = get_user_pages_remote (tsk, mm, addr, 1, write, 1, &page, &vma); | ^~ | | | struct mm_struct * ./include/linux/mm.h:2378:42: note: expected 'long unsigned int' but argument is of type 'struct mm_struct *' 2378 | unsigned long start, unsigned long nr_pages, | ~~~~~~~~~~~~~~^~~~~ /home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/access_process_vm.h:57:54: error: passing argument 5 of 'get_user_pages_remote' makes pointer from integer without a cast [-Werror=int-conversion] 57 | ret = get_user_pages_remote (tsk, mm, addr, 1, write, 1, &page, &vma); | ^~~~~ | | | int ./include/linux/mm.h:2379:66: note: expected 'struct page **' but argument is of type 'int' 2379 | unsigned int gup_flags, struct page **pages, | ~~~~~~~~~~~~~~^~~~~ /home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/access_process_vm.h:57:61: error: passing argument 6 of 'get_user_pages_remote' makes pointer from integer without a cast [-Werror=int-conversion] 57 | ret = get_user_pages_remote (tsk, mm, addr, 1, write, 1, &page, &vma); | ^ | | | int ./include/linux/mm.h:2380:33: note: expected 'int *' but argument is of type 'int' 2380 | int *locked); | ~~~~~^~~~~~ /home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/access_process_vm.h:57:13: error: too many arguments to function 'get_user_pages_remote' 57 | ret = get_user_pages_remote (tsk, mm, addr, 1, write, 1, &page, &vma); | ^~~~~~~~~~~~~~~~~~~~~ ./include/linux/mm.h:2377:6: note: declared here 2377 | long get_user_pages_remote(struct mm_struct *mm, | ^~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make[4]: *** [scripts/Makefile.build:252: /tmp/stapndYBKY/stap_34ef71e6e2ed0e338834720b8dff538e_1736_src.o] Error 1 make[3]: *** [Makefile:2050: /tmp/stapndYBKY] Error 2 WARNING: kbuild exited with status: 2 Pass 4: compilation failed. [man error::pass4] child process exited abnormally
Found that the specific upstream linux kernel git commit causing the issue is: commit ca5e863233e8f6acd1792fd85d6bc2729a1b2c10 Author: Lorenzo Stoakes <lstoakes@gmail.com> Date: Wed May 17 20:25:39 2023 +0100 mm/gup: remove vmas parameter from get_user_pages_remote() The only instances of get_user_pages_remote() invocations which used the vmas parameter were for a single page which can instead simply look up the VMA directly. In particular:- - __update_ref_ctr() looked up the VMA but did nothing with it so we simply remove it. - __access_remote_vm() was already using vma_lookup() when the original lookup failed so by doing the lookup directly this also de-duplicates the code. We are able to perform these VMA operations as we already hold the mmap_lock in order to be able to call get_user_pages_remote(). As part of this work we add get_user_page_vma_remote() which abstracts the VMA lookup, error handling and decrementing the page reference count should the VMA lookup fail. This forms part of a broader set of patches intended to eliminate the vmas parameter altogether. [akpm@linux-foundation.org: avoid passing NULL to PTR_ERR] Link: https://lkml.kernel.org/r/d20128c849ecdbf4dd01cc828fcec32127ed939a.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> (for arm64) Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> (for s390) Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Christian König <christian.koenig@amd.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> The vma argument is dropped from get_user_pages_remote() and a get_user_page_vma_remote() should be used in situations where vma being passed into get_user_pages_remote() is non-NULL. The plan of action to fix this is to check for get_user_page_vma_remote() as a autoconf test and then use it in the systemtap runtime if it is available.
Created attachment 14958 [details] Patch to use new get_user_page_vma_remote function in linux 6.5 This patch allows things to compile on rawhide linux 6.5 kernel. It also allows things to still build on RHEL 8 and 9. However, need to test that the functionality for userspace access via __access_process_vm_ works.
Following test provided reasonable results on Fedora rawhide and RHEL8: sudo ../install/bin/stap --example para-callgraph.stp 'process("/usr/bin/ls").function("*")' -c /usr/bin/ls The following also produced reasonable results on Fedora rawhide and RHEL8: sudo ../install/bin/stap --example glibc-malloc.stp -c 'stap --dump-functions' The patch is in the master branch as: commit e891a37e366362a12ca311439918d69ffe641cec (HEAD -> master, origin/master, origin/HEAD) Author: William Cohen <wcohen@redhat.com> Date: Sun Jul 9 16:46:20 2023 -0400 Adjust runtime _access_process_vm_ to work with linux 6.5 Linux kernel commit ca5e863233e8f6acd1792fd85d6bc2729a1b2c10 eliminated the vma argument for ‘get_user_pages_remote. For linux 6.5 kernel use the get_user_page_vma_remote function in its place like the __access_remote_vm function in mm/memory.c of the kernel.