Bug 30617 - Systemtap unable to successfully build kernel modules for linux-6.5
Summary: Systemtap unable to successfully build kernel modules for linux-6.5
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: runtime (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-05 14:21 UTC by William Cohen
Modified: 2023-07-10 15:08 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Patch to use new get_user_page_vma_remote function in linux 6.5 (1.70 KB, patch)
2023-07-10 14:16 UTC, William Cohen
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description William Cohen 2023-07-05 14:21:35 UTC
Fedora rawhide recently switched to the linux-6.5 kernels.  Systemtap from the git repo works with the earlier linux-6.4, but when attempting to run any systemtap instrumentation on linux-6.5 the build fails.  This can be seen with "make installcheck" the smoke test will fail with the following messages.  


In file included from /home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/runtime.h:271,
                 from /home/wcohen/systemtap_write/install/share/systemtap/runtime/runtime.h:26,
                 from /tmp/stapndYBKY/stap_34ef71e6e2ed0e338834720b8dff538e_1736_src.c:21:
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/access_process_vm.h: In function '__access_process_vm_':
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/access_process_vm.h:57:36: error: passing argument 1 of 'get_user_pages_remote' from incompatible pointer type [-Werror=incompatible-pointer-types]
   57 |       ret = get_user_pages_remote (tsk, mm, addr, 1, write, 1, &page, &vma);
      |                                    ^~~
      |                                    |
      |                                    struct task_struct *
In file included from ./include/linux/kallsyms.h:13,
                 from ./include/linux/ftrace.h:13,
                 from ./include/linux/kprobes.h:28,
                 from /home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/runtime.h:21:
./include/linux/mm.h:2377:46: note: expected 'struct mm_struct *' but argument is of type 'struct task_struct *'
 2377 | long get_user_pages_remote(struct mm_struct *mm,
      |                            ~~~~~~~~~~~~~~~~~~^~
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/access_process_vm.h:57:41: error: passing argument 2 of 'get_user_pages_remote' makes integer from pointer without a cast [-Werror=int-conversion]
   57 |       ret = get_user_pages_remote (tsk, mm, addr, 1, write, 1, &page, &vma);
      |                                         ^~
      |                                         |
      |                                         struct mm_struct *
./include/linux/mm.h:2378:42: note: expected 'long unsigned int' but argument is of type 'struct mm_struct *'
 2378 |                            unsigned long start, unsigned long nr_pages,
      |                            ~~~~~~~~~~~~~~^~~~~
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/access_process_vm.h:57:54: error: passing argument 5 of 'get_user_pages_remote' makes pointer from integer without a cast [-Werror=int-conversion]
   57 |       ret = get_user_pages_remote (tsk, mm, addr, 1, write, 1, &page, &vma);
      |                                                      ^~~~~
      |                                                      |
      |                                                      int
./include/linux/mm.h:2379:66: note: expected 'struct page **' but argument is of type 'int'
 2379 |                            unsigned int gup_flags, struct page **pages,
      |                                                    ~~~~~~~~~~~~~~^~~~~
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/access_process_vm.h:57:61: error: passing argument 6 of 'get_user_pages_remote' makes pointer from integer without a cast [-Werror=int-conversion]
   57 |       ret = get_user_pages_remote (tsk, mm, addr, 1, write, 1, &page, &vma);
      |                                                             ^
      |                                                             |
      |                                                             int
./include/linux/mm.h:2380:33: note: expected 'int *' but argument is of type 'int'
 2380 |                            int *locked);
      |                            ~~~~~^~~~~~
/home/wcohen/systemtap_write/install/share/systemtap/runtime/linux/access_process_vm.h:57:13: error: too many arguments to function 'get_user_pages_remote'
   57 |       ret = get_user_pages_remote (tsk, mm, addr, 1, write, 1, &page, &vma);
      |             ^~~~~~~~~~~~~~~~~~~~~
./include/linux/mm.h:2377:6: note: declared here
 2377 | long get_user_pages_remote(struct mm_struct *mm,
      |      ^~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[4]: *** [scripts/Makefile.build:252: /tmp/stapndYBKY/stap_34ef71e6e2ed0e338834720b8dff538e_1736_src.o] Error 1
make[3]: *** [Makefile:2050: /tmp/stapndYBKY] Error 2
WARNING: kbuild exited with status: 2
Pass 4: compilation failed.  [man error::pass4]
child process exited abnormally
Comment 1 William Cohen 2023-07-06 16:05:43 UTC
Found that the specific upstream linux kernel git commit causing the issue is:

commit ca5e863233e8f6acd1792fd85d6bc2729a1b2c10
Author: Lorenzo Stoakes <lstoakes@gmail.com>
Date:   Wed May 17 20:25:39 2023 +0100

    mm/gup: remove vmas parameter from get_user_pages_remote()
    
    The only instances of get_user_pages_remote() invocations which used the
    vmas parameter were for a single page which can instead simply look up the
    VMA directly. In particular:-
    
    - __update_ref_ctr() looked up the VMA but did nothing with it so we simply
      remove it.
    
    - __access_remote_vm() was already using vma_lookup() when the original
      lookup failed so by doing the lookup directly this also de-duplicates the
      code.
    
    We are able to perform these VMA operations as we already hold the
    mmap_lock in order to be able to call get_user_pages_remote().
    
    As part of this work we add get_user_page_vma_remote() which abstracts the
    VMA lookup, error handling and decrementing the page reference count should
    the VMA lookup fail.
    
    This forms part of a broader set of patches intended to eliminate the vmas
    parameter altogether.
    
    [akpm@linux-foundation.org: avoid passing NULL to PTR_ERR]
    Link: https://lkml.kernel.org/r/d20128c849ecdbf4dd01cc828fcec32127ed939a.1684350871.git.lstoakes@gmail.com
    Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> (for arm64)
    Acked-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Janosch Frank <frankja@linux.ibm.com> (for s390)
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: Christian König <christian.koenig@amd.com>
    Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Jarkko Sakkinen <jarkko@kernel.org>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
    Cc: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


The vma argument is dropped from get_user_pages_remote() and a get_user_page_vma_remote() should be used in situations where vma being passed into get_user_pages_remote() is non-NULL.

The plan of action to fix this is to check for get_user_page_vma_remote() as a autoconf test and then use it in the systemtap runtime if it is available.
Comment 2 William Cohen 2023-07-10 14:16:49 UTC
Created attachment 14958 [details]
Patch to use new get_user_page_vma_remote function in linux 6.5

This patch allows things to compile on rawhide linux 6.5 kernel.  It also allows things to still build on RHEL 8 and 9.  However, need to test that the functionality for userspace access via __access_process_vm_ works.
Comment 3 William Cohen 2023-07-10 15:08:38 UTC
Following test provided reasonable results on Fedora rawhide and RHEL8:

sudo ../install/bin/stap --example para-callgraph.stp 'process("/usr/bin/ls").function("*")' -c /usr/bin/ls

The following also produced reasonable results on Fedora rawhide and RHEL8:

 sudo ../install/bin/stap --example glibc-malloc.stp -c 'stap --dump-functions'


The patch is in the master branch as:

commit e891a37e366362a12ca311439918d69ffe641cec (HEAD -> master, origin/master, origin/HEAD)
Author: William Cohen <wcohen@redhat.com>
Date:   Sun Jul 9 16:46:20 2023 -0400

    Adjust runtime _access_process_vm_ to work with linux 6.5
    
    Linux kernel commit ca5e863233e8f6acd1792fd85d6bc2729a1b2c10
    eliminated the vma argument for ‘get_user_pages_remote.  For linux 6.5
    kernel use the get_user_page_vma_remote function in its place like the
    __access_remote_vm function in mm/memory.c of the kernel.