Bug 24875

Summary: VMA tracker is broken on Fedora 29
Product: systemtap Reporter: agentzh <agentzh>
Component: runtimeAssignee: Unassigned <systemtap>
Status: RESOLVED FIXED    
Severity: normal CC: fche
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description agentzh 2019-08-02 23:35:28 UTC
On my fully updated Fedora 29 system, the latest master (commit ebfc300ec) of systemtap fails to do VMA tracking correctly. Below is a minimal test case:

File a.stp:

probe process.function("foo") {
  printf("%#x\n", @var("blah"));
  exit();
}

File a.c:


long blah = 0xdeadbeefL;

int main(void) {
    return 0;
}

And compile the C program a.c like this:

    gcc -fpic -pie -Wall -g %

And then run the a.stp like this:

    stap -c './a.out' a.stp

I got the error

ERROR: read fault [man error::fault] at 0x0 near operator '@var' at a.stp:2:25
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/stap/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]

With the -DDEBUG_SYMBOLS option enabled:

_stp_do_relocation:74: found kernel _stext load address: 0xffffffffad000000
_stp_usermodule_check:847: build-id validation [26087 /home/agentzh/git/systemtap-plus/a.out] address=0x559926150000 build_id_offset=0x2f4
_stp_umodule_relocate:78: [26087] /home/agentzh/git/systemtap-plus/a.out, 4028
_stp_umodule_relocate:78: [26087] /home/agentzh/git/systemtap-plus/a.out, 4028
ERROR: read fault [man error::fault] at 0x0 near operator '@var' at a.stp:2:25
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/stap/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]

Apparently the VMA tracker resolver returns the address zero.

For comparison, the same example works flawlessly on CentOS 7:

$ stap -c './a.out' a.stp
4 0xdeadbeef
blah: 0xdeadbeef

And the same example and same version of stap also worked fine on Fedora 28.

I tried digging this up a bit on Fedora 29. And it seems that the stap_find_vma_map_info_user function fails to find the matched entry in __stp_tf_vma_map. There *is* a map entry with the matching PID, but the task->user pointer definitely differs. Even after forcibly bypassing the task->user check, the returned nonzero address is still very wrong:

_stp_do_relocation:74: found kernel _stext load address: 0xffffffffad000000
_stp_usermodule_check:856: build-id validation [29814 /home/agentzh/git/systemtap-plus/a.out] address=0x55b4b87d8000 build_id_offset=0x2f4
_stp_umodule_relocate:79: [29814] /home/agentzh/git/systemtap-plus/a.out, 4028
_stp_umodule_relocate:84: checking module (path /home/agentzh/git/systemtap-plus/a.out) and num secs 1
_stp_umodule_relocate:92: checking section .dynamic
stap_find_vma_map_info_user:334: stp tf vma map: 0000000008530af7
stap_find_vma_map_info_user:343: found pid 29814 (user: 00000000894ada10 vs 00000000b7a8c989)
_stp_umodule_relocate:101: find vma map info user returned 0
_stp_umodule_relocate:104: address=7ffca2a02028
_stp_umodule_relocate:79: [29814] /home/agentzh/git/systemtap-plus/a.out, 4028
_stp_umodule_relocate:84: checking module (path /home/agentzh/git/systemtap-plus/a.out) and num secs 1
_stp_umodule_relocate:92: checking section .dynamic
stap_find_vma_map_info_user:334: stp tf vma map: 0000000008530af7
stap_find_vma_map_info_user:343: found pid 29814 (user: 00000000894ada10 vs 00000000b7a8c989)
_stp_umodule_relocate:101: find vma map info user returned 0
_stp_umodule_relocate:104: address=7ffca2a02028
ERROR: read fault [man error::fault] at 0x7ffca2a02028 near operator '@var' at a.stp:2:25
	a.stp:2:25 in probe process("/home/agentzh/git/systemtap-plus/a.out").function("main@/home/agentzh/git/systemtap-plus/b.c:3")
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/stap-plus/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]

Some more info for the Fedora 29 system:

$ uname -a
Linux glass 5.1.20-200.fc29.x86_64 #1 SMP Fri Jul 26 15:15:46 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ gcc --version
gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ stap --version
Systemtap translator/driver (version 4.2/0.176, commit release-4.1-58-gebfc300ec2ad)
Copyright (C) 2005-2019 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
tested kernel versions: 2.6.18 ... 5.1-rc2
enabled features: AVAHI BPF PYTHON2 LIBSQLITE3 LIBXML2 NLS NSS READLINE
Comment 1 agentzh 2019-08-02 23:52:56 UTC
Just for the record, using Fedora 28's older kernels (5.0 and 4.20) on Fedora 29 gives the same error, so it seems that it's not a kernel incompatibility issue, but more like a toolchain issue like ld.so and etc, as fche suggested.
Comment 2 Frank Ch. Eigler 2019-08-05 00:12:31 UTC
On fedora 30, absence or presence of -pie in the gcc flags makes or breaks this test.
Comment 3 Frank Ch. Eigler 2019-08-22 00:12:02 UTC
commit 4ae4592f1106e941023a5768d34c2381cc869631 fixes