When trying out the Fedora 17 systemtap-1.8 rpms on x86_64 machine the systemtap.base/buildid.exp test would hang and the cpu use would be pegged. This has been observed both on a physical f17 machine and virtual image. kernel-3.4.4-5.fc17.x86_64 systemtap-1.8-1.fc17.x86_64 systemtap-testsuite-1.8-1.fc17.x86_64 To reproduce. As root: cd /usr/share/systemtap/testsuite/ make installcheck RUNTESTFLAGS="--debug buildid.exp" The test will start running but never finish. Will also see the cpu load be pegged on the machine. Note that this problem has not been observed in the upstream git verions of systemtap. It is suspected that that the following patch is fixing the problem http://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=commit;h=50bd2c22a62981ae8b7b379d48574671228f1446
Created attachment 6533 [details] sysrq-t output when machine is hung with systemtap-1.8-4 rpm This is the sysrq-t output of what is running on the system when the machine gets hung with: make installcheck RUNTESTFLAGS="systemtap.base/buildid.exp" It looks like buildid.x pid 1645 is getting stuck running.
Created attachment 6534 [details] sysrq-l output for buildid.exp hang on fedora 17 Backtrace of the problem process.
From that last attachment: [ 283.666010] <EOI> [ 283.666010] [<ffffffff812c6b48>] ? __rb_rotate_right+0x8/0x90 [ 283.666010] [<ffffffff812c6ce0>] ? rb_insert_color+0x110/0x150 [ 283.666010] [<ffffffff8114b140>] __vma_link_rb+0x30/0x40 [ 283.666010] [<ffffffff8114b1c8>] vma_link+0x78/0xe0 [ 283.666010] [<ffffffff8114ce2d>] mmap_region+0x56d/0x630 [ 283.666010] [<ffffffff8114d23d>] do_mmap_pgoff+0x34d/0x3a0 [ 283.666010] [<ffffffff8114d4ed>] do_mmap+0x2d/0x40 [ 283.666010] [<ffffffff815ed6aa>] elf_map+0x119/0x14a [ 283.666010] [<ffffffff811d04a9>] load_elf_binary+0x5e9/0x18f0 [ 283.666010] [<ffffffff81186eb5>] ? copy_strings.isra.24+0x2b5/0x300 [ 283.666010] [<ffffffff811cfec0>] ? notesize.isra.6+0x30/0x30 [ 283.666010] [<ffffffff811871f7>] search_binary_handler+0x117/0x320 [ 283.666010] [<ffffffff81188734>] do_execve_common.isra.28+0x434/0x500 [ 283.666010] [<ffffffff8118881b>] do_execve+0x1b/0x20 [ 283.666010] [<ffffffff8101c3c7>] sys_execve+0x47/0x70 [ 283.666010] [<ffffffff815fcdec>] stub_execve+0x6c/0xc0 This looks very familiar; it resembles the kinds of failures I was wrestling with just before releasing 1.8, when testing linux-next kernels. The apparent hang appeared to be corruption of the rbtree data structure, to give it loops where there ought be none.
The mainline linux kernel (3.5.0-rc7+) seems to suffer from the same problem.
Created attachment 6549 [details] soft lockup info from console On linux-3.5.0+ kernel see the same problem. This is the output from the console showing a soft lockup which looks similar to the previous backtraces.
Created attachment 6550 [details] patching out test that hangs allows buildid.exp to complete This patch comments out a one of the buildid.exp tests. When this one test is commented out the buildid.exp runs and completes.
Created attachment 6553 [details] Pared down buildid2.exp that replicates the problem This example placed in the /usr/share/systemtap/testsuite/systemtap.base consistently triggers the hange on fedora 17 with: make installcheck RUNTESTFLAGS="--debug --verbose systemtap.base/buildid2.exp" Note that it in buildid.exp a similar stap test that passes follows the problem test: spawn stap -m bid -p4 -DSTP_NO_BUILDID_CHECK $bid_stp $bid_exepath catch {close}; catch {wait} spawn staprun [pwd]/bid.ko -c $bid_exepath
Created attachment 6554 [details] excerpt from /var/log/messages from a debug kernel With the debug kernel (3.4.6-2.fc17.x86_64.debug), I'm seeing a gpf that might be a bit earlier in the timeline. Here's the call trace: ==== Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] Call Trace: Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff812c8beb>] ? file_map_prot_check+0x7b/0xf0 Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff81182a3d>] do_mmap_pgoff+0x35d/0x3b0 Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff81182cfd>] do_mmap+0x2d/0x40 Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff816a5099>] elf_map+0x11c/0x150 Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff812183a9>] load_elf_binary+0x5e9/0x1900 Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff810a4825>] ? sched_clock_local+0x25/0xa0 Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff81217dc0>] ? padzero+0x40/0x40 Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff81217dc0>] ? padzero+0x40/0x40 Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff811c7a44>] search_binary_handler+0x194/0x4f0 Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff811c7911>] ? search_binary_handler+0x61/0x4f0 Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff811c885b>] do_execve_common.isra.30+0x4eb/0x590 Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff811c847d>] ? do_execve_common.isra.30+0x10d/0x590 Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff811cbedb>] ? getname_flags+0x3b/0x250 Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff811c891b>] do_execve+0x1b/0x20 Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff81023377>] sys_execve+0x47/0x70 Jul 24 16:06:55 kvm-f17-64-1 kernel: [ 1245.120023] [<ffffffff816b792c>] stub_execve+0x6c/0xc0 ====
Here's some more info on this bug. The buildid2.exp testcase modifies the buildid.x executable. Here's the original executable: === # eu-readelf -n ./buildid.x Note section [ 2] '.note.ABI-tag' of 32 bytes at offset 0x21c: Owner Data size Type GNU 16 VERSION OS: Linux, ABI: 2.6.32 Note section [ 3] '.note.gnu.build-id' of 36 bytes at offset 0x23c: Owner Data size Type GNU 20 GNU_BUILD_ID Build ID: c62eb8e35b2b8729e42291c813235bc94a20ab68 === After the testcase uses objcopy to modify it: === # eu-readelf -n ./buildid.x Note section [ 2] '.note.ABI-tag' of 32 bytes at offset 0x21c: Owner Data size Type GNU 16 VERSION OS: Linux, ABI: 2.6.32 Note section [ 3] '.note.gnu.build-id' of 73 bytes at offset 0x23c: Owner Data size Type GNU 20 GNU_BUILD_ID Build ID: f9e1c2e8b7ddf2ee5b73eb58f03654be99c945c2 GNU 20 GNU_BUILD_ID Build ID: 57eec9e035f81d85c685c88f31d10f31f6768b8c eu-readelf: cannot get content of note section: offset out of range === So, that's a bit scary - the fact that the offset is out of range and the fact that the .note.gnu.build-id section has doubled in size. When stap is run on the original executable, we get: === # stap -p2 ../../src/testsuite/systemtap.base/buildid.stp ./buildid.x # functions _dwarf_tvar_get_x_0:long () # probes process("/es/scratch/dsmith/systemtap/f17-64/testsuite/1/buildid.x").function("foo@../../src/testsuite/systemtap.base/buildid.c:5") /* pc=.absolute+0x547 */ /* <- process("/es/scratch/dsmith/systemtap/f17-64/testsuite/1/buildid.x").function("foo@../../src/testsuite/systemtap.base/buildid.c:5").label("label") = process("/es/scratch/dsmith/systemtap/f17-64/testsuite/1/buildid.x").function("foo@../../src/testsuite/systemtap.base/buildid.c:5") <- process("./buildid.x").function("foo").label("label") */ === So, the probe is at 0x547. When run on the modified executable, we get: === stap -p2 ../../src/testsuite/systemtap.base/buildid.stp ./buildid.x # functions _dwarf_tvar_get_x_0:long () # probes process("/es/scratch/dsmith/systemtap/f17-64/testsuite/buildid.x").function("foo@../../src/testsuite/systemtap.base/buildid.c:5") /* pc=.absolute+0x567 */ /* <- process("/es/scratch/dsmith/systemtap/f17-64/testsuite/buildid.x").function("foo@../../src/testsuite/systemtap.base/buildid.c:5").label("label") = process("/es/scratch/dsmith/systemtap/f17-64/testsuite/buildid.x").function("foo@../../src/testsuite/systemtap.base/buildid.c:5") <- process("./buildid.x").function("foo").label("label") */ === On the new executable, the probe should be at 0x567, 0x20 bytes past the original address. The hang here might be related to how inode-uprobes work. With the old utrace-based uprobes, when we find the target process we check the buildid, then install the probe(s). With the new inode-based uprobes, we actually install the probe(s) at module startup, then when we find the target process we check the buildid (if I'm understanding the code correctly). I'm wondering if the bad address we're installing the probe at is causing this hang (possibly the bad probe location isn't on an instruction boundary for instance).
Created attachment 6558 [details] An attempt to replicate the problem using perf This script is an attempt to replicate the problem using perf rather than systemtap. However, perf doesn't cause the machine to hang. It gets one sample for the first "perf record", but it doesn't get any samples for the second "perf record ..."
Created attachment 6559 [details] Replicator for uprobes issue This tarball contains everything to replicate the problem on fedor17 on x86_64. The addresses might be different on i686, so this might not fail (or work) in the same manner as x86_64. Unpack the tarball tar xvfz pr14348.tar.gz; cd pr14348 Run the pr14348_perf.sh as root. This demonstrates the problem using the raw perf infrastructure. Running the pr14348.sh as root demonstrates the problem using systemtap-1.8-4.
Found to be kernel bug, corrected by https://lkml.org/lkml/2012/7/28/59
The systemtap side of this problem, bug #14409 (Delay inode-uprobes registration until after buildid checks) has been fixed.