Summary: | uretprobe_instances mis-inherited | ||
---|---|---|---|
Product: | systemtap | Reporter: | Jim Keniston <jkenisto> |
Component: | uprobes | Assignee: | Srikar Dronamraju <srikar> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | ananth, mark, srikar |
Priority: | P1 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: |
Description
Jim Keniston
2008-08-19 19:42:32 UTC
(In reply to comment #0) ... > A more general solution is to adjust fork handling in > uprobe_report_clone() as follows: ... > - Call uprobe_mk_process() to create a uprobe_process and uprobe_task > for the child. Leave uproc->uretprobe_trampoline_addr NULL; this > will be initialized along with the SSOL vma on the fly if/when the child > returns from a uretprobed function. ... I'm not sure what I was thinking when I wrote that, but... if the SSOL vma isn't in place when the child tries to return through the trampoline, I think we'll get a SIGSEGV, not a SIGTRAP. The child's uprobe_report_signal() can be trained to handle that, of course. But if somebody registers new probes on the child before all the inherited uretprobe_instances are popped, the child's SSOL vma could get instantiated, and we could get SIGTRAPs instead of SIGSEGVs in the child. Checked in a fix (commit 6c492b1) along the lines of the "more general solution" described in Comment #1. To make sure the child gets SIGTRAPs instead of SIGSEGVs, we allow the SSOL vma to be copied on fork. Regression test is testsuite/systemtap.base/bz6850.exp. Passes on i386 and x86_64. Needs testing on powerpc and s390 (and ia64). On 2.6.18-92.1.13.el5 x86_64 with current git tip (version 0.7.1/0.134 git branch master, commit 46fc0cc4) bz6850.exp gives a FAIL: bz6850 -p5 because the bz6850 program crashes with a SIGSEGV (seen when setting ulimit -c unlimited and inspecting the dumped core file). Running by hand shows: mark@gnu:~/src/systemtap/testsuite$ /usr/local/systemtap/bin/stap -v /home/mark/src/systemtap/testsuite/systemtap.base/bz6850.stp -c ./bz6850 Pass 1: parsed user script and 45 library script(s) in 250usr/30sys/272real ms. Pass 2: analyzed script: 10 probe(s), 1 function(s), 0 embed(s), 0 global(s) in 10usr/0sys/7real ms. Pass 3: translated to C into "/tmp/stap9UlJRP/stap_31504579e335e4ae863f7e5983a0bccc_3630.c" in 110usr/90sys/198real ms. Pass 4, preamble: (re)building SystemTap's version of uprobes. Uprobes (re)build complete. Pass 4: compiled C into "stap_31504579e335e4ae863f7e5983a0bccc_3630.ko" in 4140usr/620sys/5031real ms. Pass 5: starting run. main called fork_and_exec1 called fork_and_exec2 called fork1 called fork2 called fork2 returns fork1 returns fork_and_exec2 returns fork_and_exec1 returns Pass 5: run completed in 10usr/70sys/96real ms. mark@gnu:~/src/systemtap/testsuite$ gdb -core core.20463 ./bz6850 GNU gdb Red Hat Linux (6.5-37.el5_2.2rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/libthread_db.so.1". Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Core was generated by `./bz6850'. Program terminated with signal 11, Segmentation fault. #0 0x00007fff3b07e000 in ?? () (gdb) where #0 0x00007fff3b07e000 in ?? () #1 0x00002b3f00000001 in ?? () #2 0x0000000300000000 in ?? () #3 0x00007fff3b07b910 in ?? () #4 0x00007fff3b07e000 in ?? () #5 0x0000003e70a063a0 in ?? () from /lib64/libc.so.6 #6 0x000000027081abc0 in ?? () #7 0x00007fff3b07b970 in ?? () #8 0x0000003e70612852 in _dl_runtime_resolve () from /lib64/ld-linux-x86-64.so.2 #9 0x00007fff3b07e000 in ?? () #10 0x0100000000000000 in ?? () #11 0x000000007081abc0 in ?? () #12 0x00000000004008c0 in __libc_csu_fini () #13 0x0000000000000000 in ?? () On 2.6.18-92.1.1.el5 i686 the test passes with the same systemtap setup. (In reply to comment #3) > On 2.6.18-92.1.13.el5 x86_64 with current git tip (version 0.7.1/0.134 git > branch master, commit 46fc0cc4) bz6850.exp gives a FAIL: bz6850 -p5 because the > bz6850 program crashes with a SIGSEGV (seen when setting ulimit -c unlimited and > inspecting the dumped core file). > > Running by hand shows: > > mark@gnu:~/src/systemtap/testsuite$ /usr/local/systemtap/bin/stap -v > /home/mark/src/systemtap/testsuite/systemtap.base/bz6850.stp -c ./bz6850 > Pass 1: parsed user script and 45 library script(s) in 250usr/30sys/272real ms. > Pass 2: analyzed script: 10 probe(s), 1 function(s), 0 embed(s), 0 global(s) in > 10usr/0sys/7real ms. > Pass 3: translated to C into > "/tmp/stap9UlJRP/stap_31504579e335e4ae863f7e5983a0bccc_3630.c" in > 110usr/90sys/198real ms. > Pass 4, preamble: (re)building SystemTap's version of uprobes. > Uprobes (re)build complete. > Pass 4: compiled C into "stap_31504579e335e4ae863f7e5983a0bccc_3630.ko" in > 4140usr/620sys/5031real ms. > Pass 5: starting run. > main called > fork_and_exec1 called > fork_and_exec2 called > fork1 called > fork2 called > fork2 returns > fork1 returns > fork_and_exec2 returns > fork_and_exec1 returns > Pass 5: run completed in 10usr/70sys/96real ms. > > mark@gnu:~/src/systemtap/testsuite$ gdb -core core.20463 ./bz6850 > ... > Core was generated by `./bz6850'. > Program terminated with signal 11, Segmentation fault. > ... That's the sort of failure I saw WITHOUT the fix: the parent survives, the child dies with a SIGSEGV. Could you please check out the following? 1. Rebuild uprobes.ko from scratch and try again: $ su - # rmmod uprobes # cd /usr[/local]/share/systemtap/runtime/uprobes # make clean # rm Module.symvers # make # insmod uprobes.ko stap apparently rebuilt uprobes.ko, as desired, in Pass 4, but let's be sure uprobes.ko really reflects the latest bits. 2. Does /var/log/messages report anything interesting when you run the test? E.g., a "kernel tainted" message? Thanks. (In reply to comment #5) > That's the sort of failure I saw WITHOUT the fix: the parent survives, the child > dies with a SIGSEGV. Could you please check out the following? > 1. Rebuild uprobes.ko from scratch and try again: > $ su - > # rmmod uprobes > # cd /usr[/local]/share/systemtap/runtime/uprobes > # make clean > # rm Module.symvers > # make > # insmod uprobes.ko > stap apparently rebuilt uprobes.ko, as desired, in Pass 4, but let's be sure > uprobes.ko really reflects the latest bits. Doh. I didn't know uprobes.ko was never unloaded. And indeed this test machine (uptime > weeks) still had an old one loaded. After removing and rebuilding uprobes as above: Running /home/mark/src/systemtap/testsuite/systemtap.base/bz6850.exp ... === systemtap Summary === # of expected passes 3 >
> Regression test is testsuite/systemtap.base/bz6850.exp. Passes on i386 and
> x86_64. Needs testing on powerpc and s390 (and ia64).
Forgot to update, I have tested it on powerpc. Fix works fine for 64 bit apps.
For 32 bit apps, please refer 6946.
FYI. In git tree, it seems that testsuite/systemtap.base/bz6850.stp lacks its executable permission... --- Running /usr/share/systemtap/testsuite/systemtap.base/bz6850.exp ... ERROR: tcl error sourcing /usr/share/systemtap/testsuite/systemtap.base/bz6850.exp. ERROR: couldn't execute "/usr/share/systemtap/testsuite/systemtap.base/bz6850.stp": permission denied --- Thank you, (In reply to comment #8) > FYI. > In git tree, it seems that testsuite/systemtap.base/bz6850.stp lacks its > executable permission... Fixed in commit 3449f61. This problem may not be completely nipped in the bud yet. # stap -e 'probe process("bash").function("*").return {}' -w -c bash bash# exit There are stopped jobs. (bash hangs indefinitely; killable with -9) also tracked at https://bugzilla.redhat.com/show_bug.cgi?id=468759 (In reply to comment #10) > This problem may not be completely nipped in the bud yet. > > # stap -e 'probe process("bash").function("*").return {}' -w -c bash > bash# exit > There are stopped jobs. > (bash hangs indefinitely; killable with -9) This is apparently not an instance of 6850, but rather 5274 ("uprobes: Handle longjmps"). Srikar's bz5274.patch fixes this for me. (In reply to comment #10) > This problem may not be completely nipped in the bud yet. > > # stap -e 'probe process("bash").function("*").return {}' -w -c bash > bash# exit > There are stopped jobs. > (bash hangs indefinitely; killable with -9) > Frank, Can you confirm that the problem is not seen when fix for pr:5274 applied It seems to be working fine now, thanks. |