Bug 14432

Summary: tempdir isn't always removed
Product: systemtap Reporter: Josh Stone <jistone>
Component: translatorAssignee: Unassigned <systemtap>
Status: RESOLVED FIXED    
Severity: normal CC: wcohen
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description Josh Stone 2012-08-04 03:15:21 UTC
There are a number of places in the translator that call exit(), which means that session tempdirs are not removed.  The most obvious case is "stap -V".

Some of these exit() calls may have been an issue for a while, but I think some are also a regression due to PR13516 commit b96901b7, which creates the tempdir much earlier than before.  So for example, now exits during option-parsing need to be more careful and clean up.
Comment 1 Josh Stone 2012-08-04 03:19:16 UTC
Note, there is a call to _exit() in handle_interrupt(), which is sort of an emergency case that doesn't need to clean up.

For all plain exit() calls, we can probably get clever with setting up atexit() or on_exit().  Or perhaps those exit() calls should convert to the new interrupt_exception instead.
Comment 2 Josh Stone 2012-08-04 21:15:58 UTC
commit e2d0f787a648eefe4e5a152058f92c3f3274242e
Comment 3 William Cohen 2012-08-24 19:02:06 UTC
On RHEL5 systems commit e2d0f787a648eefe4e5a152058f92c3f3274242e causes the tests to hang on systemtap.base/cmd_parse.exp.  It reports "PASS: cmd_parse14" but doesn't seem to progress past that message.
Comment 4 William Cohen 2012-08-24 19:36:08 UTC
When comparing the output of "stap -v -v --vp 01020 -h" of the working and hanging versions of stap the hanging one has the following lines at the end of the output:

+Running rm -rf /tmp/stappdKUJd
+Spawn waitpid result (0x0): 0
+Removed temporary directory "/tmp/stappdKUJd"
Comment 5 William Cohen 2012-08-24 19:55:38 UTC
For cmd_parse14 doesn't seems to be exiting stap. See something like the following pstee when running the test:

$ pstree -p 15099
make(15099)───sh(15100)───execrc(15136)───expect(15137)─┬─sh(15424)───stap(15427)
                                                        └─{expect}(15158)

Started gdb to see where stap is stuck:

(gdb) where
#0  0x00000034a52c5630 in __write_nocancel () from /lib64/libc.so.6
#1  0x00000034a526a6b3 in _IO_new_file_write (f=0x34a5551860, data=0xfbd14a8, 
    n=29) at fileops.c:1260
#2  0x00000034a526baf3 in _IO_new_file_xsputn (f=0x34a5551860, 
    data=<value optimized out>, n=29) at fileops.c:514
#3  0x00000034a5260dfb in _IO_fwrite (buf=0xfbd14a8, size=1, count=29, 
    fp=0x34a5551860) at iofwrite.c:45
#4  0x00000034aba8e5dd in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char>, std::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6
#5  0x000000000053dcee in stap_waitpid (verbose=<value optimized out>, 
    pid=15428) at ../systemtap/util.cxx:618
#6  0x0000000000540adc in stap_system (verbose=2, description="rm", 
    args=std::vector of length 3, capacity 4 = {...}, null_out=false, 
    null_err=<value optimized out>) at ../systemtap/util.cxx:817
#7  0x000000000041818a in stap_system (this=0x7fffe37b6830)
    at ../systemtap/util.h:73
#8  systemtap_session::remove_tmp_dir (this=0x7fffe37b6830)
    at ../systemtap/session.cxx:1781
#9  0x0000000000418497 in systemtap_session::~systemtap_session (this=0x2, 
    __in_chrg=<value optimized out>) at ../systemtap/session.cxx:360
#10 0x000000000041332b in main (argc=6, argv=0x7fffe37b71f8)
---Type <return> to continue, or q <return> to quit---
    at ../systemtap/main.cxx:1135
(gdb)
Comment 6 Josh Stone 2012-08-24 21:36:28 UTC
(In reply to comment #4)
> When comparing the output of "stap -v -v --vp 01020 -h" of the working and
> hanging versions of stap the hanging one has the following lines at the end of
> the output:
> 
> +Running rm -rf /tmp/stappdKUJd
> +Spawn waitpid result (0x0): 0
> +Removed temporary directory "/tmp/stappdKUJd"

This much is a good thing - exactly what the commit was intended to solve.

But it doesn't gel with:

(In reply to comment #5)
> #5  0x000000000053dcee in stap_waitpid (verbose=<value optimized out>, 
>     pid=15428) at ../systemtap/util.cxx:618

This line is the clog which prints the "Spawn waitpid..." above, plus

> #8  systemtap_session::remove_tmp_dir (this=0x7fffe37b6830)
>    at ../systemtap/session.cxx:1781

This is a couple lines before that which prints "Removed temporary directory".

So I don't see how that backtrace could possibly correspond with the new output.
Comment 7 Josh Stone 2012-09-07 19:31:13 UTC
I believe cmd_parse's issues are more of a testcase bug -- cloned to PR14560.