I just tried out the new procfs probe feature (very cool, BTW) and found a bug. The bug can be reproduced as follows: 1. Run a procfs probe script that inserts an entry in /proc (e.g., /proc/systemtap/stap_d116efa3785a073ecd3b45ae46950a46_72240/foo). 2. CD to /proc/systemtap/stap_d116efa3785a073ecd3b45ae46950a46_72240 in another xterm. 3. Ctrl-c out of the probe script. You'll see messages like the following in /var/log/messages: Sep 17 11:51:20 localhost kernel: remove_proc_entry: systemtap/stap_d116efa3785a073ecd3b45ae46950a46_72240 busy, count=1 Sep 17 11:51:20 localhost kernel: remove_proc_entry: /proc/systemtap busy, count=1 ...meaning the deletions have been deferred. 4. Run the procfs probe script again, then CD to /proc. There's no /proc/systemtap entry. I don't see any error messages that indicate /proc/systemtap could not be added. I had to reboot the system before I could get the script to add /proc/systemtap again.
This is just the sort of thing that the file_operations->owner field was built for.
Created attachment 2010 [details] possible fix Can you try this patch and see what you think?
(In reply to comment #2) > Created an attachment (id=2010) > possible fix > > Can you try this patch and see what you think? The behavior is different with this patch, but still not correct. Here's what I did to test: 1. Run "staprun <module>.ko" in an xterm. The module creates /proc/systemtap, /proc/systemtap/<module>, and /proc/systemtap/<module>/<value>. 2. In a second xterm, cd to /proc/systemtap/<module>. 3. Ctrl-c out of staprun. The deferred deletion messages appear in /var/log/messages as before, but staprun doesn't exit yet. 4. cd ../ in the second xterm. 5. The <module> directory is now deleted and staprun exits. From a third xterm, do "ls /proc". /proc/systemtap does not appear. However, pwd in the second xterm indicates it's still there. It must be in some interim state (deleted, but not completely). 6. Reload the module again with "staprun <module>.ko". path_lookup() finds the existing /proc/systemtap and, thus, doesn't recreate it. The module just links <module> and <module>/<value> off of the existing /proc/systemtap. 7. /proc/systemtap can only be accessed in the second xterm. As soon as I cd out of /proc/systemtap, it's no longer accessible from a shell even though it exists in some form. One solution might be to simply not delete /proc/systemtap. Let the first module create it, then leave it around for other modules to use even if the first module is removed.
Checked in a fix for this. There were several related problems I fixed that all involved problems with directories that are awaiting deletion, but getting reused. I applied the attached ownership patch because it had the effect of making the deletion finish before the module gets removed. I had thought this was annoying, but it removes problems caused by running the same script again while the original scripts's path elements were all marked as awaiting deletion. The problem reported in this BZ was caused when /proc/systemtap was marked as awaiting deletion while new scripts kept reusing it. path_lookup() could see it, while "ls" could not. There was no easy fix for this problem. So the new behavior is for /proc/systemtap to never be in a deferred deletion state. If it is in use when a module exits, it will not be deleted. Next time a systemtap module exits, if not in use, it will be deleted.
(In reply to comment #4) > Checked in a fix for this. Unfortunately, fedora x86 8 kernels don't really like this fix. If you run a systemtap script that uses procfs, then immediately kill it (no procfs reads/writes are necessary), you will get the new warning about removal being deferred. Under fedora 8, I'm not sure you can use procfs at all without getting the warning. Under RHEL5 x86_64, I don't see the warning. (I'm not sure how much of a problem getting the warning is, but it is causing a spurious test failure for systemtap.base/procfs.exp.) The following script demonstrates getting the warning when we shouldn't. # stap -e 'probe procfs("command").read { $value = "100" }' -m foo Warning: using '-m' disables cache support. [interrupt script here] WARNING: Removal of /proc/systemtap/foo is deferred until it is no longer in use. Systemtap module removal will block.
I'm changing this one to WORKSFORME, since I can't duplicate it any more on kernels 2.6.18-168.el5 or 2.6.32.3-21.fc13.x86_64.